Validity and Approaches to Measuring It


Validity can be described as the extent to which a test has measured the attribute it was designed for. It can also be used to mean the strength of conclusions or inferences, and it tends to give a conclusion on whether the objective of the research has been achieved. Validity is dependent upon the person doing the test, the constructed materials of the test, the correlations between the test being done and the previous tests are done earlier, and also on the suitability of the person participating in the given test. Based on these factors validity can be grouped into content, face, predictive and concurrent validity.

There is another classification that classifies validity into four groups which include internal validity, external validity, construct validity and conclusion validity. Face validity tests the suitability of the test to both the user and the subject. Content validity shows the degree to which the test specification is able to achieve the intended purpose; Predictive defines the correlation between the score of the test criterion while concurrent validity shows how the test is done is related to previous ones done to measure the same item (Mangala, 2007). Conclusion validity is determined by looking at the relationship between the outcome of the research and the test carried out. Internal validity shows if at all the test carried out has an effect on the observed results, external validity refers to the researcher’s ability to generalize the results to other settings or situations while construct validity shows if there is a relationship between the test carried out and the actual phenomena under study and can be determined by evaluating the usefulness of a test and the phenomena which the theory predicts (Osborn, n.d).

The relationship between Measurement (Internal) Validity and External Validity

After looking at the various forms of validity one can actually conclude that there are two major types of validity which are external validity and measurement validity. Measurement is the process of making observations and recording them down for the purpose of the research work. In taking measurements the level of measurement must be taken into account. These levels include ordinal, nominal, ratio and interval. It’s also very crucial that the reliability of the measurement is taken into account, this is very important if at all credible results are to be expected from the research activity. There are also categories of measurements to be taken into consideration and they include survey research which involves designing and implementing questionnaires and interviews, scaling involve the developing and implementing scales, qualitative research provides a range of non-numerical methods of measurements, while unobtrusive measurements provide the researcher with a range of measurement methods that do not interfere with the context of the research (Trochim, 2006).

Sampling is the process of picking a portion of a whole population to represent the whole population in the study. A sample, therefore, is a small subset of the population chosen for the purpose of studies. The two major classifications of sampling techniques are probability and non-probability sampling. Probability sampling can be grouped into a simple random, stratified, disproportional, cluster, and systematic random sampling. Simple random sampling involves picking an array of numbers that are normally organized according to the size of the population. Systematic random sampling is always preferred in a number of researches because the samples are normally picked periodically and this makes it quite easy to handle though it makes it possible for the sample selection process to be susceptible to manipulation by other people involved in the research. Stratified sampling involves grouping the objects of study into strata such as age, sex, etc. This method is useful if there is a need to compare the results of various groups. Non- probability sampling techniques include snowball sampling in which the already selected samples are used to identify others with the same attributes as them, quota sampling is normally done to ensure that the population is well represented thereby reducing bias, convenience sampling involve careful selection of the place and subjects to be sampled and this makes it less costly financially and it’s also time convenient. The other type of sampling is consecutive sampling and judgmental, both of which are forms of convenient sampling and are quite cost-effective (Lunsford and Lunsford, 1995).

Measurement validity is also referred to as internal validity and it one to ensure that he or she is measuring what they had intended to measure for example if you conduct research on the mode of transport preferred by people but on further investigations, you realize that you have collected information on the type of cars that are liked by people, then it means that your method is invalid. There is a need to control the situation when conducting research because failure to do so will mean that the researcher will not be able to tell if the results collected are real or not. Lack of internal validity shows that the researcher has not achieved the objective of the research and this is normally caused by extraneous variables that tend to give a wrong explanation of the outcome of the research done.

There are a number of types of construct validity and these include Construct validity which is further divided into translational validity and criterion-related validity. Face and content validity are the forms of translational validity while predictive, concurrent, and convergent and discriminant validity are the forms of criterion-related validity. The other type of internal validity is Translation validity which refers to the degree to which you accurately translated your construct into operationalization. Face validity and content validity are the two main types of translational validity and they are determined by looking at the operationalization and the content of the translation to check whether it’s a good translation of the construct. Criterion-related validity is based on checking the success or the operationalization of the research against some criterion. Predictive, discriminant, concurrent, and divergent validity are all types of criterion-related validity; the difference between them is the criteria for judgment used. Predictive validity involves assessing the research’s operationalization ability to predict something they should be able to predict theoretically, for example, we can assess a learner’s ability in Physics to gauge how he or she is likely to perform in a physics-related career such as electrical engineering. If there is a high relationship between the two measurements made then the researcher can be able to conclude that the research done has predictive validity and has succeeded in giving a prediction of what is expected that it can predict theoretically Convergent validity assesses the measure’s ability to distinguish between the groups that it is expected to distinguish. The high correlation will be evidence of convergent ability.

Finally, we have discriminant validity which assesses the extent to which the operationalization is not similar to others that it is theoretically expected to be different from, for instance, one can carry out research to show that the performance in Mathematics is not correlated to the performance in history (Trochin, 2006).

External validity refers to how the results of a study can be generalized to the population from which the sample was collected in other words if the research has done lacked external validity then it means that it can not be projected to other situations but only to the sample itself. Therefore the conclusions in the study should hold for other researchers and people in different places at different times. The two methods involved in generalization are the sampling model and the proximal similarity model. The sampling model involves identifying a population to be generalized and then a sample is selected from the population after which research is conducted using the sample, the sample, therefore, becomes representative if the population and generalization can be made using the results obtained. The other mode is the proximal similarity model which involves subjecting the research to various contexts under which generalizations can be done and then developing a theory about the contexts that are like the study being done and the less similar ones, and once the framework has been developed the result of the study is then generalized to other people and situations. (Trochim, 2006)

How Poor Measurement and Sampling could Affect the Ability to Test a Theory

External validity can be caused by a wrong generalization about the people, place, and time of study because it may not always be true that a study with a given group of people will easily match another group of people. A wrong generalization will therefore result in the research findings being criticized. For instance, a study on behavior may not be generalized to all times and regions since the behavior of people keeps on changing with time as well as with the place of origin. Internal validity can be compromised by a number of threats though their effects can be reduced by adding a control group that can be compared to the program. These threats are classified into a single group and multiple group threats. Single group threats include the historical event that ends up deciding the outcome and not the treatment given by the researcher, occurrence of standard events over the course of your study aging, pre-testing, poor instrumentation, and dropping out of subjects or out of the study. The multiple group threats to internal validity include selection history which refers to an event occurring before and after the test that affects the two groups differently; selection maturation which occurs when the two groups grow at different rates; selection test which occurs as a result of the varying effects of the two groups after taking the two tests; selection mortality which occurs as a result of different rates of regression; and selection instrumentation which occurs when the implementation of the tests affects the two groups differently while selection regression refers to a scenario where the two groups under assessment move towards the average value at different rates. (Wimmer, 2000).

Efforts to Ensure That the Samples and the Measurements are Valid

Both internal and external validity can be realized in research through a number of adjustments. To start with, external validity can be achieved through the use of random samples since they are less prone to bias, use of heterogeneous sample helps in ensuring proper coverage of the population under study; it’s also important that a sample is taken from the group to which the results are meant to be generalized and lastly, it can be achieved by doing the research severally until consistency in the results is achieved. Internal validity can be realized by adding a control group that can be compared to the program.


Validity is a very important tool in assessing how successful the study carried out has been since they help in ensuring that the test is implemented, the instrumentation being used and the measurement taken are relevant to the study. External validity gives meaning to the research since it tests the ability of the research findings to other real issues at hand. External validity can be maximized through efficient sampling and avoidance of bias in the collection of samples while internal validity can be optimized through the introduction of control groups so as to avoid the interferences caused by extraneous variables, Kirk and Jerome explains that a measurement procedure can be said to have instrumental validity when the results they give are the same as those given by another procedure (Miller and Kirk, 1986, p. 22)


Lunsford T. R., Lunsford B. Rae, (1995). Research forum the research sample, part 1: sampling. Journal of prosthetics and orthotics: American academy of prosthetics and orthotics. 2009. Web.

Mangala S. (2007).Validity of a psychometric test. 2009. Web.

Miller M. L. and Kirk J. (1986). Reliability and Validity in Qualitative Research. London, Sage publications Inc.

Osborn D.R. (n.d) Reliability. 2009. Web.

Trochim W. M.K. (2006). External validity. 2009. Web.

Find out the price of your paper