The terms “reliable” and “valid” are uttered every day in the assessment industry, whether it’s in a political interview or a blog that deals with testing in schools. Unfortunately, reliability and validity have both common and technical meanings, which cause misunderstanding among politicians, schools, parents, and students. This, in turn, can cause decisions to be made and conclusions to be reached that may undermine the effectiveness of testing, such as using test results for unintended purposes.

This post is relatively short, but I encourage you to read Questar’s assessment briefs on reliability and validity for more in-depth information on the topics. The goal of this blog post is for you to understand the basics of the technical meaning of reliability and validity so that you can better discern the correct use of these terms.

Let’s begin with these two precepts:

  • A reliable test does not have to be valid.
  • A valid test must also be reliable.

A reliable test is consistent, meaning the student would obtain the same or similar score on the test if he or she took it multiple times. There are many ways to calculate a test’s reliability, but what is important to know is that reliability is a characteristic of a test that can exist even if the use of the test results is invalid. For example, an algebra test could reliably provide information about a student’s algebra knowledge but would not be valid to evaluate the student’s ability to do trigonometry.

Validity, however, is not a characteristic of the test but rather the use of the test results. In psychometrics, validity refers to the extent to which a test measures what it is intended to measure. It is not a single piece of information but rather a set of evidence that indicates what the test results may be used for in an appropriate and accurate manner. Therefore, validity requires an argument to be constructed or a study to be done for each use of a test in order to ensure that the uses are, in fact, valid.

An unreliable test does not have very accurate scores, which causes the use of the scores to be suspect. Thus, a valid test must also be reliable even though a reliable test may not be valid. For example, a test that measures a student’s mental ability in numeracy may have a high reliability coefficient of .92, which indicates that the test accurately and consistently measures what it was intended to measure. If the test user decides to use the results to make decisions about the student’s ability to be an airline pilot, the validity of this use must be determined by investigating the appropriateness of using the numeracy results for fitness as an airline pilot. For example, if a study shows that pilots with greater numeracy ability fly planes better because they are more skilled at understanding the plane’s instruments, the use of the numeracy test results would be valid. However, if the study shows no difference in flying ability between pilots with greater and lesser numeracy, the use of the results would be invalid as a pilot decision tool.

In education, using test results for purposes that have not been evaluated is problematic. For example, summative assessments, which are used for accountability purposes, are designed to provide a snapshot of student learning after a period of instruction and to reliably place students into performance levels. These tests were not designed to inform individual instruction for specific lessons, subtopics, standards, or learning disabilities. Nor were they designed to be used for teacher evaluation. Therefore, studies such as the relationship of student test scores with other measures of teacher effectiveness would have to be conducted to provide evidence as to the efficacy and appropriateness of using the accountability results for these other purposes. Misusing test results can lead to incorrect or spurious decisions that may be detrimental to those involved, or worse, be a cause for litigation.

In conclusion, the terms “reliable” and “valid” must be viewed with some trepidation, which is especially true for validity. The accurate, appropriate use of test results affects students, schools, and the state, so ensuring that the test measures what it was designed to measure is extremely important. To be effective, a test must first be reliable (i.e., provide accurate and consistent scores) and must be evaluated as to its validity for any decision based on those test results (e.g., predicting student success in the next grade).

Technical reports produced for assessment programs provide evidence of reliability and validity. As a psychometrician, stating that an assessment is reliable and valid without this kind of proof is unethical. For others, using these terms without fully understanding their meaning is misleading and may be harmful.