As described in Part 5 of this blog series, a test is designed for a specific purpose, such as granting certification or placing students in entry-level college courses. Specifically, K–12 educational testing aims to find something out about students, such as their knowledge of a content area or their yearly progression in learning the curriculum.

A test’s purpose is related to test type. Once the purpose is established, the next step is to design and develop the assessment, ensuring that it accurately measures the construct or content, serves the stated purpose, and is delivered in the appropriate format. This helps ensure that the assessment is valid and reliable so that the results can be used properly. The basic purposes of various educational assessments might be classified into these test types:

  • Diagnostic: determines specific learning difficulties or problems of a single student. These are usually administered on an individual basis.
  • Placement: determines where a student might be placed in an instructional setting or curriculum.
  • Formative: assesses learning as it occurs in the classroom to improve instruction. The purpose is to inform instruction before moving to the next unit of instruction. That is, did the students learn the material well enough? If not, teachers will typically instruct the weak areas and test again.
  • Interim/Benchmark: monitors the learning progress across larger entities such as a school or district. These assessments provide teachers with periodic results so that instruction can be altered if necessary to better meet the students’ needs. While formative assessments often evaluate recently covered course material, interim assessments include material from weeks or months.
  • Summative/Accountability: classifies or groups students based on the amount or level of content mastery. It is used most often for accountability purposes and is administered at the end of a course of study to determine student proficiency or achievement. Examples include comprehensive assessments administered to all students in the same grade at the end of the school year and end-of-course assessments administered at the end of instruction in specific courses, not grades.

To dispel a common misconception, “local assessment” is not a test type and is not related to a test’s purpose. Rather, local assessment refers to the selection and use of test results such as making the decision to use a test, determining which test to use, and understanding how to use the results (for example, using the XYZ test of basic vocabulary). In the context of today’s politics, local assessment may also refer to tests built locally (for example, teacher-made tests).

To build a reliable and valid test, the purpose must be defined prior to the building of the test since it will determine much of what takes place in the design, development, administration, and use of the test results.

Furthermore, building a reliable and valid test requires equal parts science and art. The science is that of solid use of the language, grammar, and readability of the items; the rules of thumb about item construction; and the various psychometric indices and decisions used in determining item and test quality. The art concerns the understanding of the content, the students, and other factors that may impact how students react and respond to the items and the test as a whole.

Regardless of the test type, constructing a reliable and valid test requires a great deal of planning, effort, skill, knowledge, ability, and teamwork by assessment specialists and psychometricians. To accomplish this, a main resource that test developers and users should be cognizant of is the Standards for Educational and Psychological Testing (AERA, APA, and NCME, 2014). These standards contain three sections divided into 13 chapters that cover everything from test design to responsible use in areas such as education and the workplace.

This set of guidelines explains how to be a good steward of educational assessment and to make good decisions about students, teachers, schools, districts, and states. For example, locally developed tests could be valid for formative testing but would most likely not be highly reliable, such as an Algebra test built by teachers from an un-calibrated and un-researched pool of items that would most likely result in varying difficulties and quality. Locally developed tests for accountability will most likely be neither valid nor reliable across schools, districts, or states without anchoring those tests to some national metric.

Remember, a reliable and valid test:

  • is only as good as the purpose for which it was intended
  • requires a great deal of effort that entails language use, readability, standardization, and psychometric soundness
  • serves a single purpose best

Knowing the purpose of the assessment and choosing the appropriate test type will help users make good use of tests and their results. Every test developer and user should ask, “what information do I want from a test and what decisions do I wish to make using that information?”

It may require more than one test to achieve all the desired outcomes and make all of the decisions a user wishes to make about a student. For example, if a teacher wants to determine where to place a student in an instructional setting, he or she would use a placement test to help make that decision. If a teacher also wants to determine the student’s achievement level after most of the instruction is completed, — such as classifying the student as proficient or deficient in the mastery of content  — a summative test would be the most appropriate test for this purpose.

A test’s purpose is a critical element in its development and use. Without defining the purpose ahead of time and choosing the appropriate test type, the results are often misused, invalidating the test. The best way to understand the critical nature of the purpose of a test may be the old axiom, “use the right tool for the job.”