A test typically has a single specified purpose. In fact, if a test serves more than one purpose, the secondary or tertiary purposes are rarely well served. For example, the single purpose of an English language learner (ELL) assessment is to determine a student’s English language proficiency — it cannot serve diagnostic, instructional support, growth, and classification purposes all at once.

An assessment’s purpose impacts test design, item selection, and even the psychometric methods used in constructing the test. Therefore, the use of test results should always take into account the purpose for which the test was intended in order to ensure a valid use of the assessment results. It is always better to use a test designed to measure what you want to measure rather than making inferences based on a related yet indirect assessment. While a test designed to measure a student’s intelligence may also be related to how the student does on an algebra test — because the innate ability of a student (I.Q., for example) is often related to achievement — an algebra test, rather than an intelligence test, would be a much better indicator of a student’s achievement in algebra since the purpose is direct and much more refined.

What, then, is the purpose of accountability testing? Possibly the best, or at least the most succinct, definition is from Robert Linn (2010[1]):

“The primary goals of test-based educational accountability systems are (1) to increase student achievement and (2) to increase equity in performance among racial-ethnic subpopulations and between students who are poor and their more affluent peers.”

Accountability testing is not aimed at individual students with regard to diagnostic, formative, or other classroom- and instructional-based testing. Instead, accountability is aimed at systemic changes and evaluation of minority subgroups related to gender, ethnicity, and socioeconomic status. That is, a local education agency (LEA) would evaluate how well the students performed for a specific grade, content, and sub group and should they find there is need for improvement, the LEA would take appropriate measures (for example, increased instructional time or change of curricular materials) to improve that groups performance in the next academic year.

Today’s No Child Left Behind (NCLB) accountability tests are relatively short, typically 50–75 items that evaluate student achievement for approximately 150 instructional days that cover 100–125 academic standards. As a result, these accountability tests are only a snapshot of what students know and can do. The brevity of the tests in use today is based on the purpose and the constraints of testing time.

Given the purpose of understanding group differences with respect to achievement, these tests can serve well the stated purpose of accountability at the system level. However, these tests will not be useful at pinpointing a specific learning deficiency. Another test or series of tests, along with close teacher observation, would be most appropriate to evaluate and identify specific student needs. However, given the aggregation of the snapshot of all the students in a specific grade within a state, district, or school, stakeholders have a fair picture of overall student achievement, and, therefore, the test can fulfill the two goals of accountability testing.

As such, the use of the test results is to provide indicators of overall achievement levels. That is, the results can be used to identify districts or schools where achievement is high or low; where subgroup differences are large and small; or where achievement in the subject areas assessed under NCLB differ greatly. It is a jumping off point for further analysis, observation, or additional testing to identify specific challenges and opportunities.

Here’s an example using a common scenario where a similar situation exists. Many cars today have a display that indicates that car’s average gas mileage as it is driven. The driver notices that gas mileage is going down on average. That result is an indicator that something is not quite right. To better understand what might be going on with the car, a trip to the local garage or mechanic is in order. Once there, the computer is accessed and the various tests and data are analyzed to evaluate what specifically should be done to return the car to its former efficiency level. The remedy might be new spark plugs, a new fuel filter, or some other adjustment to return the car to its optimal efficiency.

The same thing should happen with accountability test results. If the aggregated snapshots of student achievement indicate that a group of students is lagging behind, further investigation is required and eventually a remedy identified and put into place to improve the learning of that group of students.

The old engineering axiom “form follows function” holds true for testing as well. We could design a testing program that would meet a number of different purposes, but it would not be a single test. The Partnership for Assessment and Readiness for College and Careers (PARCC) testing program attempted to do more than the previous NCLB tests were purposed to do. That is, PARCC designed a system whereby periodic testing would be part of the system along with two separate components for the accountability portion. This consisted of a section of technology enhanced and constructed response items that attempted to assess high order thinking skills and a section of mostly selected-response items that attempted to measure overall student achievement. As a result, the testing time for the PARCC test was significantly more than the previous tests, and a backlash ensued.

Accountability testing has a specific purpose. It is somewhat misguided to think that the accountability tests currently in place or being developed can serve all the purposes of the many stakeholders in the educational environment, described in more detail in part three of this blog series. Accountability testing serves as a reliable and hopefully valid indicator of overall student achievement and learning. These tests provide indices for further analyses, observation, and testing to determine the specific goals and remedies for each student or group of students.

[1] Linn, Robert L. (2010). Test-based accountability. The Gordon Commissions: University of Colorado at Boulder Center for Research on Evaluation, Standards and Student Testing.