At their core, the ultimate purpose of assessments is to help students grow, learn, and achieve. Palomba and Banta’s 1999 work, “Assessment Essentials,” called assessments “the systematic collection, review and use of information about educational programs undertaken for the purpose of improving student learning and development.” That being said, formative assessment — the low-stakes questions and tests given throughout a school year — are usually the types of assessments associated with helping students learn. Correctly or incorrectly, high-stakes assessments aren’t typically considered a driver of greater student achievement.

But why is that? Why do we largely disassociate high-stakes, standardized tests from the benefits of helping kids learn? It’s certainly in the charter of large-scale assessment: “[i]nformation from summative assessments,” says a report from the Carnegie Mellon Eberly Center for Teaching Excellence, “can be used formatively when students or faculty use it to guide their efforts and activities in subsequent courses.” Others agree. In a 2009 paper, Johnson and Jenkins wrote: “…summative assessment can serve both as a guide to teaching methods and to improving curriculum to better match the interests and needs of the students.”

So it’s clear that summative assessments should be used for both learning and accountability, even if they aren’t in practice routinely used in both capacities. Using large-scale assessments for accountability alone not only deprives students and teachers of the learning value and potential but also enhances the perception that large-scale summative assessments are only a drain on instructional time.

The question then becomes why large-scale assessments are currently relegated to filling such a limited role. To understand why this is, we need to go back to the early 2000s, when No Child Left Behind (NCLB) become law. NCLB emphasized testing for achievement and accountability, setting standards for grade-level performance and requiring the tracking of student and subgroup achievement.

Although NCLB drove valid changes in student testing, it also had the consequence of emphasizing accountability at times to the point where student learning was an afterthought of high-stakes assessments. Most large-scale assessment programs in the early 2000s — notably, from 2001 through about 2006 — offered test designs and structures that were purposefully tailored toward the express goal of accountability: tests were monolithic devices given at the end of the school year and comprised of fixed forms of closed-response questions that could be easily and quickly scored. Educational assessment companies provided student results to states and the federal departments of education but offered those results to districts and teachers as almost an afterthought — in crude, almost raw-data form, difficult for teachers in the classroom to help direct instruction and curriculum. Teachers were left with little to no professional development or training on how to interpret data and apply it to their classroom instruction.

Large-scale testing did evolve somewhat over subsequent years as educators attempted to use summative assessments to more directly affect student learning. Assessment providers developed new forms of testing, such as computer-adaptive tests, to better measure student performance and growth and to reduce time spent taking tests. They also added new item types, such as constructed response and early forms of performance tasks, to better gauge higher level orders of learning. Complementing these front-end changes to tests and test design were other evolutions to how student results were presented; assessment programs and technologies began presenting student results in more teacher- and parent-friendly formats. Teachers and district administrators were provided drill-down style reporting, allowing them to understand trends and data in new and easier ways. And the assessment industry began to offer services to states, districts, and teachers to help them understand how best to use the data on both broad and individualized levels.

While these changes have been a welcome improvement, they often still fall short of realizing the full potential benefits to students. Summative assessments can still take considerable time away from instruction at the end of each year; their test items often do not probe as deeply into student understanding as they could; and they often require significant effort not only to create and deploy but to interpret results in a way that draws meaningful and insightful conclusions that have the power to affect students on an individual level.

This is where we currently are today — at the tail end of the second modern generation of large-scale assessment, with every good intention to provide both accountability and individual student benefit but only a partially-fulfilled promise.

That’s where the industry is today. But it’s also the industry’s history. In Part 2, I will look at what the next-generation of large-scale assessment looks like and will provide a model to describe leading, third-generation large-scale assessment programs.