A performance assessment is a test in which a student performs some number of tasks to show his or her knowledge, skills, and abilities in a particular area, such as conducting a science experiment. That is, a student must show how to solve a problem using what he or she knows about the assessment prompt. The best performance assessments are authentic, which is when the task is realistic and is considered something that would be done in the real world, such as applying sales tax in balancing a budget. Performance assessments can be used for both formative and summative purposes, but we’ll focus on summative assessments here.
Like any assessment type, performance assessments have pros and cons. The following pros and cons may be open to interpretation but provide good examples of the advantages and disadvantages of a performance assessment.
• Pro: Educators like this type of assessment because they can see a student’s problem-solving process, which helps them to understand what the student can accomplish based on what he or she has learned.
• Con: Systematically administering and scoring the responses can be unwieldy, both within the classroom and through large-scale assessment. The same item is not appropriate for all students. Students will use completely different, albeit correct, methods in coming to a solution, requiring both a strong knowledge of the content assessed and a strong understanding of proper scoring techniques.
• Pro: A teacher administers the item to the student once he or she determines that the student is ready to apply the skills being assessed, so a classroom of students may not necessarily take the performance assessment at the same time. Not all students are ready at the same time, so it is beneficial to be able to assess students on an individual basis to ensure that each student is ready.
• Con: Because there is no set test administration window, testing windows can be 12 months long in some instances, or at least as long as a school year. This is a drawback from a reporting standpoint.
Questar’s goal, and the goal of most assessment vendors, is to create items in which the pros outweigh the cons for all students. Items must have fair content, be accessible for all students, and be extensively reviewed and approved by subject matter experts. Performance assessments are rich sources of information if designed correctly. We must ensure that we follow Universal Design principles when developing the items so that all students can access the content. We must also ensure that that extended windows are immaterial, meaning the items should assess all students’ problem-solving ability fairly and accurately regardless of when the items are administered throughout the year. We must be able to score student responses so that the burden on the educator is minimized. Finally, and most importantly for the psychometricians, we must make sure the data received from the items can be analyzed using either traditional measurement models (e.g., the Rasch model) or new and more innovative models (e.g., the multidimensional item response theory (MIRT) model).
In the next few posts, we will discuss how to create effective and valid performance assessment items in more detail. We will consider questions such as, “What is the educator trying to learn about the students by including performance assessment items?”
The answer to this question changes by item and by point in time, which means the purpose of the performance assessment must be clearly defined in order to ensure validity. Validity refers to the extent to which a test measures what it is intended to measure, so the purpose of any assessment must be well defined because knowing this will guide your decisions in the rest of the “choose your own adventure” decision matrix. In the “Choose Your Own Adventure” book series, readers end up with different endings depending on which options they choose for the main character throughout the book. Similarly, the decisions made using assessment results will affect the outcome and meaning of the results, so having a very clear purpose, or intended outcome, for the assessment will help you when making these decisions.
We will also discuss creating strong items that can be applied to all students, as well as scoring these items. Finally, we will discuss what must be considered when analyzing performance items on a large scale.
– Canda D. Mueller, Ph.D.
Vice President of Assessment Design & Psychometrics