What is a good test item? What are the characteristics of a test item? There are many approaches to answering these questions and a good basic approach is to look at an item trace. An item trace plots the probability of correctly answering an item at each scale score value where the scale score it the best available measure of the student’s overall ability.
In this graphic are 4 Algebra test items, each taken by over 50,000 high school students. For a low Total Test Scale Score the probability of correctly answering is low, indeed zero for Items 1 and 4 which are 1-point Constructed Response (CR) items scored by experts and therefore have no guessing potential. Items 2 and 3 are the basic 4-option Multiple Choice (MC) items and you can see that for very low scores they show probabilities of correct answers in the 0.2 to 0.3 range. Indeed at total scale score of 50 and less they average about 0.25 probability of a correct answer which is about what to expect on a 4-point MC item if guessing was the answering strategy! On the high side of the scale scores the probabilities of correctly answering asymptote or level off at 1, certain to get the item correct. If an item was too difficult for the population of examinees it would not reach 1.0 for the highest total test scale score.
Some things to note about these curves are first they are not straight and they slope up at different points on the total test scale score. They follow a pattern well described by Item Response Theory which an advanced Psychometric topic to be covered another time. They are clearly not straight lines, few items traces are. If you assume that just by taking a bunch of items and adding up the number correct for a student gives you an informative scale score they you are implicitly assuming they are all straight lines like the dashed line in the graph (Linear Assumption).
Consider that items differ in their difficulty and measurement precision at different points on the ability scale as denoted in the graphic as the total test scale score. Items provide the most information on a student’s ability where the slope of the trace is steepest and that also defines its difficulty. Consider Item 1 again, its steepest slope is near the total scale score of 65. It differentiates best abilities below and above a scale score of 65. Item 3 differentiates best at about 57, therefore item 1 is more difficult than item 3.
There are many other things that an item trace can tell us about how a test item functions. Those things interest Psychometricians who model these curves with equations and statistics to help make tests comparable across time and different test forms. However, even a simple graphic tool like an item trace is useful for learning and evaluating items for a test.