In August 2015, I attended a National Center for Research on Evaluation, Standards, and Student Testing (CRESST) conference on gaming, simulations, and assessment. However, this is not necessarily as new a topic as one might think. In 1998, Dr. Joy McLarty and I presented at a conference in Chicago on simulations for assessing workplace readiness. In 2001, Dr. Randy Bennet from Educational Testing Service (ETS) and I presented on games and simulation at a conference sponsored by the Bill and Melinda Gates Foundation. A number of conferences have dealt with this concept since then, and hundreds of researchers have presented and published on this topic. The question now is, “Why aren’t we using games and simulations to assess students?”

Some reasons include technological changes and improvements, difficulty and expense to produce games and simulations, and measurement challenges to analyze the data from such activities. However, the list goes on and on. Those who have been involved in developing and deploying technology-enhanced items have already seen some of the challenges in using modern technology in accountability testing. Assessment systems cannot fail because the stakes are too high, and when they do they make both local and national headlines. Therefore, any major change to assessments should be handled with great care.

Assessment progress has often been slow. For example, when I was a student in the 1950s it was commonplace to put all the students in the cafeteria to take a standardized test such as the Kuder-Richardson or the Iowa Test of Basic Skill. In those early days, we did not use pencils but rather marked our answers using a pin with a T-shaped handle to put a hole in the answer sheet. I don’t have to tell you what young boys were like with those pins in a confined space. The pins were replaced the no. 2 pencil pretty quickly. Over the course of 30 to 40 years, the testing industry perfected the mode of paper/pencil testing and optical and image scanning of test documents.

Similarly, it will take time to perfect online administrations that use the increasing capabilities of our digital world. We may not, however, have 30 years to get it right. The expectations for high-stakes accountability testing have never been greater and will only increase.

From a measurement viewpoint, how do we analyze the enormous amount of data contained in a student’s data stream in a simulation or game? Our current measurement models make assumptions that the test is uni-dimensional, the items are independent of each other, and the data is sample independent. Multidimensional item response theory (MIRT) is a new development, but it isn’t used operationally quite yet and the number of dimensions is limited. Therefore, as we move to more complex models such as simulations or games, we will need new methods for evaluating effectiveness and scoring tests containing these new methods. The psychometric challenges are great, but they can be overcome. In the simplest method, the data stream could be broken down into component parts that are uni-dimensional and then analyzed as a series of independent events. This would require a very large number of computations and the development of an algorithm to put the parts back together in order to make inferences about the whole simulation or game section. Similarly, MIRT might be applied in a similar fashion limiting the number of dimensions put into each set of analyses.

Add to this the validity question of assessing students in ways in which they are not yet being taught. Think about providing a graphing calculator for an algebra test to a student who has never used one. It creates an invalid situation for that student. Or, put another way, it disadvantages students who have not used — or have not been taught to use — graphing calculators.

With all this in mind, one should look at the games and simulations at PBS Kids. Although aimed at younger children, the group produces some very interesting pieces. In addition, groups like Glass Labs are building simulations, and games such as SimCity Edu attempt to define educational outcomes that result from playing them. This leads to the need to carefully plan games and simulations with respect to the educational outcomes. However, this development effort will require far more time and expertise to develop than the simple items we currently work on today.

The bottom-line is that there is a great deal of work to be done before we use simulations and games in high-stakes accountability testing. It will require the inclusion of simulations and games in classroom instruction; the meticulous documentation of learning outcomes to be assessed by games and simulations; new methods of capturing and analyzing student data streams; and new ways of reporting student results that use the enormous amounts of data contained in these types of assessments. Yet we should continue to work on this concept as it holds great promise for assessing students in a more compelling environment and hopefully in a more accurate manner as well.