2014/2015 QMeHSS Workshop (5/29/2015)

Dear Colleagues,

We hope that you all are enjoying the extended holiday weekend. The next Workshop on Quantitative Methods in Education, Health and the Social Sciences (QMeHSS) is this Friday, May 29th, from 11:00-12:30pm. The workshop will be led by Dr. Robert Mislevy from Educational Testing Service (ETS) and will be held in NORC conference room 344. NORC is located at 1155 E. 60th Street. We look forward to seeing you there for another exciting discussion.


How Advances in Psychology and Technology Challenge Educational Measurement: Statistics Ain’t the Half of It!

Robert J. Mislevy

Frederic M. Lord Chair in Measurement and Statistics

Educational Testing Service


The vast majority of educational assessments over the past century have been conducted under what might be called the standard educational measurement paradigm (SEMP): 

The goal is measuring a “construct.”  The construct is framed in trait or behaviorist psychology perspective.  A single measure is desired.  Self-contained and independent tasks (usually items) evoke behavior believed to provide evidence about the construct.  An examinee produces a response, which provides an item score. A test score accumulates item scores, as a sum or as an estimate through a model such as item response theory (IRT).

At the Fiftieth Anniversary Celebration of the Psychometric Society Charles Lewis (1986) noted that “Much of the recent progress in test theory has been made by treating the study of the relationship between responses to a set of test items and a hypothesized trait (or traits) of an individual as a problem of statistical inference.”  Much of this work addressed inference framed within the SEMP.

Both psychology, from which we draw concepts for what we want to assess, what evidence might look like, and how we recognize and synthesize it, and technology, with which we craft situations for examinees to act in, capture aspects of their performance, and identify and synthesize evidence, have far outstripped the SEMP and the test theory associated with it.  Cases in point include game- and simulation-based assessments with continuous activity and massive data streams, and assessments of communicative and collaborative capabilities.

I offer the following propositions that point the way to a suitable statistical methodology:

  • Measurement methodology under the SEMP embodies, in forms that evolved to sparse SEMP data, principles of evidentiary reasoning and social values that remain important.
  • These principles can be make explicit, and expressed in forms that guide the design and analysis of assessments beyond the SEMP.
  • Progress requires not only statistical advances but on understanding of design principles for assessment that are at once more abstract, deeper, and capable of finer detail.
  • Regarding language, the words in bold in the definition of the SEMP are not adequate to address the design and analysis challenge we face.
  • Regarding psychology, design and analysis must draw on concepts and methods from situative, sociocognitive, and information-processing psychology.  (“Good practice” under the SEMP actually does this implicitly, within its aegis.)
  • Regarding statistical modeling, we can build on experience from both SEMP and less common forms of assessment; from statistics more broadly construed and evidentiary-reasoning scholarship; and from data-exploration methodologies.
  • Assessment design must jointly address, from the beginning, the assessment purpose(s), argument construction, task design (broadly conceived), identification of evidence, and analytic framework.  (It is not sufficient to “design a great assessment, capture rich data, and throw it over the wall to psychometricians to figure out “how to score it.”)