So Many Purposes, So Few Tests
In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.
In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some "pilot" sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it's tough to make meaningful comparisons.
But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test," in which administrators and educators use questions from prior assessments to guide their instruction for the current year.
This kind of behavior not only potentially corrupts any use of the results to measure teacher/school performance, but it may also serve to compromise the validity of the assessment as a gauge of student performance – if the students are being “coached” in this fashion, increases in their measured performance may not reflect “true” increases in their knowledge of the subject matter.*
To address this conundrum, Neal recommends using two different assessments – one for measuring student performance and one for measuring teacher performance. He argues that the latter type of test need not meet the requirement of being equated across years or groups of students – e.g., by repeating certain questions or formats in multiple years. As a result, there would be far less opportunity for coaching or other forms of score inflation.
I am not qualified to assess the technical merit of Neal’s "two-test" proposal, but the motivation for presenting it – the inappropriateness of using a single assessment both to measure student performance and to gauge teacher/school effectiveness in high-stakes accountability systems – is a remarkably important point, one that is too often casually dismissed.
Whether we like it or not, the way we are using these tests may compromise their utility, and if we’re going to continue down this road of increasing reliance on their results, it’s a shortcoming that will inevitably have to be addressed.
- Matt Di Carlo
* This is also why the introduction of new tests often results in a sharp drop in scores during the first year, followed by an increase in subsequent years, as educators become familiar with the new content and format.