A new working paper, published by the National Bureau of Economic Research, is the first high quality assessment of one of the new teacher evaluation systems sweeping across the nation. The study, by Thomas Dee and James Wyckoff, both highly respected economists, focuses on the first three years of IMPACT, the evaluation system put into place in the District of Columbia Public Schools in 2009.
Under IMPACT, each teacher receives a point total based on a combination of test-based and non-test-based measures (the formula varies between teachers who are and are not in tested grades/subjects). These point totals are then sorted into one of four categories – highly effective, effective, minimally effective and ineffective. Teachers who receive a highly effective (HE) rating are eligible for salary increases, whereas teachers rated ineffective are dismissed immediately and those receiving minimally effective (ME) for two consecutive years can also be terminated. The design of this study exploits that incentive structure by, put very simply, comparing the teachers who were directly above the ME and HE thresholds to those who were directly below them, and to see whether they differed in terms of retention and performance from those who were not. The basic idea is that these teachers are all very similar in terms of their measured performance, so any differences in outcomes can be (cautiously) attributed to the system’s incentives.
The short answer is that there were meaningful differences.
Teachers with scores just below the ME cutoff were substantially more likely to quit, while those that remained exhibited modest but meaningful relative performance gains the following year (equivalent to about five percentile points). Similarly, teachers who earned an HE rating, and were therefore eligible for a large, permanent salary increase if they did so again, also seemed to improve their performance (though the retention “effects” were not statistically significant).
This is very promising, albeit tentative evidence that teachers responded to the incentives embedded in the IMPACT system. There are a few additional points that bear brief mention.
This is not an overall assessment of IMPACT. This is an excellent paper by talented, capable researchers. But there are several (mostly standard) caveats, all of which are discussed thoroughly by Dee and Wyckoff. The most important of these is that the estimated effects pertain only to the group of teachers who were near the ME and HE thresholds. Whether IMPACT influenced the performance and retention of all other teachers is not addressed directly by this study. Of course, this is not to say that the findings are unimportant – after all, teachers at the low and high ends of the spectrum that are of particular policy significance. Nevertheless, it is important to avoid drawing sweeping conclusions about the “average effect” of this system.
The fact that teachers seem to have responded conflicts with some of the prior research on incentives… It is well known that “traditional” merit pay systems – essentially, paying teachers cash for improving test scores – has a poor track record in the U.S. Several recent studies, including experimental evaluations, have found little no effect of these programs on performance or retention. A few possible factors, including the dismissal threat and/or the fact that IMPACT is a multi-measure evaluation system, might help explain the discrepancy (the former possibility is particularly compelling given the stronger results for teachers near the ME compared with the HE threshold).
… but it is not particularly surprising. It is not really shocking that the the promise of a huge, permanent raise or especially the threat of dismissal would influence teachers’ labor market choices and behavior. Of course, when it comes to the estimated performance effects, it is impossible to know what kinds of behavioral changes actually led to the improvement, but the study’s results suggest it wasn’t manipulation (e.g., principal favoritism), as the improvements were not concentrated in just one of IMPACT’s components (though this is somewhat less true among teachers near the HE cutoff). Moreover, it is tough to say whether teacher labor markets in other cities (or, perhaps, in D.C. over the long term) could withstand the cycle of forced and voluntary attrition that stems from systems like IMPACT, or whether and how this kind of system might influence the type of people who pursue a teaching career. In any case, it will be important, going forward, to see whether these findings are confirmed by strong research on other systems.
The estimated effects were mostly concentrated in the second year. It seems that the performance and retention effects didn’t really show up after the first year of IMPACT. Dee and Wyckoff speculate that this may be due to the system having gained “credibility” after its first year. There’s something to this, especially since the second year was the first in which ME teachers might be dismissed (as they had to receive that rating twice in a row). Nevertheless, replicating this study using additional years of data would seem to be an important next step (especially given that the design of IMPACT was changed this year).
What can we conclude from this study? This is just the first round in what will hopefully be a large body of strong evidence on the effects of these new evaluation systems, and the incentives attached to them. The fair (albeit tentative) conclusion from this particular analysis is that IMPACT seems to be having the intended effect of changing behavior, at least in the second year of the data, and among teachers close to the thresholds. That is promising and should not be diminished. On the other hand, there are still many open questions here, and it’s a good idea to keep a level head going forward.
- Matt Di Carlo