A Few Additional Points About The IMPACT Study

The recently released study of IMPACT, the teacher evaluation system in the District of Columbia Public Schools (DCPS), has garnered a great deal of attention over the past couple of months (see our post here).

Much of the commentary from the system’s opponents was predictably (and unfairly) dismissive, but I’d like to quickly discuss the reaction from supporters. Some took the opportunity to make grand proclamations about how “IMPACT is working," and there was a lot of back and forth about the need to ensure that various states’ evaluations are as “rigorous” as IMPACT (as well as skepticism as to whether this is the case).

The claim that this study shows that “IMPACT is working” is somewhat misleading, and the idea that states should now rush to replicate IMPACT is misguided. It also misses the important points about the study and what we can learn from its results.

First, to reiterate from our first post about the study, the analysis focuses solely on the teachers who are near the minimally effective (ME) and highly effective (HE) cutoff points. It is not an “overall” assessment of the system, as there is no way to know how teachers who are not close to these thresholds (i.e., the vast majority of teachers) are responding to the system. And improvement among all teachers is an extremely important outcome (as is how the system might affect the teacher labor supply).

(Side note: Given the very weak effects in the first year, I would really like to see this study replicated using 2-3 more years of data before drawing any strong conclusions.)

Moreover, this study does not really speak to the “quality” or “rigor” of the IMPACT scores and ratings. In a sense, it actually assumes that issue away insofar as teacher improvement is gauged in terms of IMPACT scores the following year. Thus, the “improvement” is only real to the degree that IMPACT scores are actually valid measures. Similarly, the idea that the system helps in “getting rid of the worst teachers” depends on the underlying assumption that these “worst teachers” were identified accurately. In reality, these are all somewhat open questions.

Finally, and perhaps most importantly, the whole idea that we should spread IMPACT to the states – and/or that various state systems are not as “rigorous” as IMPACT – is a little odd. This is not only because, of course, policy effects may vary wildly by location/context, but also because this study focuses entirely on incentives and disincentives, with most of the effects occurring at the low end of the distribution (the potential for ME teachers to be dismissed) – a feature that is already central to most states’ new evaluation plans.

Actually, if there’s anything that sets IMPACT apart in terms of design, it is the fact that teachers can be dismissed based on just one single year’s rating (i.e., one rating as ineffective). This feature is not examined directly by this study, but, if anything, it seems open to questioning in light of the findings on teacher improvement. In other words, teachers who are dismissed for receiving ineffective ratings might have actually improved given the chance – we’ll never know (there was never even a pilot year before IMPACT went into force).*

Instead, once again, the best way to view these findings is that they focus on whether teachers responded to the incentives embedded in the IMPACT system. In other words, teachers were responding to the firing/bonus incentives by working to raise their IMPACT scores and, in the case of low-scoring teachers, voluntarily leaving the district. This is important, especially given the poor track record of teacher incentives in the U.S.

It is also important because – and this may sound obvious, but it really does matter – it suggests that many teachers were actually able to affect their scores. Now, the immediate suspicion among some folks might be that there was gaming occurring (e.g., principals giving higher observation scores to teachers in danger), but Dee and Wyckoff check out that possibility to the degree they can, and the findings, though not conclusive, do not provide any particularly strong suggestion of manipulation, at least among ME teachers (there was more cause for potential concern among HE teachers).

An alternative, more positive explanation is that DCPS officials rolled out IMPACT in such a manner that teachers were well-informed about the system’s details, received clear feedback on how to raise their scores, and the scores were sensitive enough to pick up on those efforts. It may also be the case that the support received by ME teachers (e.g., instructional coaches) worked well.

So, overall, although there is good reason for optimism among supporters of new teacher evaluation systems, overzealousness is not supported.

- Matt Di Carlo


* In addition, it may be the case that teachers exhibit similarly productive behavioral changes under other, rather different systems in other states and districts. For instance, a previous study in Cincinnati found that teachers improved as a result of evaluations that consisted entirely of observations.