Starting around 2005 and up until a few years ago, education policy discourse and policymaking was dominated by the issue of improving “teacher quality.” We don’t really hear too much about it the past couple of years, or at least not nearly as much. One of the major reasons why is that the vast majority of states have enacted policies ostensibly designed to improve teacher quality.
Thanks in no small part to the Race to the Top grant program, and the subsequent ESEA waiver program, virtually all states reformed their teacher evaluation systems, the “flagship” policy of the teacher quality push. Many of these states also tied their new evaluation results to high stakes personnel decisions, such as granting tenure, dismissals, layoffs, and compensation. Predictably, the details of these new systems vary quite a bit, both within and between states. Many advocates are unsatisfied with how the new policies were designed, and one could write a book on all the different issues. Yet it would be tough to deny that this national policy effort was among the fastest shifts in recent educational history, particularly given the controversy surrounding it.
So, what happened to all the attention to teacher quality? It was put into practice. The evidence on its effects is already emerging, but this will take a while, and so it is still a quiet time in teacher quality land, at least compared to the previous 5-7 years. Even so, there are already many lessons out there, too many for a post. Looking back, though, one big picture lesson – and definitely not a new one – is about how the evaluation reform effort stands out (in a very competitive field) for the degree to which it was driven by the promise of immediate, large results.
In short, as we all remember, the promise was that new evaluations, particularly those tied in part to test-based teacher productivity measures, and subsequently linked to personnel decisions (e.g., pay, tenure, dismissal) would generate quick (and large) improvements in testing outcomes for students. These quick and large improvements have not materialized.
Now, let’s be clear about two things. First, there is near consensus that previous evaluation systems were poorly designed and implemented, and (I dare say) wide agreement that improving the recruitment, training, and retention of teachers is at least one of the most effective ways for education policy to generate improvements in student outcomes, even if there is massive disagreement about how to do this. Second, saying that the miracle improvements didn’t occur is not much of a lesson, since anyone with even superficial knowledge of education policy did not expect massive short-term gains, and relatively few promised them explicitly.
Yet there was more than a little bit of implicit promising that the teacher quality reform endeavor would have such dramatic effects. Teacher evaluations, particularly those tied to test-based productivity measures, were pushed forward by a phalanx of out-of-context talking points about the difference between the “top” and “bottom” teachers, and the impact of firing the latter. These talking points were grounded in important and policy-relevant research, and should have been part of the discussion. But one big reason they were so ubiquitous is less because they imply the potential for evaluation reform to improve student performance (a perfectly defensible, even compelling argument), and more because they promise rapid and dramatic improvement.
Now, to reiterate, there are legitimate debates to be had here about the design of the new evaluation systems, whether and how they are tied to personnel decisions, etc. Yet even the most perfectly designed systems weren’t going to produce huge short-term gains.
We all know this, but it still bears repetition: Real improvement, particularly at any aggregate level, is almost always slow and sustained, even multi-generational.
Yet, when you’re trying to get policy changes done, promising immediate results is hardly unusual. It is quite standard in education (and many other policy fields). Maybe it’s inevitable. There is every incentive to dangle quick results in front of decision makers, because doing so gives you a better shot of advancing your policy preferences. Moreover, it is unusual for the danglers to be held accountable afterward.
But the fact remains that it can be destructive.
Expectations of large, short-term gains make it more likely that effective policies might be abandoned if they do not work quickly enough, or that effective policies will not be enacted if they do not promise short-term gains. Conversely, ineffective policies might be lauded or even replicated for short-term gains that are transitory or not “real” (e.g., due to error or test inflation). And a short-term gain orientation provides a disincentive to invest in high quality policy evaluation, which takes time. I could go on (for example, I could argue that the short-term gain mindset had some ill effects on the policymaking process).
Hopefully, the debate about the effects of teacher evaluation reform will rely not on raw aggregate test score/rate changes (which are not appropriate for policy evaluation), but rather on high quality studies designed to isolate causal effects (imperfectly, of course). Even in that case, however, caution is warranted. There is a tendency for people to latch on to the first one or two studies that confirm their predictions, and draw immediate conclusions.
To be clear, these early studies are extremely useful for assessing the initial impact of evaluation reform in specific contexts on specific outcomes (e.g., test results, teacher turnover, etc.). But they cannot by themselves address the most important questions, which are: 1) whether the teacher evaluation reform effort produced persistent improvements in teacher quality (and, thus, student outcomes) in a variety of different contexts; and 2) how it did so.
The latter question, though it gets relatively little attention, is in many respects even more important than the first. The impact of any large scale policy shift is not uniform. This is always the case, but it is particularly salient in the context of teacher evaluation reform, as the new systems’ designs and implementations were quite different between and even within states (other crucial factors, such as labor supply, also vary between locations). In addition, a huge part of the “improving teacher quality” enterprise hinges on the possibility of attracting new and more effective people into the profession, and retaining effective incumbents. These are not short-term processes. They take time (as does good research).
In reality, even over many years, it is unlikely that researchers will provide a definitive yes/no answer about whether evaluation reform “worked,” even by the standard of testing results. It is not necessarily a binary situation. There will be findings of modest successes in some places, and no effect or maybe even a negative effect elsewhere. Both types of findings are most fruitfully viewed as an opportunity to piece together slowly what worked and what didn’t, and why. That is an annoying, frustrating process. It won’t produce strong political talking points. And it will take years. But the sooner everyone starts treating improvement as a long game, the sooner we will start playing that game.