The 5-10 Percent Solution

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post.

In the world of education policy, the following assertion has become ubiquitous: If we just fire the bottom 5-10 percent of teachers, our test scores will be at the level of the highest-performing nations, such as Finland. Michelle Rhee likes to make this claim. So does Bill Gates.

The source and sole support for this claim is a calculation by economist Eric Hanushek, which he sketches out roughly in a chapter of the edited volume Creating a New Teaching Profession (published by the Urban Institute). The chapter is called "Teacher Deselection" (“deselection” is a polite way of saying “firing”). Hanushek is a respected economist, who has been researching education for over 30 years. He is willing to say some of the things that many other market-based reformers also believe, and say privately, but won’t always admit to in public.

So, would systematically firing large proportions of teachers every year based solely on their students’ test scores improve overall scores over time? Of course it would, at least to some degree. When you repeatedly select (or, in this case, deselect) on a measurable variable, even when the measurement is imperfect, you can usually change that outcome overall.

But anyone who says that firing the bottom 5-10 percent of teachers is all we have to do to boost our scores to Finland-like levels is selling magic beans—and not only because of cross-national poverty differences or the inherent limitations of most tests as valid measures of student learning (we’ll put these very real concerns aside for this post).

Before addressing the argument directly, it bears noting that this policy, even if it went down perfectly, would not be a quick fix. The simulation does not entail a one-time layoff. We would have to fire the “bottom” 5-10 percent of teachers permanently. Then, according to the calculation—and if everything went as planned—it would take around 10 years for U.S. test scores to rise to level of the world’s higher-performing nations.

It also seems improbable that we could ever legislate, design, and carry out such a policy on a large, nationwide scale, even if it had widespread support (which it doesn’t). Yet that’s what would be needed to produce the promised benefits (again, assuming everything went perfectly).

But what if we could do it? Would it work? As I said, there would almost certainly be some increase in overall test scores, at least in the short-term (whether or not that would signal proportional true improvement is a different matter entirely). But would the gains be large and sustained? It's always difficult to project the impact of an untried, drastic intervention like this, but I would argue probably not. In fact, there is a risk that this type of policy would end up hurting overall education performance in the long run, especially in higher-poverty, hard-to-staff schools and districts.

The presumed benefits of this proposal rely on several shaky assumptions, some of which would, if violated, carry negative consequences. One assumption, which I have discussed before, is that the replacement teachers will be of sufficient quality (on the whole) to produce at least average student test score gains. Hanushek’s calculation assumes that the replacements will do so (though, among other things, it’s unclear whether he uses the average gains for a first-year teacher, which are lower).

Currently, around 8-9 percent of teachers leave the profession every year, and this will probably increase as baby boomers retire. Maintaining the deselection might place substantial strain on the labor pool (of course, there would be some overlap – teachers who would be fired under the proposal would have left anyway).

In particular, high-poverty and other hard-to-staff schools—which already have problems finding good new teachers—would have to replace even more teachers every year, while choosing from an ever-narrowing applicant pool (it seems that much of California is in trouble right now). The assumption that the quality of replacements would remain stable is rather unsafe, and the calculation hinges on it.

Moreover, you can bet that many teachers, faced with the annual possibility of being fired based on test scores alone, would be even more likely to switch to higher-performing, lower-poverty schools (and/or schools that didn’t have the layoff policy). This would create additional, disruptive churn, as well as exacerbate the shortage of highly-qualified teachers in poorer schools and districts.

When all is said, it’s conceivable that, taking the firings, attrition, and switching into account, the total annual mobility rate for all teachers could approach 25 percent, and it would be much higher in poorer school and districts (making these students bear a disproportionate burden for this unintended consequence). It’s hard to imagine a public education system that could function effectively under those circumstances, let alone thrive.

Remember also that a widespread test-based firing policy would almost certainly change the “type” of person who chooses to pursue teaching (or, for that matter, chooses to remain). I find it hard to believe that any top-notch applicant would be attracted to a low-paying profession because of a systematic layoff policy (see here for an alternative view). There’s no way to know, but my guess is that the opposite is true. If so, the policy’s projected benefits would be further mitigated.

The simulation also assumes that all the dismissed teachers would leave the profession permanently. Again, this seems highly unlikely, especially if replacements are in short supply. Rather, I would speculate that a significant proportion of dismissed teachers would get jobs in other districts. In doing so, they would seriously dilute the policy’s effects, while also creating needless turnover for schools.

Then there is the issue of error. Due to the well-known imprecision of value-added models, and the year-to-year fluctuation of teacher effects, many replacement teachers would be no better or worse than the fired teachers would have been (error will be particularly high among newer teachers, due to small samples). There is something unethical about firing people based solely on measures that may be wrong due to nothing more than random statistical error, yet these mistakes would have to be tolerated, as collateral damage, in the name of productivity. But, if the replacement pool runs dry, there would also be practical consequences: we will have fired many solid teachers, whom we might have identified as such with more nuanced measures.

Finally, on a similar note, the quality of teachers who constitute the "bottom" 5-10 percent varies by location, and by poverty level (though not drastically). Imposing a widespread dismissal system would therefore result in the deselection of many teachers who would have done quite well in a different school or district. Firing these teachers solely to meet a quota is a harmful practice (again - especially if there are shortages).

In short, this proposal would be slow, risky, unfair, and it would require us to deliberately engineer test score gains for their own sake—in the most brutal manner possible. It would also be, I argue, unlikely to work, not to anywhere near the advertised degree.

Is this really our best option?

Hanushek doesn’t think so. Talking about the systematic firings, he notes, “In the long run, it would probably be superior…to develop systems that upgrade the overall effectiveness of teachers." He points out, however, that these efforts have not been successful in the past. But have we really tried?

Instead of trying to fire our way to the high performance of Finland or anywhere else, why not try to emulate the policies that these nations actually employ? It seems very strange to shoot for the achievement levels of these nations by doing the exact opposite of what they do.

In any case, Gates, Rhee, et al. constantly repeat the “fire 5-10 percent” talking point, along with the promise of miracle results, because of its potent political message: all we have to do is fire bad teachers, and everything will be fixed. They use Hanushek’s calculation to provide an empirical basis for this message. They do not, however, seem at all attuned to the fact that the proposal is less an actual policy recommendation than a stylistic illustration of the wide variation in teacher effects.

Let’s stick with meaningful conversations about how to identify, improve, and, failing that, remove ineffective teachers. Test-based measures may have a role in the evaluation of both teachers and overall school performance, but not a dominant one, and certainly not an exclusive one.

Systematically firing large numbers of teachers based solely on test scores is an incredibly crude, blunt instrument, fraught with risk. We’re better than that.

Blog Topics

But see the recent paper by Figlio/Sass and others showing that teacher value-added scores in Florida and NC aren't that different in high-poverty schools. If the teachers to whom you refer don't know the difference between a simplistic look at test score levels and a value-added system that takes poverty into account, perhaps it would be helpful if people who do know the difference didn't muddy the waters.

Stuart - thank you for the comment, as always.

I am very much aware of that paper. In fact, it is cited in this post, along with the very point that you make: that VA scores are lower in high-poverty schools, but not drastically so. I suppose I might have repeated it, or moved it to the section you quote, but I don’t think I muddied the waters.

Anyway, you're correct that teachers who think they're *guaranteed* to get better VA scores in lower-poverty schools are misinformed, but your claim that "a value-added system takes poverty into account" is true only to a degree (as you know). There are unobserved advantages to working in a lower-poverty school that the models don’t capture (e.g., peer effects, school environment), even if they don’t translate into huge aggregate differences. Consider also that, as Sass et al. find, the returns to experience seem to be stronger in lower-poverty schools.

Finally, I would bet that many movers would be motivated by working conditions, rather than job security (I might have made this more clear). For example, teachers might move to more affluent schools to avoid the exacerbated turnover problem that many high-poverty schools would likely face under this policy.

Thanks again. Please keep reading and commenting.

Thanks for these clarifying remarks. I hadn't seen that you linked to that paper.

I do think that if there's a problem of teachers migrating away from poorer schools under any regime that takes value-added into account, that migration will be mainly the result of teachers failing to understand what value-added really means. Even if researchers can't fully take everything into account, they can select the basis of comparison: to other teachers in the same school or to teachers across a district. It seems intuitive to me that if teachers in a poor school are being compared only to other teachers within the same school, it is much harder to make the excuse that poor value-added scores are due to anything unobservable about the school -- the other teachers to whom you're being compared suffer from the same school-wide obstacles. (Other objections to value-added remain, of course, such as a small n for any given teacher.)

Thanks for the detailed and thoughtful analysis of an idea that usually receives only a superficial glance. I also agree with you, and I think polls support the idea, that work conditions are the #1 motivating factor in staying at a school or leaving.

Stuart, I must take issue with your comment about teacher comparisons within schools. There is a considerable obstacle in the fact that teacher-student assignments are not random. Some VAM research found "false positives" - correlation between fifth-grade teachers and fourth-graders test scores. I don't think I've ever heard of a school engaging in any truly random assignment unless it was for a study! And as an elementary school parent, I wouldn't want random assignment - I prefer thoughtful assignment. Then as a secondary school teacher, I can tell you that there are HUGE variables among sections of the same course, with students drawn from the same pool. The pushes and pulls on a high school schedule ensure that certain clusters will form and move through their day together. If your class meets at the same time as certain honors or remedial classes, you'll have the contrasting group disproportionately represented in your room. If you teach one high-needs special education student with an instructional aide, there's an extra adult in the room and usually a positive effect. Same student, different time of day, no aide, and the class is harder to teach. I could go on and on.

David,

You’re correct: There’s a pretty solid body of evidence showing that non-economic working conditions – rather than salary or job security – are the primary factor driving mobility decisions.

The characteristics of students seem particularly important. For example, see:
http://edpro.stanford.edu/hanushek/admin/pages/files/uploads/Hanushek+K…

But salary does matter too:
http://faculty.smu.edu/millimet/classes/eco7321/papers/clotfelter%20et%…

Here’s a good review of the retention literature:
http://www.aera.net/uploadedFiles/Publications/Journals/Review_of_Educa…

Thanks for the comment.

Just to be clear, since there appears to be some confusion, nothing in these calculations or in the accompanying article says anything about test-based decision making or firing. Value-added measures do provide information, but nobody advocates making decisions solely on the basis of such scores.

What the article says is that the bottom teachers are harming kids and that we need to find a way to do something about that. The best would be to transform these teachers -- through coaching, professional development, or what have you -- into better teachers. Unfortunately, we have been unable to find a way to do that systematically and consistently.

The continual citation of Finland does not help either. What the Finish have learned is how to make sure that an ineffective teacher does not remain in the classroom for very long. This is something we have to learn in the U.S.

I also do not understand why the vast majority of hardworking and able teachers are willing to be lumped together with the small number of truly ineffective teachers. It surely is not any confusion about who the ineffective teachers are. Parents, other teachers, and principals do appear to know who the ineffective teachers are.

Developing a good evaluation system for teachers would be a start. Again, we have talked about that for many years, but it has not happened in many districts.

David -- non-random assignment is one of the other potential problems to which I referred, but I don't see how it has anything to do with the problem I was addressing: the allegation that teachers will leave high-poverty schools in droves because they will be afraid of low value-added scores. If teachers in high-poverty schools are compared to other teachers within the same school, then the fact that the school is high-poverty -- in and of itself -- ought to have no effect on the value-added scores.

Mr. Hanushek,

Your comment is much appreciated.

While I understand what you’re saying about the confusion, I do think I characterized your argument in the manner you describe. I pointed out that it wasn’t an actual policy proposal, but rather an illustration. I also noted your position that improvement is the preferable course. If this was not clear enough, I apologize.

Nevertheless, your own words are easily misunderstood. In this chapter, in the front end, you write, “This discussion provides a quantitative statement of one approach to achieving the governors’ (and the nation’s) goals – teacher deselection. Specifically, how much progress in student achievement could be accomplished by instituting a program of removing, or deselecting, the least-effective teachers?” And the approach consists of deselection based entirely on value-added estimates.

This type of statement might be easily interpreted in a manner quite different from your comment. Surely you know how subtlety is lost in our public discourse, and how, taken literally, your calculation represents the intoxicating promise of a “quick fix.” And, indeed, I have heard many people misuse your research to advocate, implicitly or explicitly, for a policy of systematic firing based solely or predominantly on value-added estimates. Perhaps you aren’t aware of how often this happens.

So many people with whom I have spoken were surprised, reading my post, to learn that you favor, albeit with skepticism, improvement over dismissals. Correct that misperception. I realize you’re a researcher and not an advocate, but your voice carries a lot of weight. When you speak to reporters and policymakers, I hope you lead off with the improvement message. I hope you tell them that evaluations and other measures to increase effectiveness should be our priority. For whatever it’s worth, you’d get tremendous support from many people, including many thousands of the great teachers you celebrate.

Thanks again,
Matt

Why, Mr. Hanushek, do "able teachers" not wish to separate themselves from "truly ineffective teachers?" Because there, for the grace of God, go I. Many of the teachers who have thus far received that unfair label in Los Angeles were actually very good teachers who chose to work with the most challenging students. Teachers like Rigoberto Ruelas, who received this unfair label.

We see through the smokescreen and know that the data is faulty and not a true measure of a teacher's worth. We reject the labels placed on teachers through this faulty measurement. And we will not be divided to facilitate the dismantling of our profession because someone has to stay behind to protect the students from the privatization forces that see both teachers and students as a dollar sign or data point, or in your case, a percentage.

Martha Infante
California Council for the Social Studies Teacher of the Year 2009

Great post, thanks for this. What I can't understand as a psychologist is that the economic models seem to ignore the psychological consequences. As you point out, firing, especially if it is perceived as arbitrary (leaving aside for a moment whether it actually is arbitrary), has an impact on everyone. It changes pedagogy, and narrows curriculum. If people know that the bottom 5-10% will be fired every few years, it will destroy any chemistry that a school needs to thrive.

I also find it obfuscatory for Hanushek to claim this:

What the article says is that the bottom teachers are harming kids and that we need to find a way to do something about that. The best would be to transform these teachers — through coaching, professional development, or what have you — into better teachers. Unfortunately, we have been unable to find a way to do that systematically and consistently.

This takes a tenuous, uncertain relationship (that teachers are the most important in school factor for predicting growth in student test scores) and assumes that the best way would be somehow to "transform" the teachers themselves. As Dan Willingham points out, this is not an immutable truth of the world, but a fact of our system. If we had a standard curriculum, or more support, or smaller class sizes in general, this might not be the case. "Coaching, professional development, what have you" assumes that his model (of the relative importance of teaching) is set in stone.
Further, as he has in other areas, puts forth the fiction that "resources don't matter." We've tried professional development, we've tried coaching, we've tried spending more per pupil, and nothing is working, we should scrap these approaches. Sure, you could point out that we spend more per pupil, but this requires that you ignore the details of how we have done this. DC for example, mismanaged how they administered the Special Ed programs, vastly inflating their per pupil costs. Does this mean that since costs per pupil went up, and test scores didn't, resources don't matter?