Skip to:

When The Legend Becomes Fact, Print The Fact Sheet

The New Teacher Project (TNTP) just released a "fact sheet" on value-added (VA) analysis. I’m all for efforts to clarify complex topics such as VA, and, without question, there is a great deal of misinformation floating around on this subject, both "pro-" and "anti-."

The fact sheet presents five sets of “myths and facts." Three of the “myths” seem somewhat unnecessary: that there’s no research behind VA; that teachers will be evaluated based solely on test scores; and that VA is useless because it’s not perfect. Almost nobody believes or makes these arguments (at least in my experience). But I guess it never hurts to clarify.

In contrast, the other two are very common arguments, but they are not myths. They are serious issues with concrete policy implications. If there are any myths, they're in the "facts" column.

The first objection – that the models aren’t “fair to teachers who work in high-needs schools, where students tend to lag far behind academically” - is a little confusing. In one sense, it’s correct to point out that value-added models focus on growth, not absolute scores, and teachers aren’t necessarily penalized just because their students “start out” low.

But most of the response to this "myth" addresses a rather different question - whether or not the models can fully account for the many factors out of teachers' hands. TNTP's take is that VA models “control for students’ past academic performance and demographic factors," which, they say, means that teachers “aren’t penalized for the effects of factors beyond their control." Even under ideal circumstances, that's just not accurate.

The evidence they cite is a frequently-misinterpreted paper by researchers at Vanderbilt University and the SAS Institute, published in 2004. What the analysis finds is that the results of a specific type of VA model (TVAAS) – one with very extensive data requirements, spanning multiple (in this analysis, five) years and subjects, in one specific location (Tennessee) - are not substantially different when variables measuring student characteristics (i.e., free/reduced lunch eligibility and race) are added to the models.

This does not, however, mean that the TVAAS model – or any other – can account for all the factors that teachers can’t control. For one thing, the free/reduced-price lunch variable is not a very good income proxy. Eligible students vary widely in family circumstances, which is a particular problem in high-poverty areas where virtually all the students qualify.

That paper aside, it's true that students' prior achievement scores account for much of the income-based variation in achievement gains (ironically, prior test scores are probably better at this than free/reduced-priced lunch). But not all of poverty's impacts are measurable/observed, and, perhaps more importantly, there are several other potential sources of bias, including the fact that students are not randomly assigned to classrooms (also here). VA scores are also affected by the choice of model, data quality and the test used. And, of course, even if there is no bias at all, many teachers will be “treated unfairly” by simple random error.

These are the important issues, the ones that need discussion. If we're going to use these VA estimates in education policy, we need to at least do it correctly and minimize mistakes. In many places around the nation, this isn't happening (also see Bruce Baker's discussion of growth models). As a result, the number of teachers "penalized" unfairly - whether because they have high-needs students or for other reasons beyond their control - may actually be destructively high. TNTP calls this a "myth." It's not.

The second “myth” they look at is the very common argument that VA scores are too volatile between years to be useful. This too is not a “myth," but it is indeed an issue that could use some clarifying discussion.TNTP points out that all performance measures fluctuate between years, and that they all entail uncertainty. These are valid points. However, their strongest rebuttal is that “teachers who earn very high value-added scores early in their career rarely go on to earn low scores later, and vice-versa."

Their “evidence” is an influential paper by researchers from Florida State University and the RAND Corporation (it was published in 2009). The analysis focuses on the stability of VA estimates over time. While everyone might have a different definition of “rarely," it’s safe to say that the word doesn’t quite apply in this case. Across all teachers, for instance, only about 25-40 percent of the top quintile (top 20%) teachers in one year were in the top quintile the next year, while between 20-30 percent of them ended up in the bottom 40%. Some of this volatility appears to have been a result of “true” improvement or degradation (within-teacher variation), but a very large proportion was due to nothing more than random error.

The accurate interpretation of this paper is that value-added estimates are, on average, moderately stable from year-to-year, but that stability improves with multiple years of data and better models (also see here and here for papers reaching similar conclusions). This does not mean that teachers scores "rarely" change over time, nor does it disprove TNTP's "myth." In fact, the papers' results show that VA estimates from poorly-specified models with smaller samples are indeed very unstable, probably to the point of being useless. And, again, since many states and districts are making these poor choices, the instability "myth" is to some degree very much a reality.

Value-added models are sophisticated and have a lot of potential, but we have no idea how they are best used or whether they will work. It is, however, likely that poor models implemented in the wrong way would "penalize" critically large numbers for reasons beyond their control, as well as generate estimates that are too unstable to be useful for any purpose, even low-stakes decisions. These are not myths, they are serious risks. Given that TNTP is actively involved in redesigning teacher quality policies in dozens of states and large districts, it is somewhat disturbing that they don't seem to know the difference.

- Matt Di Carlo


Mary: I understand your frustration, but I've tried to be very clear that using VA in high-stakes decisions might easily be destructive. Actually, I said so in this very piece (paragraphs 9, 12 and 13), as well as in several previous posts (e.g., My overall view is that there is a potentially useful role for these methods in education policy, but that they are being misused in most places. Stuart: The question is whether anyone says that new evaluations will be 100 percent test scores. You know nobody says this, nor do they say that there is no research behind VA or that systems must be perfect. Like I said in the post, it's fine for TNTP to clarify nonetheless, but let's not accuse people of making arguments they're not making. Thank you both for your comments, MD

So, you've gotten to the point where "it is somewhat disturbing" to you that this thing is being done to the nation's schools, in defiance of all reason and decency, but you won't stand all the way up and oppose it. In claiming that the problem with VAM is only "poor models implemented in the wrong way," you place yourself as the data-masters' last line of defense as their credibility crumbles. You have to try to be clearer, Matt. You say it "might easily be destructive." You've played an enabling role, so far, in allowing this random-number-driven decimation machine to be installed in thousands of actual schools, by force of law! The AFT betrayed children, by well as their teachers, in embracing Gates in exchange for a place at his table. Teachers are moving to take our unions back from leader/collaborators like Weingarten (and yourself, so far). I am, again, inviting you to come on over to the actual oppposition. Reread your own last paragraph, and realize how disgusted you really are. You can't straddle this question.

What Mary Porter said.

"You know nobody says this, nor do they say that there is no research behind VA or that systems must be perfect." To the contrary, I'm pretty sure that Diane Ravitch has made all three points. I don't have time to dig through her nearly 22,000 tweets, but she exaggerates and simplifies all the time on issues like that. But you're technically right about one thing: rather than saying there's no research behind VAM, she's more likely to say that all the research is against it, which is a closely-related myth. Moreover, if you do a Google search that uses the phrase "only on test scores," you'll find plenty of people arguing that teachers should not be evaluated "only on test scores." So there's definitely a popular perception that evaluation only on test scores is a possibility, even if well-informed people know that no such thing has been contemplated. By the way, TNTP isn't "accusing" anyone by name, so you're basically trying to prove a universal negative (come on, do you really think that no one has ever made the points that they're refuting?)



This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the Shanker Blog may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.