When The Legend Becomes Fact, Print The Fact Sheet

The New Teacher Project (TNTP) just released a "fact sheet" on value-added (VA) analysis. I’m all for efforts to clarify complex topics such as VA, and, without question, there is a great deal of misinformation floating around on this subject, both "pro-" and "anti-."

The fact sheet presents five sets of “myths and facts." Three of the “myths” seem somewhat unnecessary: that there’s no research behind VA; that teachers will be evaluated based solely on test scores; and that VA is useless because it’s not perfect. Almost nobody believes or makes these arguments (at least in my experience). But I guess it never hurts to clarify.

In contrast, the other two are very common arguments, but they are not myths. They are serious issues with concrete policy implications. If there are any myths, they're in the "facts" column.

The first objection – that the models aren’t “fair to teachers who work in high-needs schools, where students tend to lag far behind academically” - is a little confusing. In one sense, it’s correct to point out that value-added models focus on growth, not absolute scores, and teachers aren’t necessarily penalized just because their students “start out” low.

But most of the response to this "myth" addresses a rather different question - whether or not the models can fully account for the many factors out of teachers' hands. TNTP's take is that VA models “control for students’ past academic performance and demographic factors," which, they say, means that teachers “aren’t penalized for the effects of factors beyond their control." Even under ideal circumstances, that's just not accurate.

The evidence they cite is a frequently-misinterpreted paper by researchers at Vanderbilt University and the SAS Institute, published in 2004. What the analysis finds is that the results of a specific type of VA model (TVAAS) – one with very extensive data requirements, spanning multiple (in this analysis, five) years and subjects, in one specific location (Tennessee) - are not substantially different when variables measuring student characteristics (i.e., free/reduced lunch eligibility and race) are added to the models.

This does not, however, mean that the TVAAS model – or any other – can account for all the factors that teachers can’t control. For one thing, the free/reduced-price lunch variable is not a very good income proxy. Eligible students vary widely in family circumstances, which is a particular problem in high-poverty areas where virtually all the students qualify.

That paper aside, it's true that students' prior achievement scores account for much of the income-based variation in achievement gains (ironically, prior test scores are probably better at this than free/reduced-priced lunch). But not all of poverty's impacts are measurable/observed, and, perhaps more importantly, there are several other potential sources of bias, including the fact that students are not randomly assigned to classrooms (also here). VA scores are also affected by the choice of model, data quality and the test used. And, of course, even if there is no bias at all, many teachers will be “treated unfairly” by simple random error.

These are the important issues, the ones that need discussion. If we're going to use these VA estimates in education policy, we need to at least do it correctly and minimize mistakes. In many places around the nation, this isn't happening (also see Bruce Baker's discussion of growth models). As a result, the number of teachers "penalized" unfairly - whether because they have high-needs students or for other reasons beyond their control - may actually be destructively high. TNTP calls this a "myth." It's not.

The second “myth” they look at is the very common argument that VA scores are too volatile between years to be useful. This too is not a “myth," but it is indeed an issue that could use some clarifying discussion.TNTP points out that all performance measures fluctuate between years, and that they all entail uncertainty. These are valid points. However, their strongest rebuttal is that “teachers who earn very high value-added scores early in their career rarely go on to earn low scores later, and vice-versa."

Their “evidence” is an influential paper by researchers from Florida State University and the RAND Corporation (it was published in 2009). The analysis focuses on the stability of VA estimates over time. While everyone might have a different definition of “rarely," it’s safe to say that the word doesn’t quite apply in this case. Across all teachers, for instance, only about 25-40 percent of the top quintile (top 20%) teachers in one year were in the top quintile the next year, while between 20-30 percent of them ended up in the bottom 40%. Some of this volatility appears to have been a result of “true” improvement or degradation (within-teacher variation), but a very large proportion was due to nothing more than random error.

The accurate interpretation of this paper is that value-added estimates are, on average, moderately stable from year-to-year, but that stability improves with multiple years of data and better models (also see here and here for papers reaching similar conclusions). This does not mean that teachers scores "rarely" change over time, nor does it disprove TNTP's "myth." In fact, the papers' results show that VA estimates from poorly-specified models with smaller samples are indeed very unstable, probably to the point of being useless. And, again, since many states and districts are making these poor choices, the instability "myth" is to some degree very much a reality.

Value-added models are sophisticated and have a lot of potential, but we have no idea how they are best used or whether they will work. It is, however, likely that poor models implemented in the wrong way would "penalize" critically large numbers for reasons beyond their control, as well as generate estimates that are too unstable to be useful for any purpose, even low-stakes decisions. These are not myths, they are serious risks. Given that TNTP is actively involved in redesigning teacher quality policies in dozens of states and large districts, it is somewhat disturbing that they don't seem to know the difference.

- Matt Di Carlo

Blog Topics

"that there’s no research behind VA; that teachers will be evaluated based solely on test scores; and that VA is useless because it’s not perfect"

Um, Diane Ravitch and her followers make those three arguments all the time.

The Los Angeles CMOs, with funding from gates, are attempting to use a VA model which rewards/penalizes teachers for their students percentile (as opposed to raw) change each year. I tried to point out that this implicitly assumes a zero sum game and would absolutely ensure that every year a large number of teachers would be labelled negatively, no matter how much absolute improvement was made. Let's just say that they didn't want to hear it. At meetings they essentially just trotted out a version of the VA fact sheet from TNTP that you have written out.

"Um, Diane Ravitch and her followers make those three arguments all the time."

The arguments made are that VA research is not as robust or generalizable as it is represented to be. That teachers will be rewarded/punished and labelled by methods that rely too heavily on test scores and poor tests at that, and that VA is not important enough to make the basis of such major education policies/decisions especially without a longer, more open, discussion. In fact, I think you will find Ravitch arguing that data like VA are more useful when they are used as just that, data, instead of as carrots and sticks for an "accountability" culture.

In regard to the “myth” that VA models may be unfair to teachers in high-needs schools, see also this recent study by Newton, Linda Darling-Hammond, Haertel, and Thomas -- http://epaa.asu.edu/ojs/article/view/810.

The authors find that “judgments of teacher effectiveness for a given teacher can vary substantially across statistical models, classes taught, and years. Furthermore, student characteristics can impact teacher rankings, sometimes dramatically, even when such characteristics have been previously controlled statistically in the value-added model. A teacher who teaches less advantaged students in a given course or year typically receives lower effectiveness ratings than the same teacher teaching more advantaged students in a different course or year.”

No, Diane Ravitch "and her followers" never argued that there is no research behind VAM. We argue that the research doesn't support the validity or usefullness of the models, and Diane has cited the research and discussed it in detail. Matt was correct in his assessment that "nobody" is saying there's no research.

It's his third dismissal that needs to be looked at: yes, some people really do have the courage to stand up and say
VAM is useless. We don't say "that VA is useless because it’s not perfect." We say it's a useless and defective product because it doesn't churn its bogus "data" into useful information in any sense whatsoever; not for policy, not for 100% of teacher evaluations, not for 40% of teacher evaluations, not to grade schools, and not for policy decisions.

Matt says, "Value-added models are sophisticated and have a lot of potential." What if they don't, and you're afraid to say so out loud? What if its really a data-industry hoax, like real estate derivatives? Why can't a columnist for the Shanker Institute itself even admit of the possibility?

The emperor is naked, right now, parading down the streets of Tennessee and Florida, but his tailors still have their heels on all our throats.

http://www.miamiherald.com/2011/11/05/2488961/complex-new-teacher-evalu…

http://www.nytimes.com/2011/11/07/education/tennessees-rules-on-teacher…

People who just haven't been paying attention saw these stories five days ago, and they're looking into it right now. They're finding evasive, apologist columns like this one, still trying to do damage control. This very morning, people are judging the enablers of this long, ugly drive to take hijack public education for private profit.

Ravitch has repeated an argument she found elsewhere, that VAM is supposedly like a car that explodes 2 times out of 5. OK, but what if the current evaluation system explodes 2 times out of 5 too? Or even more? Unless someone has the ability to compare the two systems, denigrating VAM for its level of inaccuracy is really nothing more than saying that she's happy with whatever level of inaccuracy exists in the current system (without even caring what that level is) but won't accept a different system that has any inaccuracies in it.

Mary: I understand your frustration, but I've tried to be very clear that using VA in high-stakes decisions might easily be destructive. Actually, I said so in this very piece (paragraphs 9, 12 and 13), as well as in several previous posts (e.g., http://shankerblog.org/?p=3165). My overall view is that there is a potentially useful role for these methods in education policy, but that they are being misused in most places.

Stuart: The question is whether anyone says that new evaluations will be 100 percent test scores. You know nobody says this, nor do they say that there is no research behind VA or that systems must be perfect. Like I said in the post, it's fine for TNTP to clarify nonetheless, but let's not accuse people of making arguments they're not making.

Thank you both for your comments,
MD

So, you've gotten to the point where "it is somewhat disturbing" to you that this thing is being done to the nation's schools, in defiance of all reason and decency, but you won't stand all the way up and oppose it. In claiming that the problem with VAM is only "poor models implemented in the wrong way," you place yourself as the data-masters' last line of defense as their credibility crumbles.

You have to try to be clearer, Matt. You say it "might easily be destructive." You've played an enabling role, so far, in allowing this random-number-driven decimation machine to be installed in thousands of actual schools, by force of law! The AFT betrayed children, by well as their teachers, in embracing Gates in exchange for a place at his table. Teachers are moving to take our unions back from leader/collaborators like Weingarten (and yourself, so far).

I am, again, inviting you to come on over to the actual oppposition. Reread your own last paragraph, and realize how disgusted you really are. You can't straddle this question.

What Mary Porter said.

"You know nobody says this, nor do they say that there is no research behind VA or that systems must be perfect."

To the contrary, I'm pretty sure that Diane Ravitch has made all three points. I don't have time to dig through her nearly 22,000 tweets, but she exaggerates and simplifies all the time on issues like that. But you're technically right about one thing: rather than saying there's no research behind VAM, she's more likely to say that all the research is against it, which is a closely-related myth.

Moreover, if you do a Google search that uses the phrase "only on test scores," you'll find plenty of people arguing that teachers should not be evaluated "only on test scores." So there's definitely a popular perception that evaluation only on test scores is a possibility, even if well-informed people know that no such thing has been contemplated.

By the way, TNTP isn't "accusing" anyone by name, so you're basically trying to prove a universal negative (come on, do you really think that no one has ever made the points that they're refuting?)