People often ask me for my “bottom line” on using value-added (or other growth model) estimates in teacher evaluations. I’ve written on this topic many times, and while I have in fact given my overall opinion a couple of times, I have avoided expressing it in a strong “yes or no” format. There's a reason for this, and I thought maybe I would write a short piece and explain myself.
My first reaction to the queries about where I stand on value-added is a shot of appreciation that people are interested in my views, followed quickly by an acute rush of humility and reticence. I know think tank people aren’t supposed to say things like this, but when it comes to sweeping, big picture conclusions about the design of new evaluations, I’m not sure my personal opinion is particularly important.
Frankly, given the importance of how people on the ground respond to these types of policies, as well as, of course, their knowledge of how schools operate, I would be more interested in the views of experienced, well-informed teachers and administrators than my own. And I am frequently taken aback by the unadulterated certainty I hear coming from advocates and others about this completely untested policy. That’s why I tend to focus on aspects such as design details and explaining the research – these are things I feel qualified to discuss. (I also, by the way, acknowledge that it’s very easy for me to play armchair policy general when it's not my job or working conditions that might be on the line.)
That said, here’s my general viewpoint, in two parts. First, my sense, based on the available evidence, is that value-added should be given a try in new teacher evaluations.
I cannot say how large a role they should play, but I confess that I am uncomfortable with the 40-50 percent weighting requirements, not only because it strikes me as too high a starting point, but also because districts vary in their needs, preferences and current policy environments. In addition, I am seriously concerned about other details – the treatment of error, nominal versus actual weights, the manner in which the estimates are converted to ratings, etc. I think these issues are being largely ignored in many states and districts, even though they might, in my view, compromise this whole endeavor. I have strong opinions on these fronts, and I express them regularly.
Now, at this point, you might ask: How can you say you take teachers' opinions seriously and still support value-added, which most teachers don’t like? That’s a fair question. In my experience, the views of teachers who oppose value-added are not absolute.
Here’s my little field test (I recommend trying it yourself): When I’m talking to someone, especially a teacher, who is dead set against value-added, I ask them whether they could support using these estimates as 10-15 percent of their final evaluation score, with checks on the accuracy of the datasets and other basic precautions. Far more often than not, they are receptive (if not enthusiastic).
In other words, I would suggest that views on this issue, as is usually the case, are not binary – it’s a continuum. Teachers are open to trying new things, even if they're not crazy about them; they do this all the time (experienced teachers can [and will] tell you about dozens of programs and products that have come and gone over the years). So, while there’s plenty of blame to go around, this debate might have been a bit less divisive but for the unfounded, dealbreaker insistence on very high weights, for every district, right out of the gate.
This brings me to the second thing I want to say, which is more of a meta-opinion: Whether or not we use these measures in teacher evaluations is an important decision, but the attention it gets seems way overblown.
And I think this is because the intense debate surrounding value-added isn’t entirely – or perhaps even mostly - about value-added itself. Instead, for many people on both “sides” of this issue, it has become intertwined with - a kind of symbol of - firing teachers.
Supporters of these measures are extremely eager to use the estimates as a major criterion for dismissals, as many believe (unrealistically, in my view) that this will lead to very quick, drastic improvements in aggregate performance. Opponents, on the other hand, frequently assert (perhaps unfairly) that value-added represents an attempt to erect a scientific facade around the institutionalization of automatic dismissals that will end up being arbitrary and harmful. Both views (my descriptions of them are obviously generalizations) are less focused on the merits of the measures than on the connected but often severely conflated issue of how they’re going to be used.
Think about it: If, hypothetically, we were designing new evaluations solely for the purpose of helping teachers improve, without also tying them to dismissals or other high-stakes decisions, would there be as much controversy? I very much doubt it. We would certainly find plenty to argue about, but the areas of major disagreement today – e.g., how high the weights should be – might not be particularly salient, since teachers and administrators would presumably be given all the information to use as they saw fit.
Now, here’s a more interesting hypothetical: If we were designing new evaluations and did plan to use them for dismissals and other high-stake decisions, but value-added wasn’t on the table for whatever reason, would there still be relentless controversy over the measures we were using and how they were combined? I suspect there would be (actually, for teachers in untested grades/subjects, there already is).
That’s because, again, much of the fuss is about the decisions for which the ratings will be used, and the manner in which many of these systems are being imposed on teachers and other school staff. Value-added is the front line soldier in that larger war.
Thus, when I say that I think we should give value-added a try, that is really just saying that I believe, based on the available evidence, that the estimates transmit useful, albeit imperfect, information about teacher performance. Whether and how this information – or that from other types of measures - is appropriate for dismissals or other high-stakes decisions is a related yet in many respects separate question, and a largely empirical one at that. That's the whole idea of giving something a try - to see how it works out.
- Matt Di Carlo