Skip to:

Value-Added: Theory Versus Practice


Hey Leonie, If I understand you correctly, you are asking how we can determine that there is overall variation in teacher effects if we can’t isolate them at the individual level. If that’s what you meant, it’s an excellent question. I might have explained it more clearly in the post. I think the best way to answer is with an extremely ironic (albeit imperfect) analogy. Let’s say I’m gathering data on dart throwing – I have a large group of regular dart players, and I give each one ten tries to hit the bull’s eye, recording the results for each throw. Over a large sample of throwers, there would probably be a lot “spread” in the results. Some people would hit it 8-9 times, some 6-7 times, some 3-4 times. Using these data, I could demonstrate reasonably – with statistical measures or just eyeballing it (I might need more than ten tries for each person to do it statistically) – that some players were better than others at this particular task. That is - there is overall variation in performance that was not just random flucutation (if it was all just “luck,” most people would hit roughly the same number, and there’d be less spread). But let’s say I then took it a step further, and tried to assign each *individual* a score – percentage of throws hitting bull’s – and I called that their “bull’s eye performance index.” This is a whole different matter, as anyone who has played darts will tell you. At the *individual* level, my index would be pretty weak. Maybe some of my throwers were drunk, or distracted by poor lighting, or just having a bad day, etc. I might *tentatively* say that the people who hit 8-9 bull’s were at least above average, those who hit none were likely below average (at least for regular dart players), but for the vast majority, it would be tough to tell either way. So, my darts “analysis” could certainly show that there seems to be quite a bit of variation in the ability to hit the bull’s eye (or at least that it wasn’t just random), but NOT which individual players are definitely good or bad, at least in the vast majority of cases. (Note that, as is the case with teachers, accuracy would improve if I gave them 100 throws instead of ten, or if I made an effort to control for factors like environment or sobriety.) Obviously, teaching is nothing like darts, but I think this gives the basic idea. Dart throwing, like teaching, is a skill, and some people are better than others at it. But when it comes to showing WHO is good or bad with any degree of accuracy, that’s a far more difficult endeavor. And we could also, of course, make an even more important point: That being a good dart player is about much more than hitting bull’s eyes, just as good teaching is about much more than test results. For instance, someone might be poor at hitting bull’s eyes, but very good hitting other parts of the board. So, a good performance measure – for dart players and teachers – must be multidimensional. In the case of the LAT/NEPC, we obviously cannot know the “correct model,” but the fact that results varied with different models is itself one of the critical issues (I would add that Briggs and Domingue did, in my view, present compelling evidence that alternative specifications were superior). All models will show that teacher effects (at least as measured by test score gains) are not solely random, but different models will show different teachers to be more or less effective. And that was my main argument: The overall variation was not really the point, and by focusing so strongly on it, the LA Times (and others) kind of misses all the other issues that bear on the debate over using value-added. Sorry for the unacceptably long reply, but I hope this answers your question. Thanks for the comment. MD

Would the Times claim that there is a wide variation in the number of assists by NBA players, so the way to win championships would be to cut the players who don't make enough assists according to a statistical model. Or would it say the same about rebounds? Or points scored? Would it say that we should go ahead and fire basketball players, now, based on any one of those factors, because someday over the rainbow, a statistical model may be developed that takes all three into account?

I'm not sure I understand this. I agree that probably the quality of teaching varies, as in any profession, but I'm not sure how we can know this for sure, if we can't ascertain or measure what the actual quality of any individual teacher might be. If the estimates of a teacher's effectiveness (based solely on test scores) varies widely from year to year, or from one VA formula to another, or from one specific test to another, or even perhaps for one class vs. another -- how do we know how much the "quality" of teachers varies over an entire cohort? Which does not even to begin to address the issue that test scores should never be the sole measure of teacher quality. There are some teachers who might be good at keeping students from dropping out and engaging their interest -- but not at raising their test scores, for example. Are you counting that as "quality" as well?

Thanks for continuing to shine a light on the misuse of statistics, and value-added analysis. As Mark Twain's famous quote*, "Lies--damned lies--and statistics," reveals, numbers are easily manipulated to convey truth or a highly nuanced meaning, when in fact, there value is highly dependent upon the logic used to produce them. The methods that produce quantifiable data are assumed to be valid on their face simply since data allow easy comparisons. It reduces the complex to the facile, allowing anyone with an opinion to rail against the object quantified, irrespective of the validity of the measure, or model, and hence, the data. "If the LA Times reports this, and it is data, surely it must be correct," they think. Sadly, right is not might in our world very often. Those with access to nearly limitless amounts of capital, and lest we forget - axes to grind, persist in perpetuating falsehoods like the LA Times value-added analysis of LAUSD teachers. While value-added offers some value as a management tool, its misuse at the individual teacher level is not apparent to the average person. So, like the days of old, good citizens rally around the town square, pitchforks and torches in hand, ready to rid the castle of the purported evildoers who, when viewed properly, are simply those trying to do what the townsfolk want and could not do themselves. * Twain quoting Benjamin Disraeli in his autobiography. According to Stephen Goranson at, "Twain's Autobiography attribution of a remark about lies and statistics to Disraeli is generally not accepted. Evidence is now available to conclude that the phrase originally appeared in 1895 in an article by Leonard H. Courtney."


This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the Shanker Blog may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.