About Value-Added And "Junk Science"
One can often hear opponents of value-added referring to these methods as “junk science." The term is meant to express the argument that value-added is unreliable and/or invalid, and that its scientific “façade” is without merit.
Now, I personally am not opposed to using these estimates in evaluations and other personnel policies, but I certainly understand opponents’ skepticism. For one thing, there are some states and districts in which design and implementation has been somewhat careless, and, in these situations, I very much share the skepticism. Moreover, the common argument that evaluations, in order to be "meaningful," must consist of value-added measures in a heavily-weighted role (e.g., 45-50 percent) is, in my view, unsupportable.
All that said, calling value-added “junk science” completely obscures the important issues. The real questions here are less about the merits of the models per se than how they're being used.
If value-added is “junk science” regardless of how it's employed, then a fairly large chunk of social scientific research is “junk science." If that’s your opinion, then okay – you’re entitled to it – but it’s not very compelling, at least in my (admittedly biased) view.
And those who hold this opinion will find that their options for using evidence to support their policy views are extremely limited. They should, for instance, cease citing the CREDO charter school study, which uses a somewhat similar approach – i.e., put simply, judging effectiveness by statistical comparison with schools serving similar students. In this sense, CREDO (and most of the charter school literature) must also be called “junk science."
Furthermore, what is the case against calling classroom observations “junk science” too? Even when done properly -- by well-trained observers observing multiple times throughout the year, observation scores also fluctuate over time and between raters, and they are subject to systematic bias (e.g., poorly-trained or vindictive principals).
You might believe that human judgment is a better way to assess performance than analyzing large-scale test score datasets, and you might be correct, but that's just an opinion, and it hardly means that all alternative measures are "junk" no matter their policy deployment.
It's also important to bear in mind the obvious fact that value-added has a wide range of research-related uses outside of high-stakes personnel decisions (including program evaluation). In fact, many of the conclusions from this literature are things with which few teachers would disagree - e.g., teachers vary widely in their measured performance, they improve a great deal during their first few years, etc.
In short, value-added models are what they are – sophisticated but imperfect tools that must be used properly. We can and should disagree about their proper uses, but calling the models "junk science" adds almost nothing of substance to that debate.
- Matt Di Carlo
I couldn't agree more. I also agree that the correlations found in the MET report, for example, are too low to base teacher evaluations on - at present. However, the science I'm basing my opinions on is this so-called "junk science." In fact, the only way to make statements about the imperfect ability of state tests to be used in evaluations is to use that very science which is being called "junk."
In reality, the term "junk science" (as I've seen it used is, simply put, propaganda. Opponents to teacher evaluation via state test aren't interested in sticking with a professional discussion of the topic, but in trying to add emotional connotations to the arguments. These emotional connotations are, in my opinion, not only distracting, but make opponents of evaluation via state test seem immature, unprofessional, and unprepared to have a real discussion. This is very unfortunate to me as I see them having more truth on their side than not.
Thanks for being another voice in the discussion. Hopefully folks using the term "junk science" will realize the distraction from their argument and shift accordingly.
but what say you to the instability of value-added scores? Does not that seem to indicate a lack of reliability, and if there is no reliability, how can one draw valid inferences?
Worth noting that people have been overclaiming for value-added for more than a decade. Sanders claimed he could show teacher effects for three years out. The independent studies of his black box methodology commissioned by the TN Auditors office, by Bock, Wolfe, & Fisher, said that could not be sustained by the data, iirc. And if I may paraphrase from their report, which can be read here, they found the results of TVAAS lacking sufficient accuracy to be reported the way they were. The sad thing is that in the 17 years since that report was issued, the problems they identified still exist, at least as I read the more recent critiques.
The Bock & Wolfe paper has long been superseded by other work on growth models, most of which Matt is referring to implicitly here about the weaknesses. The fact that there is instability in a model is a feature of a particular model, not something that makes it "junk science." ALL statistical models have uncertainty of various types.
Fisher's paper (also commissioned by the TN legislative audit office) was on the contract between the state and Sanders' unit at UTK and was very specific to Sanders' insistence on keeping his particular model proprietary. In both that paper and as the head of Florida's state assessment office, Tom Fisher was an advocate of high-stakes testing who was skeptical of Sanders' way of running business and skeptical of growth models for judging teachers. But he was (and presumably is) still an advocate of high-stakes testing.
Well considering that the concepts of educational standards and standardized testing are so fraught with error as shown by Noel Wilson that it renders them invalid, and once invalidity is proven the reliability goes out the window, any talk of VAM having any value whatsoever is "vain and illusory" (to quote Wilson).
I invite Matt and Sherman to refute/disprove what Wilson says in his "Educational Standards and the Problem of Error" found at: http://epaa.asu.edu/ojs/article/view/577/700 or his essay review of the testing bible "A Little Less than Valid: And Essay Review" found at: www.edrev.info/essays/v10n5.pdf
Very, very weak assertion. First, judges don't care about piddly assertions. When a group of teachers are fired via VAM, and those teachers have the resources for a class action suit, they will win - plain and simple. We're waiting for a large scale case, but there are already some promising examples of teachers fighting VAM outcomes in court and winning (see D.C. and Houston TX).
Moreover, I am a REAL scientist that researches within and teaches physics and chemistry. On the other hand, I have received a doctorate in an educational field, so I am well versed in the social science and the manipulation of data to try and make it fit social circumstances. It was hard for me to keep my mouth shut going through my doctoral program because all along I felt as though the statistics that were being used hardly meant a thing in terms of inferring broad circumstantial conclusions to an ever broader social scale. People are complicated in that they work differently given different situations. Studies involving gravity are measurable, testable, repeatable, and falsifiable, whereas social science topics are not.
The problem here is that VAM zeros in on individuals and hurts them for no reason other than a computer applying VAM formulas has spit out values. It's GIGO. Judges will not go for this.
Another huge issue with your post is that you are attempting to uncover a contradiction among non-reformers in that we accept studies like the one from CREDO, yet we do not accept VAM outcomes. But you miss the mark. At least the CREDO study, and other large scale studies, attempt to describe SCHOOLS and school systems. VAM is zeroing in on teachers and taking them to task for something out of their hands. Student outcomes are terribly difficult to measure, and it is even more difficult to measure covariates in the learning process.
This is a big difference that you seem to ignore. Also, VAM causes many unintended consequences that old-fashioned teacher evaluation does not. The old-fashioned methods may not have been very valid, however, they didn't hurt children. VAM causes teachers to find out what's on the test, by an means necessary, and forces them to teach to that test in order to make it seem as though students are successful. And there is even worse - blatant cheating. There are too many unintended consequences to even address here.
The bottom line is this - public education is under attack and VAM is a tool used to beat up teachers. Politicians love this stuff - it enables them to unload veterans and save on pensions and wages. Churn is what they want, and they've enjoyed this for years as many teachers quit before their first 5 years.
Teachers should be evaluated like any other public servant - by the services they offer - not by the outcomes of the services. Police, fire, and teachers are unable to make the public do what people are supposed to do - all we do is OFFER a service.
If the point is to make good teaching happen and the instrument to make that happen is VAM then, while the underlying methodology may be fine in a scientific sence, it's being hopelessly misapplied. Knowing how to apply scientific methods correctly is good science, missaplying them is bad science i.e. "junk science".
VAM assumes that the value that a teacher adds to a student can be completely (i.e. covers every aspect of value that the teachers adds) and truthfully (i.e. does so without error) ascertained by two scores on a standardised test.
There are two different ways of teaching that could get similar results on a standardised test - a great teacher can do inspiring lessons that not only teaches advanced cognitive skills but also social skills (i.e. working together, self-dircected learning, addressing conflict etc) or a ho-hum/under-pressure teacher can teach-to-the-test, drill-and-kill and practice/practice/practice-the-test. They two teachers may get the VAM but there is definitely better teaching going-on in one of the classrooms. That VAM is unable to distinguish the two teaching methods means it's pretty poor applied science.
...I should also add that most teachers agree that we get better, not because of test scores, but because we cease to go home in tears like we did our first few years. I don't personally accept ANY measures involving high-stakes testing, whether it be measuring teacher effectiveness or the pass/fail status of a student.
Remember this - ANY measure using high-stakes testing is now in error. I don't care what it is - whether it be a state exam or ACT. For example, because the ACT now counts towards accountability for our schools in my state, guess what we are doing? Preparing our kids to take the ACT. Guess what that does? INVALIDATES the results.
Talk NAEP, I'll listen. Talk PISA, TIMSS, I'll listen. Talk any state standardized test, A.P. test, or ACT, when those measures count towards school accountability, and the meaningfulness equals zero.
See Campbell's Law. This whole testing thing is a facade - smoke and mirrors.
Our poor children.
I'd like to offer a clarification here on the use of the term junk science. I am completely opposed to the use of VAM as an evaluation tool at the individual teacher and school level due to the oft cited reasons not to do so. If VAM is used as it was intended, for large scale, longer term understanding and evaluations of programs, then it is no longer junk science. VAM becomes junk science when it is used for purposes it cannot hope to accomplish, much like using a wire brush to clean the dust from the wings of a butterfly collection instead of for getting the rust and grime off of a barbecue grill. We have politicians, lobbyists and their funders to thank for VAM being relegated to junk science status. Though this is the point of Mr. DiCarlos article, he could have made it more affirmatively. Having read many things he has written over the years, I think that his strong desire for accuracy via neutrality has sometimes gone a bit farther than is useful and has had the unintended effect of sanitizing? salient aspects of the topics he holds forth on. I can't really fault him for what may be an aversion to wading into or even approaching what is all to often the cesspool of policy making inhabited by those habitués who mistakenly think they are swimming in champagne.
Does the poor application of a valid statistical model qualify as junk science? Linus Pauling misapplied scientific knowledge about Vitamin C to megadoses of Vitamin C. Was that junk science?
The value-added measurement is closer to a pseudoscience when it comes teacher evaluation. VAM is hard to prove false, and those who advocate its use show a serious lack of skepticism. VAM uses student test scores--a flawed proxy of a teacher's effectiveness--to measure a teacher's effect. Wikipedia defines junk science as " any scientific data, research, or analysis considered to be spurious or fraudulent."
Saying it's the people who use value-added measurement for teacher evaluation, rather than value-added measurement, that harms people, is like saying "Guns don't kill people. People kill people."