Quality Control, When You Don't Know The Product

Last week, New York State’s Supreme Court issued an important ruling on the state’s teacher evaluations. The aspect of the ruling that got the most attention was the proportion of evaluations – or “weight” – that could be assigned to measures based on state assessments (in the form of estimates from value-added models). Specifically, the Court ruled that these measures can only comprise 20 percent of a teacher’s evaluation, compared with the option of up to 40 percent for which Governor Cuomo and others were pushing. Under the decision, the other 20 percent must consist entirely of alternative test-based measures (e.g., local assessments).

Joe Williams, head of Democrats for Education Reform, one of the flagship organizations of the market-based reform movement, called the ruling “a slap in the face” and “a huge win for the teachers unions." He characterized the policy impact as follows: “A mediocre teacher evaluation just got even weaker."

This statement illustrates perfectly the strange reasoning that seems to be driving our debate about evaluations.

As I have noted before, while I am receptive to the idea to the idea of using test-based teacher productivity measures in evaluations (though not how it is being implemented in most places), there is not a shred of evidence that doing so will increase performance of teachers or students. Moreover, there is only scant empirical research on the types of measures that should be included in evaluations, and how they should be combined. These systems are brand new, and we have almost no idea what they’re supposed to look like or what impact they are likely to have.

I find rather perplexing Mr. Williams’ bold claims that the system “prior to" the ruling (in which up to 40 percent of evaluations could be based on state tests) is “mediocre," and that the “post-ruling” guideline (at most 20 percent state tests) is “even weaker." These systems are still being designed. How does he reach his conclusions about the overall quality of the evaluations when he has no idea what the final systems will look like?

It's difficult to decipher Mr. Williams' criteria for the quality of legislation guiding evaluation systems. Perhaps he prefers the state assessments over locally-designed tests (certainly a defensible viewpoint). But, given that Mr. Williams regards the “pre-ruling” system as “mediocre," even though an incredibly high proportion (40 percent) might have been based on value-added scores, it also seems possible that he doesn't like the fact that they must be bargained. In other words, Mr. Williams might be assuming that to the degree evaluation systems require teacher input (through their unions), they are inevitably going to be subpar. So, even if the ruling had gone the other way, 80 percent of the new evaluation systems would still be subject to bargaining – the remaining test-based 20 percent, as well as the 60 percent that must be based on non-testing data, such as observations.

If this is a correct interpretation (I don’t want to put words in Mr. Williams’ mouth, as his reasons are not stated directly; see a similar perspective here), it’s an unfortunate viewpoint. No matter how hard some people try to cultivate the (mostly false) distinction between teachers and their unions, the opinion that teachers’ unions should not be involved in something is tantamount to saying that teachers shouldn’t  necessarily be involved.

Now, it’s obviously true that teachers should not be a party in all education decisions, but I should think that the complex task of designing brand new evaluation systems would, for most people, scream out for teacher input. Conversely, the unilateral imposition of evaluations runs the awful risk of teachers’ not understanding or supporting them, and this is a route to failure. Are we really at that point in education policy, where people think that to the degree teachers have a right to a say in how they're evaluated, evaluation systems will be worse?

It also bears mentioning the obvious, but often overlooked fact that subjecting evaluations to collective bargaining means that districts will still be the party mainly accountable for designing them. Anyone who thinks that unions walk into bargaining and dictate terms has never been anywhere near a bargaining table. And the idea that unions and districts cannot get together and hammer out effective evaluations is, in my view, a borderline insult to both teachers and administrators.

In any case, I am once again concerned at the degree to which some people seem certain as to the quality of evaluations – whether those not yet designed or in their first year or two of operation – without an iota of evidence to back up their opinions. The New York system, which leaves the design of evaluations to local level bargaining, will no doubt result in a variety of outcomes. Some of the evaluations that districts and unions design will be great, others will be subpar, and most will be somewhere in between. The optimal designs will certainly vary by location. Districts will hopefully all learn from each other and adjust course as time passes.

It would be nice if we knew how to design the perfect system. We don’t, and we never will. So let’s all try to avoid making sweeping statements about their quality, especially when we don’t even know what the systems look like.

- Matt Di Carlo