A Big Open Question: Do Value-Added Estimates Match Up With Teachers' Opinions Of Their Colleagues?

A recent article about the implementation of new teacher evaluations in Tennessee details some of the complicated issues with which state officials, teachers and administrators are dealing in adapting to the new system. One of these issues is somewhat technical – whether the various components of evaluations, most notably principal observations and test-based productivity measures (e.g., value-added) – tend to “match up." That is, whether teachers who score high on one measure tend to do similarly well on the other (see here for more on this issue).

In discussing this type of validation exercise, the article notes:

If they don't match up, the system's usefulness and reliability could come into question, and it could lose credibility among educators.
Value-added and other test-based measures of teacher productivity may have a credibility problem among many (but definitely not all) teachers, but I don’t think it’s due to – or can be helped much by – whether or not these estimates match up with observations or other measures being incorporated into states’ new systems. I’m all for this type of research (see here and here), but I’ve never seen what I think would be an extremely useful study for addressing the credibility issue among teachers: One that looked at the relationship between value-added estimates and teachers’ opinions of each other.

If a teacher has a high opinion of one of her colleagues’ effectiveness in the classroom, then it’s unlikely that a negative assessment from any external source – whether value-added models, the principal or even a fellow teacher who doesn’t work at that school – will override that judgment. This is perfectly normal – people tend to trust their own judgment above all else, especially when it’s professionals assessing on-the-job performance.*

If, on the other hand, a teacher found that all the colleagues he or she respected as educators received strong value-added scores, this is the kind of validation that might cause even the most ardent skeptic to rethink their position.

That’s 0ne reason why a systematic analysis of the relationship between teacher value-added estimates and teachers’ assessments of their colleagues could be so powerful. Obviously, such an examination would not allow individual teachers to view their own assessments by their colleagues, or those of other teachers. That would not only completely taint the results (many teachers would not be candid if they knew the individual-level results would shared), but it’s also unethical, and would likely cause serious problems within a school. The data would have to be completely private and the results reported overall, not school-by-school, and certainly not individually. For the same general reasons, I don't think this type of measure could be incorporated into actual evaluations.

If, however, several of these analyses, or a big one that was conducted in a diverse set of schools and districts, showed that, on the whole, teachers who are highly regarded by their colleagues also tend to receive high value-added scores, this might not only boost the credibility of value-added estimates among some teachers, but, for the rest of us, it would represent a fairly powerful partial validation of these estimates’ ability to gauge teacher effectiveness (though, as always, the analysis would probably only include math and reading teachers).

And, of course, the converse is true: If a group of studies found only a weak relationship, this might erode some people’s credibility and compel less favorable policy conclusions (of course, the association would be a matter of degree, and different people could interpret it differently, especially if it turned out to be moderate).

It’s a little strange that such a study has not, to my knowledge, been conducted. After all, if the correlation between value-added and students’ opinions has policy relevance (see here), then so does the estimates’ relationship with teachers’ opinions.

One reason why this type of analysis hasn't been conducted might be the fact that there are significant hurdles. For example, among other problems, teachers vary in their familiarity with each others’ abilities, and they also maintain personal relationships that can color judgments. In addition, teachers may hesitate to bash their colleagues, even if they’re assured that their responses are completely confidential. But these issues are not uncommon in survey research, and I believe there are means of dealing with them.

Still, any rigorous project of this sort would require full cooperation from everyone involved. It would have to carefully designed, and would probably require some financial investment. But I think it would be well worth it.

- Matt Di Carlo


* The relationship been value-added and peer observations is clearly a similar approach, one that could be done immediately, anywhere such a system exists. This is a good idea, but it's not the same thing as what I'm proposing. Peer observation is a one-shot deal, and is typically (and correctly) carried out by a observer who does not have a day-to-day working relationship with the observed teacher.


When value added and other measures, like peer or principal observation, are very highly correlated it raises the question of whether the value-added aspect of teacher evaluation is even necessary.
From the research of seen on the topic both aspects, observation and value-added, suffer from the same issue: they can identify a schools very best and worst teachers but they struggle to make fine distinctions between the mass in the middle.


Agreed. But how often do teachers really observe each other in the classroom? If teachers don't regularly observe each other's classrooms, their opinions of each other's teaching ability may not be all that valid.

My guess would be you'd have to do such a study in one of the relatively few districts that are trying to encourage the Japanese practice of "lesson study." At least in Japan, teachers from an entire school will regularly observe one of their colleagues teaching a class and then engage in discussion and criticism of how it went. Teachers who do that kind of thing will have a much stronger basis for assessing their colleagues' performance.


On the other hand, teachers are more in tune with the teaching context and will provide feedback based on the correct context. Tests change and value added parameters will change, the possibility of their being wrong changes more often than teacher.

Spend the money on teachers not tests.


@Stuart - I think part of the point is that while teachers opinions of each other's teaching ability, if not based on classroom observations, may not be valid, they are likely to nevertheless be psychologically important to teachers in the way Matt descri


I hear value added but when is there going to be talk about subtracting invaluable practices?

When it comes to teacher assessment using a Value-Added system there is a small percentage of teachers that the assessment tool doesn't really measure well at all, due to their position, Special Education teachers of the alternative assessment students. Administration often doesn't even know what current practices are and should look like in classrooms with students with multiple and complex needs. 99% of society doesn't.

Trying to use one assessment tool to assess 100% of a population doesn't work for students and isn't legal, why does any one think it's a good thing for teachers?