A recent article about the implementation of new teacher evaluations in Tennessee details some of the complicated issues with which state officials, teachers and administrators are dealing in adapting to the new system. One of these issues is somewhat technical – whether the various components of evaluations, most notably principal observations and test-based productivity measures (e.g., value-added) – tend to “match up." That is, whether teachers who score high on one measure tend to do similarly well on the other (see here for more on this issue).
In discussing this type of validation exercise, the article notes:
If they don't match up, the system's usefulness and reliability could come into question, and it could lose credibility among educators.Value-added and other test-based measures of teacher productivity may have a credibility problem among many (but definitely not all) teachers, but I don’t think it’s due to – or can be helped much by – whether or not these estimates match up with observations or other measures being incorporated into states’ new systems. I’m all for this type of research (see here and here), but I’ve never seen what I think would be an extremely useful study for addressing the credibility issue among teachers: One that looked at the relationship between value-added estimates and teachers’ opinions of each other.
If a teacher has a high opinion of one of her colleagues’ effectiveness in the classroom, then it’s unlikely that a negative assessment from any external source – whether value-added models, the principal or even a fellow teacher who doesn’t work at that school – will override that judgment. This is perfectly normal – people tend to trust their own judgment above all else, especially when it’s professionals assessing on-the-job performance.*
If, on the other hand, a teacher found that all the colleagues he or she respected as educators received strong value-added scores, this is the kind of validation that might cause even the most ardent skeptic to rethink their position.
That’s 0ne reason why a systematic analysis of the relationship between teacher value-added estimates and teachers’ assessments of their colleagues could be so powerful. Obviously, such an examination would not allow individual teachers to view their own assessments by their colleagues, or those of other teachers. That would not only completely taint the results (many teachers would not be candid if they knew the individual-level results would shared), but it’s also unethical, and would likely cause serious problems within a school. The data would have to be completely private and the results reported overall, not school-by-school, and certainly not individually. For the same general reasons, I don't think this type of measure could be incorporated into actual evaluations.
If, however, several of these analyses, or a big one that was conducted in a diverse set of schools and districts, showed that, on the whole, teachers who are highly regarded by their colleagues also tend to receive high value-added scores, this might not only boost the credibility of value-added estimates among some teachers, but, for the rest of us, it would represent a fairly powerful partial validation of these estimates’ ability to gauge teacher effectiveness (though, as always, the analysis would probably only include math and reading teachers).
And, of course, the converse is true: If a group of studies found only a weak relationship, this might erode some people’s credibility and compel less favorable policy conclusions (of course, the association would be a matter of degree, and different people could interpret it differently, especially if it turned out to be moderate).
It’s a little strange that such a study has not, to my knowledge, been conducted. After all, if the correlation between value-added and students’ opinions has policy relevance (see here), then so does the estimates’ relationship with teachers’ opinions.
One reason why this type of analysis hasn't been conducted might be the fact that there are significant hurdles. For example, among other problems, teachers vary in their familiarity with each others’ abilities, and they also maintain personal relationships that can color judgments. In addition, teachers may hesitate to bash their colleagues, even if they’re assured that their responses are completely confidential. But these issues are not uncommon in survey research, and I believe there are means of dealing with them.
Still, any rigorous project of this sort would require full cooperation from everyone involved. It would have to carefully designed, and would probably require some financial investment. But I think it would be well worth it.
- Matt Di Carlo
* The relationship been value-added and peer observations is clearly a similar approach, one that could be done immediately, anywhere such a system exists. This is a good idea, but it's not the same thing as what I'm proposing. Peer observation is a one-shot deal, and is typically (and correctly) carried out by a observer who does not have a day-to-day working relationship with the observed teacher.