Evaluating Individual Teachers Won't Solve Systemic Educational Problems

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

Our guest author today is David K. Cohen, John Dewey Collegiate Professor of Education and professor of public policy at the University of Michigan, and a member of the Shanker Institute’s board of directors.

What are we to make of recent articles (here and here) extolling IMPACT, Washington DC’s fledging teacher evaluation system, for how many "ineffective" teachers have been identified and fired, how many "highly effective" teachers rewarded? It’s hard to say.

In a forthcoming book, Teaching and Its Predicaments (Harvard University Press, August 2011), I argue that fragmented school governance in the U.S. coupled with the lack of coherent educational infrastructure make it difficult either to broadly improve teaching and learning or to have valid knowledge of the extent of improvement. Merriam-Webster defines "infrastructure" as: "the underlying foundation or basic framework (as of a system or organization)." The term is commonly used to refer to the roads, rail systems, and other frameworks that facilitate the movement of things and people, or to the physical and electronic mechanisms that enable voice and video communication. But social systems also can have such "underlying foundations or basic frameworks". For school systems around the world, the infrastructure commonly includes student curricula or curriculum frameworks, exams to assess students’ learning of the curricula, instruction that centers on teaching that curriculum, and teacher education that aims to help prospective teachers learn how to teach the curricula. The U.S. has had no such common and unifying infrastructure for schools, owing in part to fragmented government (including local control) and traditions of weak state guidance about curriculum and teacher education.

Like many recent reform efforts that focus on teacher performance and accountability, IMPACT does not attempt to build infrastructure, but rather assumes that weak individual teachers are the problem. There are some weak individual teachers, but the chief problem has been a non-system that offers no guidance or support for strong teaching and learning, precisely because there has been no infrastructure. IMPACT frames reform as a matter of solving individual problems when the weakness is systemic.

IMPACT and similar programs aim to distinguish more and less qualified individual teachers by using longitudinal measures of student achievement—especially value-added calculations—to estimate each teacher’s contribution to student learning. The goal is to reward teachers whose students gain more, or eliminate those teachers whose students gain less, or both. These programs, which promise large improvements in student performance without serious investment in system redesign, understandably have wide appeal, because they offer the appearance of a simple solution and cost little. President Obama and Education Secretary Duncan favor such programs, as do a growing number of governors, state legislators, business leaders, and several large foundations. As with many states and localities, DC’s efforts were undertaken with the support of federal and foundation incentives.

But niche "reforms" like this could not do enough by themselves to offer real improvement, even if they were accurate and reliable, which they are not. In the case of performance pay, one problem is that the United States lacks an instructional system that would enable valid determinations of which teachers boost students’ test scores. Another is that researchers report that that performance pay does not boost student test scores (the most recent case in point is New York City’s decision to cancel its scheme after a RAND study that found that money rewards had no effect on students’ test scores). And still another is that existing tests do not support defensible determinations of teaching quality, except perhaps at the very extremes of the distribution (see here). (One reason for that last point is that the tests have limited reliability—scores on one administration of a test weakly predict scores on another administration of the same test a week or two later.) Tests also do not agree very well; different parts of the same test that attempt to measure the same academic content seem to yield different results. Moreover, both the students who take such tests and their teachers have unequal access to educational resources, and some teachers systematically get more or less able students (see here, here, and here). For these reasons and several others, the existing tests can incorrectly identify teachers as ineffective or not. Hence this approach is suspect even in niche terms.

In making teachers the culprit for system failure, these policies assume that the causes of weak student learning lie chiefly in teachers’ deficient sense of responsibility, determination, and hard work. It’s true that some teachers are not responsible or determined, but dealing with that small fraction of the teaching force will do little to remedy the chief school-related causes of weak student performance—the absence of systemic clarity about what is to be taught and learned, how best to teach it, and support for teachers to learn those things—all things that well-designed infrastructure could offer. The lack of infrastructure has been especially damaging in the high-poverty schools at which teacher accountability has chiefly been aimed. One result is that most accountability policies have set off a chain of disappointing results—including the gaming of tests by states setting the bar very low, or by district and school personnel cheating (most recently in Atlanta).

To be fair, efforts to refine niche reforms have had several constructive effects: They have helped call attention to America’s longest-running educational problems; they have stimulated public and private work on these problems; and they have drawn attention to inequality in public education. But they have done little to provide the systemic support that infrastructure could offer for the quality instruction that students need.

A coherent educational infrastructure in the United States could enable valid judgments about the quality of teaching and learning and about which teachers do a better job of helping students learn. If teachers and students used common curricula, for example, they would have more equal chances to teach and learn. Teachers could have meaningful opportunities to learn to teach the common curriculum in preservice or later professional education. And there could be assessments of students’ learning that were valid for the common curriculum, so students could have less unequal chances to be tested on what they were supposed to have been taught. Reform should aim to build these key elements of infrastructure, and build educators’ capability to use it well.

The mere presence of these things would not, of course, assure quality education. That would depend on how infrastructure was designed and how educators used it, and use would depend on the capability of school systems, the people who work in them, and how society supported their work. But because teachers in the United States have lacked these resources, they have had great difficulty building shared occupational knowledge and skills. They have had no common framework with which to make valid judgments about students’ work and no common vocabulary with which to identify, investigate, discuss, and solve problems of teaching and learning. Hence, they also have little common knowledge that could be systematized for use in the education of intending teachers. Individual teachers have developed their own knowledge and skills, and some have become quite expert—but public education has had no organized means to turn teachers’ individual knowledge and skill into common know-how, let alone remember it, improve it by analysis, and make it available to novices. Thus, even aside from the question of whether they are valid and reliable (and they are not), small, narrow programs such as IMPACT can distract the nation from how best to solve the schools’ central problems.

- David K. Cohen

Blog Topics

Thank you, Dr. Cohen, for this most helpful and concise explanation of why most of our current attempts at education reform either fail or produce only spotty results. I'm encouraged that some, particularly within the teaching profession, are beginning to understand the systemic nature of the problem and the need to address it at that level.

I'm looking forward to reading your book. It sounds quite interesting.

Do you have a way to estimate the impact of the infrastructure reforms you're suggesting?

I ask, because part of the appeal of the "get better teachers" argument is that the benefits have been quantified by researchers like Hanushek (sp?), and they are large. Most arguments, such as yours, don't make a serious attempt to quantify the impact of the suggested reform. Perhaps these answers are in your book or another source I am unfamiliar with?

I am a teacher, I have had he experienced with students that have parents that support what I do in the classroom. These students become more confident in there learning you see in the work they do and participation. I have had parents that could care less and that child will decline in partipqtion will become weaker in the work and socialization with it's peers. The teacher knows and can sense the difference of caliber of student. Why then politicians and governmental entities want to demonize teachers. Yes there are teachers that ares fringed than others. We are different individuals. Like any other professions.