Evaluating Individual Teachers Won't Solve Systemic Educational Problems

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

Our guest author today is David K. Cohen, John Dewey Collegiate Professor of Education and professor of public policy at the University of Michigan, and a member of the Shanker Institute’s board of directors.  

What are we to make of recent articles (here and here) extolling IMPACT, Washington DC’s fledging teacher evaluation system, for how many "ineffective" teachers have been identified and fired, how many "highly effective" teachers rewarded? It’s hard to say.

In a forthcoming book, Teaching and Its Predicaments (Harvard University Press, August 2011), I argue that fragmented school governance in the U.S. coupled with the lack of coherent educational infrastructure make it difficult either to broadly improve teaching and learning or to have valid knowledge of the extent of improvement. Merriam-Webster defines "infrastructure" as: "the underlying foundation or basic framework (as of a system or organization)." The term is commonly used to refer to the roads, rail systems, and other frameworks that facilitate the movement of things and people, or to the physical and electronic mechanisms that enable voice and video communication. But social systems also can have such "underlying foundations or basic frameworks". For school systems around the world, the infrastructure commonly includes student curricula or curriculum frameworks, exams to assess students’ learning of the curricula, instruction that centers on teaching that curriculum, and teacher education that aims to help prospective teachers learn how to teach the curricula. The U.S. has had no such common and unifying infrastructure for schools, owing in part to fragmented government (including local control) and traditions of weak state guidance about curriculum and teacher education.

Like many recent reform efforts that focus on teacher performance and accountability, IMPACT does not attempt to build infrastructure, but rather assumes that weak individual teachers are the problem. There are some weak individual teachers, but the chief problem has been a non-system that offers no guidance or support for strong teaching and learning, precisely because there has been no infrastructure. IMPACT frames reform as a matter of solving individual problems when the weakness is systemic.

First, Know-What; Then, Know-How

It is satisfying to read a book that examines education without claiming to be an education book. Small Is Beautiful: Economics as if People Mattered feels fresh and inspiring, despite having been around since the early 1970s. In it, British economist E.F. Schumacher attempts to address fundamental questions, as opposed to dwelling on the politics around nonessential issues, even the politics around the politics.

Schumacher argues that education will only help society if it helps that society become wiser. And we get wiser by thinking first about where we want to go (i.e., know-what), not how to get there. Today, the education world seems focused on the latter. Science, technology, engineering, all teach know-how. But who is concerned with the know-what? In my view, efforts like the Albert Shanker Institute’s "Call for Common Content" are a step in this direction.

Schumacher points out that we often look at education as the answer to all kinds of problems. "[A]ll history – as well as all current experience – points to the fact that it is man, not nature, who provides the primary resource: that the key factor of all economic development comes out of the mind of man." If our civilization is in a state of crisis "it is not far-fetched to suggest that there may be something wrong with its education." We believe that for every new challenge ahead there ought to be a scientific and technological solution: more and better education will solve all problems to come. Yet, with all of our scientific and technological advances, our social problems still seem intractable. Why is that?

Atlanta: Bellwether Or Whistleblower For Test-Driven Reform?

Early in the life of No Child Left Behind, one amateur but insightful futurist on the Shanker Institute Board remarked to me: "Well, if you tie teacher pay, labeling failing schools, and evaluations of teachers and principals all to student test results—guess what?—you’ll get student test results. But some 20, years down the road when these kids get out of high school, we may discover they don’t know anything."

The quip did not necessarily suggest that we were headed for massive cheating scandals. Nor did it mean that students should never be assessed to find out how well they were learning what had been taught. It was just a warning that the incentives to produce score results would produce them —one way or another—and whether or not they stood for any true reflection on learning. Meaning, in this case, that a system that defines success narrowly in terms of test score gains will, at minimum, invite exaggerated claims and, at worst, encourage corruption.

An important report was released this spring that should bring some U. S. education "reformers" up short as they pursue policies based on test-based incentives. Instead, Incentives and Test-Based Accountability in Education, by the National Research Council (NRC), was received as a blip on their screens. A serious research review, the report looked at "15 test-based incentive programs, including large scale policies of NCLB, its predecessors, and state high school exit exams as well as a number of experiments and programs carried out in the United States and other countries." Its conclusion: "Despite using them [test-based incentives] for several decades, policymakers and educators do not yet know how to consistently generate positive effects on achievement and to improve education."

In other words, given the methods we are now using to grant performance pay, design evaluation plans, or fix low performing schools, these incentives don’t work. Moreover, looking at recent education history, they haven’t worked for quite a long time.

Teacher Evaluations: Don't Begin Assembly Until You Have All The Parts

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

Over the past year or two, roughly 15-20 states have passed or are considering legislation calling for the overhaul of teacher evaluation. The central feature of most of these laws is a mandate to incorporate measures of student test score growth, in most cases specifying a minimum percentage of a teacher’s total score that must consist of these estimates.

There’s some variation across states, but the percentages are all quite high. For example, Florida and Colorado both require that at least 50 percent of an evaluation must be based on growth measures, while New York mandates a minimum of 40 percent. These laws also vary in terms of other specifics, such as the degree to which the growth measure proportion must be based on state tests (rather than other assessments), how much flexibility districts have in designing their systems, and how teachers in untested grades and subjects are evaluated. But they all share that defining feature of mandating a minimum proportion – or “weight” – that must be attached to a test-based estimate of teacher effects (at least for those teachers in tested grades and subjects).

Unfortunately, this is typical of the misguided manner in which many lawmakers (and the advocates advising them) have approached the difficult task of overhauling teacher evaluation systems. For instance, I have discussed previously the failure of most systems to account for random error. The weighting issue is another important example, and it violates a basic rule of designing performance assessment systems: You should exercise extreme caution in pre-deciding the importance of any one component until you know what the other components will be. Put simply, you should have all the parts in front of you before you begin the assembly process.

Peer Effects And Attrition In High-Profile Charter Schools

An article in last week’s New York Times tells the story of child who was accepted (via lottery) into the highly-acclaimed Harlem Success Academy (HSA), a charter school in New York City. The boy’s mother was thrilled, saying she felt like she had just gotten her son a tuition-free spot in an elite private school. From the very first day of kindergarten, however, her child was in trouble. Sometimes he was sent home early; other times he was forced to stay late and “practice walking the hallways” as punishment for acting out. During his third week, he was suspended for three days.

Shortly thereafter, the mother, who had been corresponding with the principal and others about these incidents, received an e-mail message from HSA founder Eva Moskowitz. Moskowitz told her that, at this school, it is “extremely important that children feel successful," and that HSA, with its nine-hour days, during which children are “being constantly asked to focus and concentrate," can sometimes “overwhelm children and be a bad environment." The mother understood this to be a veiled threat of sorts, but was not upset at the time. Indeed, she credits HSA staff with helping her to find a regular public school for her child to attend. Happily, her son eventually ended up doing very well at his new school.

It’s very important to remember that this is really only one side of the story. It’s also an anecdote, and there is no way to tell how widespread this practice might be at HSA, or at charter schools in general. I retell it here because it helps to illustrate a difficult-to-measure “advantage” that some charter schools have when compared with regular neighborhood schools – the peer effects of attrition without replacement.

The Faulty Logic Of Using Student Surveys In Accountability Systems

In a recent post, I discussed the questionable value of student survey data to inform teacher evaluation models. Not only is there little research support for such surveys, but the very framing of the idea often reflects faulty reasoning.

A quote from a recent Educators 4 Excellence white paper helps to illustrate the point:

For a system that aims to serve students, young people’s interests are far too often pushed aside. Students’ voices should be at the forefront of the education debate today, especially when it comes to determining the effectiveness of their teacher.

This sounds noble… but seriously, why should students’ opinions be "at the forefront of the education debate"? Are students’ needs better served when we ask students what they need directly? Research on this is explicit: no, not really.

The Implications Of An Extreme "No Excuses" Perspective

In an article in this week’s New York Times Magazine, author Paul Tough notifies supporters of market-based reform that they cannot simply dismiss the "no excuses" maxim when it is convenient. He cites two recent examples of charter schools (the Bruce Randolph School in Denver, CO, and the Urban Prep Academy in Chicago) that were criticized for their low overall performance. Both schools have been defended publicly by "pro-reform" types (the former by Jonathan Alter; the latter by the school’s founder, Tim King), arguing that comparisons of school performance must be valid – that is, the schools’ test scores must be compared with those of similar neighborhood schools.

For example, Tim King notes that, while his school does have a very low proficiency rate – 17 percent – his students are mostly poor African-Americans, whose scores should be compared with those of peers in nearby schools. Paul Tough’s rejoinder is to proclaim that statements like these represent the "very same excuses for failure that the education reform movement was founded to oppose." His basic argument is that a 17 percent pass rate is not good enough, regardless of where a school is located or how disadvantaged are its students, and that pointing to the low performance of comparable schools is really just shedding the "no excuses" mantra when it serves one’s purposes.

Without a doubt, the sentiment behind this argument is noble, not only because it calls out hypocrisy, but because it epitomizes the mantra that "all children can achieve." In this extreme form, however, it also carries a problematic implication: Virtually every piece of high-quality education research, so often cited by market-based reformers to advance the cause, is also built around such "excuses."

A 'Summary Opinion' Of The Hoxby NYC Charter School Study

Almost two years ago, a report on New York City charter schools rocked the education policy world. It was written by Hoover Institution scholar Caroline Hoxby with co-authors Sonali Murarka and Jenny Kang. Their primary finding was that:

On average, a student who attended a charter school for all of grades kindergarten through eight would close about 86 percent of the “Scarsdale-Harlem achievement gap” [the difference in scores between students in Harlem and those in the affluent NYC suburb] in math, and 66 percent of the achievement gap in English.
The headline-grabbing conclusion was uncritically repeated by most major news outlets, including the New York Post, which called the charter effects “off the charts," and the NY Daily News, which announced that, from that day forward, anyone who opposed charter schools was “fighting to block thousands of children from getting superior educations." A week or two later, Mayor Michael Bloomberg specifically cited the study in announcing that he was moving to expand the number of NYC charter schools. Even today, the report is often mentioned as primary evidence favoring the efficacy of charter schools.

I would like to revisit this study, but not as a means to relitigate the “do charters work?" debate. Indeed, I have argued previously that we spend too much time debating whether charter schools “work," and too little time asking why some few are successful. Instead, my purpose is to illustrate an important research maxim: Even well-designed, sophisticated analyses with important conclusions can be compromised by a misleading presentation of results.

What Do We Do When Second Graders Think Math Is Not For Girls?

Although the past several generations have seen declining gender inequalities in educational attainment, gender-based differences in the fields of study we choose seem to persist (see here). For example, the percentage of women obtaining degrees in the science, technology, engineering, and mathematics (STEM) fields has remained exceedingly static in the last few decades (see here).

In trying to explain this persistent trend, some conclude that (1) women are not as interested in these fields, and/or that (2) women just aren’t as good as men in these domains. But how would one tell whether these explanations are right or wrong?

One problem is that people share specific, culturally based ideas about what men and women are and should be. Numerous studies demonstrate that the dearth of women in STEM fields can be directly linked to negative associations regarding girls and the sciences, and especially girls and math ability (see here and here).

What Do State And Local Governments Do?

Those who wish to dismantle public services in the U.S. seem to share a general belief – accepted, to some extent, even by people who generally support public sector spending – that government is a massive, incompetent blob. At the federal level, I have always found this somewhat strange, since around two-thirds of federal spending goes towards Social Security, Medicare/Medicaid and national defense, programs that are generally popular and widely regarded as successful.

Survey data indicate that people do trust state and local government more than they do federal government, but the level of confidence is still not particularly high. Americans also appear generally unwilling to pay higher taxes to preserve public services (except for education), and most accept that state and local government is too large and much of it is superfluous. But when people are asked about specific programs, they tend to respond favorably. This suggests, among other things, that people may have general perceptions of "government" without full knowledge of all the roles government plays.

So, I thought it might be useful to take a quick look at how public dollars actually are spent. After all, it’s our money, and it’s always good to keep track of how our elected officials are spending it.