Underestimating Context (But Selectively)

Imagine that for some reason you were lifted out of your usual place in society and dropped into somebody else’s spot — the place of someone whose behavior you have never understood. For example, you are an anarchist who suddenly becomes a top cabinet member. Or you are an environmentalist who is critical of big business who suddenly finds yourself responsible for developing environmental policy for ExxonMobil or BP.

As systems thinker Donella Meadows points out in her book Thinking in Systems, in any given position, "you experience the information flows, the incentives and disincentives, the goals and discrepancies, the pressure […] that goes with that position." It’s possible, but highly unlikely, that you might remember how things looked from where you were before. If you become a manager, you’ll probably see labor less as a deserving partner, and more as a cost to be minimized. If you become a labor leader, every questionable business decision will start to seem like a deliberate attack on your members.

How do we know?

The best psychological experiments ask questions about human nature. What makes a person strong? Or evil? Are good and evil dispositional hardwired traits, permanent once unleashed? Or is there something about the situations in which people find themselves that influences their behavior?

Success Via The Presumption Of Accuracy

In our previous post, Professor David K. Cohen argued that reforms such as D.C.’s new teacher evaluation system (IMPACT) will not by themselves lead to real educational improvement, because they focus on the individual rather than systemic causes of low performance. He framed this argument in terms of the new round of IMPACT results, which were released two weeks ago. While the preliminary information was limited, it seems that the distribution of teachers across the four ratings categories (highly effective, effective, minimally effective, and ineffective) were roughly similar to last year’s - including a small group of teachers fired for receiving the lowest “ineffective” rating, and a somewhat larger group (roughly 200) fired for having received the “minimally effective” label for two consecutive years.

Cohen’s argument on the importance of infrastructure does not necessarily mean that we should abandon the testing of new evaluation systems, only that we should be very careful about how we interpret their results and the policy conclusions we draw from them (which is good advice at all times). Unfortunately, however, it seems that caution is in short supply. For instance, shortly after the IMPACT results were announced, the Washington Post ran an editorial, entitled “DC Teacher Performance Evaluations Are Working," in which a couple of pieces of “powerful evidence” were put forward in an attempt to support this bold claim. The first was that 58 percent of the teachers who received a “minimally effective” rating last year and remained in the district were rated either “effective” or “highly effective” this year. The second was that around 16 percent of DC teachers were rated “highly effective” this year, and will be offered bonuses, which the editorial writers argued shows that most teachers “are doing a good job” and being rewarded for it.

The Post’s claim that these facts represent evidence - much less “powerful evidence” - of IMPACT’s success is a picture-perfect example of the flawed evidentiary standards that too often drive our education debate. The unfortunate reality is that we have virtually no idea whether IMPACT is actually “working," and we won’t have even a preliminary grasp for some time. Let’s quickly review the Post’s evidence.

Evaluating Individual Teachers Won't Solve Systemic Educational Problems

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

Our guest author today is David K. Cohen, John Dewey Collegiate Professor of Education and professor of public policy at the University of Michigan, and a member of the Shanker Institute’s board of directors.  

What are we to make of recent articles (here and here) extolling IMPACT, Washington DC’s fledging teacher evaluation system, for how many "ineffective" teachers have been identified and fired, how many "highly effective" teachers rewarded? It’s hard to say.

In a forthcoming book, Teaching and Its Predicaments (Harvard University Press, August 2011), I argue that fragmented school governance in the U.S. coupled with the lack of coherent educational infrastructure make it difficult either to broadly improve teaching and learning or to have valid knowledge of the extent of improvement. Merriam-Webster defines "infrastructure" as: "the underlying foundation or basic framework (as of a system or organization)." The term is commonly used to refer to the roads, rail systems, and other frameworks that facilitate the movement of things and people, or to the physical and electronic mechanisms that enable voice and video communication. But social systems also can have such "underlying foundations or basic frameworks". For school systems around the world, the infrastructure commonly includes student curricula or curriculum frameworks, exams to assess students’ learning of the curricula, instruction that centers on teaching that curriculum, and teacher education that aims to help prospective teachers learn how to teach the curricula. The U.S. has had no such common and unifying infrastructure for schools, owing in part to fragmented government (including local control) and traditions of weak state guidance about curriculum and teacher education.

Like many recent reform efforts that focus on teacher performance and accountability, IMPACT does not attempt to build infrastructure, but rather assumes that weak individual teachers are the problem. There are some weak individual teachers, but the chief problem has been a non-system that offers no guidance or support for strong teaching and learning, precisely because there has been no infrastructure. IMPACT frames reform as a matter of solving individual problems when the weakness is systemic.

First, Know-What; Then, Know-How

It is satisfying to read a book that examines education without claiming to be an education book. Small Is Beautiful: Economics as if People Mattered feels fresh and inspiring, despite having been around since the early 1970s. In it, British economist E.F. Schumacher attempts to address fundamental questions, as opposed to dwelling on the politics around nonessential issues, even the politics around the politics.

Schumacher argues that education will only help society if it helps that society become wiser. And we get wiser by thinking first about where we want to go (i.e., know-what), not how to get there. Today, the education world seems focused on the latter. Science, technology, engineering, all teach know-how. But who is concerned with the know-what? In my view, efforts like the Albert Shanker Institute’s "Call for Common Content" are a step in this direction.

Schumacher points out that we often look at education as the answer to all kinds of problems. "[A]ll history – as well as all current experience – points to the fact that it is man, not nature, who provides the primary resource: that the key factor of all economic development comes out of the mind of man." If our civilization is in a state of crisis "it is not far-fetched to suggest that there may be something wrong with its education." We believe that for every new challenge ahead there ought to be a scientific and technological solution: more and better education will solve all problems to come. Yet, with all of our scientific and technological advances, our social problems still seem intractable. Why is that?

Atlanta: Bellwether Or Whistleblower For Test-Driven Reform?

Early in the life of No Child Left Behind, one amateur but insightful futurist on the Shanker Institute Board remarked to me: "Well, if you tie teacher pay, labeling failing schools, and evaluations of teachers and principals all to student test results—guess what?—you’ll get student test results. But some 20, years down the road when these kids get out of high school, we may discover they don’t know anything."

The quip did not necessarily suggest that we were headed for massive cheating scandals. Nor did it mean that students should never be assessed to find out how well they were learning what had been taught. It was just a warning that the incentives to produce score results would produce them —one way or another—and whether or not they stood for any true reflection on learning. Meaning, in this case, that a system that defines success narrowly in terms of test score gains will, at minimum, invite exaggerated claims and, at worst, encourage corruption.

An important report was released this spring that should bring some U. S. education "reformers" up short as they pursue policies based on test-based incentives. Instead, Incentives and Test-Based Accountability in Education, by the National Research Council (NRC), was received as a blip on their screens. A serious research review, the report looked at "15 test-based incentive programs, including large scale policies of NCLB, its predecessors, and state high school exit exams as well as a number of experiments and programs carried out in the United States and other countries." Its conclusion: "Despite using them [test-based incentives] for several decades, policymakers and educators do not yet know how to consistently generate positive effects on achievement and to improve education."

In other words, given the methods we are now using to grant performance pay, design evaluation plans, or fix low performing schools, these incentives don’t work. Moreover, looking at recent education history, they haven’t worked for quite a long time.

Teacher Evaluations: Don't Begin Assembly Until You Have All The Parts

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

Over the past year or two, roughly 15-20 states have passed or are considering legislation calling for the overhaul of teacher evaluation. The central feature of most of these laws is a mandate to incorporate measures of student test score growth, in most cases specifying a minimum percentage of a teacher’s total score that must consist of these estimates.

There’s some variation across states, but the percentages are all quite high. For example, Florida and Colorado both require that at least 50 percent of an evaluation must be based on growth measures, while New York mandates a minimum of 40 percent. These laws also vary in terms of other specifics, such as the degree to which the growth measure proportion must be based on state tests (rather than other assessments), how much flexibility districts have in designing their systems, and how teachers in untested grades and subjects are evaluated. But they all share that defining feature of mandating a minimum proportion – or “weight” – that must be attached to a test-based estimate of teacher effects (at least for those teachers in tested grades and subjects).

Unfortunately, this is typical of the misguided manner in which many lawmakers (and the advocates advising them) have approached the difficult task of overhauling teacher evaluation systems. For instance, I have discussed previously the failure of most systems to account for random error. The weighting issue is another important example, and it violates a basic rule of designing performance assessment systems: You should exercise extreme caution in pre-deciding the importance of any one component until you know what the other components will be. Put simply, you should have all the parts in front of you before you begin the assembly process.

Peer Effects And Attrition In High-Profile Charter Schools

An article in last week’s New York Times tells the story of child who was accepted (via lottery) into the highly-acclaimed Harlem Success Academy (HSA), a charter school in New York City. The boy’s mother was thrilled, saying she felt like she had just gotten her son a tuition-free spot in an elite private school. From the very first day of kindergarten, however, her child was in trouble. Sometimes he was sent home early; other times he was forced to stay late and “practice walking the hallways” as punishment for acting out. During his third week, he was suspended for three days.

Shortly thereafter, the mother, who had been corresponding with the principal and others about these incidents, received an e-mail message from HSA founder Eva Moskowitz. Moskowitz told her that, at this school, it is “extremely important that children feel successful," and that HSA, with its nine-hour days, during which children are “being constantly asked to focus and concentrate," can sometimes “overwhelm children and be a bad environment." The mother understood this to be a veiled threat of sorts, but was not upset at the time. Indeed, she credits HSA staff with helping her to find a regular public school for her child to attend. Happily, her son eventually ended up doing very well at his new school.

It’s very important to remember that this is really only one side of the story. It’s also an anecdote, and there is no way to tell how widespread this practice might be at HSA, or at charter schools in general. I retell it here because it helps to illustrate a difficult-to-measure “advantage” that some charter schools have when compared with regular neighborhood schools – the peer effects of attrition without replacement.

The Faulty Logic Of Using Student Surveys In Accountability Systems

In a recent post, I discussed the questionable value of student survey data to inform teacher evaluation models. Not only is there little research support for such surveys, but the very framing of the idea often reflects faulty reasoning.

A quote from a recent Educators 4 Excellence white paper helps to illustrate the point:

For a system that aims to serve students, young people’s interests are far too often pushed aside. Students’ voices should be at the forefront of the education debate today, especially when it comes to determining the effectiveness of their teacher.

This sounds noble… but seriously, why should students’ opinions be "at the forefront of the education debate"? Are students’ needs better served when we ask students what they need directly? Research on this is explicit: no, not really.

The Implications Of An Extreme "No Excuses" Perspective

In an article in this week’s New York Times Magazine, author Paul Tough notifies supporters of market-based reform that they cannot simply dismiss the "no excuses" maxim when it is convenient. He cites two recent examples of charter schools (the Bruce Randolph School in Denver, CO, and the Urban Prep Academy in Chicago) that were criticized for their low overall performance. Both schools have been defended publicly by "pro-reform" types (the former by Jonathan Alter; the latter by the school’s founder, Tim King), arguing that comparisons of school performance must be valid – that is, the schools’ test scores must be compared with those of similar neighborhood schools.

For example, Tim King notes that, while his school does have a very low proficiency rate – 17 percent – his students are mostly poor African-Americans, whose scores should be compared with those of peers in nearby schools. Paul Tough’s rejoinder is to proclaim that statements like these represent the "very same excuses for failure that the education reform movement was founded to oppose." His basic argument is that a 17 percent pass rate is not good enough, regardless of where a school is located or how disadvantaged are its students, and that pointing to the low performance of comparable schools is really just shedding the "no excuses" mantra when it serves one’s purposes.

Without a doubt, the sentiment behind this argument is noble, not only because it calls out hypocrisy, but because it epitomizes the mantra that "all children can achieve." In this extreme form, however, it also carries a problematic implication: Virtually every piece of high-quality education research, so often cited by market-based reformers to advance the cause, is also built around such "excuses."

A 'Summary Opinion' Of The Hoxby NYC Charter School Study

Almost two years ago, a report on New York City charter schools rocked the education policy world. It was written by Hoover Institution scholar Caroline Hoxby with co-authors Sonali Murarka and Jenny Kang. Their primary finding was that:

On average, a student who attended a charter school for all of grades kindergarten through eight would close about 86 percent of the “Scarsdale-Harlem achievement gap” [the difference in scores between students in Harlem and those in the affluent NYC suburb] in math, and 66 percent of the achievement gap in English.
The headline-grabbing conclusion was uncritically repeated by most major news outlets, including the New York Post, which called the charter effects “off the charts," and the NY Daily News, which announced that, from that day forward, anyone who opposed charter schools was “fighting to block thousands of children from getting superior educations." A week or two later, Mayor Michael Bloomberg specifically cited the study in announcing that he was moving to expand the number of NYC charter schools. Even today, the report is often mentioned as primary evidence favoring the efficacy of charter schools.

I would like to revisit this study, but not as a means to relitigate the “do charters work?" debate. Indeed, I have argued previously that we spend too much time debating whether charter schools “work," and too little time asking why some few are successful. Instead, my purpose is to illustrate an important research maxim: Even well-designed, sophisticated analyses with important conclusions can be compromised by a misleading presentation of results.