Certainty And Good Policymaking Don't Mix

Using value-added and other types of growth model estimates in teacher evaluations is probably the most controversial and oft-discussed issue in education policy over the past few years.

Many people (including a large proportion of teachers) are opposed to using student test scores in their evaluations, as they feel that the measures are not valid or reliable, and that they will incentivize perverse behavior, such as cheating or competition between teachers. Advocates, on the other hand, argue that student performance is a vital part of teachers’ performance evaluations, and that the growth model estimates, while imperfect, represent the best available option.

I am sympathetic to both views. In fact, in my opinion, there are only two unsupportable positions in this debate: Certainty that using these measures in evaluations will work; and certainty that it won’t. Unfortunately, that’s often how the debate has proceeded – two deeply-entrenched sides convinced of their absolutist positions, and resolved that any nuance in or compromise of their views will only preclude the success of their efforts. You’re with them or against them. The problem is that it's the nuance - the details - that determine policy effects.

Let’s be clear about something: I'm not aware of a shred of evidence – not a shred – that the use of growth model estimates in teacher evaluations improves performance of either teachers or students.

Teachers' Preparation Routes And Policy Views

In a previous post, I lamented the scarcity of survey data measuring what teachers think of different education policy reforms. A couple of weeks ago, the National Center for Education Information (NCEI) released the results of their teacher survey (conducted every five years), which provides a useful snapshot of teachers’ opinions toward different policies (albeit not at the level of detail that one might wish).

There are too many interesting results to review in one post, and I encourage you to take a look at the full set yourself. There was, however, one thing about the survey tabulations that I found particularly striking, and that was the high degree to which policy opinions differed between traditionally-certified teachers and those who entered teaching through alternative certification (alt-cert).

In the figure below, I reproduce data from the NCEI report’s battery of questions about whether teachers think different policies would “improve education." Respondents are divided by preparation route – traditional and alternative.

Test-Based Teacher Evaluations Are The Status Quo

We talk a lot about the “status quo” in our education debates. For instance, there is a common argument that the failure to use evidence of “student learning” (in practice, usually defined in terms of test scores) in teacher evaluations represents the “status quo” in this (very important) area.

Now, the implication that “anything is better than the status quo” is a rather massive fallacy in public policy, as it assumes that the costs of alternatives will outweigh benefits, and that there is no chance the replacement policy will have a negative impact (almost always an unsafe assumption). But, in the case of teacher evaluations, the “status quo” is no longer what people seem to think.

Not counting Puerto Rico and Hawaii, the ten largest school districts in the U.S. are (in order): New York City; Los Angeles; Chicago; Dade County (FL); Clark County (NV); Broward County (FL); Houston; Hillsborough (FL); Orange County (FL); and Palm Beach County (FL). Together, they serve about eight percent of all K-12 public school students in the U.S., and over one in ten of the nation’s low-income children.

Although details vary, every single one of them is either currently using test-based measures of effectiveness in its evaluations, or is in the process of designing/implementing these systems (most due to statewide legislation).

Attracting The "Best Candidates" To Teaching

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

One of the few issues that all sides in the education debate agree upon is the desirability of attracting “better people” into the teaching profession. While this certainly includes the possibility of using policy to lure career-switchers, most of the focus is on attracting “top” candidates right out of college or graduate school.

The common metric that is used to identify these “top” candidates is their pre-service (especially college) characteristics and performance. Most commonly, people call for the need to attract teachers from the “top third” of graduating classes, an outcome that is frequently cited as being the case in high-performing nations such as Finland. Now, it bears noting that “attracting better people," like “improving teacher quality," is a policy goal, not a concrete policy proposal – it tells us what we want, not how to get it. And how to make teaching more enticing for “top” candidates is still very much an open question (as is the equally important question of how to improve the performance of existing teachers).

In order to answer that question, we need to have some idea of whom we’re pursuing – who are these “top” candidates, and what do they want? I sometimes worry that our conception of this group – in terms of the “top third” and similar constructions – doesn’t quite square with the evidence, and that this misconception might actually be misguiding rather than focusing our policy discussions.

Again, Niche Reforms Are Not The Answer

Our guest author today is David K. Cohen, John Dewey Collegiate Professor of Education and professor of public policy at the University of Michigan, and a member of the Shanker Institute’s board of directors.

A recent response to my previous post on these pages helps to underscore one of my central points: If there is no clarity about what it will take to improve schools, it will be difficult to design a system that can do it.  In a recent essay in the Sunday New York Times Magazine, Paul Tough wrote that education reformers who advocated "no excuses" schooling were now making excuses for reformed schools' weak performance.  He explained why: " Most likely for the same reason that urban educators from an earlier generation made excuses: successfully educating large numbers of low-income kids is very, very hard." 

 In his post criticizing my initial essay, "What does it mean to ‘fix the system’?," the Fordham Institute’s Chris Tessone told the story of how Newark Public Schools tried to meet the requirements of a federal school turnaround grant. The terms of the grant required that each of three failing high school replace at least half of their staff. The schools, he wrote, met this requirement largely by swapping a portion of their staffs with one another, a process which Tessone and school administrators refer to as the “dance of the lemons.”Would such replacement be likely to solve the problem?

Even if all of the replaced teachers had been weak (which we do not know), I doubt that such replacement could have done much to help.

If Gifted And Talented Programs Don't Boost Scores, Should We Eliminate Them?

In education policy debates, the phrase “what works” is sometimes used to mean “what increases test scores." Among those of us who believe that testing data have a productive role to play in education policy (even if we disagree on the details of that role), there is a constant struggle to interpret test-based evidence properly and put it in context. This effort to craft and maintain a framework for using assessment data productively is very important but, despite the careless claims of some public figures, it is also extremely difficult.

Equally important and difficult is the need to apply that framework consistently. For instance, a recent working paper from the National Bureau of Economic Research (NBER) looked at the question of whether gifted and talented (GT) programs boost student achievement. The researchers found that GT programs (and magnet schools as well) have little discernible impact on students’ test score gains. Another recent NBER paper reached the same conclusion about the highly-selective “exam schools” in New York and Boston. Now, it’s certainly true that high-quality research on the test-based effect of these programs is still somewhat scarce, and these are only two (as yet unpublished) analyses, but their conclusions are certainly worth noting.

Still, let’s speculate for a moment: Let’s say that, over the next few years, several other good studies also reached the same conclusion. Would anyone, based on this evidence, be calling for the elimination of GT programs? I doubt it. Yet, if we applied faithfully the standards by which we sometimes judge other policy interventions, we would have to make a case for getting rid of GT.

In The Classroom, Differences Can Become Assets

Author, speaker and education expert Sir Ken Robinson argues that today’s education system is anachronistic and needs to be rethought. Robinson notes that our current model, shaped by the industrial revolution, reveals a "production line" approach: for example, we group kids by "date of manufacture", instruct them "by batches", and subject them all to standardized tests. Yet, we often miss the most fundamental questions - for example, Robinson asks, "Why is age the most important thing kids have in common?"

In spite of the various theories about the stages of cognitive development (Piaget, etc.), it is difficult to decide how to group children. Academically and linguistically diverse classrooms have become a prevalent phenomenon in the U.S. and other parts of the world, posing important challenges for educators whose mission is to support the learning of all students.

It’s not only that children are dissimilar in terms of their interests, ethnicity, social class, skills, and other attributes; what’s even more consequential is that human interactions are built on the basis of those differences. In other words, individuals create patterns of relations that reflect and perpetuate social distinctions.

Atlanta: Bellwether Or Whistleblower For Test-Driven Reform?

Early in the life of No Child Left Behind, one amateur but insightful futurist on the Shanker Institute Board remarked to me: "Well, if you tie teacher pay, labeling failing schools, and evaluations of teachers and principals all to student test results—guess what?—you’ll get student test results. But some 20, years down the road when these kids get out of high school, we may discover they don’t know anything."

The quip did not necessarily suggest that we were headed for massive cheating scandals. Nor did it mean that students should never be assessed to find out how well they were learning what had been taught. It was just a warning that the incentives to produce score results would produce them —one way or another—and whether or not they stood for any true reflection on learning. Meaning, in this case, that a system that defines success narrowly in terms of test score gains will, at minimum, invite exaggerated claims and, at worst, encourage corruption.

An important report was released this spring that should bring some U. S. education "reformers" up short as they pursue policies based on test-based incentives. Instead, Incentives and Test-Based Accountability in Education, by the National Research Council (NRC), was received as a blip on their screens. A serious research review, the report looked at "15 test-based incentive programs, including large scale policies of NCLB, its predecessors, and state high school exit exams as well as a number of experiments and programs carried out in the United States and other countries." Its conclusion: "Despite using them [test-based incentives] for several decades, policymakers and educators do not yet know how to consistently generate positive effects on achievement and to improve education."

In other words, given the methods we are now using to grant performance pay, design evaluation plans, or fix low performing schools, these incentives don’t work. Moreover, looking at recent education history, they haven’t worked for quite a long time.

The Implications Of An Extreme "No Excuses" Perspective

In an article in this week’s New York Times Magazine, author Paul Tough notifies supporters of market-based reform that they cannot simply dismiss the "no excuses" maxim when it is convenient. He cites two recent examples of charter schools (the Bruce Randolph School in Denver, CO, and the Urban Prep Academy in Chicago) that were criticized for their low overall performance. Both schools have been defended publicly by "pro-reform" types (the former by Jonathan Alter; the latter by the school’s founder, Tim King), arguing that comparisons of school performance must be valid – that is, the schools’ test scores must be compared with those of similar neighborhood schools.

For example, Tim King notes that, while his school does have a very low proficiency rate – 17 percent – his students are mostly poor African-Americans, whose scores should be compared with those of peers in nearby schools. Paul Tough’s rejoinder is to proclaim that statements like these represent the "very same excuses for failure that the education reform movement was founded to oppose." His basic argument is that a 17 percent pass rate is not good enough, regardless of where a school is located or how disadvantaged are its students, and that pointing to the low performance of comparable schools is really just shedding the "no excuses" mantra when it serves one’s purposes.

Without a doubt, the sentiment behind this argument is noble, not only because it calls out hypocrisy, but because it epitomizes the mantra that "all children can achieve." In this extreme form, however, it also carries a problematic implication: Virtually every piece of high-quality education research, so often cited by market-based reformers to advance the cause, is also built around such "excuses."

No Excuses In Anti-Poverty Policy As Well

Much of the current education debate consists of a constant, ongoing argument about the role of poverty. One “side” is accused of using poverty as an excuse for not improving schools, and of saying that poverty is destiny in regard to educational outcomes. The other “side” is accused of completely ignoring the detrimental effects of poverty, and of arguing that market-based reforms can by themselves transform our public education system.

Both portrayals are inaccurate, and both “sides” know it, yet the accusations continue. Of course there is a core of truth in the characterizations, but the differences are far more nuanced than the opponents usually communicate. It’s really a matter of degree. In addition to differences in the specifics of what should be done, a lot boils down to variations in how much improvement we believe can be gained by teacher-focused education reform (or by education reform in general) by itself. In other words, some people have higher expectations than others.

I have previously argued that the reasonable expectation for teacher quality-based reforms is that, if everything goes perfectly (which is far from certain), they will generate very slow, gradual improvement over a period of years and decades. This means we should make these changes, but be very careful to design them sensibly, monitor their effects, and maintain realistic expectations (for the record, I think we are, in many respects, falling short on all three counts).

But the thing that I find a little frustrating about the whole poverty/education thing is that, while nobody should use poverty as an excuse in education policy, it’s not uncommon to hear education used as an excuse, of sorts, in discussions about anti-poverty policy.