Certainty And Good Policymaking Don't Mix

Using value-added and other types of growth model estimates in teacher evaluations is probably the most controversial and oft-discussed issue in education policy over the past few years.

Many people (including a large proportion of teachers) are opposed to using student test scores in their evaluations, as they feel that the measures are not valid or reliable, and that they will incentivize perverse behavior, such as cheating or competition between teachers. Advocates, on the other hand, argue that student performance is a vital part of teachers’ performance evaluations, and that the growth model estimates, while imperfect, represent the best available option.

I am sympathetic to both views. In fact, in my opinion, there are only two unsupportable positions in this debate: Certainty that using these measures in evaluations will work; and certainty that it won’t. Unfortunately, that’s often how the debate has proceeded – two deeply-entrenched sides convinced of their absolutist positions, and resolved that any nuance in or compromise of their views will only preclude the success of their efforts. You’re with them or against them. The problem is that it's the nuance - the details - that determine policy effects.

Let’s be clear about something: I'm not aware of a shred of evidence – not a shred – that the use of growth model estimates in teacher evaluations improves performance of either teachers or students.

Our Annual Testing Data Charade

Every year, around this time, states and districts throughout the nation release their official testing results. Schools are closed and reputations are made or broken by these data. But this annual tradition is, in some places, becoming a charade.

Most states and districts release two types of assessment data every year (by student subgroup, school and grade): Average scores (“scale scores”); and the percent of students who meet the standards to be labeled proficient, advanced, basic and below basic. The latter type – the rates – are of course derived from the scores – that is, they tell us the proportion of students whose scale score was above the minimum necessary to be considered proficient, advanced, etc.

Both types of data are cross-sectional. They don’t follow individual students over time, but rather give a “snapshot” of aggregate performance among two different groups of students (for example, third graders in 2010 compared with third graders in 2011). Calling the change in these results “progress” or “gains” is inaccurate; they are cohort changes, and might just as well be chalked up to differences in the characteristics of the students (especially when changes are small). Even averaged across an entire school or district, there can be huge differences in the groups compared between years – not only is there often considerable student mobility in and out of schools/districts, but every year, a new cohort enters at the lowest tested grade, while a whole other cohort exits at the highest tested grade (except for those retained).

For these reasons, any comparisons between years must be done with extreme caution, but the most common way - simply comparing proficiency rates between years - is in many respects the worst. A closer look at this year’s New York City results illustrates this perfectly.

Test-Based Teacher Evaluations Are The Status Quo

We talk a lot about the “status quo” in our education debates. For instance, there is a common argument that the failure to use evidence of “student learning” (in practice, usually defined in terms of test scores) in teacher evaluations represents the “status quo” in this (very important) area.

Now, the implication that “anything is better than the status quo” is a rather massive fallacy in public policy, as it assumes that the costs of alternatives will outweigh benefits, and that there is no chance the replacement policy will have a negative impact (almost always an unsafe assumption). But, in the case of teacher evaluations, the “status quo” is no longer what people seem to think.

Not counting Puerto Rico and Hawaii, the ten largest school districts in the U.S. are (in order): New York City; Los Angeles; Chicago; Dade County (FL); Clark County (NV); Broward County (FL); Houston; Hillsborough (FL); Orange County (FL); and Palm Beach County (FL). Together, they serve about eight percent of all K-12 public school students in the U.S., and over one in ten of the nation’s low-income children.

Although details vary, every single one of them is either currently using test-based measures of effectiveness in its evaluations, or is in the process of designing/implementing these systems (most due to statewide legislation).

Melodramatic

At a press conference earlier this week, New York City Mayor Michael Bloomberg announced the city’s 2011 test results. Wall Street Journal reporter Lisa Fleisher, who was on the scene, tweeted Mayor Bloomberg’s remarks. According to Fleisher, the mayor claimed that there was a “dramatic difference” between his city’s testing progress between 2010 and 2011, as compared with the rest of state.

Putting aside the fact that the results do not measure “progress” per se, but rather cohort changes – a comparison of cross-sectional data that measures the aggregate performance of two different groups of students – I must say that I was a little astounded by this claim. Fleisher was also kind enough to tweet a photograph that the mayor put on the screen in order to illustrate the “dramatic difference” between the gains of NYC students relative to their non-NYC counterparts across the state.  Here it is:

If Gifted And Talented Programs Don't Boost Scores, Should We Eliminate Them?

In education policy debates, the phrase “what works” is sometimes used to mean “what increases test scores." Among those of us who believe that testing data have a productive role to play in education policy (even if we disagree on the details of that role), there is a constant struggle to interpret test-based evidence properly and put it in context. This effort to craft and maintain a framework for using assessment data productively is very important but, despite the careless claims of some public figures, it is also extremely difficult.

Equally important and difficult is the need to apply that framework consistently. For instance, a recent working paper from the National Bureau of Economic Research (NBER) looked at the question of whether gifted and talented (GT) programs boost student achievement. The researchers found that GT programs (and magnet schools as well) have little discernible impact on students’ test score gains. Another recent NBER paper reached the same conclusion about the highly-selective “exam schools” in New York and Boston. Now, it’s certainly true that high-quality research on the test-based effect of these programs is still somewhat scarce, and these are only two (as yet unpublished) analyses, but their conclusions are certainly worth noting.

Still, let’s speculate for a moment: Let’s say that, over the next few years, several other good studies also reached the same conclusion. Would anyone, based on this evidence, be calling for the elimination of GT programs? I doubt it. Yet, if we applied faithfully the standards by which we sometimes judge other policy interventions, we would have to make a case for getting rid of GT.

First, Know-What; Then, Know-How

It is satisfying to read a book that examines education without claiming to be an education book. Small Is Beautiful: Economics as if People Mattered feels fresh and inspiring, despite having been around since the early 1970s. In it, British economist E.F. Schumacher attempts to address fundamental questions, as opposed to dwelling on the politics around nonessential issues, even the politics around the politics.

Schumacher argues that education will only help society if it helps that society become wiser. And we get wiser by thinking first about where we want to go (i.e., know-what), not how to get there. Today, the education world seems focused on the latter. Science, technology, engineering, all teach know-how. But who is concerned with the know-what? In my view, efforts like the Albert Shanker Institute’s "Call for Common Content" are a step in this direction.

Schumacher points out that we often look at education as the answer to all kinds of problems. "[A]ll history – as well as all current experience – points to the fact that it is man, not nature, who provides the primary resource: that the key factor of all economic development comes out of the mind of man." If our civilization is in a state of crisis "it is not far-fetched to suggest that there may be something wrong with its education." We believe that for every new challenge ahead there ought to be a scientific and technological solution: more and better education will solve all problems to come. Yet, with all of our scientific and technological advances, our social problems still seem intractable. Why is that?

The Implications Of An Extreme "No Excuses" Perspective

In an article in this week’s New York Times Magazine, author Paul Tough notifies supporters of market-based reform that they cannot simply dismiss the "no excuses" maxim when it is convenient. He cites two recent examples of charter schools (the Bruce Randolph School in Denver, CO, and the Urban Prep Academy in Chicago) that were criticized for their low overall performance. Both schools have been defended publicly by "pro-reform" types (the former by Jonathan Alter; the latter by the school’s founder, Tim King), arguing that comparisons of school performance must be valid – that is, the schools’ test scores must be compared with those of similar neighborhood schools.

For example, Tim King notes that, while his school does have a very low proficiency rate – 17 percent – his students are mostly poor African-Americans, whose scores should be compared with those of peers in nearby schools. Paul Tough’s rejoinder is to proclaim that statements like these represent the "very same excuses for failure that the education reform movement was founded to oppose." His basic argument is that a 17 percent pass rate is not good enough, regardless of where a school is located or how disadvantaged are its students, and that pointing to the low performance of comparable schools is really just shedding the "no excuses" mantra when it serves one’s purposes.

Without a doubt, the sentiment behind this argument is noble, not only because it calls out hypocrisy, but because it epitomizes the mantra that "all children can achieve." In this extreme form, however, it also carries a problematic implication: Virtually every piece of high-quality education research, so often cited by market-based reformers to advance the cause, is also built around such "excuses."

Great Expectations

A couple of years ago, Eat Pray Love author Elizabeth Gilbert explored the negative side of our unrealistically high expectations for artists and, more generally, for those who rely on their creativity to make a living. In ancient Rome, Gilbert recounts, creativity was associated with a sort of divine spirit that came to human beings from some distant and unknowable source, for distant and unfathomable reasons. The Romans referred to this intangible spirit as a genius. An individual was not a genius, but rather had a genius - a magical entity who was believed to live in the walls of an artist's studio and who would come out and invisibly assist the artist with his/her work. The lesson Gilbert draws is one of humility (i.e., successes are not entirely ours – don’t be such a narcissist) and emancipatory relief (i.e., failures are not completely our fault either – can’t hurt to try).

What does all this have to do with education and teachers? It seems to me that our expectations for both teachers and artists are sometimes unrealistic and unproductive, if not detrimental. Great teachers are often portrayed as superheroes, unencumbered by anything that might distract them from their teaching crusade – "refusing to surrender to the combined menaces of poverty, bureaucracy, and budgetary shortfalls." As a recent article in The Atlantic explained, Teach for America now asks applicants to talk about how they have overcome the challenges in their lives and uses these answers to rate their perseverance.

Yet the meaning of "Great Teacher" rarely gets analyzed. Instead, our definition of greatness – or even competence – remains a convenient black box, leading some to suggest that the question of what makes a teacher great is less important than separating the wheat from the chaff. In turn, this reveals a simplistic and, in my view, negative assumption that greatness, unlike Gilbert’s genius, is a stable, static, innate, and independent attribute. You either have it or you don’t.

No Excuses In Anti-Poverty Policy As Well

Much of the current education debate consists of a constant, ongoing argument about the role of poverty. One “side” is accused of using poverty as an excuse for not improving schools, and of saying that poverty is destiny in regard to educational outcomes. The other “side” is accused of completely ignoring the detrimental effects of poverty, and of arguing that market-based reforms can by themselves transform our public education system.

Both portrayals are inaccurate, and both “sides” know it, yet the accusations continue. Of course there is a core of truth in the characterizations, but the differences are far more nuanced than the opponents usually communicate. It’s really a matter of degree. In addition to differences in the specifics of what should be done, a lot boils down to variations in how much improvement we believe can be gained by teacher-focused education reform (or by education reform in general) by itself. In other words, some people have higher expectations than others.

I have previously argued that the reasonable expectation for teacher quality-based reforms is that, if everything goes perfectly (which is far from certain), they will generate very slow, gradual improvement over a period of years and decades. This means we should make these changes, but be very careful to design them sensibly, monitor their effects, and maintain realistic expectations (for the record, I think we are, in many respects, falling short on all three counts).

But the thing that I find a little frustrating about the whole poverty/education thing is that, while nobody should use poverty as an excuse in education policy, it’s not uncommon to hear education used as an excuse, of sorts, in discussions about anti-poverty policy.

What Do Teachers Really Think About Education Reform?

There has recently been a lot of talk about teachers’ views on education policy. Many teachers have been quite vocal in their opposition to certain policies (also here) and many more have expressed their views democratically – through their unions – especially in states where teachers have collective bargaining rights.

We should listen carefully to these views, but it’s also important to bear in mind that there are millions of public school teachers out there, with a wide variety of opinions on any particular education policy, and not all of their voices might be getting through.

So, the question remains: How do most teachers feel about the current wave of education policy reforms spreading throughout states and districts, including (but not at all limited to) merit pay, eliminating tenure and incorporating test-based measures into teacher evaluations?

The logical mechanism by which we might learn more about teachers’ views on these policies is, of course, a survey. Unfortunately, useful national surveys are quite rare. In order to get accurate estimates, you need an unusually large number of teachers to take the survey (a deliberate "oversample"), and they must be randomly polled (lest there be selection bias). In my last post, I suggested that states/districts conduct their own teacher surveys.  In the meantime, some national evidence is already available, and if the data make one thing clear, it’s that we need more. When it comes to supporting or opposing different policies, teachers’ opinions, like everyone’s, depend a great deal on the details.