Merit Pay: The End Of Innocence?

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

The current teacher salary scale has come under increasing fire, and for a reason. Systems where people are treated more or less the same suffer from two basic problems. First, there will always be a number of "free riders." Second, and relatedly, some people may feel their contributions aren’t sufficiently recognized. So, what are good alternatives? I am not sure; but based on decades worth of economic and psychological research, measures such as merit pay are not it.

Although individual pay for performance (or merit pay) is a widespread practice among U.S. businesses, the research on its effectiveness shows it to be of limited utility (see here, here, here, and here), mostly because it’s easy for its benefits to be swamped by unintended consequences. Indeed, psychological research indicates that a focus on financial rewards may serve to (a) reduce intrinsic motivation, (b) heighten stress to the point that it impairs performance, and (c) promote a narrow focus reducing how well people do in all dimensions except the one being measured.

In 1971, a research psychologist named Edward Deci published a paper concluding that, while verbal reinforcement and positive feedback tends to strengthen intrinsic motivation, monetary rewards tend to weaken it. In 1999, Deci and his colleagues published a meta-analysis of 128 studies (see here), again concluding that, when people do things in exchange for external rewards, their intrinsic motivation tends to diminish. That is, once a certain activity is associated with a tangible reward, such as money, people will be less inclined to participate in the task when the reward is not present. Deci concluded that extrinsic rewards make it harder for people to sustain self-motivation.

Attracting The "Best Candidates" To Teaching

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

One of the few issues that all sides in the education debate agree upon is the desirability of attracting “better people” into the teaching profession. While this certainly includes the possibility of using policy to lure career-switchers, most of the focus is on attracting “top” candidates right out of college or graduate school.

The common metric that is used to identify these “top” candidates is their pre-service (especially college) characteristics and performance. Most commonly, people call for the need to attract teachers from the “top third” of graduating classes, an outcome that is frequently cited as being the case in high-performing nations such as Finland. Now, it bears noting that “attracting better people," like “improving teacher quality," is a policy goal, not a concrete policy proposal – it tells us what we want, not how to get it. And how to make teaching more enticing for “top” candidates is still very much an open question (as is the equally important question of how to improve the performance of existing teachers).

In order to answer that question, we need to have some idea of whom we’re pursuing – who are these “top” candidates, and what do they want? I sometimes worry that our conception of this group – in terms of the “top third” and similar constructions – doesn’t quite square with the evidence, and that this misconception might actually be misguiding rather than focusing our policy discussions.

Comparing Teacher Turnover In Charter And Regular Public Schools

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

A couple of weeks ago, a new working paper on teacher turnover in Los Angeles got a lot of attention, and for good reason. Teacher turnover, which tends to be alarmingly high in lower-income schools and districts, has been identified as a major impediment to improvements in student achievement.

Unfortunately, some of the media coverage of this paper has tended to miss the mark. Mostly, we have seen horserace stories focusing on fact that many charter schools have very high teacher turnover rates, much higher than most regular public schools in LA. The problem is that, as a group, charter school teachers are significantly dissimilar to their public school peers. For instance, they tend to be younger and/or less experienced than public school teachers overall; and younger, less experienced teachers tend to exhibit higher levels of turnover across all types of schools. So, if there is more overall churn in charter schools, this may simply be a result of the demographics of the teaching force or other factors, rather than any direct effect of charter schools per se (e.g., more difficult working conditions).

But the important results in this paper aren’t about the amount of turnover in charters versus regular public schools, which can measured very easily, but rather the likelihood that similar teachers in these schools will exit.

Melodramatic

At a press conference earlier this week, New York City Mayor Michael Bloomberg announced the city’s 2011 test results. Wall Street Journal reporter Lisa Fleisher, who was on the scene, tweeted Mayor Bloomberg’s remarks. According to Fleisher, the mayor claimed that there was a “dramatic difference” between his city’s testing progress between 2010 and 2011, as compared with the rest of state.

Putting aside the fact that the results do not measure “progress” per se, but rather cohort changes – a comparison of cross-sectional data that measures the aggregate performance of two different groups of students – I must say that I was a little astounded by this claim. Fleisher was also kind enough to tweet a photograph that the mayor put on the screen in order to illustrate the “dramatic difference” between the gains of NYC students relative to their non-NYC counterparts across the state.  Here it is:

Again, Niche Reforms Are Not The Answer

Our guest author today is David K. Cohen, John Dewey Collegiate Professor of Education and professor of public policy at the University of Michigan, and a member of the Shanker Institute’s board of directors.

A recent response to my previous post on these pages helps to underscore one of my central points: If there is no clarity about what it will take to improve schools, it will be difficult to design a system that can do it.  In a recent essay in the Sunday New York Times Magazine, Paul Tough wrote that education reformers who advocated "no excuses" schooling were now making excuses for reformed schools' weak performance.  He explained why: " Most likely for the same reason that urban educators from an earlier generation made excuses: successfully educating large numbers of low-income kids is very, very hard." 

 In his post criticizing my initial essay, "What does it mean to ‘fix the system’?," the Fordham Institute’s Chris Tessone told the story of how Newark Public Schools tried to meet the requirements of a federal school turnaround grant. The terms of the grant required that each of three failing high school replace at least half of their staff. The schools, he wrote, met this requirement largely by swapping a portion of their staffs with one another, a process which Tessone and school administrators refer to as the “dance of the lemons.”Would such replacement be likely to solve the problem?

Even if all of the replaced teachers had been weak (which we do not know), I doubt that such replacement could have done much to help.

If Gifted And Talented Programs Don't Boost Scores, Should We Eliminate Them?

In education policy debates, the phrase “what works” is sometimes used to mean “what increases test scores." Among those of us who believe that testing data have a productive role to play in education policy (even if we disagree on the details of that role), there is a constant struggle to interpret test-based evidence properly and put it in context. This effort to craft and maintain a framework for using assessment data productively is very important but, despite the careless claims of some public figures, it is also extremely difficult.

Equally important and difficult is the need to apply that framework consistently. For instance, a recent working paper from the National Bureau of Economic Research (NBER) looked at the question of whether gifted and talented (GT) programs boost student achievement. The researchers found that GT programs (and magnet schools as well) have little discernible impact on students’ test score gains. Another recent NBER paper reached the same conclusion about the highly-selective “exam schools” in New York and Boston. Now, it’s certainly true that high-quality research on the test-based effect of these programs is still somewhat scarce, and these are only two (as yet unpublished) analyses, but their conclusions are certainly worth noting.

Still, let’s speculate for a moment: Let’s say that, over the next few years, several other good studies also reached the same conclusion. Would anyone, based on this evidence, be calling for the elimination of GT programs? I doubt it. Yet, if we applied faithfully the standards by which we sometimes judge other policy interventions, we would have to make a case for getting rid of GT.

In The Classroom, Differences Can Become Assets

Author, speaker and education expert Sir Ken Robinson argues that today’s education system is anachronistic and needs to be rethought. Robinson notes that our current model, shaped by the industrial revolution, reveals a "production line" approach: for example, we group kids by "date of manufacture", instruct them "by batches", and subject them all to standardized tests. Yet, we often miss the most fundamental questions - for example, Robinson asks, "Why is age the most important thing kids have in common?"

In spite of the various theories about the stages of cognitive development (Piaget, etc.), it is difficult to decide how to group children. Academically and linguistically diverse classrooms have become a prevalent phenomenon in the U.S. and other parts of the world, posing important challenges for educators whose mission is to support the learning of all students.

It’s not only that children are dissimilar in terms of their interests, ethnicity, social class, skills, and other attributes; what’s even more consequential is that human interactions are built on the basis of those differences. In other words, individuals create patterns of relations that reflect and perpetuate social distinctions.

Matt Damon, Jon Stewart And The "Teacher In The Family Effect"

Over the past year or so, two high-profile celebrities – Jon Stewart and Matt Damon – have expressed skepticism about the market-based education reform policies currently spreading throughout the U.S. One cannot help but notice that they share one characteristic that they both acknowledge has helped to guide their opinions: Their mothers were both PK-12 educators. I’m also the son of a teacher and I know that this has had a substantial effect on my opinions about public education. No doubt the same is true of people who are married to teachers.

It’s hardly surprising that your occupation can help to influence the views of your family members, especially those pertaining directly to that career (i.e., education policy and teachers’ families). But I found myself wondering if there was some way to get a sense of just how strong this “effect” might be. In other words, how much more likely are non-teachers from “teacher families” – those with a mother, father, or spouse who is a K-12 teacher – to hold different views toward education policy, compared with non-teachers who don’t have any teachers in their immediate families.

Let’s take a very quick look.

Underestimating Context (But Selectively)

Imagine that for some reason you were lifted out of your usual place in society and dropped into somebody else’s spot — the place of someone whose behavior you have never understood. For example, you are an anarchist who suddenly becomes a top cabinet member. Or you are an environmentalist who is critical of big business who suddenly finds yourself responsible for developing environmental policy for ExxonMobil or BP.

As systems thinker Donella Meadows points out in her book Thinking in Systems, in any given position, "you experience the information flows, the incentives and disincentives, the goals and discrepancies, the pressure […] that goes with that position." It’s possible, but highly unlikely, that you might remember how things looked from where you were before. If you become a manager, you’ll probably see labor less as a deserving partner, and more as a cost to be minimized. If you become a labor leader, every questionable business decision will start to seem like a deliberate attack on your members.

How do we know?

The best psychological experiments ask questions about human nature. What makes a person strong? Or evil? Are good and evil dispositional hardwired traits, permanent once unleashed? Or is there something about the situations in which people find themselves that influences their behavior?

Success Via The Presumption Of Accuracy

In our previous post, Professor David K. Cohen argued that reforms such as D.C.’s new teacher evaluation system (IMPACT) will not by themselves lead to real educational improvement, because they focus on the individual rather than systemic causes of low performance. He framed this argument in terms of the new round of IMPACT results, which were released two weeks ago. While the preliminary information was limited, it seems that the distribution of teachers across the four ratings categories (highly effective, effective, minimally effective, and ineffective) were roughly similar to last year’s - including a small group of teachers fired for receiving the lowest “ineffective” rating, and a somewhat larger group (roughly 200) fired for having received the “minimally effective” label for two consecutive years.

Cohen’s argument on the importance of infrastructure does not necessarily mean that we should abandon the testing of new evaluation systems, only that we should be very careful about how we interpret their results and the policy conclusions we draw from them (which is good advice at all times). Unfortunately, however, it seems that caution is in short supply. For instance, shortly after the IMPACT results were announced, the Washington Post ran an editorial, entitled “DC Teacher Performance Evaluations Are Working," in which a couple of pieces of “powerful evidence” were put forward in an attempt to support this bold claim. The first was that 58 percent of the teachers who received a “minimally effective” rating last year and remained in the district were rated either “effective” or “highly effective” this year. The second was that around 16 percent of DC teachers were rated “highly effective” this year, and will be offered bonuses, which the editorial writers argued shows that most teachers “are doing a good job” and being rewarded for it.

The Post’s claim that these facts represent evidence - much less “powerful evidence” - of IMPACT’s success is a picture-perfect example of the flawed evidentiary standards that too often drive our education debate. The unfortunate reality is that we have virtually no idea whether IMPACT is actually “working," and we won’t have even a preliminary grasp for some time. Let’s quickly review the Post’s evidence.