How Can We Tell If Vouchers "Work"?

Brookings recently released an evaluation of New York City’s voucher program, called the School Choice Scholarship Foundation Program (SCSF), which was implemented in the late 1990s. Voucher offers were randomized, and the authors looked at the impact of being offered/accepting them on a very important medium-term outcome – college enrollment (they were also able to follow an unusually high proportion of the original voucher recipients to check this outcome).

The short version of the story is that, overall, the vouchers didn’t have any statistically discernible impact on college enrollment. But, as is often the case, there was some underlying variation in the results, including positive estimated impacts among African-American students, which certainly merit discussion.*

Unfortunately, such nuance was not always evident in the coverage of and reaction to the report, with some voucher supporters (strangely, given the results) exclaiming that the program was an unqualified success, and some opponents questioning the affiliations of the researchers. For my part, I’d like to make a quick, not-particularly-original point about voucher studies in general: Even the best of them don’t necessarily tell us much about whether “vouchers work."

Trial And Error Is Fine, So Long As You Know The Difference

It’s fair to say that improved teacher evaluation is the cornerstone of most current education reform efforts. Although very few people have disagreed on the need to design and implement new evaluation systems, there has been a great deal of disagreement over how best to do so – specifically with regard to the incorporation of test-based measures of teacher productivity (i.e., value-added and other growth model estimates).

The use of these measures has become a polarizing issue. Opponents tend to adamantly object to any degree of incorporation, while many proponents do not consider new evaluations meaningful unless they include test-based measures as a major element (say, at least 40-50 percent). Despite the air of certainty on both sides, this debate has mostly been proceeding based on speculation. The new evaluations are just getting up and running, and there is virtually no evidence as to their effects under actual high-stakes implementation.

For my part, I’ve said many times that I'm receptive to trying value-added as a component in evaluations (see here and here), though I disagree strongly with the details of how it’s being done in most places. But there’s nothing necessarily wrong with divergent opinions over an untested policy intervention, or with trying one. There is, however, something wrong with fully implementing such a policy without adequate field testing, or at least ensuring that the costs and effects will be carefully evaluated post-implementation. To date, virtually no states/districts of which I'm aware have mandated large-scale, independent evaluations of their new systems.*

If this is indeed the case, the breathless, speculative debate happening now will only continue in perpetuity.

What The "No Excuses" Model Really Teaches Us About Education Reform

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

In a previous post, I discussed “Apollo 20," a Houston pilot program in which a group of low-performing regular public schools are implementing the so-called “no excuses” education model common among high-profile charter schools such as KIPP. In the Houston implementation, “no excuses” consists of five basic policies: a longer day and year, resulting in 21 percent more school time; different human capital policies, including performance bonuses and firing and selectively rehiring all principals and half of teachers (the latter is one of the "turnaround models" being pushed by the Obama Administration); extensive 2-on-1 tutoring; regular assessments and data analysis; and “high expectations” for behavior and achievement, including parental contracts.

A couple of weeks ago, Harvard professor Roland Fryer, the lead project researcher, released the results of the pilot’s first year. I haven’t seen much national coverage of the report, but I’ve seen a few people characterize the results as evidence that “’No excuses’ works in regular public schools." Now, it’s true that there were effects – strong in math – and that the results appear to be persistent across different model specifications.

But, when it comes to the question of whether “no excuses works," the reality is a bit more complicated. There are four main things to keep in mind when interpreting the results of this paper, a couple of which bear on the larger debate about "no excuses" charter schools and education reform in general.

When It Comes To How We Use Evidence, Is Education Reform The New Welfare Reform?

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

In the mid-1990s, after a long and contentious debate, the U.S. Congress passed the Personal Responsibility and Work Opportunity Reconciliation Act of 1996, which President Clinton signed into law. It is usually called the “Welfare Reform Act," as it effectively ended the Aid to Families with Dependent Children (AFDC) program (which is what most people mean when they say “welfare," even though it was [and its successor is] only a tiny part of our welfare state). Established during the New Deal, AFDC was mostly designed to give assistance to needy young children (it was later expanded to include support for their parents/caretakers as well).

In place of AFDC was a new program – Temporary Assistance for Needy Families (TANF). TANF gave block grants to states, which were directed to design their own “welfare” programs. Although the states were given considerable leeway, their new programs were to have two basic features: first, for welfare recipients to receive benefits, they had to be working; and second, there was to be a time limit on benefits, usually 3-5 years over a lifetime, after which individuals were no longer eligible for cash assistance (states could exempt a proportion of their caseload from these requirements). The general idea was that time limits and work requirements would “break the cycle of poverty”; recipients would be motivated (read: forced) to work, and in doing so, would acquire the experience and confidence necessary for a bootstrap-esque transformation.

There are several similarities between the bipartisan welfare reform movement of the 1990s and the general thrust of the education reform movement happening today. For example, there is the reliance on market-based mechanisms to “cure” longstanding problems, and the unusually strong liberal-conservative alliance of the proponents. Nevertheless, while calling education reform “the new welfare reform” might be a good soundbyte, it would also take the analogy way too far.

My intention here is not to draw a direct parallel between the two movements in terms of how they approach their respective problems (poverty/unemployment and student achievement), but rather in how we evaluate their success in doing so. In other words, I am concerned that the manner in which we assess the success or failure of education reform in our public debate will proceed using the same flawed and misguided methods that were used by many for welfare reform.

A Matter Of Time

Extended school time is an education reform option that seems to be gaining in popularity. President Obama gave his endorsement earlier this year, while districts such as DCPS have extended time legislation under consideration.

The idea is fairly simple: Make the school day and/or year longer, so kids will have more time to learn.  Unlike many of the policy proposals flying around these days, it’s an idea that actually has some basis in research. While, by itself, more time yields negligible improvements in achievement, there is some evidence (albeit mixed evidence) that additional time devoted to “academic learning” can have a positive effect, especially for students with low initial test scores. So, more time might have potential benefits (at least in terms of test scores), but the time must be used wisely.

Still, extending schools days/years, like all policy options, must of course be evaluated in terms of cost effectiveness.  Small increases, such as adding a few days to the school calendar, are inconsistently and minimally effective, while larger increases in school time are an expensive intervention that must be weighed against alternatives, as well as against the fact that states and districts are, facing a few more years of fiscal crisis, cutting other potentially effective programs.

Three Questions For Those Who Dismiss The Nashville Merit Pay Study

The reaction from many performance pay advocates to the Nashville evaluation released last week has been that the study is relatively meaningless (see here and here for examples).  The general interpretation: The results show that the pay bonuses do not improve student achievement, but short-term test score gains are not the "true purpose" of these incentive programs. What they are really supposed to improve, so the line goes, is the quality of people who pursue teaching as a career, as well as their retention rates.

While I disagree that the findings are not important (they are, if for no other reason than they discredit the idea that teachers are holding their effort hostage to more money), I am sympathetic towards the view that the study didn’t tackle the big issues. Attracting the best possible people into the profession – and keeping them there – are much more meaningful goals than short-term test score gains, and they are not addressed in this study (though some results for retention are reported).

But this argument also begs a few important questions that I hope we can answer before the Nashville study fades into evaluation oblivion.  I have three of them.

Persistently Low-Performing Incentives

Today, the National Center on Performance Incentives (NCPI) and the RAND Corporation released a long-awaited experimental evaluation of teacher performance pay in Nashville, Tenn. It finds that performance bonuses have virtually no effect on student math test scores (there were small but significant gains by fifth graders, but only in two of the three years examined, and the gains did not last into sixth grade).

Since this is such a politically contentious issue, these findings are likely to spark a lot of posturing and debate. So it’s worth trying to put them in context. As I discussed in a prior post, we now have at least preliminary results from three randomized experimental evaluations of merit pay in the U.S., the first contemporary, high-quality evidence of its kind.  This Nashville report and the two previously-released studies – one from Chicago and one from New York City's schoolwide bonus program – reached the same conclusion: Performance bonuses for teachers have little or no discernable effect on student test scores. 

Although the NYC and Chicago findings are preliminary (the evaluations are still in progress), the NYC program provides schoolwide and not individual bonuses, and one additional study (Round Rock, Tex.) is yet to be released, the three already-released reports do represent a fairly impressive, though still very tentative, body of evidence on merit pay’s utility as a means to improve test scores.

And at this point, it’s a good bet that, when all the evaluations are final and the smoke has cleared, we will have to conclude that performance bonuses are, at the very least, a very unpromising policy for producing short-term test score gains. 

Performance Pay On (Randomized) Trial

This is an exciting time for those of us who are strange enough to find research on teacher performance pay exciting. It is also, most likely, an anxious time for those with unyielding faith in its effectiveness. From all the chatter on performance incentives, and all the money we are putting into encouraging them, one might think they are a sure bet to work. But there's actually very little good evidence on their effects in the U.S. As with a lot of education policy in fashion today, investing in performance pay is a leap of faith.

But now, just in time to be way too late, there are currently four high-quality evaluations of teacher performance pay programs in progress, and they are the first large-scale experimental studies of how these bonuses affect performance in the U.S.