Immediate Gratification And Education Policy

A couple of months ago, Bill Gates said something that received a lot of attention. With regard to his foundation’s education reform efforts, which focus most prominently on teacher evaluations, but encompass many other areas, he noted, “we don’t know if it will work." In fact, according to Mr. Gates, “we won’t know for probably a decade."

He’s absolutely correct. Most education policies, including (but not limited to) those geared toward shifting the distribution of teacher quality, take a long time to work (if they do work), and the research assessing these policies requires a great deal of patience. Yet so many of the most prominent figures in education policy routinely espouse the opposite viewpoint: Policies are expected to have an immediate, measurable impact (and their effects are assessed in the crudest manner imaginable).

A perfect example was the reaction to the recent release of results of the National Assessment of Educational Progress (NAEP).

Several state and district educational agency leaders (not to mention dozens of advocates) were quick to attribute increases in average scores to recent policy changes. This is, I suppose, predictable – these are highly politicized offices, and one might expect their incumbents to take every opportunity to score points. It is, however, unbecoming of their important positions, and does a disservice to the public.

Some of the worst commentary this year came from the editorial boards of the nation’s largest newspapers. In particular, the New York Times, Washington Post and Wall Street Journal all published editorials that could easily serve as a lesson in an undergraduate research methods course – a lesson entitled, “How not to use evidence." All three editorials made quite bold – and totally inappropriate – claims about how the NAEP scores in places such as Tennessee and D.C. showed that the reforms in those states were working.

To be clear, it is entirely possible that part of the increases in NAEP in places like Tennessee and D.C. were indeed due to the reforms in those states (including less discussed policies, such as the expansion of pre-K in D.C.). But the increases themselves cannot be used to support that supposition, and too many of the people suggesting otherwise should know better. Good research takes time (and it will not consist of unadjusted NAEP cohort changes).

If we rely on bad evidence, we will draw incorrect conclusions, and that is extremely dangerous at any time, but it is particularly risky now, given the pace and extent of educational policy change.

Even more generally, though, there is serious cause for concern here regarding expectations for how quickly policies can work. For example, if new teacher evaluations are eventually going to be effective in improving instruction and the quality of the teacher workforce, it will take time for these effects to show up in overall testing results. The most meaningful and persistent impact will likely be in the slow, steady improvement of teachers’ performance and, perhaps, in the attraction of strong candidates to the profession. The reasonable bet is that the impact this will have on aggregate student performance will be modest and cumulative over many years (and remember that this is no guarantee, as these are largely untried policies).

When supporters of a given policy expect immediate gratification – or, by the way, when a policy’s opponents trumpet poor test results as evidence that the interventions don’t work – nobody wins. We risk ending policies that are working simply because we couldn’t muster the patience to give them time to have an effect, and, just as hazardously, we will double down on policies that aren’t really doing any good.

So, in this case, we should listen to Bill Gates, and his admonition that policies take time to work, as does the research on whether or not they are working.

- Matt Di Carlo


The frustration comes from high stake decisions being tied to unproven education reforms.


Agree with you, Matt.

But what do you suggest an alternative?

Let's use Tennessee as an example. Kevin Huffman getting killed by critics.

I think we can safely say that they'd use any data, whether DiCarlo approved or not, to bash him if it's trending down. Agreed?

Is your advice along the lines of "Kevin, while you may be unfairly assailed using spurious data, you cannot use better-but-still-not-good-enough data to defend yourself?"