SIG And The High Price Of Cheap Evidence

A few months ago, the U.S. Department of Education (USED) released the latest data from schools that received grants via the School Improvement (SIG) program. These data -- consisting solely of changes in proficiency rates -- were widely reported as an indication of “disappointing” or “mixed” results. Some even went as far as proclaiming the program a complete failure.

Once again, I have to point out that this breaks almost every rule of testing data interpretation and policy analysis. I’m not going to repeat the arguments about why changes in cross-sectional proficiency rates are not policy evidence (see our posts here, here and here, or examples from the research literature here, here and here). Suffice it to say that the changes themselves are not even particularly good indicators of whether students’ test-based performance in these schools actually improved, to say nothing of whether it was the SIG grants that were responsible for the changes. There’s more to policy analysis than subtraction.

So, in some respects, I would like to come to the defense of Secretary Arne Duncan and USED right now - not because I’m a big fan of the SIG program (I’m ambivalent at best), but rather because I believe in strong, patient policy evaluation, and these proficiency rate changes are virtually meaningless. Unfortunately, however, USED was the first to portray, albeit very cautiously, rate changes as evidence of SIG’s impact. In doing so, they provided a very effective example of why relying on bad evidence is a bad idea even if it supports your desired conclusions.

To make a long story short, last year, USED released the first year of proficiency rate changes for schools that received SIG grants. The results suggested that there were rate increases among most SIG schools (note that the data they presented were even less useful than this year’s, insofar as they didn’t even bother to compare the SIG schools to non-SIG schools). The USED press release, in fairness, was not at all reckless in its interpretations. It portrayed the data release as part of an effort to be transparent, and included warnings about drawing causal inferences from the data (though much of this focused on the assumption that it was "too early" to draw conclusions, which is misleading insofar as the raw changes couldn't be used to assess SIG regardless of how many years of data were available).

Nevertheless, the release stated that the results showed "positive momentum and progress in many SIG schools," and was very clearly intended to suggest that the program was working.

(Side note: As mentioned last week, I think there's also an important underlying discussion here about expectations of how much and how quickly programs such as SIG would produce results.)

This year, the "results" look less favorable, which puts the Department in the somewhat uncomfortable position of having borderline endorsed a deeply flawed approach to program evaluation that can now be used to argue that SIG isn't working. The statement accompanying the second data release did its best to cast the changes in a positive light, including a spotlight on three individual SIG schools, but the overall description was of "incremental" progress. 

In other words, it is dangerous rely on bad evidence over the medium and long-term, so you should avoid using it even when it helps you in the short-term.

The SIG program is a complex intervention that requires complex analysis. It will be a few years before there is enough decent evidence to draw conclusions about the program’s impact (in the meantime, however, there is opportunity for solid analysis, such as that presented in this paper).

In part, this is because good research takes time, but it’s also because policies take a while to be implemented and have a measurable impact, particularly when the intervention in question is as drastic as this one (it is, in my view, a little strange that anyone would think a complete overhaul of a school could be evaluated as a success or failure after a couple of years). There are no short cuts here.

- Matt Di Carlo