Thinking About Tests While Rethinking Test-Based Accountability

Earlier this week, per the late summer ritual, New York State released its testing results for the 2015-2016 school year. New York City (NYC), always the most closely watched set of results in the state, showed a 7.6 percentage point increase in its ELA proficiency rate, along with a 1.2 percentage point increase in its math rate. These increases were roughly equivalent to the statewide changes.

City officials were quick to pounce on the results, which were called “historic,” and “pure hard evidence” that the city’s new education policies are working. This interpretation, while standard in the U.S. education debate, is, of course, inappropriate for many reasons, all of which we’ve discussed here countless times and will not detail again (see here). Suffice it to say that even under the best of circumstances these changes in proficiency rates are only very tentative evidence that students improved their performance over time, to say nothing of whether that improvement was due to a specific policy or set of policies.

Still, the results represent good news. A larger proportion of NYC students are scoring proficient in math and ELA than did last year. Real improvement is slow and sustained, and this is improvement. In addition, the proficiency rate in NYC is now on par with the statewide rate, which is unprecedented. There are, however, a couple of additional issues with these results that are worth discussing quickly.

The first issue, one that also applies to the state as a whole, is the fact that these results were explicitly presented by the state as not comparable between years, due to changes in test design and administration (which may be why there were large statewide increases in ELA proficiency but virtually none in math). In short, rate comparisons between this year and last should be undertaken with the most extreme caution (in my opinion, they should have been ignored altogether).

Note also that the NYC proficiency rate in math was essentially flat. Even putting aside the incomparability, this makes any victory dances to the music of rate increases seem rather less than appropriate.

But the final issue here, which is far less technical, is the question of why the DeBlasio/Farina administration, which has committed itself to moving beyond the diehard test-based accountability approach that characterized the NYC education department under Mayor Bloomberg, would choose to publicize test results so forcefully as evidence of the efficacy of their policies.

Now, on the one hand, the decision to do so is easy to understand. Test scores are the coin of the realm in U.S. education. Elected and appointed officials across the nation, particularly in districts serving large proportions of disadvantaged students, are under immense pressure to “show results,” and failure to do so can make or break administrations. It’s hardly out of line to publicize results when they are positive. Moreover, while the U.S., in my opinion, relies a bit too heavily on test-based accountability, the fact remains that testing data, used properly, are perhaps the most important currently available means of assessing student and school performance, as well as policy effects.

That said, no matter what you think of the new NYC approach (and I think there are arguments on both sides), there was an opportunity here not to ignore completely the testing results, but rather to present them in a manner a bit more consistent with building a more comprehensive infrastructure for assessing student and school performance, as NYC officials claim to be attempting to do.

There might, for instance, have been more effort to say that the testing results look pretty good, but that it’s way too early to get any idea of whether the new policies are working, that education policy reflects the contributions of many individuals, past and present, that true policy evaluation takes time and rarely provides clear cut answers, and, of course, that there’s much more to assessing schools and policies than testing results.

I believe this framing of the results would have been applauded by supporters as well as critics of the administration’s policy positions. Or maybe I’m just speaking for myself.

Issues Areas