The Ratings Game: New York City Edition

Gotham Schools reports that the New York City Department of Education rolled out this year’s school report card grades by highlighting the grades’ stability between this year and last. That is, they argued that schools’ grades were roughly the same between years, which is supposed to serve as evidence of the system’s quality.

The city’s logic here is generally sound. As I’ve noted before, most schools don’t undergo drastic changes in their operations over the course of a year, and so fluctuations in grades among a large number of schools might serve as a warning sign that there’s something wrong with the measures being used. Conversely, it’s not unreasonable to expect from a high-quality rating system that, over a two-year period, some schools would get higher grades and some lower, but that most would stay put. That was the city’s argument this year.

The only problem is that this wasn’t really the case.

The NYCDOE’s specific evidence of system “stability” is that the vast majority of schools – about 90 percent - either received the same grade in 2010-11 as in 2009-10, or “moved” only one grade level, whether up or down (e.g., from D to C or A to B).

Notice how the “or” in that statement masks how many schools changed one grade by lumping them together with schools that didn’t change at all. This is important because there are only five possible grades (A-F) in the NYC system, and over 90 percent of schools received an A, B or C. Given this distribution, a change of one grade is rather large. At the very least, you certainly can’t group together schools that changed one grade with those that didn’t change, and portray that percentage as a stability rate.

So, let’s take a look at the actual stability rates. Using data from NYCDOE website, here’s the simple breakdown in the “movement” of grades among schools that got a grade both this year and last.

As you can see, it’s true that roughly nine out of ten schools “moved” one grade or fewer (up or down), but it’s also true that only about 45 percent schools’ ratings were stable, while most schools’ grades – about 55 percent - were different in 2010-11 compared with 2009-10.
(Side note: The year-to-year stability of the city’s grades was almost identical to that among public schools in Ohio, as I showed in this post.)

Look – as stated above, it’s reasonable to expect a degree of instability from even a well-designed system, and I can’t say whether these results are “good” or “bad” by any absolute standard. Designing school ratings systems is not an exact science, and it’s tough to separate the “real” improvement (or degradation) from the various forms of error that might emerge.

What I can say is that the level of year-to-year stability summarized in the graph is most definitely not something that I would hold up as evidence of the system’s quality. If anything, I would say that most schools getting different grades is a cause for concern, not celebration. I’m frankly surprised the city chose to do this.

This is the second time during the past three months that the city has characterized their performance data in a misleading fashion. In August, Mayor Bloomberg called the city’s test results “dramatic progress," though they were neither dramatic nor progress. Of course, such misrepresentation is hardly limited to New York, and it is in large part attributable to a political environment in which test scores can make or break the careers of elected officials, superintendents and teachers. When the stakes are that high, presenting data can become a political exercise, and that’s precisely the opposite of what it’s supposed to be.

- Matt Di Carlo


Post-script: Although this post addresses New York City’s claims of stability using schools’ letter grades as their evidence, it’s also worth noting that the grades are derived from actual index scores. The scores range from zero to 100. The correlation between this year’s scores and last year’s is 0.61 (r2=0.37, p<.01). In the context of this post, I would interpret this as a modest relationship (see here for a related debate).