New York State is set to release its annual testing data today. Throughout the state, and especially in New York City, we will hear a lot about changes in school and district proficiency rates. The rates themselves have advantages – they are easy to understand, comparable across grades and reflect a standards-based goal. But they also suffer severe weaknesses, such as their sensitivity to where the bar is set and the fact that proficiency rates and the actual scores upon which they’re based can paint very different pictures of student performance, both in a given year as well as over time. I’ve discussed this latter issue before in the NYC context (and elsewhere), but I’d like to revisit it quickly.
Proficiency rates can only tell you how many students scored above a certain line; they are completely uninformative as to how far above or below that line the scores might be. Consider a hypothetical example: A student who is rated as proficient in year one might make large gains in his or her score in year two, but this would not be reflected in the proficiency rate for his or her school – in both years, the student would just be coded as “proficient” (the same goes for large decreases that do not “cross the line”). As a result, across a group of students, the average score could go up or down while proficiency rates remained flat or moved in the opposite direction. Things are even messier when data are cross-sectional (as public data lmost always are), since you’re comparing two different groups of students (see this very recent NYC IBO report).
Let’s take a rough look at how frequently rates and scores diverge in New York City.
Unlike rates (and unlike the hypothetical example above), scale scores on New York State’s tests are only comparable within grades – e.g., fourth graders in one year can be compared with fourth graders the next year, but not with students in any other grade. This means we have to compare scores and rates not only school-by-school, but grade-by-grade as well.
In the scatterplot below, each dot (roughly 4,000 of them) is a single tested grade in a single school. The horizontal axis is the change in this grade’s ELA proficiency rate between 2010 and 2011. The vertical axis is the change in its actual scale score. The red lines in the middle of the graph are zero change. So, if the dot is located in the upper right quadrant, both the score and the rate increased. If it’s in the lower left quadrant, both decreased.
You can see that most of the dots are in one of these two quadrants, as we would expect. When scores increase or decrease, rates also tend to increase or decrease. But not always. You’ll also notice that there are a large number of dots in the upper left and lower right quadrants of the graph (particularly the latter), which means that these ELA scores and rates moved in opposite directions.
(The scatterplot looks extremely similar for math, and if I exclude grade/school combinations with smaller numbers of tested students.)
Let’s see if we can sum this up to give you a very rough idea of how many grade/school combinations exhibited such a trend (note that these figures don't count extremely small changes – see the first footnote for details).*
We’ll start with ELA.
Around 30 percent of grade/school groups had disparate trends in their scores and rates. In one in five cases, the two moved in opposite directions. In another 11 percent of grades, either the score or the rate moved while the other was relatively stable.
So, if you were summarizing student performance (at least cohort changes for individual grades) based solely on changes in rates between 2010 and 2011, there’s a 30 percent chance you’d reach a different conclusion if you checked the scores too.
Here’s the same graph for math.
The situation is similar. About one in four grades saw their rates and scores either move in opposite directions, or one was stable while the other moved (once again, the figures don't change much if I exclude grades with small numbers of students).
Certainly, these results for NYC are not necessarily representative of all districts, or even of NYC results in other years. It all depends on the distribution of successive cohorts' scores vis-a-vis the proficiency line.
That said, what this shows is that changes in proficiency rates can give a very different picture than trends in average scores. They measure different things.**
Yet, when states and districts, including New York, release their testing results for the 2011-2012 school year (if they haven’t already), virtually all of the presentations and stories about these data will focus on trends in the proficiency rates, and it's not unusual to see officials issue glowing proclamations about very small changes in these rates.
If, however, policymakers and reporters fail to look at the scores too (grade-by-grade, if necessary), they risk drawing incomplete, potentially misleading conclusions about student performance. And states and districts, such as the D.C. Public Schools, that don’t release their scores should do so.
- Matt Di Carlo
* Since many of the grades are comprised of relatively small samples, I (somewhat crudely) coded minor changes in the rate or score (one point or less) as “stable," so as to avoid characterizing grades as moving in opposite directions when the changes are very small. If, for example, the score and rate move in opposite directions, but both changes are small, this does not “count” as divergence. Furthermore, if it just so happens that a “stable” score or rate (i.e., one that changes only a little) moves in the same direction as a larger change in the other measure, that grade/school change is coded as convergent. This recoding reduces the number of divergent changes by a small but noteworthy amount.
** I would argue that the scores are better for looking at changes over time, since they give a better idea of the change in performance of the "typical student." But, as mentioned above, scores usually aren't comparable between grades, so score changes would have to be examined on a grade-by-grade basis. In addition, of course, the scores don't mean much to most people.