Data Driving: At The Intersection Of Arbitrary And Meaningful
In his State of the City address last month, New York City Mayor Michael Bloomberg made some brief comments about the upcoming adoption of new assessments aligned with the Common Core State Standards (CCSS), including the following statement:
But no matter where the definition of proficiency is arbitrarily set on the new tests, I expect that our students’ progress will continue outpacing the rest of the State’s[,] the only meaningful measurement of progress we have.On the surface, this may seem like just a little bit of healthy bravado. But there are a few things about this single sentence that struck me, and it also helps to illustrate an important point about the relationship between standards and testing results.
The first thing to note is the reference to the definition of proficiency – i.e., the choice of cut score above which students are deemed “proficient” – as “arbitrary." On the one hand, Mayor Bloomberg is absolutely correct. The specification of cut scores, to put it mildly, is not an exact science.
On the other hand, the dismissal of these threshold choices as “arbitrary” is more than a little strange coming from a mayor who relies so heaviliy on proficiency rates to judge the performance of schools. Indeed, in the very same sentence, he even refers to the changes in these rates vis-à-vis the state’s as “the only meaningful measure of progress we have."
If proficiency thresholds are “arbitrary," how can they be the basis for the “only meaningful measure” the city has?
The plausible explanation for this seemingly contradictory statement is that Mayor Bloomberg thinks that changes in proficiency rates bear no relationship to where the proficiency line is set. In other words, he assumes that adopting the new standards will cause a downward shift in proficiency rates in the first year, but will have no impact on rate changes going forward (or on the comparison of the city's changes with those of the state). I suspect that this belief is common.
In reality, though, even if the tests themselves don't change in terms of content, where one sets the bar by itself can have a big impact on trajectories. To illustrate this, the graph below is taken from this terrific article by Andrew Dean Ho, which was published in the journal Educational Researcher.
Here we have six illustrative states in which the bar for proficiency is set at different levels. State A has the lowest cut score and thus the highest proficiency rates, while the opposite is true for state F.
The lines represent each state’s simulated trajectory between 2006 and 2014. Clearly, these states are on different paths. For instance, states A and B show huge increases in the first few years, and then level off. In contrast, the rates of states E and F are quite flat at first, and then increase sharply. As Ho notes, these results might lead to much speculation about which policy changes served to bring about such meaningful differences.
The kicker, in case you haven’t already guessed, is that these six “states” are the same state, using the exact same dataset of scores. The only difference between them is the proficiency cut score. The primary reason the trends look so different is that the distribution of students’ scores around the proficiency line plays a substantial role in shaping the trajectory of changes in proficiency rates. If you change the line, you change the trajectory, whether for a school, district or entire state.
Thus, the effect of adopting the CCSS on proficiency rates will not be a one-shot deal, whether in New York or elsewhere. It will be permanent.
Now, back to the mayor's statement: If you fail to acknowledge that the choice of thresholds affects trajectories, or that rates and average scores often move in different directions, or that year-to-year cohort changes represent comparisons between two different groups of students, you might very well think that rate changes are the "only meaningful measure of progress we have," when in reality they are not "progress" measures at all, and aren't particularly meaningful when assessing district or school performance.
In fairness, I'm nitpicking this one sentence - Mayor Bloomberg has a day job, to say the least, and strong familiarity with the fine-grained details of educational measurement is not among the requirements for that position. That said, these kinds of misconceptions, though common, are still a little disconcerting coming from individuals who are in charge of school systems, especially when, like Mayor Bloomberg, they are strong advocates for the high-stakes use of testing data.
As I’ve said before, I would have a lot more faith in “data-driven decision making” if the people making the decisions had a little more time behind the wheel.