The Thrill Of Success, The Agony Of Measurement
** Reprinted here in the Washington Post
The recent release of the latest New York State testing results created a little public relations coup for the controversial Success Academies charter chain, which operates over 20 schools in New York City, and is seeking to expand.
Shortly after the release of the data, the New York Post published a laudatory article noting that seven of the Success Academies had overall proficiency rates that were among the highest in the state, and arguing that the schools “live up to their name." The Daily News followed up by publishing an op-ed that compares the Success Academies' combined 94 percent math proficiency rate to the overall city rate of 35 percent, and uses that to argue that the chain should be allowed to expand because its students “aced the test” (this is not really what high proficiency rates mean, but fair enough).
On the one hand, this is great news, and a wonderfully impressive showing by these students. On the other, decidedly less sensational hand, it's also another example of the use of absolute performance indicators (e.g., proficiency rates) as measures of school rather than student performance, despite the fact that they are not particularly useful for the former purpose since, among other reasons, they do not account for where students start out upon entry to the school. I personally don't care whether Success Academy gets good or bad press. I do, however, believe that how one gauges effectiveness, test-based or otherwise, is important, even if one reaches the same conclusion using different measures.
It’s not possible, using available data, to check the testing performance for Success Academies’ students upon initial entry (which is kindergarten for all the schools currently operating). This is because the state begins testing in third grade. We can, however, try to get a rough idea (or, more accurately, illustrate the futility of trying) by taking a quick look at the latest (2014) math results for third graders attending Success Academies (at least those campuses for which testing data are available). The distributions are presented in the table below.
As you can see, the Success Academy third grade math proficiency rates range from 85 to 100 percent, with most of them falling above 90 percent. Moreover, huge proportions of these third graders -- between about 50 percent and almost 90 percent -- score above the cutoff for level 4 ("advanced"). Citywide, the third grade rates in 2014 were 14.8 percent (level 4) and 38.6 percent (level 3 + level 4). The comparisons are similar in previous years.
Now, it may be the case that Success Academy students enter kindergarten performing at very low levels, and that the school performs remarkable feats for its students before they reach third grade, bringing virtually all of them up to proficiency level, and a very large group to the "advanced" level 4, over that 3-4 year time period. Or, perhaps, Success Academy kindergartners, for whatever reason, enter their schools at higher levels, on average, than their NYC peers attending most other schools, and that's a big reason why the Success Academies exhibit much higher overall proficiency and level 4 rates - a larger share of the students they serve are proficient/advanced, or at least closer to the thresholds, from the start.
These two interpretations are not mutually exclusive - both can be true, and it is a matter of degree (and other factors, including attrition and grade-repeating, not to mention the distortion from the conversion of scores to these cutpoint-based rates, may influence these results). The point here is that it's not possible to tell using simple cross-sectional rates. If you don't know where the students start, and follow the same group over time, it's tough to assign any school-based meaning to where they end up (and raw proficiency rate changes are only marginally better).
At the very least, it is bad practice simply to assume that the (unobserved) performance levels of Success Academy entrants are representative of their peers' citywide, especially when one needn't make such assumptions to get a sense of performance.*
To be clear, I am not implying that the Success Academies are low-performing. In fact, by a test-based standard of school effectiveness, for which one should be relying on growth model estimates, the Success Academies score very well (as do many other regular public and charter schools). They may or may not bring huge masses of very low-scoring students up to proficient/advanced levels by third grade, but they do seem to compel highly impressive testing increases from their students, and that should not be minimized or dismissed (rather, the question, about which I've written extensively, is how they do it). It should, however, be assessed with the best available tools (imperfect though those tools may be).
(Side note: Charter advocates may be interested to learn that the city's KIPP schools had 2014 math proficiency rates ranging from 25 to 60 percent. Should they be judged by these absolute rates?)
Finally, one might accuse me of being overly cautious. After all, if the Success Academies receive high growth model scores, and insofar as their high proficient/advanced rates at least partially reflect that growth, does it really do any harm to use the latter as evidence of their effectiveness in raising test scores? I would argue that it does. If the goal here is to learn from successful schools, we have to be very cautious -- and consistent -- in how we identify successful schools (including, hopefully, the use of non-test measures). Bad measurement leads to bad decisions and counterproductive incentives.
There are, for instance, a lot of schools, in NYC and elsewhere, that are extremely effective in terms of boosting testing outcomes, but they may not exhibit high proficiency rates for the simple reason that their students enter way behind. One can only cringe at the thought of these schools, which compel improvement from those students who most need it, being ignored in terms of what they can teach us, and perhaps even being labeled as "failing," due to nothing more than the misinterpretation of testing data.
Yet this happens every day, and it will keep happening so long as we continue to interpret testing data in a manner that fails to acknowledge the obvious fact that learning doesn't begin and end inside of school buildings.
- Matt Di Carlo
* Though there is at least some evidence that cohorts entering charter middle schools run by one similarly high-profile chain -- KIPP -- do not test at a significantly higher levels than their counterparts in nearby regular public schools.