Fundamental Flaws In The IFF Report On D.C. Schools

A new report, commissioned by the District of Columbia Mayor Vincent Gray and conducted by the Chicago-based consulting organization IFF, was supposed to provide guidance on how the District might act and invest strategically in school improvement, including optimizing the distribution of students across schools, many of which are either over- or under-enrolled.

Needless to say, this is a monumental task. Not only does it entail the identification of high- and low-performing schools, but plans for improving them as well. Even the most rigorous efforts to achieve these goals, especially in a large city like D.C., would be to some degree speculative and error-prone.

This is not a rigorous effort. IFF’s final report is polished and attractive, with lovely maps and color-coded tables presenting a lot of summary statistics. But there’s no emperor underneath those clothes. The report's data and analysis are so deeply flawed that its (rather non-specific) recommendations should not be taken seriously.

The Perilous Conflation Of Student And School Performance

Unlike many of my colleagues and friends, I personally support the use of standardized testing results in education policy, even, with caution and in a limited role, in high-stakes decisions. That said, I also think that the focus on test scores has gone way too far and their use is being implemented unwisely, in many cases to a degree at which I believe the policies will not only fail to generate improvement, but may even risk harm.

In addition, of course, tests have a very productive low-stakes role to play on the ground – for example, when teachers and administrators use the results for diagnosis and to inform instruction.

Frankly, I would be a lot more comfortable with the role of testing data – whether in policy, on the ground, or in our public discourse – but for the relentless flow of misinterpretation from both supporters and opponents. In my experience (which I acknowledge may not be representative of reality), by far the most common mistake is the conflation of student and school performance, as measured by testing results.

Consider the following three stylized arguments, which you can hear in some form almost every week:

Schools' Effectiveness Varies By What They Do, Not What They Are

There may be a mini-trend emerging in certain types of charter school analyses, one that seems a bit trivial but has interesting implications that bear on the debate about charter schools in general. It pertains to how charter effects are presented.

Usually, when researchers estimate the effect of some intervention, the main finding is the overall impact, perhaps accompanied by a breakdown by subgroups and supplemental analyses. In the case of charter schools, this would be the estimated overall difference in performance (usually testing gains) between students attending charters versus their counterparts in comparable regular public schools.

Two relatively recent charter school reports, however – both generally well-done given their scope and available data – have taken a somewhat different approach, at least in the “public roll-out” of their results.

The Ratings Game: New York City Edition

Gotham Schools reports that the New York City Department of Education rolled out this year’s school report card grades by highlighting the grades’ stability between this year and last. That is, they argued that schools’ grades were roughly the same between years, which is supposed to serve as evidence of the system’s quality.

The city’s logic here is generally sound. As I’ve noted before, most schools don’t undergo drastic changes in their operations over the course of a year, and so fluctuations in grades among a large number of schools might serve as a warning sign that there’s something wrong with the measures being used. Conversely, it’s not unreasonable to expect from a high-quality rating system that, over a two-year period, some schools would get higher grades and some lower, but that most would stay put. That was the city’s argument this year.

The only problem is that this wasn’t really the case.

Making (Up) The Grade In Ohio

In a post last week over at Flypaper, the Fordham Institute’s Terry Ryan took a “frank look” at the ratings of the handful of Ohio charter schools that Fordham’s Ohio branch manages. He noted that the Fordham schools didn’t make a particularly strong showing, ranking 24th among the state’s 47 charter authorizers in terms of the aggregate “performance index” among the schools it authorizes. Mr. Ryan takes the opportunity to offer a few valid explanations as to why Fordham ranked in the middle of the charter authorizer pack, such as the fact that the state’s “dropout recovery schools," which accept especially hard-to-serve students who left public schools, aren’t included (which would likely bump up Fordham's relative ranking).

Mr. Ryan doth protest too little. His primary argument, which he touches on but does not flesh out, should be that Ohio’s performance index is more a measure of student characteristics than of any defensible concept of school effectiveness. By itself, it reveals relatively little about the “quality” of schools operated by Ohio’s charter authorizers.

But the limitations of measures like the performance index, which are discussed below (and in the post linked above), have implications far beyond Ohio’s charter authorizers. The primary means by which Ohio assesses school/district performance is the state’s overall “report card grades," which are composite ratings comprised of multiple test-based measures, including the performance index. Unfortunately, however, these ratings are also not a particularly useful measure of school effectiveness. Not only are the grades unstable between years, but they also rely too heavily on test-based measures, including the index, that fail to account for student characteristics. While any attempt to measure school performance using testing data is subject to imprecision, Ohio’s effort falls short.

The Stability Of Ohio's School Value-Added Ratings And Why It Matters

I have discussed before how most testing data released to the public are cross-sectional, and how comparing them between years entails the comparison of two different groups of students. One way to address these issues is to calculate and release school- and district-level value-added scores.

Value added estimates are not only longitudinal (i.e., they follow students over time), but the models go a long way toward accounting for differences in the characteristics of students between schools and districts. Put simply, these models calculate “expectations” for student test score gains based on student (and sometimes school) characteristics, which are then used to gauge whether schools’ students did better or worse than expected.

Ohio is among the few states that release school- and district-level value-added estimates (though this number will probably increase very soon). These results are also used in high-stakes decisions, as they are a major component of Ohio’s “report card” grades for schools, which can be used to close or sanction specific schools. So, I thought it might be useful to take a look at these data and their stability over the past two years. In other words, what proportion of the schools that receive a given rating in one year will get that same rating the next year?

Remorse Code (Or Comments From A Crib Strangler)

Those who publicly advocate for the kind of education policies put forth in "Waiting for Superman" are now seeing the equivalent of a letter-high fastball down the middle. They can wait on it and crank it out, using the buzz created by a major motion picture to advance the movie’s/campaign’s arguments at face value. I'm a little late on this (at least by blog standards), but over at Fordham’s Flypaper blog, Mike Petrilli saw this fastball, yet instead of unloading, he sacrifice bunted the runner into scoring position for the good of the team.

Responding to an interview in which Davis Guggenheim, the film’s director, claims that charter schools have "cracked the code" on how to educate even the poorest kids, Petrilli warns against the hubris of thinking that we are anywhere beyond first steps when it comes to fixing urban schools. He points out that charters like KIPP benefit from selection effects (more motivated and informed parents seek out the schools), and that the degree to which these schools have actually "closed the gap" between poor and affluent schools has been somewhat oversold. Petrilli also notes that while some of these schools seem to have "cracked the code," there is still little idea of how to expand them to serve more than a tiny minority of poor kids.

Thoughtful comments like these should remind those of us who care about expanding quality education that, although we may have canyon-sized differences between us on what needs to be done (Petrilli claims that those who disagree with him are trying to "strangle" reforms "in their crib"), there may be a few important respects in which we are closer than we may appear. Still (in addition to the crib-strangling allegations), I would take issue with one of Petrilli’s central points – that charters like KIPP may have "cracked the code," and the main problem now is how to scale them up. From my perspective, the "code" is specific policies and practices that produce results. And on this front, we’re practically still using decoder rings from cereal boxes.

What Is "Charterness," Exactly?

** Also posted here on Valerie Strauss' Answer Sheet in the Washington Post.

Two weeks ago, researchers from Mathematica dropped a bomb on the education policy community. It didn’t go off.

The report (prepared for the Institute for Education Sciences, a division of the USDOE) includes students in 36 charter schools throughout 15 states. The central conclusion: the vast majority of charter students does no better or worse than their regular public counterparts in math and reading scores (or on most of the other 35 outcomes examined). On the other hand, charter parents and students are more satisfied with their schools, and charters are more effective boosting scores of lower-income students.