Quality Control In Charter School Research

There's a fairly large body of research showing that charter schools vary widely in test-based performance relative to regular public schools, both by location as well as subgroup. Yet, you'll often hear people point out that the highest-quality evidence suggests otherwise (see here, here and here) - i.e., that there are a handful of studies using experimental methods (randomized controlled trials, or RCTs) and these analyses generally find stronger, more uniform positive charter impacts.

Sometimes, this argument is used to imply that the evidence, as a whole, clearly favors charters, and, perhaps by extension, that many of the rigorous non-experimental charter studies - those using sophisticated techniques to control for differences between students - would lead to different conclusions were they RCTs.*

Though these latter assertions are based on a valid point about the power of experimental studies (the few of which we have are often ignored in the debate over charters), they are dubiously overstated for a couple of reasons, discussed below. But a new report from the (indispensable) organization Mathematica addresses the issue head on, by directly comparing estimates of charter school effects that come from an experimental analysis with those from non-experimental analyses of the same group of schools.

The researchers find that there are differences in the results, but many are not statistically significant and those that are don't usually alter the conclusions. This is an important (and somewhat rare) study, one that does not, of course, settle the issue, but does provide some additional tentative support for the use of strong non-experimental charter research in policy decisions.

As most people know, one of the big issues in charter school research, common elsewhere as well, is selection effects – the idea that applicants to charter schools are different from non-applicants in terms of unobserved characteristics such as motivation, social networks, family involvement in their education and whether or not they're thriving in their current school.

Researchers who wish to isolate the effect of charter schools must address this issue by attempting to control for these differences between students, using variables such as prior achievement, lunch program eligibility and special education classification. When done correctly, this approach can be quite powerful, but it does entail the (unlikely and untestable) assumption that the two groups (treatment and control) do not differ on any observable or unobservable characteristics that might influence the results, at least to some extent.

RCTs, on the other hand, account for the differences between students using the magic of random assignment. Put simply, they focus on applicants only, and compare those who did and did not attend a charter school. Since the factor determining which schools these students attend is a random lottery, this means that charter students should be, on average, the same as non-charter students in terms of student and family characteristics that are known to affect achievement.

Due in large part to the infrequency of lotteries, there are only a handful of charter RCTs out there. Most notably, a lottery study of New York City charters found generally positive, meaningful effects on a variety of different outcomes, as did this analysis of charters in Boston (also see here for more on Massachusetts, and this RCT of a small group of charters in Chicago).**

More recently, a 2010 Mathematica lottery study of 36 charter middle schools across six states, a relatively large proportion of which were in rural or suburban areas, found no discernible effect overall (there was, however, a positive impact among low-income students, one which was very modest in reading and strong in math, along with a strong negative impact among non-low-income students in both subjects). The data from this study were expanded and used in the new Mathematica analysis discussed below.

So, while it's true that the lottery studies generally show positive effects, it's also worth noting that we are really only talking about a small group of studies in a limited set of locations, most of which include relatively few schools.

Moreover, one might reasonably assert that oversubscribed charters tend to be among the better ones out there (as evident in the number of applications they receive). So, it’s hard to argue that they are representative of charters as a whole, whether in a given location or nationally (see here and here for more on this issue of external validity).***

That said, the “best evidence” argument is obviously correct in its premise that RCTs are to be preferred, even if they are not without their own issues (e.g., limited scope). Since, however, experimental analyses are simply not feasible in most cases, it is important to assess whether different non-experimental alternatives are suitable for informing policy decisions.

The Mathematica study is among the first to do so in the charter school context, and they do it in a manner that deals explicitly with the formidable complications involved in comparing experimental and non-experimental estimates. The basic approach, highly simplified, consists of two steps.

First, they estimate charter effects the way they would have to if there was no random assignment – comparing charter students with their counterparts in regular public schools, including those who didn’t apply to the lottery, and controlling for differences between these groups of students using variables such as prior achievement and lunch program eligibility (the researchers employ four different non-experimental methods that are generally considered rigorous).

Second, they compare these estimates with an experimental "benchmark" - an analysis of the same schools that exploits the randomization of the lotteries – i.e., comparing applicants who attended charters with applicants who did not.

In short, the Mathematica researchers find that the two general types of analyses do yield different results, the extent and statistical significance of which vary by technique, but that the magnitude of these differences (if any) are, in most cases, quite modest and don’t serve to change the policy conclusions very much. In other words, there does (unsurprisingly) appear to be some bias in the non-experimental estimates, but the results suggest that, in most cases, it is not particularly large, at least in this case (see similar findings here).

Of course, this is only one study, one that is exceedingly rigorous and geographically diverse (with schools from several different states), but includes only a relatively small sample of (middle) schools over a relatively short time period.****

Still, it's another important early step validating the use of non-experimental charter school research designs in policy decisions, including which specific approaches provide estimates comparable to those of experimental designs. We will hopefully see more of these replications, since they are important for researchers assessing not only whether charter schools generate better  testing outcomes than comparable regular public schools, but also the (I would argue) more pressing question of why, in a few cases, some do.

- Matt Di Carlo


* Just to be perfectly clear, if you define the body of charter school research in terms of any study from any researcher/ organization that has ever been released, the majority are not at all useful, as so many consist of crude, unadjusted comparisons. When I talk about the high-quality non-experimental evidence, I'm talking about a few dozen analyses (see this meta-analysis) that employ sophisticated techniques designed to control (albeit imperfectly) for differences between students/schools.

** Incidentally, a non-experimental study using data from New York City, done by CREDO, also found positive impacts, but they were smaller in magnitude, prompting an interesting and somewhat heated exchange between the researchers (see hereherehere and here, in chronological order). For whatever it's worth, one of the non-experimental techniques examined in the Mathematica analysis discussed above is a rough approximation of CREDO's approach, and the estimates yielded by these models were similar to those from the experimental benchmarks (but remember this study includes a set of schools that is very different from New York City's).

*** I have also raised the possibility that the results from the RCTs in New York City and Boston are influenced in part by charter market share. That is, RCTs, especially single-location RCTs, are only possible when charters are oversubscribed, which is more likely to occur in places with relatively fewer charters (as is the case in Boston and New York City). This itself offers benefits to these schools, as they compete for limited resources such as personnel and private funding.

**** For instance, there is some evidence that experimental and non-experimental estimates may vary by the type and and level of school to which regular public schools are compared. In addition, as mentioned briefly above there is some variation by non-experimental approach, subject (math/reading) and subsample within the Mathematica study.


Excellent blog post.