A Few Points About The New CREDO Charter School Analysis

A new report from CREDO on charter schools’ test-based performance received a great deal of attention, and rightfully so - it includes 27 states, which together serve 95 percent of the nation's charter students.

The analysis as a whole, like its predecessor, is a great contribution. Its sheer scope, as well as a few specific parts (examination of trends), are new and important. And most of the findings serve to reaffirm the core conclusions of the existing research on charters' estimated test-based effects. Such an interpretation may not be particularly satisfying to charter supporters and opponents looking for new ammunition, but the fact that this national analysis will not settle anything in the contentious debate about charter schools once again suggests the need to start asking a different set of questions.

Along these lines, as well as others, there are a few points worth discussing quickly.

The overall finding is that charter schools vary widely in test-based performance, and on the whole are not meaningfully better or worse. First, very briefly, the overall difference between charters and district schools in the full 27-state sample is extremely small– about 0.01 standard deviations (and not statistically significant in math). In other words, in this analysis, which includes a huge share of the nation’s charters, the impact of these schools relative to their comparable traditional public counterparts is negligible.*

There are two caveats here. First, as discussed below, the estimated impact overall masks considerable variation within and between states. Second, when interpreting the effect sizes, remember that education is cumulative - single-year gains can add up.*

The relative impact of charters included in the 2009 study is better using the more recent data, but that's mostly a function of a decrease in comparison schools' impacts (and the discrepancy is arguably modest). One of the sub-analyses in this report is a look at the trend in charter performance - i.e., comparing the results before 2007-2008 with those between 2008 and 2010-11 - for the schools in the 16 states that were included in the 2009 report (at least those that remained open).**

The big finding here is that the relative impact of schools included in CREDO’s 2009 sample improved over time. This is basically true, but the key word is "relative." The primary reason why charters "improved" is that the estimated effect of regular public schools was lower in the later time period compared with the earlier. In other words, put simply, continuing charters' performance was actually quite stable, but it "looks better" because the performance of the schools to which charters are compared is slightly lower. This of course does not mean the trend is unimportant or can be dismissed - charter schools should be assessed primarily versus their regular public alternatives - but the underlying dynamics are also relevant.

In addition, the difference in (relative) estimated impacts between the earlier and later time periods is not large (across all students/states, as well as when separated by student subgroup, it is no more than 0.02 standard deviations). Thus, the overall effect is modest in both time periods (though the reading estimate went from tiny and negative to tiny and positive, which is certainly meaningful). Moreover, the charters that opened since 2008 that time are modestly less effective than continuing schools in both reading and math.

On the other hand, one shouldn’t lose sight of the fact that real improvement is almost always gradual, particularly across such a large group of schools, and that, despite the unrealistic expectations that sometimes dominate education debates, 4-5 years is not a very long time in policy terms. If, for example, CREDO replicates this analysis a few years from now, and finds another modest increase – well, a standard deviation here and a standard deviation there, and sooner or later you’re talking about large differences. For now, though, the improvement is significant, but probably shouldn't be overinterpreted.

The more interesting findings are those by state… As I and others have argued many times, the important policy question at this point is less about whether charters are better (overall, they’re not), but rather why some do well and some do not. And the state-level results are far more useful for this purpose than those overall. Although, predictably, most states' estimated effects are similar to the national average, there is still quite a bit of variation.

For instance, a few states stand out with meaningfully large effect sizes in both subjects, including Louisiana, D.C., Rhode Island and Tennessee. In contrast, only Nevada exhibits large negative estimated impacts. This squares with prior research, which suggests that most charter sectors are for the most part only modestly different from comparable regular public schools in terms of test-based performance, but relatively few seem to have a negative (relative) impact on these outcomes.

There is also inter-state variation in terms of improvement. For example, it looks like a decent part of the overall change in estimated impact are driven by a few states, most notably D.C. and Arkansas (which, again, means that we should pay attention to these states).

Although attempting to explain these differences should be the priority, it is also extremely difficult. For example, some explanations of these inter-state differences, within and between time periods, have focused on the “authorization theory” – i.e., that the rules and regulations by which charters are approved and renewed are driving the differences. There is some support for that perspective in this report, particularly CREDO’s finding that closure drove some of the improvements (though, again, in the 16 original CREDO states, the schools that opened since 2007-08 are a bit lower-performing than those that remained open).

As argued here, my guess (speculation) is that the impact is meaningful but modest, and that effects vary as much within-authorizer (and state) as between them. In any case, we will need a lot more work to determine which factors, authorization included, drive these differences. And, to reiterate, remember that these findings also depend on the quality of the regular public schools to which charters are compared - a given effect in one state means something different than the same estimate in another state.

…and by student subgroup. Most of the estimated charter effects by subgroup, like those overall, are not especially large (and haven’t changed too much either), but there are positive impacts among traditionally lower-scoring groups. Bearing in the mind the caveats about interpreting effect sizes discussed above, the estimated charter impacts on black and special education students' growth are positive but small, slightly larger for low-income students, and even larger among English Language Learners.

This is a pretty common result in the charter literature - i.e., that charters do at least a bit better with disadvantaged student subgroups than regular public schools, and slightly worse with more advantaged subgroups. Less well-known, however, is why this appears to be the case. In very simple terms, it may be that these students do better because they tend to live in the areas with higher-performing charters (vis-a-vis regular public schools), or, perhaps, the fact that charters tend to be better in these areas (e.g., large cities versus suburbs) is because they are more adept at serving these subgroups.

Remember: This is all based on math and reading tests. I hesitate to point this out, not only because it is mundane, but also because I am often just as guilty of it as anyone. That said, let's be careful not to conflate "performance" with "relative estimated testing gains in two subjects." Testing data are currently the best option for assessing school performance, particularly for such a huge group of schools across so many states, but we all know their limitations. And this standing caveat should appeal not only to those who tend to oppose high-stakes testing, but also to charter supporters regardless of their views on testing, since there is limited evidence that charters might look better if different performance indicators, such as graduation or parental satisfaction, were more widely available/used.

Overall, what do these results mean? The differences within and between time periods are still quite small, and, overall, the major conclusion is no different than before: There is substantial variability in estimated charter school effects, and little meaningful difference on the whole. That said, the finding that charter schools' relative performance may be getting better is significant, and should not be disregarded. It will be very interesting to see if this improvement keeps up.

And, of course, the most important question – how do we explain these differences within and between time periods, states and subgroups – remains an open one, and is severely constrained by the difficult of gathering these data, but this report provides some useful information toward that goal (actually, having school-level estimates across 27 states is by itself a big asset). Going forward, this will hopefully be the focus of charter research.

One final point: It’s a little striking to consider that it’s been over 20 years since charter schools appeared on the public educational landscape, and opinions about them, positive and negative, tend to be exceedingly strong, but we’re still in the earlier phases of figuring them out. Good policy research, like good policy, requires time and patience.

- Matt Di Carlo

*****

* As a very rough guide, 0.01 standard deviations is about one percent of the typical test-based achievement gap between white and black students. However, keep in mind that the larger estimates for individual states compared with the effect overall is in part a function of sample size (i.e., the overall sample is larger, which means the results are more precisely estimated and will tend to be smaller than those for individual state samples).

** Actual data years vary a bit by state; see Table 2 in this supplemental report.

Blog Topics

A few thoughts:

- CREDO is hardly alone in using "days of learning" to explain the test score gains. But it's still highly misleading; the correct way to express this is in the terms of what we are measuring, which is test score items answered correctly. To state that a group gained "seven days of reading" over another has no analog in real-world teaching.

- One of the largest problems with CREDO's methodology is that they don't disaggregate the data about student demographics as well as they could. There is no reason, for example, that CREDO couldn't have done a separate analysis for students eligible for free lunch AND students eligible for free or reduced-price lunch. One is an indicator of deeper poverty and could have provided a better match. Why conflate the two when the data is available to allow a finer distinction?

- Same with special education, although I realize the data often doesn't allow for these distinctions. Still, there is all the difference in the world between a child with a severe cognitive impairment and one with a mild speech impediment.

- Matt's right that the real question here is "why" some charters are better than others, but I'd put it a different way: Are "successful" charters replicable? Can we reproduce the gains of a "good" charter for a large number of students? Ultimately, CREDO never even addresses this question. If the small differences are mostly peer effect, it doesn't speak very well for charters, does it?

Matthew, I tihink you missed what is possibly the biggest problem with the study: 8% of the worst charters closed between 2009 through 2013. This introduces massive survivorship bias in the study, and renders the comparison to public school essentially meaningless. See here, for instance:

http://edushyster.com/?p=2878

Out of curiosity, what does a .01 standard deviation of change reflect in terms of a test score? For example, would that correspond with one out of hundred students completing one more question correctly? And while single year gains can add up, they can also fluctuate from one year to the next as one year's cohort of students is better or worse than students that proceed or follow that group.

I am struck by the fact that an improvement of .01 standard deviations is considered significant. 0.01 S.D. of improvement is not " a standard deviation here and a standard deviation there, and sooner or later you’re talking about large differences".

On the other hand, I think you maybe on the mark about suggesting "the need to start asking a different set of questions." It may be as time goes on that regular public schools and charter schools look more and more alike in terms of performance measures in aggregate. In which case, you have to wonder if the results are simply a reflection of the variability of the particular school in question and whether those results really reflect being a charter or regular public school or the underlying demographics of that school. This is not to discount the success, a particular charter school (or regular public school) has achieved, but if we can't get at a curricular difference or a structural difference that is portable, what is the purpose of the charter school movement?