For almost two decades now, educational accountability policy in the U.S. has included a focus on the performance of student subgroups, such as those defined by race and ethnicity, income, or special education status. The (very sensible) logic behind this focus is the simple fact that aggregate performance measures, whether at the state-, district-, or school levels, often mask large gaps between subgroups.
Yet one of the unintended consequences of this subgroup focus has been confusion among both policymakers and the public as to how to interpret and use subgroup indicators in formal school accountability systems, particularly when those indicators are expressed as simple “achievement gaps” or “gap closing” measures. This is not only because achievement gaps can narrow for undesirable reasons and widen for desirable reasons, but also because many gaps exist prior to entry into the school (or district). If, for instance, a large Hispanic/White achievement gap for a given cohort exists at the start of kindergarten, it is misleading and potentially damaging to hold a school accountable for the persistence of that gap in later grades – particularly in cases where public policy has failed to provide the extra resources and supports that might help lower-performing students make accelerated achievement gains every year. In addition, the coarseness of current educational variables, particularly those usually used as income proxies, limits the detail and utility of some subgroup measures.
A helpful and timely little analysis by David Figlio and Krzystof Karbownik, published by the Brookings Institution, addresses some of these issues, and the findings have clear policy implications.
Figlio and Karbownik use a unique dataset of 568 elementary schools in Florida between 1999 and 2012. They are able to match students’ schooling records to their birth certificates, thus providing data on parental education, marital status, poverty status, and other characteristics. This allows for the construction of a socioeconomic status (SES) index that is far more complete than the typical subsidized lunch eligibility variable. Students are sorted into SES quartiles, and most of the analysis focuses on comparing the top SES students (top 25 percent) to the bottom SES students (bottom 25 percent), using testing (FCAT) data from grades three and five.
They find, first, that the correlation between the average standardized score of high and low SES students in each school is quite modest. In some schools, both groups score highly (relative to their peers in other schools); in other schools, both score relatively poorly; in still other schools, one group does well and the other does poorly. This is another way of showing that SES-based gaps in test scores – i.e., the gap in scores between the top and bottom SES quartiles – vary widely among schools.
There are two possible reasons for this variation and they are not mutually exclusive. The first is differences in scores upon entry into the schools. If, for example, kindergarteners enter one school with a much wider SES gap than their peers in a different school, this could account for any differences in the size of achievement gaps between these two schools when these students reach third and fifth grade. The second possible explanation is that something happened while the students were attending the schools – e.g., a group in one school made more progress than in the other, thus exacerbating or attenuating any gaps that existed upon entry.
Figlio and Karbownik cannot isolate precisely the degree to which either of these explanations holds, but they do provide some initial evidence by exploiting the fact that they have kindergarten testing data for five of their eight cohorts. As one would expect, they find that SES-linked student achievement gaps in kindergarten do go a long way toward predicting the SES-linked gaps in third and fifth grade, but also that a substantial proportion of the variation remains. In other words, the variation in SES-related student achievement gaps in third and fifth grade vary by school, even after taking students’ starting points into account.
The amount of growth between third and fifth grade exhibited by high and low SES students, while correlated, also varies quite a bit by school. In some schools, both groups make strong relative progress, in other schools both make poor progress, and in others one group grows a lot and the other does not. Once again, this suggests that schools vary in their (test-based) effectiveness with these groups.
Finally, all of this variation is not just statewide. Indeed, there are even big differences in the SES-related achievement gap between schools within the same district.
If we accept the very plausible implication of these findings – that schools vary in how well they serve high and low SES students, at least in terms of test scores – then two big questions might follow. The first is why. Why would some schools, even in the same district, seem to do well with both groups, others with neither, and others with one and not the other? Are there specific school practices or policies that might help explain these discrepancies? These are difficult empirical questions to answer, yet they are important insofar as accountability systems are about changing behavior productively (see, for instance, Rouse et al. 2013).
The second big question that might flow from these findings is what to do about them. Figlio and Karbownik suggest, correctly, that researchers and policymakers should focus on the practices of specific schools rather than solely on districts (since the gaps and other outcomes vary between schools within the same district). They also propose that accountability systems should focus on the performance of student subgroups, rather than solely on schoolwide measures, since outcomes often vary between subgroups (in this case, SES groups) within the same schools.
The latter is, of course, a perfectly defensible conclusion, and it is very much in line with current federal policy. Note, however, that these findings do not necessarily support the use of simple achievement gap or “gap closing” measures (which some states are using). This is not only because kindergarten gaps explain so much of the gaps that are found in third and fifth grade (and schools might be blamed for the gaps that existed before students even enrolled), but also because progress between grade three and five varies for high and low SES students in the same school, which again means that gaps might narrow due to undesirable underlying outcomes or widen due to desirable outcomes.
Figlio and Karbownik’s results do, however, suggest the need to consider some types of subgroup-focused accountability measures, such as a component for growth among traditionally low scoring subgroups (though it is not entirely clear how much typical schoolwide growth model estimates vary from those applied to specific subgroups – see our quick analysis for an example).
This type of component would carry with it a clear incentive for schools to focus on improving the performance of traditionally lower scoring student subgroups, such as low students living in poverty, special education students, or students from racial/ethnic and language minority populations. The snag here is that we would presumably want this to happen without any negative consequences for the other students. And policymakers would need to provide schools with the wherewithal to accomplish these goals. Needless to say, this is much easier said than done.