** Reprinted here in the Washington Post
I’ve written many times about how absolute performance levels – how highly students score – are not by themselves valid indicators of school quality, since, most basically, they don’t account for the fact that students enter the schooling system at different levels. One of the most blatant (and common) manifestations of this mistake is when people use NAEP results to determine the quality of a state's schools.
For instance, you’ll often hear that Massachusetts has the “best” schools in the U.S. and Mississippi the “worst," with both claims based solely on average scores on the NAEP (though, technically, Massachusetts public school students' scores are statistically tied with at least one other state on two of the four main NAEP exams, while Mississippi's rankings vary a bit by grade/subject, and its scores are also not statistically different from several other states').
But we all know that these two states are very different in terms of basic characteristics such as income, parental education, etc. Any assessment of educational quality, whether at the state or local level, is necessarily complicated, and ignoring differences between students precludes any meaningful comparisons of school effectiveness. Schooling quality is important, but it cannot be assessed by sorting and ranking raw test scores in a spreadsheet.
Income is one of the most common variables used to illustrate the interconnectedness of student background and educational outcomes such as test scores (even though it is the conditions often associated with income that exert influence, rather than income itself). And, indeed, the proportion of Mississippi’s public school students eligible for federal lunch subsidies, an income/poverty proxy, is roughly twice as high as that of Massachusetts (63 versus 29 statewide, and 67 versus 32 in the NAEP reading results below).
Let’s see how this simple bivariate relationship looks across all states. The scatterplot below presents free/reduced-price lunch (FRL) eligibility rates of test takers by average NAEP reading scores from 2011. We’ll use eighth rather than fourth grade scores, since the latter only reflect 3-4 years of schooling; the sample is also limited to public school students only.
Each red dot is a single state (D.C. is excluded), while the line in the middle of the plot represents the average relationship between students’ FRL rates and their NAEP reading scores. Predictably, this is a strong association (the correlation coefficient is -0.83 [and -0.79 in math]). There is some deviation of dots (states) from the line, but, on the whole, scores tend to be lower in states with higher poverty.
And income is of course not the only relevant observable student characteristic. Although one must be very careful about interpreting models that use state-level data (i.e., this is purely illustrative), a simple regression that includes FRL, as well as the percent of students who are minorities, special education, and limited English proficient (LEP) explains about three-quarters of the variation in NAEP reading scores.
These crude results are indicative of what we know from other, more rigorous research: How highly students score on tests is mostly a function of their backgrounds, rather than where they attend school.
That is precisely why most value-added models, which are specifically designed to isolate (albeit imperfectly) schools’ effects on the test performance of their students, focus on growth – how quickly students improve – and they actually posit absolute performance level as a control variable.
Even using this growth-oriented perspective, however, any attempt to determine which state has the “best schools” would be rife with complications. Most basically, NAEP is really the only test administered to a representative sample of students in all states at regular intervals, but the data are cross-sectional, which means that changes over time may reflect differences between cohorts (see here). Also, school effectiveness, like education policy in general, likely varies more within than between states.
What we can use NAEP for is to determine - with a reasonable degree of confidence - which states have the highest performing students (at least to the extent tests can measure this). Yet this valuable information is frequently lost in a barrage of misinterpretation on the part of adults, sometimes trying to advocate on behalf of their policy preferences or their personal reputations.
On the whole, interpreting testing and other outcome data requires a humble, nuanced approach. The choice of measures must be guided by what one is trying to assess. This is not easily compatible with the highly-charged political environment surrounding today's education policy debates. But we’ll know we’ve made progress when we stop hearing statements such as those positing the “best” and “worst schools” based solely on absolute scores.
- Matt Di Carlo