Louisiana's "School Performance Score" Doesn't Measure School Performance

Louisiana’s "School Performance Score" (SPS) is the state’s primary accountability measure, and it determines whether schools are subject to high-stakes decisions, most notably state takeover. For elementary and middle schools, 90 percent of the SPS is based on testing outcomes. For secondary schools, it is 70 percent (and 30 percent graduation rates).*

The SPS is largely calculated using absolute performance measures – specifically, the proportion of students falling into the state’s cutpoint-based categories (e.g., advanced, mastery, basic, etc.). This means that it is mostly measuring student performance, rather than school performance. That is, insofar as the SPS only tells you how high students score on the test, rather than how much they have improved, schools serving more advantaged populations will tend to do better (since their students tend to perform well when they entered the school) while those in impoverished neighborhoods will tend to do worse (even those whose students have made the largest testing gains).

One rough way to assess this bias is to check the association between SPS and student characteristics, such as poverty. So let’s take a quick look.

The scatterplot below presents the relationship between schools’ SPS in 2011 and the percent of their students receiving subsidized lunch, an imperfect but for our purposes adequate proxy for student poverty.

There is a moderate-to-strong relationship between poverty and SPS (the correlation coefficient is 0.64). Virtually all of the lowest-poverty schools score above 100, while virtually all the highest-poverty schools score below 100, indicating that Louisiana’s primary test-based accountability measure is pretty severely biased by student characteristics.**

This is also the case in states using similar measures, such as Ohio and Florida.

Now, it is true that schools can receive some credit in their accountability ratings if their SPS scores increase between years. Unfortunately, this is calculated through simple subtraction. There is no attempt to control for measurement error, student characteristics, or other factors outside of schools’ control. Nor is there an attempt to address the problems inherent in looking at year-to-year changes in proficiency- and other cutpoint-based measures.

As was the case with California and New York, this means that, to no small extent, year-to-year changes in schools’ SPS are attributable to random error and/or due to factors outside of schools’ control (also see here and here). If this is the case – if a school’s “SPS growth” in one year does not predict its “growth” the next year – then it makes little sense to punish or reward schools based on these changes.

This is evident in the second scatterplot, below, which compares the change in SPS for each school (each dot) between 2009 and 2010 (the vertical axis) with that between 2010 and 2011 (the horizontal axis). If the 2009-2010 change has predictive power, we might expect to see an upward slope in the dots.

Obviously, annual change in the SPS does a rather poor job of predicting change in the next year. Schools that show increases in their SPS scores in one year are likely to show decreases the next year, and vice-versa.

So, Louisiana’s school rating system relies almost exclusively on a measure (the SPS) that is heavily biased by student characteristics, and year-to-year changes in this measure are borderline random. It says little about the actual effectiveness of schools.

- Matt Di Carlo

*****

* Effective in 2012-13, these proportions will change slightly, per the state's NCLB waiver application. I should also note that not all actions taken based on low SPS are punitive. For example, schools might also be targeted for additional resources, such as tutoring.

** In fairness, even a valid measure of school performance might still be correlated with student poverty, if lower-performing schools were concentrated in poorer neighborhoods. However, this is not particularly plausible in the case of the SPS, given the extent of the bias, as well as the fact that it is so clearly due to the choice of measures.