The District of Columbia Public Charter School Board (PCSB) recently released the 2014 results of their “Performance Management Framework” (PMF), which is the rating system that the PCSB uses for its schools.
Very quick background: This system sorts schools into one of three “tiers," with Tier 1 being the highest-performing, as measured by the system, and Tier 3 being the lowest. The ratings are based on a weighted combination of four types of factors -- progress, achievement, gateway, and leading -- which are described in detail in the first footnote.* As discussed in a previous post, the PCSB system, in my opinion, is better than many others out there, since growth measures play a fairly prominent role in the ratings, and, as a result, the final scores are only moderately correlated with key student characteristics such as subsidized lunch eligibility.** In addition, the PCSB is quite diligent about making the PMF results accessible to parents and other stakeholders, and, for the record, I have found the staff very open to sharing data and answering questions.
That said, PCSB's big message this year was that schools’ ratings are improving over time, and that, as a result, a substantially larger proportion of DC charter students are attending top-rated schools. This was reported uncritically by several media outlets, including this story in the Washington Post. It is also based on a somewhat questionable use of the data. Let’s take a very simple look at the PMF dataset, first to examine this claim and then, more importantly, to see what we can learn about the PMF and DC charter schools in 2013 and 2014.
First, it bears emphasizing that I am very much opposed to judging school performance per se, or changes in that performance, based on distributions of ratings from these kinds of systems, in large part because doing so assumes that the ratings are valid for this purpose. In my view, some of the measures used to calculate the ratings can be useful in that particular context, but others are more accurately viewed as ratings of students' test-based performance than that of the schools they attend. The PMF system, to PCSB's credit, may weight the former measures more heavily than do a bunch of other systems, but the DC charter ratings are still driven substantially by measures of student rather than school performance (i.e., how highly students score, rather than schools' contribution to that performance). These two general types of measures can be useful for accountability purposes, but there is a critical difference between them, one that is not consistent with the practice of giving schools single ratings and characterizing those ratings in terms of "school performance."
That said, we’ll put aside this inevitable caveat for the moment, and move on. Since this is a rather long post, I have relegated some of the more technical discussion to footnotes, and I will provide a summary of its major points right at the outset:
- In reality, the number of students attending schools that actually received Tier 1 ratings was basically flat between 2013 and 2014;
- Among schools receiving ratings in both 2013 and 2014, PMF scores were, on average, relatively stable (and this is probably a good sign);
- A superficial look at the four subcomponents of the PMF framework suggests, as would be expected, that they exhibit somewhat different properties within and between years, with a couple of issues that might be examined further.
My intention in performing this quick analysis was not to "fact check," but I came across an issue that, I think, requires clarification. As mentioned above, the PCSB press release, as well as the Post story, reports that 2014 enrollment in these 22 Tier 1 schools was over 12,000 students, and that this is a roughly nine percent increase from 2013. The actual total is about 8,000 students, and the proportion is more or less stable since 2013. There are a few reasons for these discrepancies. These are summarized in the list below, with additional details in the third footnote:
- The 12,000 figure also includes enrollment in schools that did not receive ratings at all, but were simply affiliated with networks whose other schools did receive Tier 1 ratings. This is not defensible. If schools or campuses don't receive tier ratings, they should not be included in tabulations of the students attending schools that received a given rating;
- The PCSB figure is based on 2014-15 pre-audited enrollment, which means it does not actually reflect the number of students who attended these schools during the 2013-14 school year. The ratings, which (as shown below) fluctuate quite a bit between years, apply to 2013-14, and any statistics about enrollment in these schools should also use 2013-14 data;
- From what I can tell, several "low-performing" charter schools are excluded entirely from receiving 2014 ratings because they closed at the end of that year, even though they were in full operation, and should be included in any statistics about the sector's performance in that year.***
As you can see, the percentage of students attending Tier 1 schools was basically flat between 2013 and 2014 (there was a one percentage point increase), which is not consistent with the claims in PCSB’s press release. Again, this is mostly because the 12,000 figure includes schools that did not actually receive tier ratings.
In contrast, the percentage of students attending the lowest-rated Tier 3 schools dropped quite sharply, but this is in no small part due to the closure issue discussed above (#3 in the list). As a rough illustration of the impact this had on the figures, including these closed schools in the 2014 data with their 2013 tier ratings, and using their 2013 enrollments as rough estimates, increases the proportion of students attending Tier 3 schools in 2014 to about 14 percent, and reduces the enrollment in Tier 1 schools to slightly lower than its 2014 level.
In addition to the validity of the ratings themselves, these results are another good illustration of why claims about performance improvement based on the results of school rating systems should be viewed with skepticism (see this post for a far more blatant example from Florida).
Moving on, let’s take a slightly different approach to looking at changes in the ratings, one that doesn't depend on the sample of schools included, and one that focuses not on examining the claims of PCSB, but rather on the far more important issue of what we can learn about the results of the PMF system and its constituent parts.
Of the 64 schools that were rated in 2014, 59 also received ratings in 2013. We will focus on these schools for most of the remainder of this post.*****
For starters, the table below summarizes the “movement” in ratings between 2013 and 2014. For example, 16 schools (the upper left cell, shaded in yellow) received the highest rating (Tier 1) in both 2013 and 2014. Moving over one cell to the right, six schools received a Tier 1 rating in 2013 but were “downgraded” to Tier 2 in 2014 (remember that higher tiers are better, so an increase in tier is actually a lower rating).
Overall, then, just over 70 percent of DC charters (42 out of 59) received the same rating in 2013 and 2014. Among the 17 schools that received different ratings between years, ten did worse, and seven did better. (Note, once again, that this level of ratings "turnover," while hardly unusual, does illustrate one of the problems with using 2014-15 pre-audited enrollment to characterize enrollment in tiers calculated using 2013-14 data - doing so assumes implicitly that ratings will be stable between years, and they are not. Over one in four Tier 1 schools in 2013 received a lower rating in 2014.)
The fact that almost three out of ten schools got a different rating in 2014 than in 2013 may seem like quite a bit of turnover over such a short period of time, particularly for a system that sort schools into just three categories. As is the case with proficiency rates, however, some of this movement is just schools that were close to the thresholds, and might therefore have moved up or down a tier with little change in their score.
So, let’s take a look at how actual numerical scores compared between 2013 and 2014. In the scatterplot below, each dot is a charter school, and their scores are plotted for 2013 (horizontal axis) and 2014 (vertical axis). The blue line is the "average relationship" between scores in these two years.
You can see that there is a pretty tight relationship between scores in 2013 and 2014. The correlation coefficient is 0.85, which is very high.
If we sum up the results for each year, the average score among these 59 schools was 58.4 in 2013, and dropped to 57.7 in 2014 (the averages are approximately the same if they are weighted by student enrollment, and the changes vary only modestly by school type, with the largest drop found among middle schools). This change in the average score is so minor that the safest interpretation is to regard it as more or less flat. This, as discussed below, is probably a good sign for the system.
Now let’s see how each of the four measures that comprise the system – progress, achievement, gateway and leading – look in 2013 and 2014 (see the first footnote for a description of these components). I compiled these data manually, using the school report cards. The table below presents means and standard deviations (a common measure of variation) for each measure in both years, as well as their year-to-year correlations between 2013 and 2014 (note that the figures below apply to scores [actually, percentages of possible points earned] that PCSB calculates based on the raw outcomes, such as proficient/advanced rates and median growth percentiles, and not the outcomes themselves).
As you can see in the table, components do differ somewhat in how much they vary in any given year, as is usually the case when measures are not standardized. In addition, the average score on all but one of the four sub-components dropped a bit between 2013 and 2014, with the sole exception being the “leading indicators," which exhibited a modest increase. In all cases, however, the changes were once again very minor.
(Side note: For a discussion of how differences between measures in the degree to which they vary in any given year can influence the “true weighting” of a system, see this post.)
Taking a quick look at the year-to-year correlations, as would be expected, the achievement and gateway components exhibit the most stability between years. This is mostly because they are based on absolute performance measures (e.g., proficient/advanced rates), which do not fluctuate much over short periods of time. The stability of the growth measure (median growth percentiles) may seem high to those accustomed to looking at results for teacher-level value-added estimates, but schoolwide growth measures are more stable, largely because they are based on larger samples than teacher-level estimates and are thus estimated less imprecisely. This level of stability in D.C. is roughly in line with what I've found in other states using similar models, such as Colorado.
What stands out most in this table is the relatively high volatility of the “leading indicator” component. This measure is based on attendance and re-enrollment rates for elementary and middle schools, and, for high schools, attendance, re-enrollment and the proportion of ninth graders on track to graduate. The correlation coefficient (0.42) is a little low. Perhaps more importantly, unlike the other components, it varies drastically by school type (though these sub-samples are small). It is comparatively high (roughly 0.65) for high schools and combined elementary/middle schools, considerably lower for elementary schools (0.37), and actually negative for middle schools (-0.27 – i.e., the relationship is basically random).
Determining whether it is attendance or re-enrollment (or both) that is driving this volatility, particularly among middle schools, would require another laborious manual collection of data (my bet is that it’s re-enrollment). In any case, this is something that the PCSB might want to look into, since any indicator that is essentially random between years may not be suitable for inclusion in a rating system (for example, the board might want to calculate this measure differently depending on school type). These are interesting non-test measures that may play a useful role in school rating systems, but they may require some tweaking.
Finally, let's check the relationship not over time, but rather between these components in 2014. The table below presents correlations between each measure and all three of the other measures (since these are single-year estimates, the sample includes all 64 schools that received a 2014 tier rating).
As expected, the strongest correlation (0.66) is that between the achievement and gateway measures (largely because they are both based on absolute test performance). The rest are moderate at best, which is typical of rating systems' components, and indicative of the fact that they are measuring very different things (measurement error, of course, also plays a role). The progress scores, for example, are best seen as (highly imperfect) measures of actual school performance, whereas the other three are more appropriately viewed as measures of student performance, which is partially a function of school effectiveness proper, but mostly of non-school factors such as student background.
And the fact that multiple measures in a system are moderately correlated is not only to be expected, it may actually be viewed as a good thing, assuming that the idea here is for different measures to capture different facets of student and student performance. This, however, depends on avoiding interpretation of the ratings simply as "school performance measures," which they are not. However, once again, the correlations for the leading indicators are particularly low, and this, while hardly a deal-breaker, is something that warrants checking out.
In summary, then, the results for DC charter schools, at least as measured by the PMF system, were quite stable between 2013 and 2014. Although I can understand PCSB’s desire to declare substantial improvement, this is not supported by any of the data presented above. Moreover, if anything, relative stability between consecutive years is a good sign for a rating system. Even when volatility is not driven by error, aggregate school and student performance tend not to change quickly, and any measure that suggests otherwise may be more of a cause for caution than celebrating.
Finally, to reiterate, the performance of schools or charter sectors as a whole, as well as trends in that performance, really should not be gauged with these rating systems. As the data above suggest, each measure captures different things (in many cases, student rather than school performance), and exhibits different properties, both in any given year and over time. The most useful application of these systems is for users to examine each measure separately, armed with clear guidance as to how it should be interpreted. This is the purpose for which PCSB designed the system, and it is evident in the user-friendly manner in which they present the school-by-school data. For future releases, they might consider skipping the PR fishing expedition, and instead renewing their past practice of emphasizing the PMF's informational value to stakeholders.
- Matt Di Carlo
* The measures are calculated as follows for elementary and middle schools (the system is a bit different for high schools, but quite similar): 40 percent based on student progress (math/reading median growth percentiles); 25 percent based on student achievement (math/reading proficiency/advanced rates); 15 percent for gateway measures (math/reading proficient/advanced in grade 3 for elementary schools, and grade 8 for middle schools); and leading indicators (attendance and re-enrollment rates). For each measure, PCSB calculates the percentage of total possible points earned by the school, and that is the school’s score on that measure. The percentages are then combined using the weights specified above, and sorted into tiers 1-3.
** The system weights quite heavily absolute performance levels – how highly students score, which is largely a function of background. Although 40 percent of the ratings are based on median growth percentiles, another 40 percent of schools' ratings are based on absolute test-based performance, including extra weight on third grade proficient/advanced rates for elementary schools (the "gateway" indicators), which exacerbates the problem, since these students have only attended the school for 3-4 years at most. The remaining 20 percent comes from "leading indicators" (e.g., attendance and re-enrollment), I think these are a decent attempt at non-test alternatives, though the degree to which raw attendance is a function of school quality is probably on the low side (i.e., it may basically be a non-test status measure). Re-enrollment rates are an interesting possibility, but could pose a problem to the degree student attrition varies systematically for reasons other than school performance or parent satisfaction.
*** Regarding the first and most significant issue, in calculating the 12,000 enrollment figure, PCSB coded several early childhood campuses as Tier 1, even though these campuses were not actually rated, but are affiliated with networks whose schools received Tier 1 ratings. For example, in 2014 there were over 1,000 students in four KIPP schools that, because they serve only early childhood students, did not receive tiered ratings. Yet these students are included in the total number of students who attend Tier 1 schools. This imputation is unsupportable. If schools did not receive ratings, they should not be included in statistics about enrollment in rated schools. As for the second issue (i.e., using pre-audited 2014-15 enrollment), PCSB is using 2014-15 because they want to provide an estimate of the current charter school population, but this strikes me as indefensible, partially because pre-audited enrollment is subject to error, but mostly because the ratings pertain to 2013-14, and so any estimates of enrollment in schools by tier should use 2013-14 enrollment (this is particularly salient given the fact, as shown below, that there is quite a bit of ratings turnover between years). The third and final issue is that five schools are excluded from the 2014 ratings -- not just from the 12,000 figure, but from receiving ratings at all -- because they closed at the end of that year. Specifically, PCSB decided to close Arts and Technology Academy, Community Academy (Amos 3), and Booker T. Washington at the end of the 2013-14 school year, and to hand over control of Imagine Southeast to a different operator (Democracy Prep). The situation is similar for Maya Angelou (Evans Middle School), which seems to have been closed by its own board. All of these schools received Tier 3 ratings in previous years except Maya Angelou (Tier 2), but none received a rating in 2014 even though they were in full operation. I suppose PCSB excluded them from the 12,000 figure because it is based on 2014-15 (pre-audited) enrollment, and these schools were closed in 2014-15, but, again, this approach (mismatching 2014-15 enrollment with ratings calculated using 2013-14 data) is quite misleading. In addition, it is strange that these schools are simply excluded from the PMF even though they were in operation, as parents might still want to know how their child's school was rated even if that child is enrolled elsewhere.
**** It is difficult to tell whether the enrollment figures provided by the PCSB’s data website include students enrolled in schools’ early childhood education programs. Based on the fact that schools that serve both early childhood and older students are given two separate report cards (one for their early childhood program and one for their regular program), and that, among these schools, both report cards list the same enrollment, my guess is that the enrollment figures include both types of students. This of course means that the totals in the first table include a bunch of students who attend schools that did receive tiered ratings, but attended the early childhood program of that school, which did not technically receive a tier rating (early childhood programs are currently evaluated with a different system). This is coding error, to be sure, but, unlike the issue described in the text, it at least counts students as attending a school with a given tier that actually received a tier rating (rather than just being affiliated with a network that received a given tier rating). In addition, it bears noting that there are three rated schools with two different campuses (either an elementary and middle campus, or a middle and high school campus) that received separate tier ratings, but list the same enrollment for both campuses. This of course suggests that the enrollments are pooled across campuses. For these schools, I simply divided the enrollment in half and assigned each half to a campus. For one school, this makes no difference in the table, since both campuses received the same rating. For the other two, the ratings are different, so there is some slight coding error, but it could not possibly affect the figures in the first table more than a fraction of a percentage point.