At the end of February, the District of Columbia Council’s Education Committee held its annual hearing on the performance of the District’s Public Schools (DCPS). The hearing (full video is available here) lasted over four hours, and included discussion on a variety of topics, but there was, inevitably, a block of time devoted to the discussion of DCPS testing results (and these questions were the focus of the news coverage).
These exchanges between Council members and DCPS Chancellor Kaya Henderson focused particularly on the low-stakes Trial Urban District Assessment (TUDA).* Though it was all very constructive and not even remotely hostile, it’s fair to say that Ms. Henderson was grilled quite a bit (as is often the case at these kinds of hearings). Unfortunately, the arguments from both sides of the dais were fraught with the typical misinterpretations of TUDA, and I could not get past how tragic it is to see legislators question the superintendent of a large urban school district based on a misinterpretation of what the data mean - and to hear that superintendent respond based on the same flawed premises.
But what I really kept thinking -- as I have before in similar contexts -- was how effective Chancellor Henderson could have been in answering the Council’s questions had she chosen to interpret the data properly (and I still hold out hope that this will become the norm some day). So, let’s take a quick look at a few major arguments that were raised during the hearing, and how they might have been answered.
The improvements in TUDA/NAEP scores are due to demographic changes in the district. Council Member David Catania made some (well-informed) arguments regarding the composition of the TUDA sample and how it might have been driving the 2009-2011 changes.
There is quite a bit of irony here: TUDA changes are being discussed based on the false premise that they represent "growth," but the substance of the conversation is about demographic changes in the sample, which is the most salient manifestation of the fact that they're not growth measures, but rather comparisons between two different groups of students.
So, instead of trying to fit a cross-sectional peg into a longitudinal hole, or asserting that painfully simplistic subgroup breakdowns can "prove" that the "growth" was "real" (they cannot), Ms. Henderson should simply have established the correct premises and made her point.
In other words, she should have said that student populations can shift quite bit, even over short periods of time, and raw NAEP/TUDA changes, which do not follow students over time, reflect these changes. That is not an opinion but a fact, one that is, by the way, particularly salient in DC, given its high residential mobility, as well as the movement of students in and out of the District’s large charter school sector.**
Thus, while it is possible that any increases reflect, at least to some degree, “real” progress, it is very difficult to tell. It would have been perfectly acceptable (and, I think, rather charming) for the Chancellor to state that she likes to speculate they do, and that they at least give some indication of "moving in the right direction." Put differently, it's not good evidence, but it's not discouraging either.
Then, she might have added that the cross-sectional nature of the changes are precisely why both DCPS and DC charter schools have begun calculating actual growth measures for its annual school report cards, as these do employ longitudinal data and attempt to control (however imperfectly) for student characteristics that might influence testing performance.
And finally, Ms. Henderson could have closed this out beautifully by asking a simple question: If it is indeed the case that the DCPS student body is becoming more ethnically diverse and economically prosperous, isn’t that a good thing?
Achievement gaps in the District are large and, in some cases, growing. Mr. Catania also asked about achievement gaps in the District, and noted that the gaps since 2007 had not narrowed, and had in some cases widened.
This was actually a softball. But for past public statements on this issue, Ms. Henderson could easily have responded. First, she might quickly have reiterated the issue regarding changes in the samples of students taking the test (and achievement gap changes are even more subject to this imprecision than overall changes).
That said, here's the story: Between 2011 and 2013, achievement gaps based on subsidized lunch eligibility (a rough proxy for income/poverty) are growing because the reading and math scores of eligible and ineligible student cohorts both increased, but the jump was larger for the latter than the former.
For example, between 2011 and 2013, the math scores of eighth graders who were low-income (i.e., eligible) increased five points, but they increased a full 15 points for students who were not low-income (not eligible). This widened the achievement gap between these subgroups, even though the underlying dynamics -- increases among both groups -- can hardly be considered a "bad thing."
In other words, changes in raw achievement gaps, particularly over the short-term, are not only poor measures of district and school performance, but they often mask important underlying realities - e.g., widening gaps frequently reflect underlying changes that are positive (e.g., both subgroups increase).
(Conversely, it’s possible for gaps to shrink for reasons that are undesirable, for instance, if they mask improvement or flat scores among lower-scoring subgroups, and a decline in higher-performing subgroups.)
Basically, Ms. Henderson could have responded to this question by saying that it’s critical to compel strong growth from traditionally lower-scoring subgroups exhibit stronger increases, but that this need not come at the expense of higher-performing subgroups, which is often what raw achievement gap changes reflect (to the degree they reflect growth at all). Moreover, again, these are not growth measures, and the District's shifting student population is a factor in these changes.
Results are not increasing quickly enough to meet the District’s goals within any reasonable amount of time. Answering this question would have taken some courage. Putting aside, once again, the fact that neither NAEP/TUDA, nor publicly available results for D.C.’s state test, are actually measures of growth, real educational improvement is slow and sustained, and it occurs over decades, not years (and much of it is intergenerational). However, even small improvements, dissatisfying as they may be to the casual observer, can make a huge difference for thousands of students over time. In addition, the quality of schooling inputs is very important, and contribute to these improvements.
Of course, Ms. Henderson (and the vast majority of urban superintendents across the nation) cannot say this, as our education policy debate has consistently failed to include realistic conversations about expectations – a fact, for example, that is very clearly reflected in the mandates of NCLB. Superintendents must continue to promise results that are impossible to deliver, lest they be accused of defeatism and low expectations. This is unfortunate: It is difficult to believe that there is no achievable middle ground between complacency and fantasy.
So, in short, even if Ms. Henderson herself were not among the more aggressive proponents of this viewpoint, it would have been politically risky for her to tell the Council that patience is required, and that improvement will take longer than most people seem willing to acknowledge. But it would have been an important moment in education reform, one, for whatever it’s worth, that would have earned my respect and admiration.
Overall, the part of this hearing that focuses on DCPS testing results did not go particularly well for Ms. Henderson, largely because the questions posed to her were based on false premises – and she was either unwilling or unable to correct them. The hearing was yet another good example of how inappropriate interpretation of testing data pollutes our education discourse, even at the highest levels, and what a huge difference it would make if well-informed leaders who understand testing data could help reshape that discussion.
- Matt Di Carlo
* It is interesting to note that D.C.'s state exam, the DC-CAS, was not really discussed.
** Perhaps even more importantly, the fact that average scores went up or down tells you almost nothing about the effectiveness of the district’s schools, since it’s entirely plausible, for example, that fourth graders made huge progress between kindergarten and fourth grade, but simply “ended up” at the same point as their predecessors.