It is a gross understatement to say that the No Child Left Behind (NCLB) law is, was – and will continue to be – a controversial piece of legislation. Although opinion tends toward the negative, there are certain features, such as a focus on student subgroup data, that many people support. And it’s difficult to make generalizations about whether the law’s impact on U.S. public education was “good” or “bad” by some absolute standard.
The one thing I would say about NCLB is that it has helped to institutionalize the improper interpretation of testing data.
Most of the attention to the methodological shortcomings of the law focuses on “adequate yearly progress” (AYP) – the crude requirement that all schools must make “adequate progress” toward the goal of 100 percent proficiency by 2014. And AYP is indeed an inept measure. But the problems are actually much deeper than AYP.
Rather, it’s the underlying methods and assumptions of NCLB (including AYP) that have had a persistent, negative impact on the way we interpret testing data.
I’m not going to get into all of the details here, since I discuss them so frequently (follow the links below), but the most important issues include:
- The exclusive reliance on proficiency and other cutpoint-based rates;
- The presentation of changes in cross-sectional proficiency rates as measures of “progress";
- The severe blurring of the boundary between test scores as an indicator of school performance rather than student performance (e.g., all schools must reach absolute performance benchmarks, no matter who their students are or where they start out);
For example, when test scores are released by states every year, AYP may or may not get a quick mention, but most reporters still focus on proficiency levels of and/or changes in proficiency rates, despite their well-documented limitations, as well as the fact that the actual scores upon which they’re based often suggest different conclusions (#1). When schools’ or districts’ rates increase, even by tiny amounts, that is called “growth” or “progress” (#2), even though it is not. Moreover, schools with high rates are called “high-performing," even though this is not a valid measure of school effectiveness, per se, while schools with increasing rates are assumed to have improved, even though test score changes can occur for a variety of reasons, both school- and non-school-related (#3).
In other words, AYP is generally disregarded, but we still routinely lend credibility to many of the basic methods and assumptions upon which AYP is based.
Don’t get me wrong – the painful simplicity of NCLB’s metrics is not without its advantages; for instance, they are easy to calculate and understand (and that does matter). Also, one might argue that NCLB’s focus on data collection and subgroup disaggregation was a step forward, regardless of the specific measures used. Nevertheless, NCLB’s metrics are deeply, deeply flawed, and there were plenty of warnings about this.
Even though recent efforts, such as the increasing prevalence of growth models, suggest that we are changing course, the "data legacy" of NCLB remains difficult to supplant. For example, states applying for NCLB “flexibility” are still required to set “annual measureable objectives” (AMOs) that aren’t particularly different from AYP – they still embody most of the misguided principles listed above - except that states now have the ability to choose their own targets and subgroups. Even those states that have designed their own rating systems still rely heavily on absolute proficiency.
In part, this persistence may also be due to the fact that states have been structured – e.g., data collection/analysis systems – to NCLB-style measures, and rapid changes are challenging in terms of logistics.
But this isn't solely a logistical issue. For over a decade now, administrators, reporters and other stakeholders have become accustomed to relying on NCLB-style measures to characterize school performance. The assumptions embedded in these measures, and the incentives they represent, seem to be deeply rooted, and it may be some time before we are able to think differently about what testing data are and how we should use them.
- Matt Di Carlo