Guessing About NAEP Results

Every two years, the release of data from the National Assessment of Educational Progress (NAEP) generates a wave of research and commentary trying to explain short- and long-term trends. For instance, there have been a bunch of recent attempts to “explain” an increase in aggregate NAEP scores during the late 1990s and 2000s. Some analyses postulate that the accountability provisions of NCLB were responsible, while more recent arguments have focused on the “effect” (or lack thereof) of newer market-based reforms – for example, looking to NAEP data to “prove” or “disprove” the idea that changes in teacher personnel and other policies have (or have not) generated “gains” in student test scores.

The basic idea here is that, for every increase or decrease in cross-sectional NAEP scores over a given period of time (both for all students and especially for subgroups such as minority and low-income students), there must be “something” in our education system that explains it. In many (but not all) cases, these discussions consist of little more than speculation. Discernible trends in NAEP test score data are almost certainly due to a combination of factors, and it’s unlikely that one policy or set of policies is dominant enough to be identified as “the one." Now, there’s nothing necessarily wrong with speculation, so long as it is clearly identified as such, and conclusions presented accordingly. But I find it curious that some people involved with these speculative arguments seem a bit too willing to assume that schooling factors – rather than changes in cohorts’ circumstances outside of school – are the primary driver of NAEP trends.

So, let me try a little bit of illustrative speculation of my own: I might argue that changes in the economic conditions of American schoolchildren and their families are the most compelling explanation for changes in NAEP.

Here’s my story: Early childhood (0-5) is the most important time for children as far as cognitive and non-cognitive development. When the economic circumstances - and all they entail - among families with young children increase relative to previous cohorts, the kids will perform better over the long term. The mid- to late-1990s was a time of remarkable economic growth, and the benefits were shared by most workers – higher- and lower-earners, whites and minorities. On average, children who were born or very young during this era experienced more economic stability and resources than their “predecessors." We might therefore expect them to perform better on tests like NAEP.

To provide a ballpark identification of these children, let’s look at babies born around 1995, when U.S. economic growth started to pop. The two main NAEP assessments – reading and math – are administered to fourth and eighth graders every two years. Children born around 1995, who would have experienced improved economic conditions, on average, during early childhood, would have taken their first NAEP test at roughly age eight or nine (fourth grade), which means we should look at the NAEP results for fourth graders taking the test around 2003.

The trend in fourth grade math and reading is presented in the two simple graphs below, which are taken directly from the NAEP reports on reading and math (note that the two y-axes are a bit different).

The "evidence" is fairly clear – the cohort of fourth graders tested in 2002 (reading) and 2003 (math) exhibited substantial increases relative to those tested in prior years. In reading, the increase was around six points between 2000 and 2002, which is equivalent to roughly half a “year of learning." The increase in math was no less impressive – a jump of nine points between 2000 and 2003. In both subjects, the 2002-2003 gains were followed by significant, but much smaller, positive changes throughout the 2000’s, leveling out after 2007.

These raw data would seem to suggest that the NAEP increases since the start of this century – the subject of endless debate and advocacy, virtually all of which has been focused on NCLB and other education policies –  may, to a substantial degree, be due to the economic circumstances of families, rather than the quality of schooling. This supposition is supported by the literature demonstrating the influence of non-school factors (e.g., family background, mostly unobserved) on student performance as measured by tests.

Now, let me reiterate my point: This “story," and the descriptive evidence I present to support it, is no less a matter of speculation than anyone else’s. Education and learning are complex and subject to a interconnected web of causal influences, school- and non-school alike. Without question, good education policy can generate test score increases, which hopefully reflect “real” improvement, and it’s very plausible that federal, state, and local education policies are factors in the increases shown above.

That said, when you simply eyeball the data and then make causal arguments, you’re really just guessing (as I did above). That's fine, but if you’re guessing without considering the fact that non-school factors are often the primary drivers of aggregate testing outcomes, it’s probably not a very good guess. This is especially true in the case of NAEP and other aggregate, cross-sectional data - non-school inputs can alter the characteristics of students taking the tests, which can influence results, both nationally and for individual states/districts. The knee-jerk, “something has to explain it” tendency is to blame or replicate the policies that we are most focused on today, while ignoring the web of complex factors and practices, past and present, that may actually be much more salient to student success (see here and here for more thorough, but still limited, analyses using NAEP, and this article summarizing recent NAEP research and its issues).

Trying to make sense of an inherently messy situation, such as the causes and implications of testing trends, is a worthwhile endeavor – we have to try to understand these things. But ignoring the diversity of factors that influence learning makes it much, much messier. The best course is to rely on evidence that controls for time-varying school and non-school factors, preferably using longitudinal data, before drawing anything beyond tentative conclusions.

- Matt Di Carlo