Resources on Testing and School Accountability

School accountability systems in the U.S. rely heavily on standardized testing data. The No Child Left Behind (NCLB) law enacted in 2001 required all states to test students in grades 3-8, and one high school grade, in math and reading every year, and to be held accountable for the results. In addition, since (and, in many cases, before) then, numerous states and districts have implemented their own school accountability systems and measures, virtually all of which rely at least mostly on testing outcomes.

Although most of the debate over the role of standardized testing in U.S. education understandably focuses on whether these assessments should be used in accountability systems, there is relatively little informed discussion of how they should be used. Testing data by themselves cannot be valid or invalid; validity is a characteristic of how the data are interpreted.

We have published a great deal of research-based content about the proper interpretation of testing data in school accountability systems. This page provides links to this content, as well as external resources.

Shortcuts

Key distinctions about test scores

School versus student performance
Growth versus change
Test scores versus proficiency rates

School rating systems
Further reading (external papers and reports)

Key distinctions about test scores

The following three distinctions are extremely important for understanding the role of standardized tests in school accountability systems.

School versus student performance

Tests provide information, albeit imperfect and incomplete information, about student knowledge of a given block of content at a given time. One can use these data on student performance to get some approximate idea of schools’ contribution to that performance, but doing so requires sophisticated methods and careful interpretation, and the vast majority of school accountability policies in the U.S., including NCLB, reflect rather serious conflation of school and student performance.

In general, absolute performance, or “status” measures, which indicate how highly students score on tests (e.g., proficiency rates), are appropriate for gauging student performance. Schools’ actual contribution to testing progress, on the other hand, must be gauged using growth – that is, how much progress students make while attending a given school. Both types of measures have a potentially useful role to play in accountability systems, but it is critical to understand the distinction between them, and to make sure that it is reflected in how the data are presented and used in decision making.

Selected posts

Growth versus cohort changes

In our public discourse about education, changes in proficiency rates or average scores, whether using state assessments or the National Assessment of Educational Progress (NAEP), are often called “growth” or “progress.” They are not. In reality, they compare the performance of two different groups (i.e., cohorts) of students.

The rotation of cohorts in and out of the tested sample means that schoolwide rates or scores can remain flat between years even when students exhibit strong growth (or declines). In addition, changes between cohorts in students’ characteristics, which are often unmeasurable using standard educational variables such as subsidized lunch eligibility, can have a substantial impact on the magnitude of changes in scores or rates between years. Measuring “growth” or “progress” properly requires following the same group of students over time.

Selected posts

Test scores versus proficiency rates

In NCLB-style accountability systems, test scores are often sorted into performance categories such as below basic, basic, proficient and advanced. Most commonly, results for a given school or district are summarized in terms of the proportion of students who score above the threshold for “proficient” (i.e,. proficiency rates). It is rather common for these rates to be portrayed as “test scores,” when they are in reality one big step removed from the scores upon which they’re based.

Presenting rates instead of scores can be useful because it provides a standard for which to aim, and also because most people cannot interpret raw test scores. This conversion, however, also introduces entails a great deal of data loss and potential distortion, and it is important to bear that in mind when interpreting the rates, whether in any given year or over time.

Selected posts

School rating systems

Since and before the passage of NCLB, but particularly in recent years, numerous states and districts have implemented their own school rating systems. All of these systems rely predominantly on standardized testing data, but they vary considerably in their design. We’ve published analyses of several of these systems, with an emphasis on how the ratings and their components can be interpreted usefully.

Original analyses of state and district rating systems

We have also published numerous discussions of key concepts and components of these systems and test-based accountability in general.

Selected posts

Resources

Resources on Testing and School Accountability

Shortcuts

Key distinctions about test scores

School versus student performance

Growth versus cohort changes

Test scores versus proficiency rates

School rating systems

Further reading