Skip to:

Resources on Testing and School Accountability

School accountability systems in the U.S. rely heavily on standardized testing data. The No Child Left Behind (NCLB) law enacted in 2001 required all states to test students in grades 3-8, and one high school grade, in math and reading every year, and to be held accountable for the results. In addition, since (and, in many cases, before) then, numerous states and districts have implemented their own school accountability systems and measures, virtually all of which rely at least mostly on testing outcomes.

Although most of the debate over the role of standardized testing in U.S. education understandably focuses on whether these assessments should be used in accountability systems, there is relatively little informed discussion of how they should be used. Testing data by themselves cannot be valid or invalid; validity is a characteristic of how the data are interpreted.

We have published a great deal of research-based content about the proper interpretation of testing data in school accountability systems. This page provides links to this content, as well as external resources.

  1.   School versus student performance
  2.   Growth versus change
  3.   Test scores versus proficiency rates

Key distinctions about test scores

The following three distinctions are extremely important for understanding the role of standardized tests in school accountability systems.

School versus student performance

Tests provide information, albeit imperfect and incomplete information, about student knowledge of a given block of content at a given time. One can use these data on student performance to get some approximate idea of schools’ contribution to that performance, but doing so requires sophisticated methods and careful interpretation, and the vast majority of school accountability policies in the U.S., including NCLB, reflect rather serious conflation of school and student performance.

In general, absolute performance, or “status” measures, which indicate how highly students score on tests (e.g., proficiency rates), are appropriate for gauging student performance. Schools’ actual contribution to testing progress, on the other hand, must be gauged using growth – that is, how much progress students make while attending a given school. Both types of measures have a potentially useful role to play in accountability systems, but it is critical to understand the distinction between them, and to make sure that it is reflected in how the data are presented and used in decision making.

Selected posts

Growth versus cohort changes 

In our public discourse about education, changes in proficiency rates or average scores, whether using state assessments or the National Assessment of Educational Progress (NAEP), are often called “growth” or “progress.” They are not. In reality, they compare the performance of two different groups (i.e., cohorts) of students.

The rotation of cohorts in and out of the tested sample means that schoolwide rates or scores can remain flat between years even when students exhibit strong growth (or declines). In addition, changes between cohorts in students’ characteristics, which are often unmeasurable using standard educational variables such as subsidized lunch eligibility, can have a substantial impact on the magnitude of changes in scores or rates between years. Measuring “growth” or “progress” properly requires following the same group of students over time.

Selected posts

Test scores versus proficiency rates 

In NCLB-style accountability systems, test scores are often sorted into performance categories such as below basic, basic, proficient and advanced. Most commonly, results for a given school or district are summarized in terms of the proportion of students who score above the threshold for “proficient” (i.e,. proficiency rates). It is rather common for these rates to be portrayed as “test scores,” when they are in reality one big step removed from the scores upon which they’re based.

Presenting rates instead of scores can be useful because it provides a standard for which to aim, and also because most people cannot interpret raw test scores. This conversion, however, also introduces entails a great deal of data loss and potential distortion, and it is important to bear that in mind when interpreting the rates, whether in any given year or over time. 

Selected posts

School rating systems

Since and before the passage of NCLB, but particularly in recent years, numerous states and districts have implemented their own school rating systems. All of these systems rely predominantly on standardized testing data, but they vary considerably in their design. We’ve published analyses of several of these systems, with an emphasis on how the ratings and their components can be interpreted usefully.

Original analyses of state and district rating systems

We have also published numerous discussions of key concepts and components of these systems and test-based accountability in general.

Selected posts

Further reading

  • Booher-Jennings, J. 2005. Below the bubble: “Educational triage” and the Texas accountability system. American Educational Research Journal, 42(2), 231–268.
  • Carnoy, M., and Loeb, S. 2002. Does external accountability affect student outcomes? A cross state analysis. Educational Evaluation and Policy Analysis, 24(4), 305–331.
  • Chiang, H. 2002. How accountability pressure on failing schools affects student achievement. Journal of Public Economics 93(9-10), 1045-1057.
  • Chingos, M. M. 2012. Strength in numbers: State spending on K-12 assessment systems. Washington, DC: Brown Center on Education Policy, Brookings Institution.
  • Davidson, E., Reback, R., Rockoff, J. E., and Schwartz, H. L. 2013. Fifty ways to leave a child behind: Idiosyncrasies and discrepancies in states’ implementation of NCLB (NBER Working Paper No. 18988). Cambridge, MA: National Bureau of Economic Research.
  • Dee, T. S., and Jacob, B. 2011. The Impact of No Child Left Behind on Student Achievement. Journal of Policy Analysis and Management 30(3), 418-446.
  • Dee, T. S., Jacob, B., and Schwartz, N. L. 2013. The Effects of NCLB on School Resources and Practices. Educational Evaluation and Policy Analysis 35(2), 252-279.
  • Ehlert, M., Koedel, C., Parsons, E., and Podgursky, M. 2012. Selecting Growth Measures for School and Teacher Evaluations. CALDER Working Paper 80. Washington, D.C.: National Center for Analysis of Longitudinal Data in Education Research.
  • Figlio, D. N. and H. F. Ladd. 2008. School accountability and student achievement. In: Helen F. Ladd and Edward B. Fiske (eds.) Handbook of Research in Education Finance and Policy. Routledge Press.
  • Figlio, D., and Rouse, C. 2006. Do Accountability and Voucher Threats Improve Low Performing Schools? Journal of Public Economics 90, 239-255.
  • Glazerman, S. M., and Potamites, L. 2011. False Performance Gains: A Critique of Successive Cohort Indicators. Washington, D.C.: Mathematica Policy Research.
  • Goldhaber, D., Gabele, B. and Walch, J. 2012. Does the Model Matter? Exploring the Relationship Between Different Achievement-based Teacher Assessments. CEDR Working Paper 2012-6. Seattle, WA: University of Washington.
  • Harris, D. N. 2011. Value-Added Measures in Education: What Every Educator Needs to Know. Cambridge, MA: Harvard Education Press.
  • Hanushek, E. A., and Raymond, M. E. 2005. Does School Accountability Lead to Improved Student Performance? Journal of Policy Analysis and Management 24(2), 297-327.
  • Ho, A. D. 2008. The Problem With “Proficiency”: Limitations of Statistics and Policy Under No Child Left Behind. Educational Researcher 37(6), 351-360.
  • Ho, A. D., and Reardon, S. F. 2012. Estimating achievement gaps from test scores reported in ordinal “proficiency” categories. Journal of Educational and Behavior Statistics 37(4), 489-518.
  • Jacob, B. 2005. Accountability, Incentives and Behavior: The Impact of High-Stakes Testing in the Chicago Public Schools. Journal of Public Economics 89(5-6), 761-796.
  • Jacob, B. A., and Levitt, S. D. 2003. Rotten Apples: An Investigation of the Prevalence And Predictors of Teacher Cheating. The Quarterly Journal of Economics 118(3), 843-877.
  • Kane, T., and Staiger, D. 2002. The Promise and Pitfalls of Using Imprecise School Accountability Measures. Journal of Economic Perspectives 16(4): 91–114
  • Kane, T., and Staiger, D. 2002. Volatility in School Test Scores: Implications for Test-Based Accountability Systems. Brookings Papers on Education Policy 2002, 235-283.
  • Ladd, H. F., and Lauen, D. L. 2010. Status versus growth: The distribution effects of school accountability policies. Journal of Policy Analysis and Management 29(3), 426-450.
  • Ladd, H. F., and Walsh, R. P. 2002. Implementing value-added measures of school effectiveness: Getting the incentives right. Economics of Education Review 21(1), 1-17.
  • Ladd, H. and Zelli, A. 2002. “School-based accountability in North Carolina: The responses of school principals,” Educational Administration Quarterly 38(4), 494-529.
  • Linn, R. L. 2000. Assessments and accountability. Educational Researcher, 29(2), 4–16
  • Linn, R. L. 2007. Validity of inferences from test-based educational accountability systems. Journal of Personnel Evaluation in Education 19, 5–15.
  • Linn, R. L., and Haug, C. 2002. Stability of School Building Accountability Scores and Gains. Educational Evaluation and Policy Analysis 24(10), 29-36.
  • McEachin, A., and Polikoff, M. S. We are the 5%. Which Schools Would Be Held Accountable Under a Proposed Revision of the Elementary and Secondary Education Act? Educational Researcher 41(7), 243-251.
  • Neal, D., and Schanzenbach, D. W. 2010. Left Behind by Design: Proficiency Counts and Test-Based Accountability. The Review of Economics and Statistics, 92(2), 263-283.
  • Polikoff, M. S., McEachin, A. J., Wrabel, S. L., and Duque, M. 2013. The Waive of the Future? School Accountability in the Waiver Era. Educational Researcher 43(1), 45-54.
  • Polikoff, M. S., and Wrabel, S. L. 2013. When is 100% not 100%? The Use of Safe Harbor to Make Adequate Yearly Progress. Education Finance and Policy 8(2), 251-270.
  • Porter, A. C., Linn, R. L., and Trimble, C. S. 2005. The effects of state decisions about NCLB adequate yearly progress targets. Educational Measurement: Issues and Practice, 24(4), 32–39.
  • Riddle, W. 2012. Major Accountability Themes of Second-Round State Applications for NCLB Waivers. Washington, D.C.: Center for Education Policy.
  • Rockoff, J. E., & Turner, L. E. 2008. Short run impacts of accountability on school quality. NBER Working Paper, 14564. Cambridge, MA: National Bureau of Economic Research..
  • Rouse, C. E., Hannaway, J., Goldhaber, D., & Figlio, D. N. 2007. Feeling the Florida heat? How low-performing schools respond to voucher and accountability pressure. American Economic Journal: Economic Policy 5(2), 251-281.
  • Sims, D. P. 2013. Can failure succeed? Using racial subgroup rules to analyze the effect of school accountability failure on student performance. Economics of Education Review 32, 262–274.
  • Winters, M. A., & Cowen, J. M. 2012. Grading New York: Accountability and Student Proficiency in America’s Largest School District. Education Evaluation and Policy Analysis 34(3), 313-327.

Resource Type

Issues Areas: