Resources on Testing and School Accountability
School accountability systems in the U.S. rely heavily on standardized testing data. The No Child Left Behind (NCLB) law enacted in 2001 required all states to test students in grades 3-8, and one high school grade, in math and reading every year, and to be held accountable for the results. In addition, since (and, in many cases, before) then, numerous states and districts have implemented their own school accountability systems and measures, virtually all of which rely at least mostly on testing outcomes.
Although most of the debate over the role of standardized testing in U.S. education understandably focuses on whether these assessments should be used in accountability systems, there is relatively little informed discussion of how they should be used. Testing data by themselves cannot be valid or invalid; validity is a characteristic of how the data are interpreted.
We have published a great deal of research-based content about the proper interpretation of testing data in school accountability systems. This page provides links to this content, as well as external resources.
- School rating systems
- Further reading (external papers and reports)
Key distinctions about test scores
The following three distinctions are extremely important for understanding the role of standardized tests in school accountability systems.
School versus student performance
Tests provide information, albeit imperfect and incomplete information, about student knowledge of a given block of content at a given time. One can use these data on student performance to get some approximate idea of schools’ contribution to that performance, but doing so requires sophisticated methods and careful interpretation, and the vast majority of school accountability policies in the U.S., including NCLB, reflect rather serious conflation of school and student performance.
In general, absolute performance, or “status” measures, which indicate how highly students score on tests (e.g., proficiency rates), are appropriate for gauging student performance. Schools’ actual contribution to testing progress, on the other hand, must be gauged using growth – that is, how much progress students make while attending a given school. Both types of measures have a potentially useful role to play in accountability systems, but it is critical to understand the distinction between them, and to make sure that it is reflected in how the data are presented and used in decision making.
- The Perilous Conflation Of Student And School Performance
- There’s No One Correct Way To Rate Schools
- The Great Proficiency Debate
- The Case Against Assigning Single Ratings To Schools
- Which State Has The Best Schools?
- ESEA Waivers And The Perpetuation Of Poor Educational Measurement
Growth versus cohort changes
In our public discourse about education, changes in proficiency rates or average scores, whether using state assessments or the National Assessment of Educational Progress (NAEP), are often called “growth” or “progress.” They are not. In reality, they compare the performance of two different groups (i.e., cohorts) of students.
The rotation of cohorts in and out of the tested sample means that schoolwide rates or scores can remain flat between years even when students exhibit strong growth (or declines). In addition, changes between cohorts in students’ characteristics, which are often unmeasurable using standard educational variables such as subsidized lunch eligibility, can have a substantial impact on the magnitude of changes in scores or rates between years. Measuring “growth” or “progress” properly requires following the same group of students over time.
- When Growth Isn’t Really Growth
- If Your Evidence Is Changes In Proficiency Rates, You Probably Don’t Have Much Evidence
- Actual Growth Measures Make A Big Difference When Measuring Growth
- How Cross-Sectional Are Cross-Sectional Testing Data?
- The Ever-Changing NAEP Sample
Test scores versus proficiency rates
In NCLB-style accountability systems, test scores are often sorted into performance categories such as below basic, basic, proficient and advanced. Most commonly, results for a given school or district are summarized in terms of the proportion of students who score above the threshold for “proficient” (i.e,. proficiency rates). It is rather common for these rates to be portrayed as “test scores,” when they are in reality one big step removed from the scores upon which they’re based.
Presenting rates instead of scores can be useful because it provides a standard for which to aim, and also because most people cannot interpret raw test scores. This conversion, however, also introduces entails a great deal of data loss and potential distortion, and it is important to bear that in mind when interpreting the rates, whether in any given year or over time.
- It’s Test Score Season, But Some States Don’t Release Test Scores
- How Often Do Proficiency Rates And Average Scores Move In Different Directions?
- A New Idea For Test-Based Accountability In D.C.: Actual Test Scores
- Proficiency Rates And Achievement Gaps
School rating systems
Since and before the passage of NCLB, but particularly in recent years, numerous states and districts have implemented their own school rating systems. All of these systems rely predominantly on standardized testing data, but they vary considerably in their design. We’ve published analyses of several of these systems, with an emphasis on how the ratings and their components can be interpreted usefully.
Original analyses of state and district rating systems
- California (API)
- District of Columbia charter schools
- District of Columbia Public Schools
- Louisiana (SPS)
- New York City
We have also published numerous discussions of key concepts and components of these systems and test-based accountability in general.
- The Persistent Misidentification Of Low Performing Schools
- The Debate And Evidence On The Impact Of NCLB
- Rethinking The Use Of Simple Achievement Gap Measures In School Accountability Systems
- Interpreting Achievement Gaps In New Jersey And Beyond
- Performance Measurement In Healthcare And Education
- Sample Size And Volatility In School Accountability Systems
- Under The Hood Of School Rating Systems
- Why Did Florida Schools’ Grades Improve Dramatically Between 1999 and 2005?
- Redesigning Florida's School Report Cards
- A Few Quick Fixes For School Accountability Systems (by Morgan Polikoff and Andrew McEachin)
- Does It Matter How We Measure Schools’ Test-Based Performance?
- Five Recommendations For Reporting On (Or Just Interpreting) State Test Scores
- Booher-Jennings, J. 2005. Below the bubble: “Educational triage” and the Texas accountability system. American Educational Research Journal, 42(2), 231–268.
- Carnoy, M., and Loeb, S. 2002. Does external accountability affect student outcomes? A cross state analysis. Educational Evaluation and Policy Analysis, 24(4), 305–331.
- Chiang, H. 2002. How accountability pressure on failing schools affects student achievement. Journal of Public Economics 93(9-10), 1045-1057.
- Chingos, M. M. 2012. Strength in numbers: State spending on K-12 assessment systems. Washington, DC: Brown Center on Education Policy, Brookings Institution.
- Davidson, E., Reback, R., Rockoff, J. E., and Schwartz, H. L. 2013. Fifty ways to leave a child behind: Idiosyncrasies and discrepancies in states’ implementation of NCLB (NBER Working Paper No. 18988). Cambridge, MA: National Bureau of Economic Research.
- Dee, T. S., and Jacob, B. 2011. The Impact of No Child Left Behind on Student Achievement. Journal of Policy Analysis and Management 30(3), 418-446.
- Dee, T. S., Jacob, B., and Schwartz, N. L. 2013. The Effects of NCLB on School Resources and Practices. Educational Evaluation and Policy Analysis 35(2), 252-279.
- Ehlert, M., Koedel, C., Parsons, E., and Podgursky, M. 2012. Selecting Growth Measures for School and Teacher Evaluations. CALDER Working Paper 80. Washington, D.C.: National Center for Analysis of Longitudinal Data in Education Research.
- Figlio, D. N. and H. F. Ladd. 2008. School accountability and student achievement. In: Helen F. Ladd and Edward B. Fiske (eds.) Handbook of Research in Education Finance and Policy. Routledge Press.
- Figlio, D., and Rouse, C. 2006. Do Accountability and Voucher Threats Improve Low Performing Schools? Journal of Public Economics 90, 239-255.
- Glazerman, S. M., and Potamites, L. 2011. False Performance Gains: A Critique of Successive Cohort Indicators. Washington, D.C.: Mathematica Policy Research.
- Goldhaber, D., Gabele, B. and Walch, J. 2012. Does the Model Matter? Exploring the Relationship Between Different Achievement-based Teacher Assessments. CEDR Working Paper 2012-6. Seattle, WA: University of Washington.
- Harris, D. N. 2011. Value-Added Measures in Education: What Every Educator Needs to Know. Cambridge, MA: Harvard Education Press.
- Hanushek, E. A., and Raymond, M. E. 2005. Does School Accountability Lead to Improved Student Performance? Journal of Policy Analysis and Management 24(2), 297-327.
- Ho, A. D. 2008. The Problem With “Proficiency”: Limitations of Statistics and Policy Under No Child Left Behind. Educational Researcher 37(6), 351-360.
- Ho, A. D., and Reardon, S. F. 2012. Estimating achievement gaps from test scores reported in ordinal “proficiency” categories. Journal of Educational and Behavior Statistics 37(4), 489-518.
- Jacob, B. 2005. Accountability, Incentives and Behavior: The Impact of High-Stakes Testing in the Chicago Public Schools. Journal of Public Economics 89(5-6), 761-796.
- Jacob, B. A., and Levitt, S. D. 2003. Rotten Apples: An Investigation of the Prevalence And Predictors of Teacher Cheating. The Quarterly Journal of Economics 118(3), 843-877.
- Kane, T., and Staiger, D. 2002. The Promise and Pitfalls of Using Imprecise School Accountability Measures. Journal of Economic Perspectives 16(4): 91–114
- Kane, T., and Staiger, D. 2002. Volatility in School Test Scores: Implications for Test-Based Accountability Systems. Brookings Papers on Education Policy 2002, 235-283.
- Ladd, H. F., and Lauen, D. L. 2010. Status versus growth: The distribution effects of school accountability policies. Journal of Policy Analysis and Management 29(3), 426-450.
- Ladd, H. F., and Walsh, R. P. 2002. Implementing value-added measures of school effectiveness: Getting the incentives right. Economics of Education Review 21(1), 1-17.
- Ladd, H. and Zelli, A. 2002. “School-based accountability in North Carolina: The responses of school principals,” Educational Administration Quarterly 38(4), 494-529.
- Linn, R. L. 2000. Assessments and accountability. Educational Researcher, 29(2), 4–16
- Linn, R. L. 2007. Validity of inferences from test-based educational accountability systems. Journal of Personnel Evaluation in Education 19, 5–15.
- Linn, R. L., and Haug, C. 2002. Stability of School Building Accountability Scores and Gains. Educational Evaluation and Policy Analysis 24(10), 29-36.
- McEachin, A., and Polikoff, M. S. We are the 5%. Which Schools Would Be Held Accountable Under a Proposed Revision of the Elementary and Secondary Education Act? Educational Researcher 41(7), 243-251.
- Neal, D., and Schanzenbach, D. W. 2010. Left Behind by Design: Proficiency Counts and Test-Based Accountability. The Review of Economics and Statistics, 92(2), 263-283.
- Polikoff, M. S., McEachin, A. J., Wrabel, S. L., and Duque, M. 2013. The Waive of the Future? School Accountability in the Waiver Era. Educational Researcher 43(1), 45-54.
- Polikoff, M. S., and Wrabel, S. L. 2013. When is 100% not 100%? The Use of Safe Harbor to Make Adequate Yearly Progress. Education Finance and Policy 8(2), 251-270.
- Porter, A. C., Linn, R. L., and Trimble, C. S. 2005. The effects of state decisions about NCLB adequate yearly progress targets. Educational Measurement: Issues and Practice, 24(4), 32–39.
- Riddle, W. 2012. Major Accountability Themes of Second-Round State Applications for NCLB Waivers. Washington, D.C.: Center for Education Policy.
- Rockoff, J. E., & Turner, L. E. 2008. Short run impacts of accountability on school quality. NBER Working Paper, 14564. Cambridge, MA: National Bureau of Economic Research..
- Rouse, C. E., Hannaway, J., Goldhaber, D., & Figlio, D. N. 2007. Feeling the Florida heat? How low-performing schools respond to voucher and accountability pressure. American Economic Journal: Economic Policy 5(2), 251-281.
- Sims, D. P. 2013. Can failure succeed? Using racial subgroup rules to analyze the effect of school accountability failure on student performance. Economics of Education Review 32, 262–274.
- Winters, M. A., & Cowen, J. M. 2012. Grading New York: Accountability and Student Proficiency in America’s Largest School District. Education Evaluation and Policy Analysis 34(3), 313-327.