Skip to:

Testing Data

  • Guessing About NAEP Results

    Written on February 15, 2012

    Every two years, the release of data from the National Assessment of Educational Progress (NAEP) generates a wave of research and commentary trying to explain short- and long-term trends. For instance, there have been a bunch of recent attempts to “explain” an increase in aggregate NAEP scores during the late 1990s and 2000s. Some analyses postulate that the accountability provisions of NCLB were responsible, while more recent arguments have focused on the “effect” (or lack thereof) of newer market-based reforms – for example, looking to NAEP data to “prove” or “disprove” the idea that changes in teacher personnel and other policies have (or have not) generated “gains” in student test scores.

    The basic idea here is that, for every increase or decrease in cross-sectional NAEP scores over a given period of time (both for all students and especially for subgroups such as minority and low-income students), there must be “something” in our education system that explains it. In many (but not all) cases, these discussions consist of little more than speculation. Discernible trends in NAEP test score data are almost certainly due to a combination of factors, and it’s unlikely that one policy or set of policies is dominant enough to be identified as “the one." Now, there’s nothing necessarily wrong with speculation, so long as it is clearly identified as such, and conclusions presented accordingly. But I find it curious that some people involved with these speculative arguments seem a bit too willing to assume that schooling factors – rather than changes in cohorts’ circumstances outside of school – are the primary driver of NAEP trends.

    So, let me try a little bit of illustrative speculation of my own: I might argue that changes in the economic conditions of American schoolchildren and their families are the most compelling explanation for changes in NAEP.

    READ MORE
  • Fundamental Flaws In The IFF Report On D.C. Schools

    Written on February 6, 2012

    A new report, commissioned by the District of Columbia Mayor Vincent Gray and conducted by the Chicago-based consulting organization IFF, was supposed to provide guidance on how the District might act and invest strategically in school improvement, including optimizing the distribution of students across schools, many of which are either over- or under-enrolled.

    Needless to say, this is a monumental task. Not only does it entail the identification of high- and low-performing schools, but plans for improving them as well. Even the most rigorous efforts to achieve these goals, especially in a large city like D.C., would be to some degree speculative and error-prone.

    This is not a rigorous effort. IFF’s final report is polished and attractive, with lovely maps and color-coded tables presenting a lot of summary statistics. But there’s no emperor underneath those clothes. The report's data and analysis are so deeply flawed that its (rather non-specific) recommendations should not be taken seriously.

    READ MORE
  • The Perilous Conflation Of Student And School Performance

    Written on February 2, 2012

    Unlike many of my colleagues and friends, I personally support the use of standardized testing results in education policy, even, with caution and in a limited role, in high-stakes decisions. That said, I also think that the focus on test scores has gone way too far and their use is being implemented unwisely, in many cases to a degree at which I believe the policies will not only fail to generate improvement, but may even risk harm.

    In addition, of course, tests have a very productive low-stakes role to play on the ground – for example, when teachers and administrators use the results for diagnosis and to inform instruction.

    Frankly, I would be a lot more comfortable with the role of testing data – whether in policy, on the ground, or in our public discourse – but for the relentless flow of misinterpretation from both supporters and opponents. In my experience (which I acknowledge may not be representative of reality), by far the most common mistake is the conflation of student and school performance, as measured by testing results.

    Consider the following three stylized arguments, which you can hear in some form almost every week:

    READ MORE
  • A Dark Day For Educational Measurement In The Sunshine State

    Written on January 25, 2012

    Just this week, Florida announced its new district grading system. These systems have been popping up all over the nation, and given the fact that designing one is a requirement of states applying for No Child Left Behind waivers, we are sure to see more.

    I acknowledge that the designers of these schemes have the difficult job of balancing accessibility and accuracy. Moreover, the latter requirement – accuracy – cannot be directly tested, since we cannot know “true” school quality. As a result, to whatever degree it can be partially approximated using test scores, disagreements over what specific measures to include and how to include them are inevitable (see these brief analyses of Ohio and California).

    As I’ve discussed before, there are two general types of test-based measures that typically comprise these systems: absolute performance and growth. Each has its strengths and weaknesses. Florida’s attempt to balance these components is a near total failure, and it shows in the results.

    READ MORE
  • Performance And Chance In New York's Competitive District Grant Program

    Written on January 23, 2012

    New York State recently announced a new $75 million competitive grant program, which is part of its Race to the Top plan. In order to receive some of the money, districts must apply, and their applications receive a score between zero and 115. Almost a third of the points (35) are based on proposals for programs geared toward boosting student achievement, 10 points are based on need, and there are 20 possible points awarded for a description of how the proposal fits into districts’ budgets.

    The remaining 50 points – almost half – of the application is based on “academic performance” over the prior year. Four measures are used to produce the 0-50 point score: One is the year-to-year change (between 2010 and 2011) in the district’s graduation rate, and the other three are changes in the state “performance index” in math, English Language Arts (ELA) and science. The “performance index” in these three subjects is calculated using a simple weighting formula that accounts for the proportion of students scoring at levels 2 (basic), 3 (proficient) and 4 (advanced).

    The idea of using testing results as a criterion in the awarding of grants is to reward those districts that are performing well. Unfortunately, due to the choice of measures and how they are used, the 50 points will be biased and to no small extent based on chance.

    READ MORE
  • Is California's "API Growth" A Good Measure Of School Performance?

    Written on January 4, 2012

    California calls its “Academic Performance Index” (API) the “cornerstone” of its accountability system. The API is calculated as a weighted average of the proportions of students meeting proficiency and other cutoffs on the state exams.

    It is a high-stakes measure. “Growth” in schools’ API scores determines whether they meet federal AYP requirements, and it is also important in the state’s own accountability regime. In addition, toward the middle of last month, the California Charter Schools Association called for the closing of ten charter schools based in part on their (three-year) API “growth” rates.

    Putting aside the question of whether the API is a valid measure of student performance in any given year, using year-to-year changes in API scores in high-stakes decisions is highly problematic. The API is cross-sectional measure – it doesn’t follow students over time – and so one must assume that year-to-year changes in a school’s index do not reflect a shift in demographics or other characteristics of the cohorts of students taking the tests. Moreover, even if the changes in API scores do in fact reflect “real” progress, they do not account for all the factors outside of schools’ control that might affect performance, such as funding and differences in students’ backgrounds (see here and here, or this Mathematica paper, for more on these issues).

    Better data are needed to test these assumptions directly, but we might get some idea of whether changes in schools’ API are good measures of school performance by testing how stable they are over time.

    READ MORE
  • NAEP Shifting

    Written on October 31, 2011

    ** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

    Tomorrow, the education world will get the results of the 2011 National Assessment of Educational Progress (NAEP), often referred to as the “nation’s report card." The findings – reading and math scores among a representative sample of fourth and eighth graders - will drive at least part of the debate for the next two years, when the next round comes out.

    I’m going to make a prediction, one that is definitely a generalization, but is hardly uncommon in policy debates: People on all “sides” will interpret the results favorably no matter how they turn out.

    If NAEP scores are positive – i.e., overall scores rise by a statistically significant margin, and/or there are encouraging increases among key subgroups such as low performers or low-income students – supporters of market-based reform will say that their preferred policies are working. They’ll claim that the era of test-based accountability, which began with the enactment of No Child Left Behind ten years ago, have produced real results. Market reform skeptics, on the other hand, will say that virtually none of the policies, such as test-based teacher evaluations and merit pay, for which reformers are pushing were in force in more than a handful of locations between 2009 and 2011. Therefore, they’ll claim, the NAEP progress shows that the system is working without these changes.

    If the NAEP results are not encouraging – i.e., overall progress is flat (or negative), and there are no strong gains among key subgroups – the market-based crowd will use the occasion to argue that the “status quo” isn’t producing results, and they will strengthen their call for policies like new evaluations and merit pay. Skeptics, in contrast, will claim that NCLB and standardized test-based accountability were failures from the get-go. Some will even use the NAEP results to advocate for the wholesale elimination of standardized testing.

    READ MORE
  • The Education Reporter's Dilemma

    Written on September 26, 2011

    I’ve written so many posts about the misinterpretation of testing data in news stories that I’m starting to annoy myself. For example, I’ve shown that year-to-year changes in testing results might be attributable to the fact that, each year, a different set of students takes the test. I’ve discussed the fact that proficiency rates are not test scores – they only tell you the proportion of students above a given line – and that the rates and actual scores can move in opposite directions (see this simple illustration). And I’ve pleaded with journalists, most of whom I like and respect, to write with care about these issues (and, I should note, many of them do so).

    Yet here I am, back on my soapbox again. This time the culprit is the recent release of SAT testing data, generating dozens of error-plagued stories from newspapers and organizations. Like virtually all public testing data, the SAT results are cross-sectional – each year, the test is taken by a different group of students. This means that demographic changes in the sample of test takers influence the results. This problem is even more acute in the case of the SAT, since it is voluntary. Despite the best efforts of the College Board (see their press release), a slew of stories improperly equated the decline in average SAT scores since the previous year with an overall decline in student performance – a confirmation of educational malaise (in fairness, there were many exceptions).

    I’ve come to think that there’s a fundamental problem here: When you interpret testing data properly, you don’t have much of a story.

    READ MORE
  • How Cross-Sectional Are Cross-Sectional Testing Data?

    Written on September 7, 2011

    In several posts, I’ve complained about how, in our public discourse, we misinterpret changes in proficiency rates (or actual test scores) as “gains” or “progress," when they actually represent cohort changes—that is, they are performance snapshots for different groups of students who are potentially quite dissimilar.

    For example, the most common way testing results are presented in news coverage and press releases is to present year-to-year testing results across entire schools or districts – e.g., the overall proficiency rate across all grades in one year compared with the next. One reason why the two groups of students being compared (the first versus the second year) are different is obvious. In most districts, tests are only administered to students in grades 3-8. As a result, the eighth graders who take the test in Year 1 will not take it in Year 2, as they will have moved on to the ninth grade (unless they are retained). At the same time, a new cohort of third graders will take the test in Year 2 despite not having been tested in Year 1 (because they were in second grade). That’s a large amount of inherent “turnover” between years (this same situation applies when results are averaged for elementary and secondary grades). Variations in cohort performance can generate the illusion of "real" change in performance, positive or negative.

    But there’s another big cause of incomparability between years: Student mobility. Students move in and out of districts every year. In urban areas, mobility is particularly high. And, in many places, this mobility includes students who move to charter schools, which are often run as separate school districts.

    I think we all know intuitively about these issues, but I’m not sure many people realize just how different the group of tested students across an entire district can be in one year compared with the next. In order to give an idea of this magnitude, we might do a rough calculation for the District of Columbia Public Schools (DCPS).

    READ MORE
  • Charter And Regular Public School Performance In "Ohio 8" Districts, 2010-11

    Written on August 29, 2011

    Every year, the state of Ohio releases an enormous amount of district- and school-level performance data. Since Ohio has among the largest charter school populations in the nation, the data provide an opportunity to examine performance differences between charters and regular public schools in the state.

    Ohio’s charters are concentrated largely in the urban “Ohio 8” districts (sometimes called the “Big 8”): Akron; Canton; Cincinnati; Cleveland; Columbus; Dayton; Toledo; and Youngstown. Charter coverage varies considerably between the “Ohio 8” districts, but it is, on average, about 20 percent, compared with roughly five percent across the whole state. I will therefore limit my quick analysis to these districts.

    Let’s start with the measure that gets the most attention in the state: Overall “report card grades." Schools (and districts) can receive one of six possible ratings: Academic emergency; academic watch; continuous improvement; effective; excellent; and excellent with distinction.

    These ratings represent a weighted combination of four measures. Two of them measure performance “growth," while the other two measure “absolute” performance levels. The growth measures are AYP (yes or no), and value-added (whether schools meet, exceed, or come in below the growth expectations set by the state’s value-added model). The first “absolute” performance measure is the state’s “performance index," which is calculated based on the percentage of a school’s students who fall into the four NCLB categories of advanced, proficient, basic and below basic. The second is the number of “state standards” that schools meet as a percentage of the number of standards for which they are “eligible." For example, the state requires 75 percent proficiency in all the grade/subject tests that a given school administers, and schools are “awarded” a “standard met” for each grade/subject in which three-quarters of their students score above the proficiency cutoff (state standards also include targets for attendance and a couple of other non-test outcomes).

    The graph below presents the raw breakdown in report card ratings for charter and regular public schools.

    READ MORE

Pages

Subscribe to Testing Data

DISCLAIMER

This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from shankerblog.org. The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the Shanker Blog may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.