Skip to:

Accountability

  • The False Choice Of Growth Versus Proficiency

    Written on October 1, 2019

    Tennessee is considering changing its school accountability system such that schools have the choice of having their test-based performance judged by either status (how highly students score) or growth (how much progress students make over the course of the year). In other words, if schools do poorly on one measure, they are judged by the other (apparently, Texas already has a similar system in place).

    As we’ve discussed here many times in the past, status measures, such as proficiency rates, are poor measures of school performance, since some students, particularly those living in poverty, enter their schools far behind their more affluent peers. As a result, schools serving larger proportions of poor students will exhibit lower scores and proficiency rates, even if they are very effective in compelling progress from their students. That is why growth models, which focus on individual student gains on over time, are a superior measure of school performance per se.

    This so-called “growth versus proficiency” debate has resurfaced several times over the years, and it was particularly prevalent during the time when states were submitting proposals for their accountability systems during reauthorization of the Elementary and Secondary Education Act. The policy that came out of these discussions was generally promising, as many states moved at least somewhat toward weighting growth model estimates more heavily. 

    At the same time, however, it is important to mention that the “growth versus proficiency” debate sometimes implies that states must choose between these two types of indicators. This is misleading. And the Tennessee proposal is a very interesting context for discussing this, since they are essentially using these two types of measures interchangeably. The reality, of course, is that both types of measures transmit valuable but different information, and both have a potentially useful role to play in accountability systems.

    READ MORE
  • Tests Worth Teaching To

    Written on September 24, 2019

    Our guest authors today are Chester E. Finn, Jr. and Andrew E. Scanlan. Finn is a distinguished senior fellow and president emeritus at the Thomas B. Fordham Institute and a senior fellow at Stanford University’s Hoover Institution. Scanlan is a research and policy associate at the Thomas B. Fordham Institute.

    This year, some 165,000 American educators are teaching Advanced Placement (AP) classes—a veritable army, mobilized to serve some three million students as they embark on coursework leading to the AP program’s rigorous three-hour exams each May. As we explore in our new book, Learning in the Fast Lane: The Past, Present and Future of Advanced Placement, preparing these young people to succeed on the tests (scored from 1 to 5, with 3 or better deemed “qualifying”) is a major instructional objective for teachers as well as for the students (and their families) who recognize the program’s potential to significantly enhance their post-secondary prospects.

    For AP teachers, one might suppose that this objective would be vexing—yet another end-of-year exam that will constrain their curricular choices, stunt their classroom autonomy, and turn their pupils into cram-and-memorize machines rather than eager, deeper learners, creative thinkers, and inquisitive intellectuals.

    One might also suppose that the AP program, as it has infiltrated 70 percent of U.S. public (and half of private) high schools, would be vulnerable to the anti-testing resentments and revolts of recent years. These have been largely driven by government-imposed school accountability regimes that are mostly based on the scores kids get on state-mandated assessments, especially in math and English. That’s led many schools to press teachers to devote more hours to “test prep,” minimize time spent on other subjects, and neglect topics that aren’t included in state standards (and therefore won’t be tested). It’s not unreasonable, then, to expect resistance to AP as well.

    READ MORE
  • The Offline Implications Of The Research About Online Charter Schools

    Written on February 27, 2019

    It’s rare to find an educational intervention with as unambiguous a research track record as online charter schools. Now, to be clear, it’s not a large body of research by any stretch, its conclusions may change in time, and the online charter sub-sector remains relatively small and concentrated in a few states. For now, though, the results seem incredibly bad (Zimmer et al. 2009Woodworth et al. 2015). In virtually every state where these schools have been studied, across virtually all student subgroups, and in both reading and math, the estimated impact of online charter schools on student testing performance is negative and large in magnitude.

    Predictably, and not without justification, those who oppose charter schools in general are particularly vehement when it comes to online charter schools – they should, according to many of these folks, be closed down, even outlawed. Charter school supporters, on the other hand, tend to acknowledge the negative results (to their credit) but make less drastic suggestions, such as greater oversight, including selective closure, and stricter authorizing practices.

    Regardless of your opinion on what to do about online charter schools’ poor (test-based) results, they are truly an interesting phenomenon for a few reasons.

    READ MORE
  • Why Teacher Evaluation Reform Is Not A Failure

    Written on August 23, 2018

    The RAND Corporation recently released an important report on the impact of the Gates Foundation’s “Intensive Partnerships for Effective Teaching” (IPET) initiative. IPET was a very thorough and well-funded attempt to improve teaching quality in schools in three districts and four charter management organizations (CMOs). The initiative was multi-faceted, but its centerpiece was the implementation of multi-measure teacher evaluation systems and the linking of ratings from those systems to professional development and high stakes personnel decisions, including compensation, tenure, and dismissal. This policy, particularly the inclusion in teacher evaluations of test-based productivity measures (e.g., value-added scores), has been among the most controversial issues in education policy throughout the past 10 years.

    The report is extremely rich and there's a lot of interesting findings in there, so I would encourage everyone to read it themselves (at least the executive summary), but the headline finding was that the IPET had no discernible effect on student outcomes, namely test scores and graduation rates, in the districts that participated, vis-à-vis similar districts that did not. Given that IPET was so thoroughly designed and implemented, and that it was well-funded, it can potentially be viewed as a "best case scenario" test of the type of evaluation reform that most states have enacted. Accordingly, critics of these reforms, who typically focus their opposition on the high stakes use of evaluation measures, particularly value-added and other test-based measures, in these evaluations, have portrayed the findings as vindication of their opposition. 

    This reaction has merit. The most important reason why is that evaluation reform was portrayed by advocates as a means to immediate and drastic improvements in student outcomes. This promise was misguided from the outset, and evaluation reform opponents are (and were) correct in pointing this out. At the same time, however, it would be wise not to dismiss evaluation reform as a whole, for several reasons, a few of which are discussed below.

    READ MORE
  • The Theory And Practice Of School Closures

    Written on September 13, 2017

    The idea of closing “low performing schools” has undeniable appeal, at least in theory. The basic notion is that some schools are so dysfunctional that they cannot be saved and may be doing irreparable harm to their students every day they are open. Thus, it is argued, closing such schools and sending their students elsewhere is the best option – even if students end up in “average” schools, proponents argue, they will be better off.

    Such closures are very controversial, however, and for good reason. For one thing, given adequate time and resources, schools may improve – i.e., there are less drastic interventions that might be equally (or more) effective as a way to help students. Moreover, closing a school represents a disruption in students’ lives (and often, by the way, to the larger community). In this sense, any closure must offer cumulative positive effects sufficient to offset an initial negative effect. Much depends on how and why schools are identified for closure, and the quality of the schools that displaced students attend. In practice, then, closure is a fairly risky policy, both educationally and (perhaps especially) politically. This disconnect between the appeal of theoretical school closures and the actual risks, in practice, may help explain why U.S. educational policy has been designed such that many schools operate at some risk of closure, but relatively few ever end up shutting their doors.

    Despite the always contentious debates about the risks and merits of closing “low performing schools,” there has not been a tremendous amount of strong evidence about effects (in part because such closures have been somewhat rare). A new report by the Center for Research on Education Outcomes (CREDO) helps fill the gap, using a very large dataset to examine the test-based impact of school closures (among other things). The results speak directly to the closure debate, in both specific and general terms, but interpreting them is complicated by the fact that this analysis evaluates what is at best a policy done poorly.

    READ MORE
  • Where Do Achievement Gaps Come From?

    Written on August 10, 2017

    For almost two decades now, educational accountability policy in the U.S. has included a focus on the performance of student subgroups, such as those defined by race and ethnicity, income, or special education status. The (very sensible) logic behind this focus is the simple fact that aggregate performance measures, whether at the state-, district-, or school levels, often mask large gaps between subgroups.

    Yet one of the unintended consequences of this subgroup focus has been confusion among both policymakers and the public as to how to interpret and use subgroup indicators in formal school accountability systems, particularly when those indicators are expressed as simple “achievement gaps” or “gap closing” measures. This is not only because achievement gaps can narrow for undesirable reasons and widen for desirable reasons, but also because many gaps exist prior to entry into the school (or district). If, for instance, a large Hispanic/White achievement gap for a given cohort exists at the start of kindergarten, it is misleading and potentially damaging to hold a school accountable for the persistence of that gap in later grades – particularly in cases where public policy has failed to provide the extra resources and supports that might help lower-performing students make accelerated achievement gains every year. In addition, the coarseness of current educational variables, particularly those usually used as income proxies, limits the detail and utility of some subgroup measures.

    A helpful and timely little analysis by David Figlio and Krzystof Karbownik, published by the Brookings Institution, addresses some of these issues, and the findings have clear policy implications.

    READ MORE
  • For Florida's School Grading System, A Smart Change With Unexpected Effects

    Written on July 13, 2017

    Last year, we discussed a small but potentially meaningful change that Florida made to its school grading system, one that might have attenuated a long-standing tendency of its student “gains” measures, by design, to favor schools that serve more advantaged students. Unfortunately, this result doesn’t seem to have been achieved.

    Prior to 2014-15, one of the criteria by which Florida students could be counted as having “made gains” was scoring as proficient or better in two consecutive years, without having dropped a level (e.g., from advanced to proficient). Put simply, this meant that students scoring above the proficiency threshold would be counted as making “gains,” even if they in fact made only average or even below average progress, so long as they stayed above the line. As a result of this somewhat crude “growth” measure, schools serving large proportions of students scoring above the proficiency line (i.e., schools in affluent neighborhoods) were virtually guaranteed to receive strong “gains” scores. Such “double counting” in the “gains” measures likely contributed to a very strong relationship between schools’ grades and their students’ socio-economic status (as gauged, albeit roughly, by subsidized lunch eligibility rates).

    Florida, to its credit, changed this “double counting” rule effective in 2014-15. Students who score as proficient in two consecutive years are no longer automatically counted as making “gains.” They must also exhibit some score growth in order to receive the designation.

    READ MORE
  • Improving Accountability Measurement Under ESSA

    Written on May 25, 2017

    Despite the recent repeal of federal guidelines for states’ compliance with the Every Student Succeeds Act (ESSA), states are steadily submitting their proposals, and they are rightfully receiving some attention. The policies in these proposals will have far-reaching consequences for the future of school accountability (among many other types of policies), as well as, of course, for educators and students in U.S. public schools.

    There are plenty of positive signs in these proposals, which are indicative of progress in the role of proper measurement in school accountability policy. It is important to recognize this progress, but impossible not to see that ESSA perpetuates long-standing measurement problems that were institutionalized under No Child Left Behind (NCLB). These issues, particularly the ongoing failure to distinguish between student and school performance, continue to dominate accountability policy to this day. Part of the confusion stems from the fact that school and student performance are not independent of each other. For example, a test score, by itself, gauges student performance, but it also reflects, at least in part, school effectiveness (i.e., the score might have been higher or lower had the student attended a different school).

    Both student and school performance measures have an important role to play in accountability, but distinguishing between them is crucial. States’ ESSA proposals make the distinction in some respects but not in others. The result may end up being accountability systems that, while better than those under NCLB, are still severely hampered by improper inference and misaligned incentives. Let’s take a look at some of the key areas where we find these issues manifested.

    READ MORE
  • Subgroup-Specific Accountability, Teacher Job Assignments, And Teacher Attrition: Lessons For States

    Written on April 5, 2017

    Our guest author today is Matthew Shirrell, assistant professor of educational leadership and administration in the Graduate School of Education and Human Development at the George Washington University.

    Racial/ethnic gaps in student achievement persist, despite a wide variety of interventions designed to address them (see Reardon, Robinson-Cimpian, & Weathers, 2015). The No Child Left Behind Act of 2001 (NCLB) took a novel approach to closing these achievement gaps, requiring that schools make yearly improvements not only in overall student achievement, but also in the achievement of students of various subgroups, including racial/ethnic minority subgroups and students from economically disadvantaged families.

    Evidence is mixed on whether NCLB’s “subgroup-specific accountability” accomplished its goal of narrowing racial/ethnic and other achievement gaps. Research on the impacts of the policy, however, has largely neglected the effects of this policy on teachers. Understanding any effects on teachers is important to gaining a more complete picture of the policy’s overall impact; if the policy increased student achievement but resulted in the turnover or attrition of large numbers of teachers, for example, these benefits and costs should be weighed together when assessing the policy’s overall effects.

    In a study just published online in Education Finance and Policy (and supported by funding from the Albert Shanker Institute), I explore the effects of NCLB’s subgroup-specific accountability on teachers. Specifically, I examine whether teaching in a school that was held accountable for a particular subgroup’s performance in the first year of NCLB affected teachers’ job assignments, turnover, and attrition.

    READ MORE
  • Do Subgroup Accountability Measures Affect School Ratings Systems?

    Written on October 28, 2016

    The school accountability provisions of No Child Left Behind (NCLB) institutionalized a focus on the (test-based) performance of student subgroups, such as English language learners, racial and ethnic groups, and students eligible for free- and reduced-price lunch (FRL). The idea was to shine a spotlight on achievement gaps in the U.S., and to hold schools accountable for serving all students.

    This was a laudable goal, and disaggregating data by student subgroups is a wise policy, as there is much to learn from such comparisons. Unfortunately, however, NCLB also institutionalized the poor measurement of school performance, and so-called subgroup accountability was not immune. The problem, which we’ve discussed here many times, is that test-based accountability systems in the U.S. tend to interpret how highly students score as a measure of school performance, when it is largely a function of factors out of schools' control, such as student background. In other words, schools (or subgroups of those students) may exhibit higher average scores or proficiency rates simply because their students entered the schools at higher levels, regardless of how effective the school may be in raising scores. Although NCLB’s successor, the Every Student Succeeds Act (ESSA), perpetuates many of these misinterpretations, it still represents some limited progress, as it encourages greater reliance on growth-based measures, which look at how quickly students progress while they attend a school, rather than how highly they score in any given year (see here for more on this).

    Yet this evolution, slow though it may be, presents a somewhat unique challenge for the inclusion of subgroup-based measures in formal school accountability systems. That is, if we stipulate that growth model estimates are the best available test-based way to measure school (rather than student) performance, how should accountability systems apply these models to traditionally lower scoring student subgroups?

    READ MORE

Pages

Subscribe to Accountability

DISCLAIMER

This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from shankerblog.org. The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the Shanker Blog may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.