Skip to:

Accountability

  • Improving Accountability Measurement Under ESSA

    Written on May 25, 2017

    Despite the recent repeal of federal guidelines for states’ compliance with the Every Student Succeeds Act (ESSA), states are steadily submitting their proposals, and they are rightfully receiving some attention. The policies in these proposals will have far-reaching consequences for the future of school accountability (among many other types of policies), as well as, of course, for educators and students in U.S. public schools.

    There are plenty of positive signs in these proposals, which are indicative of progress in the role of proper measurement in school accountability policy. It is important to recognize this progress, but impossible not to see that ESSA perpetuates long-standing measurement problems that were institutionalized under No Child Left Behind (NCLB). These issues, particularly the ongoing failure to distinguish between student and school performance, continue to dominate accountability policy to this day. Part of the confusion stems from the fact that school and student performance are not independent of each other. For example, a test score, by itself, gauges student performance, but it also reflects, at least in part, school effectiveness (i.e., the score might have been higher or lower had the student attended a different school).

    Both student and school performance measures have an important role to play in accountability, but distinguishing between them is crucial. States’ ESSA proposals make the distinction in some respects but not in others. The result may end up being accountability systems that, while better than those under NCLB, are still severely hampered by improper inference and misaligned incentives. Let’s take a look at some of the key areas where we find these issues manifested.

    READ MORE
  • Subgroup-Specific Accountability, Teacher Job Assignments, And Teacher Attrition: Lessons For States

    Written on April 5, 2017

    Our guest author today is Matthew Shirrell, assistant professor of educational leadership and administration in the Graduate School of Education and Human Development at the George Washington University.

    Racial/ethnic gaps in student achievement persist, despite a wide variety of interventions designed to address them (see Reardon, Robinson-Cimpian, & Weathers, 2015). The No Child Left Behind Act of 2001 (NCLB) took a novel approach to closing these achievement gaps, requiring that schools make yearly improvements not only in overall student achievement, but also in the achievement of students of various subgroups, including racial/ethnic minority subgroups and students from economically disadvantaged families.

    Evidence is mixed on whether NCLB’s “subgroup-specific accountability” accomplished its goal of narrowing racial/ethnic and other achievement gaps. Research on the impacts of the policy, however, has largely neglected the effects of this policy on teachers. Understanding any effects on teachers is important to gaining a more complete picture of the policy’s overall impact; if the policy increased student achievement but resulted in the turnover or attrition of large numbers of teachers, for example, these benefits and costs should be weighed together when assessing the policy’s overall effects.

    In a study just published online in Education Finance and Policy (and supported by funding from the Albert Shanker Institute), I explore the effects of NCLB’s subgroup-specific accountability on teachers. Specifically, I examine whether teaching in a school that was held accountable for a particular subgroup’s performance in the first year of NCLB affected teachers’ job assignments, turnover, and attrition.

    READ MORE
  • Do Subgroup Accountability Measures Affect School Ratings Systems?

    Written on October 28, 2016

    The school accountability provisions of No Child Left Behind (NCLB) institutionalized a focus on the (test-based) performance of student subgroups, such as English language learners, racial and ethnic groups, and students eligible for free- and reduced-price lunch (FRL). The idea was to shine a spotlight on achievement gaps in the U.S., and to hold schools accountable for serving all students.

    This was a laudable goal, and disaggregating data by student subgroups is a wise policy, as there is much to learn from such comparisons. Unfortunately, however, NCLB also institutionalized the poor measurement of school performance, and so-called subgroup accountability was not immune. The problem, which we’ve discussed here many times, is that test-based accountability systems in the U.S. tend to interpret how highly students score as a measure of school performance, when it is largely a function of factors out of schools' control, such as student background. In other words, schools (or subgroups of those students) may exhibit higher average scores or proficiency rates simply because their students entered the schools at higher levels, regardless of how effective the school may be in raising scores. Although NCLB’s successor, the Every Student Succeeds Act (ESSA), perpetuates many of these misinterpretations, it still represents some limited progress, as it encourages greater reliance on growth-based measures, which look at how quickly students progress while they attend a school, rather than how highly they score in any given year (see here for more on this).

    Yet this evolution, slow though it may be, presents a somewhat unique challenge for the inclusion of subgroup-based measures in formal school accountability systems. That is, if we stipulate that growth model estimates are the best available test-based way to measure school (rather than student) performance, how should accountability systems apply these models to traditionally lower scoring student subgroups?

    READ MORE
  • A Few Reactions To The Final Teacher Preparation Accountability Regulations

    Written on October 19, 2016

    The U.S. Department of Education (USED) has just released the long-anticipated final regulations for teacher preparation (TP) program accountability. These regulations will guide states, which are required to design their own systems for assessing TP program performance for full implementation in 2018-19. The earliest year in which stakes (namely, eligibility for federal grants) will be attached to the ratings is 2021-22.

    Among the provisions receiving attention is the softening of the requirement regarding the use of test-based productivity measures, such as value-added and other growth models (see Goldhaber et al. 2013; Mihaly et al. 2013; Koedel et al. 2015). Specifically, the final regulations allow greater “flexibility” in how and how much these indicators must count toward final ratings. For the reasons that Cory Koedel and I laid out in this piece (and I will not reiterate here), this is a wise decision. Although it is possible that value-added estimates will eventually play a significant role in these TP program accountability systems, the USED timeline provides insufficient time for the requisite empirical groundwork.

    Yet this does not resolve the issues facing those who must design these systems, since putting partial brakes on value-added for TP programs also puts increased focus on the other measures which might be used to gauge program performance. And, as is often the case with formal accountability systems, the non-test-based bench is not particularly deep.

    READ MORE
  • The Details Matter In Teacher Evaluations

    Written on September 22, 2016

    Throughout the process of reforming teacher evaluation systems over the past 5-10 years, perhaps the most contentious, discussed issue was the importance, or weights, assigned to different components. Specifically, there was a great deal of debate about the proper weight to assign to test-based teacher productivity measures, such estimates from value-added and other growth models.

    Some commentators, particularly those more enthusiastic about test-based accountability, argued that the new teacher evaluations somehow were not meaningful unless value-added or growth model estimates constituted a substantial proportion of teachers’ final evaluation ratings. Skeptics of test-based accountability, on the other hand, tended toward a rather different viewpoint – that test-based teacher performance measures should play little or no role in the new evaluation systems. Moreover, virtually all of the discussion of these systems’ results, once they were finally implemented, focused on the distribution of final ratings, particularly the proportions of teachers rated “ineffective.”

    A recent working paper by Matthew Steinberg and Matthew Kraft directly addresses and informs this debate. Their very straightforward analysis shows just how consequential these weighting decisions, as well as choices of where to set the cutpoints for final rating categories (e.g., how many points does a teacher need to be given an “effective” versus “ineffective” rating), are for the distribution of final ratings.

    READ MORE
  • Thinking About Tests While Rethinking Test-Based Accountability

    Written on August 5, 2016

    Earlier this week, per the late summer ritual, New York State released its testing results for the 2015-2016 school year. New York City (NYC), always the most closely watched set of results in the state, showed a 7.6 percentage point increase in its ELA proficiency rate, along with a 1.2 percentage point increase in its math rate. These increases were roughly equivalent to the statewide changes.

    City officials were quick to pounce on the results, which were called “historic,” and “pure hard evidence” that the city’s new education policies are working. This interpretation, while standard in the U.S. education debate, is, of course, inappropriate for many reasons, all of which we’ve discussed here countless times and will not detail again (see here). Suffice it to say that even under the best of circumstances these changes in proficiency rates are only very tentative evidence that students improved their performance over time, to say nothing of whether that improvement was due to a specific policy or set of policies.

    Still, the results represent good news. A larger proportion of NYC students are scoring proficient in math and ELA than did last year. Real improvement is slow and sustained, and this is improvement. In addition, the proficiency rate in NYC is now on par with the statewide rate, which is unprecedented. There are, however, a couple of additional issues with these results that are worth discussing quickly.

    READ MORE
  • A Small But Meaningful Change In Florida's School Grades System

    Written on July 28, 2016

    Beginning in the late 1990s, Florida became one of the first states to assign performance ratings to public schools. The purpose of these ratings, which are in the form of A-F grades, is to communicate to the public “how schools are performing relative to state standards.” For elementary and middle schools, the grades are based entirely on standardized testing results.

    We have written extensively here about Florida’s school grading system (see here for just one example), and have used it to illustrate features that can be found in most other states’ school ratings. The primary issue is the heavy reliance that states place on how highly students score on tests, which tells you more about the students the schools serve than about how well they serve those students – i.e., it conflates school and student performance. Put simply, some schools exhibit lower absolute testing performance levels than do other schools, largely because their students enter performing at lower levels. As a result, schools in poorer neighborhoods tend to receive lower grades, even though many of these schools are very successful in helping their students make fast progress during their few short years of attendance.

    Although virtually every states’ school rating system has this same basic structure to varying degrees, Florida’s system warrants special attention, as it was one of the first in the nation and has been widely touted and copied (as well as researched -- see our policy brief for a review of this evidence). It is also noteworthy because it contains a couple of interesting features, one of which exacerbates the aforementioned conflation of student and school performance in a largely unnoticed manner. But, this feature, discussed below, has just been changed by the Florida Department of Education (FLDOE). This correction merits discussion, as it may be a sign of improvement in how policymakers think about these systems.

    READ MORE
  • Teachers' Opinions Of Teacher Evaluation Systems

    Written on June 17, 2016

    The primary test of the new teacher evaluation systems implemented throughout the nation over the past 5-10 years is whether they improve teacher and ultimately student performance. Although the kinds of policy evaluations that will address these critical questions are just beginning to surface (e.g., Dee and Wyckoff 2015), among the most important early indicators of how well the new systems are working is their credibility among educators. Put simply, if teachers and administrators don’t believe in the systems, they are unlikely to respond productively to them.

    A new report from the Institute of Education Sciences (IES) provides a useful little snapshot of teachers’ opinions of their evaluation systems using a nationally representative survey. It is important to bear in mind that the data are from the 2011-12 Schools and Staffing Survey (SASS) and the 2012-13 Teacher Follow Up Survey, a time in which most of the new evaluations in force today were either still on the drawing board, or in their first year or two of implementation. But the results reported by IES might still serve as a useful baseline going forward.

    The primary outcome in this particular analysis is a survey item querying whether teachers were “satisfied” with their evaluation process. And almost four in five respondents either strongly or somewhat agreed that they were satisfied with their evaluation. Of course, satisfaction with an evaluation system does not necessarily signal anything about its potential to improve or capture teacher performance, but it certainly tells us something about teachers’ overall views of how they are evaluated.

    READ MORE
  • Getting Serious About Measuring Collaborative Teacher Practice

    Written on April 8, 2016

    Our guest author today is Nathan D. Jones, an assistant professor of special education at Boston University. His research focuses on teacher quality, teacher development, and school improvement. Dr. Jones previously worked as a middle school special education teacher in the Mississippi Delta. In this column, he introduces a new Albert Shanker Institute publication, which was written with colleagues Elizabeth Bettini and Mary Brownell.

    The current policy landscape presents a dilemma. Teacher evaluation has dominated recent state and local reform efforts, resulting in broad changes in teacher evaluation systems nationwide. The reforms have spawned countless research studies on whether emerging evaluation systems use measures that are reliable and valid, whether they result in changes in how teachers are rated, what happens to teachers who receive particularly high or low ratings, and whether the net results of these changes have had an effect on student learning.

    At the same time,  there has been increasing enthusiasm about the promise of teacher collaboration (see here and here), spurred in part by new empirical evidence linking teacher collaboration to student outcomes (see Goddard et al., 2007; Ronfeldt, 2015; Sun, Grissom, & Loeb, 2016). When teachers work together, such as when they jointly analyze student achievement data (Gallimore et al., 2009; Saunders, Gollenberg, & Gallimore, 2009) or when high-performing teachers are matched with low-performing peers (Papay, Taylor, Tyler, & Laski, 2016), students have shown substantially better growth on standardized tests.

    This new work adds to a long line of descriptive research on the importance of colleagues and other social aspects of the school organization.  Research has documented that informal relationships with colleagues play an important role in promoting positive teacher outcomes, such as planned and actual retention decisions (e.g., Bryk & Schneider, 2002; Pogodzisnki, Youngs, & Frank, 2013; Youngs, Pogodzinski, Grogan, & Perrone, 2015). Further, a number of initiatives aimed at improving teacher learning – e.g., professional learning communities (Giles & Hargreaves, 2006) and lesson study (Lewis, Perry, & Murrata, 2006) – rely on teachers planning instruction collaboratively.

    READ MORE
  • Evaluating The Results Of New Teacher Evaluation Systems

    Written on March 24, 2016

    A new working paper by researchers Matthew Kraft and Allison Gilmour presents a useful summary of teacher evaluation results in 19 states, all of which designed and implemented new evaluation systems at some point over the past five years. As with previous evaluation results, the headline result of this paper is that only a small proportion of teachers (2-5 percent) were given the low, “below proficiency” ratings under the new systems, and the vast majority of teachers continue to be rated as satisfactory or better.

    Kraft and Gilmour present their results in the context of the “Widget Effect,” a well-known 2009 report by the New Teacher Project showing that the overwhelming majority of teachers in the 12 districts for which they had data received “satisfactory” ratings. The more recent results from Kraft and Gilmour indicate that this hasn’t changed much due to the adoption of new evaluation systems, or, at least, not enough to satisfy some policymakers and commentators who read the paper.

    The paper also presents a set of findings from surveys of and interviews with observers (e.g., principals). These are in many respects more interesting and important results from a research and policy perspective, but let’s nevertheless focus a bit on the findings on the distribution of teachers across rating categories, as they caused a bit of a stir. I have several comments to make about them, but will concentrate on three in particular (all of which, by the way, pertain not to the paper’s discussion, which is cautious and thorough, but rather to some of the reaction to it in our education policy discourse).

    READ MORE

Pages

Subscribe to Accountability

DISCLAIMER

This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from shankerblog.org. The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the Shanker Blog may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.