Accountability

Will Value-Added Reinforce The Walls Of The Egg-Crate School?

Our guest author today is Susan Moore Johnson, Jerome T. Murphy Research Professor in Education at Harvard Graduate School of Education. Johnson directs the Project on the Next Generation of Teachers, which examines how best to recruit, develop, and retain a strong teaching force.

Academic scholars are often dismayed when policymakers pass laws that disregard or misinterpret their research findings. The use of value-added methods (VAMS) in education policy is a case in point.

About a decade ago, researchers reported that teachers are the most important school-level factor in students’ learning, and that that their effectiveness varies widely within schools (McCaffrey, Koretz, Lockwood, & Hamilton 2004; Rivkin, Hanushek, & Kain 2005; Rockoff 2004). Many policymakers interpreted these findings to mean that teacher quality rests with the individual rather than the school and that, because some teachers are more effective than others, schools should concentrate on increasing their number of effective teachers.

Based on these assumptions, proponents of VAMS began to argue that schools could be improved substantially if they would only dismiss teachers with low VAMS ratings and replace them with teachers who have average or higher ratings (Hanushek 2009). Although panels of scholars warned against using VAMS to make high-stakes decisions because of their statistical limitations (American Statistical Association, 2014; National Research Council & National Academy of Education, 2010), policymakers in many states and districts moved quickly to do just that, requiring that VAMS scores be used as a substantial component in teacher evaluation.

Read more about Will Value-Added Reinforce The Walls Of The Egg-Crate School?

Trust: The Foundation Of Student Achievement

When sharing with me the results of some tests, my doctor once said, "You are a scientist, you know a single piece of data can't provide all the answers or suffice to make a diagnosis. We can't look at a single number in isolation, we need to look at all results in combination." Was my doctor suggesting that I ignore that piece of information we had? No. Was my doctor deemphasizing the result? No. He simply said that we needed additional evidence to make informed decisions. This is, of course, correct.

In education, however, it is frequently implied or even stated directly that the bottom line when it comes to school performance is student test scores, whereas any other outcomes, such as cooperation between staff or a supportive learning environment, are ultimately "soft" and, at best, of secondary importance. This test-based, individual-focused position is viewed as serious, rigorous, and data driven. Deviation from it -- e.g., equal emphasis on additional, systemic aspects of schools and the people in them -- is sometimes derided as an evidence-free mindset. Now, granted, few people are “purely” in one camp or the other. Most probably see themselves as pragmatists, and, as such, somewhere in between: Test scores are probably not all that matters, but since the rest seems so difficult to measure, we might as well focus on "hard data" and hope for the best.

Why this narrow focus on individual measures such as student test scores or teacher quality? I am sure there are many reasons but one is probably lack of familiarity with the growing research showing that we must go beyond the individual teacher and student and examine the social-organizational aspects of schools, which are associated (most likely causally) with student achievement. In other words, all the factors skeptics and pragmatists might think are a distraction and/or a luxury, are actually relevant for the one thing we all care about: Student achievement. Moreover, increasing focus on these factors might actually help us understand what’s really important: Not simply whether testing results went up or down, but why or why not.

Read more about Trust: The Foundation Of Student Achievement

The Purpose And Potential Impact Of The Common Core

I think it makes sense to have clear, high standards for what students should know and be able to do, and so I am generally a supporter of the Common Core State Standards (CCSS). That said, I’m not comfortable with the way CCSS is being advertised as a means for boosting student achievement (i.e., test scores), nor the frequency with which I have heard speculation about whether and when the CCSS will generate a “bump” in NAEP scores.

To be clear, I think it is plausible to argue that, to the degree that the new standards can help improve the coherence and breadth/depth of the content students must learn, they may lead to some improvement over the long term – for example, by minimizing the degree to which student mobility disrupts learning or by enabling the adoption of coherent learning progressions across grade levels. It remains to be seen whether the standards, as implemented, can be helpful in attaining these goals.

The standards themselves, after all, only discuss the level and kind of learning that students should be pursuing at a given point in their education. They do not say what particular content should be taught when (curricular frameworks), how it should be taught (instructional materials), who will be doing the teaching and with what professional development, or what resources will be made available to teachers and students. And these are the primary drivers of productivity improvements. Saying how high the bar should be raised (or what it should consist of) is important, but outcomes are determined by whether or not the tools are available with which to accomplish that raising. The purpose of having better or higher standards is just that – better or higher standards. If you're relying on immediate test-based gratification due solely to CCSS, you're confusing a road map with how to get to your destination.

Read more about The Purpose And Potential Impact Of The Common Core

Teaching = Thinking + Relationship

Our guest author today is Bryan Mascio, who taught for over ten years in New Hampshire, primarily working with students who had been unsuccessful in traditional school settings. Bryan is now a doctoral student at the Harvard Graduate School of Education, where he conducts research on the cognitive aspects of teaching, and works with schools to support teachers in improving relationships with their students.

Before I became a teacher I worked as a caretaker for a wide variety of animals. Transitioning from one profession to the other was quite instructive. When I trained dogs, for example, it was straightforward: When the dog sat on command I would give him praise and a treat. After enough training, anyone else could give the command and the dog would perform just as well and as predictably. When I worked with students, on the other hand, it was far more complex – we worked together in a relationship, with give and take as they learned and grew. Regrettably, when I look at how we train teachers today, it reminds me more of my first profession than my second.

Teaching is far more than a mechanized set of actions. Our most masterful teachers aren’t just following scripts or using pre-packaged curricula. They are tailoring lessons, making professional judgments, and forging deep bonds with students – all of which is far more difficult to see or understand. Teaching is a cognitive skill that has human relationships at its center. Unfortunately, we typically don't view teaching this way in the United States. As a result, we usually don't prepare teachers like (or for) this, we don’t evaluate them like this, and we don’t even study them like this. In our public discussion of education, we typically frame teaching as a collection of behaviors, and teachers as though they are simply technicians. This doesn’t just create a demoralized workforce; it also leaves students in the care of well-meaning and hard-working teachers who are, nonetheless, largely unable to meet their students' individual needs – due either to lack of preparation for, or mandates that prevent, meeting them.

Read more about Teaching = Thinking + Relationship

Measurement And Incentives In The USED Teacher Preparation Regulations

Late last year, the U.S. Department of Education (USED) released a set of regulations, the primary purpose of which is to require states to design formal systems of accountability for teacher preparation (TP) programs. Specifically, states are required to evaluate annually the programs operating within their boundaries, and assign performance ratings. Importantly, the regulations specify that programs receiving low ratings should face possible consequences, such as the loss of federal funding.

The USED regulations on TP accountability put forth several outcomes that states are to employ in their ratings, including: Student outcomes (e.g., test-based effectiveness of graduates); employment outcomes (e.g., placement/retention); and surveys (e.g., satisfaction among graduates/employers). USED proposes that states have their initial designs completed by the end of this year, and start generating ratings in 2017-18.

As was the case with the previous generation of teacher evaluations, teacher preparation is an area in which there is widespread agreement about the need for improvement. And formal high stakes accountability systems can (even should) be a part of that at some point. Right now, however, requiring all states to begin assigning performance ratings to schools, and imposing high stakes accountability for those ratings within a few years, is premature. The available measures have very serious problems, and the research on them is in its relative infancy. If we cannot reliably distinguish between programs in terms of their effectiveness, it is ill-advised to hold them formally accountable for that effectiveness. The primary rationale for the current focus on teacher quality and evaluations was established over decades of good research. We are nowhere near that point for TP programs. This is one of those circumstances in which the familiar refrain of “it’s imperfect but better than nothing” is false, and potentially dangerous.

Read more about Measurement And Incentives In The USED Teacher Preparation Regulations

Charter Schools, Special Education Students, And Test-Based Accountability

Opponents often argue that charter schools tend to serve a disproportionately low number of special education students. And, while there may be exceptions and certainly a great deal of variation, that argument is essentially accurate. Regardless of why this is the case (and there is plenty of contentious debate about that), some charter school supporters have acknowledged that it may be a problem insofar as charters are viewed as a large scale alternative to regular public schools.

For example, Robin Lake, writing for the Center for Reinventing Public Education, takes issue with her fellow charter supporters who assert that “we cannot expect every school to be all things to every child.” She argues instead that schools, regardless of their governance structures, should never “send the soft message that kids with significant differences are not welcome,” or treat them as if “they are somebody else’s problem.” Rather, Ms. Lake calls upon charter school operators to take up the banner of serving the most vulnerable and challenging students and “work for systemic special education solutions.”

These are, needless to say, noble thoughts, with which many charter opponents and supporters can agree. Still, there is a somewhat more technocratic but perhaps more actionable issue lurking beneath the surface here: Put simply, until test-based accountability systems in the U.S. are redesigned such that they stop penalizing schools for the students they serve, rather than their effectiveness in serving those students, there will be a rather strong disincentive for charters to focus aggressively on serving special education students. Moreover, whatever accountability disadvantage may be faced by regular public schools that serve higher proportions of special education students pales in comparison with that faced by all schools, charter and regular public, located in higher-poverty areas. In this sense, then, addressing this problem is something that charter supporters and opponents should be doing together.

Read more about Charter Schools, Special Education Students, And Test-Based Accountability

How Not To Improve New Teacher Evaluation Systems

One of the more interesting recurring education stories over the past couple of years has been the release of results from several states’ and districts’ new teacher evaluation systems, including those from New York, Indiana, Minneapolis, Michigan and Florida. In most of these instances, the primary focus has been on the distribution of teachers across ratings categories. Specifically, there seems to be a pattern emerging, in which the vast majority of teachers receive one of the higher ratings, whereas very few receive the lowest ratings.

This has prompted some advocates, and even some high-level officials, essentially to deem as failures the new systems, since their results suggest that the vast majority of teachers are “effective” or better. As I have written before, this issue cuts both ways. On the one hand, the results coming out of some states and districts seem problematic, and these systems may need adjustment. On the other hand, there is a danger here: States may respond by making rash, ill-advised changes in order to achieve “differentiation for the sake of differentiation,” and the changes may end up undermining the credibility and threatening the validity of the systems on which these states have spent so much time and money.

Granted, whether and how to alter new evaluations are difficult decisions, and there is no tried and true playbook. That said, New York Governor Andrew Cuomo’s proposals provide a stunning example of how not to approach these changes. To see why, let’s look at some sound general principles for improving teacher evaluation systems based on the first rounds of results, and how they compare with the New York approach.*

Read more about How Not To Improve New Teacher Evaluation Systems

The Status Fallacy: New York State Edition

A recent New York Times story addresses directly New York Governor Andrew Cuomo’s suggestion, in his annual “State of the State” speech, that New York schools are in a state of crisis and "need dramatic reform." The article’s general conclusion is that the “data suggest otherwise.”

There are a bunch of important points raised in the article, but most of the piece is really just discussing student rather than school performance. Simple statistics about how highly students score on tests – i.e., “status measures” – tell you virtually nothing about the effectiveness of the schools those students attend, since, among other reasons, they don’t account for the fact that many students enter the system at low levels. How much students in a school know in a given year is very different from how much they learned over the course of that year.

I (and many others) have written about this “status fallacy” dozens of times (see our resources page), not because I enjoy repeating myself (I don’t), but rather because I am continually amazed just how insidious it is, and how much of an impact it has on education policy and debate in the U.S. And it feels like every time I see signs that things might be changing for the better, there is an incident, such as Governor Cuomo’s speech, that makes me question how much progress there really has been at the highest levels.

Read more about The Status Fallacy: New York State Edition

Actual Growth Measures Make A Big Difference When Measuring Growth

As a frequent critic of how states and districts present and interpret their annual testing results, I am also obliged (and indeed quite happy) to note when there is progress.

Recently, I happened to be browsing through New York City’s presentation of their 2014 testing results, and to my great surprise, on slide number four, I found proficiency rate changes between 2013 and 2014 among students who were in the sample in both years (which they call “matched changes”). As it turns out, last year, for the first time, New York State as a whole began publishing these "matched" year-to-year proficiency rate changes for all schools and districts. This is an excellent policy. As we’ve discussed here many times, NCLB-style proficiency rate changes, which compare overall rates of all students, many of whom are only in the tested sample in one of the years, are usually portrayed as “growth” or “progress.” They are not. They compare different groups of students, and, as we’ll see, this can have a substantial impact on the conclusions one reaches from the data. Limiting the sample to students who were tested in both years, though not perfect, at least permits one to measure actual growth per se, and provides a much better idea of whether students are progressing over time.

This is an encouraging sign that New York State is taking steps to improve the quality and interpretation of their testing data. And, just to prove that no good deed goes unpunished, let’s see what we can learn using the new “matched” data – specifically, by seeing how often the matched (longitudinal) and unmatched (cross-sectional) changes lead to different conclusions about student “growth” in schools.

Read more about Actual Growth Measures Make A Big Difference When Measuring Growth

Sample Size And Volatility In School Accountability Systems

It is generally well-known that sample size has an important effect on measurement and, therefore, incentives in test-based school accountability systems.

Within a given class or school, for example, there may be students who are sick on testing day, or get distracted by a noisy peer, or just have a bad day. Larger samples attenuate the degree to which unusual results among individual students (or classes) can influence results overall. In addition, schools draw their students from a population (e.g., a neighborhood). Even if the characteristics of the neighborhood from which the students come stay relatively stable, the pool of students entering the school (or tested sample) can vary substantially from one year to the next, particularly when that pool is small.

Classes and schools tend to be quite small, and test scores vary far more between- than within-student (i.e., over time). As a result, testing results often exhibit a great deal of nonpersistent variation (Kane and Staiger 2002). In other words, much of the differences in test scores between schools, and over time, is fleeting, and this problem is particularly pronounced in smaller schools. One very simple, though not original, way to illustrate this relationship is to compare the results for smaller and larger schools.

Read more about Sample Size And Volatility In School Accountability Systems

Subscribe to Accountability

Recent Blog Posts

Publications

What Would Bayard Rustin Do? by Eric Chenoweth
Eric Chenoweth is director of the Institute for Democracy in Eastern Europe and principal author of Democracy Web, an online comparative study guide for teachers, students and civic activists. He worked with Bayard Rustin in various capacities in the late 1970s and 1980s.
The Adequacy and Fairness of State School Finance Systems (Seventh Edition)
A national evaluation of the K-12 school finance systems of all 50 states and D.C., published by researchers from the Albert Shanker Institute, University of Miami, and Rutgers Graduate School of Education.
Does Money Matter in Education? (Third Edition)
A comprehensive review of the research about the effect of K-12 school funding on student outcomes.

Blog Archives

Our Mission

The Albert Shanker Institute, endowed by the American Federation of Teachers and named in honor of its late president, is a nonprofit, nonpartisan organization dedicated to three themes - excellence in public education, unions as advocates for quality, and freedom of association in the public life of democracies. With an independent Board of Directors (composed of educators, business representatives, labor leaders, academics, and public policy analysts), its mission is to generate ideas, foster candid exchanges, and promote constructive policy proposals related to these issues.

This blog offers informal commentary on the research, news, and controversies related to the work of the Institute.