The Past Is Prologue To The Future

Our guest author today is Stanley Litow, Professor at Duke and Columbia Universities, where he teaches about the role of corporations in society, and the author of The Challenge for Business and Society: From Risk to Reward. He formerly led Corporate Social Responsibility at IBM, where he was twice selected as CEO of the Year by Corporate Responsibility Magazine.

It was thirty years ago this month that Joseph Fernandez began his tenure as the New York City Public Schools’ Chancellor. Born and raised in New York, Fernandez led the public school system in Miami prior to assuming leadership of New York City’s schools, the nation’s largest school system. Even before becoming Chancellor in NYC, Fernandez had already been acknowledged a premier leader of a large city school system.  Over nearly four years under Fernandez's leadership in NYC, the schools accomplished a great deal despite significant challenges.  In fact at the end of his first six months on the job, Joseph Berger wrote a story in the New York Times that claimed that Fernandez had “enjoyed a string of triumphs as he maneuvered to gain control of [the school] system.”

Among his many reforms, Fernandez championed the creation of dozens of new, innovative small schools across NYC, many of which ultimately spread across the nation. Decades later, the evaluation results of these innovative schools performed by MDRC as part of a set of longitudinal studies have documented significant gains in achievement.  His successors, who have disagreed sharply about many other things, have all continued to support and sustain the NYC small schools effort.  Fernandez also championed the first diversity curriculum in any US school district. That reform, Children of the Rainbow, attempted to assist early childhood and elementary educators in addressing the challenge of providing equity and excellence for students whose families might be nontraditional, including a book in its appendix titled "Heather Has Two Mommies." In the midst of the AIDS crisis, he began a structured way of providing students in New York City high schools with access to condoms, helping to provide health safety and security for students.

The False Choice Of Growth Versus Proficiency

Tennessee is considering changing its school accountability system such that schools have the choice of having their test-based performance judged by either status (how highly students score) or growth (how much progress students make over the course of the year). In other words, if schools do poorly on one measure, they are judged by the other (apparently, Texas already has a similar system in place).

As we’ve discussed here many times in the past, status measures, such as proficiency rates, are poor measures of school performance, since some students, particularly those living in poverty, enter their schools far behind their more affluent peers. As a result, schools serving larger proportions of poor students will exhibit lower scores and proficiency rates, even if they are very effective in compelling progress from their students. That is why growth models, which focus on individual student gains on over time, are a superior measure of school performance per se.

This so-called “growth versus proficiency” debate has resurfaced several times over the years, and it was particularly prevalent during the time when states were submitting proposals for their accountability systems during reauthorization of the Elementary and Secondary Education Act. The policy that came out of these discussions was generally promising, as many states moved at least somewhat toward weighting growth model estimates more heavily. 

At the same time, however, it is important to mention that the “growth versus proficiency” debate sometimes implies that states must choose between these two types of indicators. This is misleading. And the Tennessee proposal is a very interesting context for discussing this, since they are essentially using these two types of measures interchangeably. The reality, of course, is that both types of measures transmit valuable but different information, and both have a potentially useful role to play in accountability systems.

The Offline Implications Of The Research About Online Charter Schools

It’s rare to find an educational intervention with as unambiguous a research track record as online charter schools. Now, to be clear, it’s not a large body of research by any stretch, its conclusions may change in time, and the online charter sub-sector remains relatively small and concentrated in a few states. For now, though, the results seem incredibly bad (Zimmer et al. 2009Woodworth et al. 2015). In virtually every state where these schools have been studied, across virtually all student subgroups, and in both reading and math, the estimated impact of online charter schools on student testing performance is negative and large in magnitude.

Predictably, and not without justification, those who oppose charter schools in general are particularly vehement when it comes to online charter schools – they should, according to many of these folks, be closed down, even outlawed. Charter school supporters, on the other hand, tend to acknowledge the negative results (to their credit) but make less drastic suggestions, such as greater oversight, including selective closure, and stricter authorizing practices.

Regardless of your opinion on what to do about online charter schools’ poor (test-based) results, they are truly an interesting phenomenon for a few reasons.

Why Teacher Evaluation Reform Is Not A Failure

The RAND Corporation recently released an important report on the impact of the Gates Foundation’s “Intensive Partnerships for Effective Teaching” (IPET) initiative. IPET was a very thorough and well-funded attempt to improve teaching quality in schools in three districts and four charter management organizations (CMOs). The initiative was multi-faceted, but its centerpiece was the implementation of multi-measure teacher evaluation systems and the linking of ratings from those systems to professional development and high stakes personnel decisions, including compensation, tenure, and dismissal. This policy, particularly the inclusion in teacher evaluations of test-based productivity measures (e.g., value-added scores), has been among the most controversial issues in education policy throughout the past 10 years.

The report is extremely rich and there's a lot of interesting findings in there, so I would encourage everyone to read it themselves (at least the executive summary), but the headline finding was that the IPET had no discernible effect on student outcomes, namely test scores and graduation rates, in the districts that participated, vis-à-vis similar districts that did not. Given that IPET was so thoroughly designed and implemented, and that it was well-funded, it can potentially be viewed as a "best case scenario" test of the type of evaluation reform that most states have enacted. Accordingly, critics of these reforms, who typically focus their opposition on the high stakes use of evaluation measures, particularly value-added and other test-based measures, in these evaluations, have portrayed the findings as vindication of their opposition. 

This reaction has merit. The most important reason why is that evaluation reform was portrayed by advocates as a means to immediate and drastic improvements in student outcomes. This promise was misguided from the outset, and evaluation reform opponents are (and were) correct in pointing this out. At the same time, however, it would be wise not to dismiss evaluation reform as a whole, for several reasons, a few of which are discussed below.

What Happened To Teacher Quality?

Starting around 2005 and up until a few years ago, education policy discourse and policymaking was dominated by the issue of improving “teacher quality.” We don’t really hear too much about it the past couple of years, or at least not nearly as much. One of the major reasons why is that the vast majority of states have enacted policies ostensibly designed to improve teacher quality.

Thanks in no small part to the Race to the Top grant program, and the subsequent ESEA waiver program, virtually all states reformed their teacher evaluation systems, the “flagship” policy of the teacher quality push. Many of these states also tied their new evaluation results to high stakes personnel decisions, such as granting tenure, dismissals, layoffs, and compensation. Predictably, the details of these new systems vary quite a bit, both within and between states. Many advocates are unsatisfied with how the new policies were designed, and one could write a book on all the different issues. Yet it would be tough to deny that this national policy effort was among the fastest shifts in recent educational history, particularly given the controversy surrounding it.

So, what happened to all the attention to teacher quality? It was put into practice. The evidence on its effects is already emerging, but this will take a while, and so it is still a quiet time in teacher quality land, at least compared to the previous 5-7 years. Even so, there are already many lessons out there, too many for a post. Looking back, though, one big picture lesson – and definitely not a new one – is about how the evaluation reform effort stands out (in a very competitive field) for the degree to which it was driven by the promise of immediate, large results.

Improving Accountability Measurement Under ESSA

Despite the recent repeal of federal guidelines for states’ compliance with the Every Student Succeeds Act (ESSA), states are steadily submitting their proposals, and they are rightfully receiving some attention. The policies in these proposals will have far-reaching consequences for the future of school accountability (among many other types of policies), as well as, of course, for educators and students in U.S. public schools.

There are plenty of positive signs in these proposals, which are indicative of progress in the role of proper measurement in school accountability policy. It is important to recognize this progress, but impossible not to see that ESSA perpetuates long-standing measurement problems that were institutionalized under No Child Left Behind (NCLB). These issues, particularly the ongoing failure to distinguish between student and school performance, continue to dominate accountability policy to this day. Part of the confusion stems from the fact that school and student performance are not independent of each other. For example, a test score, by itself, gauges student performance, but it also reflects, at least in part, school effectiveness (i.e., the score might have been higher or lower had the student attended a different school).

Both student and school performance measures have an important role to play in accountability, but distinguishing between them is crucial. States’ ESSA proposals make the distinction in some respects but not in others. The result may end up being accountability systems that, while better than those under NCLB, are still severely hampered by improper inference and misaligned incentives. Let’s take a look at some of the key areas where we find these issues manifested.

Organizing For Adaptive Change Management

Our guest author today is Joshua P. Starr, chief executive officer of PDK International. This piece was originally published in Phi Delta Kappan, and it is adapted from his chapter in Teaching in Context: The Social Side of Education Reform, edited by Esther Quintero (Harvard Education Press, 2017).

One day, when I was a district superintendent, I visited two high schools we had identified as “needing improvement.” I was there to share our strategy to help them boost student achievement and also give teachers and staff a chance to air their thoughts and concerns. The schools faced similar challenges, and they served similar student populations, but the comments I heard on my visits were totally different.

At one school, faculty complained that students lacked respect for authority, had been poorly prepared by their middle schools, and were being raised by parents who didn’t value education. In short, they pointed to problems beyond their control. They wanted me to remove the kids who were giving them the most trouble, and they also wanted more money.

At the other school, teachers and staff told me about their collective struggle to improve instruction, talked about their desire for more professional learning, and described how they were challenging and changing their own beliefs about student abilities. That is, they found specific problems lurking in their own teaching practices and believed they had to learn and grow so they could serve students better.

Teacher Evaluations And Turnover In Houston

We are now entering a time period in which we might start to see a lot of studies released about the impact of new teacher evaluations. This incredibly rapid policy shift, perhaps the centerpiece of the Obama Administration’s education efforts, was sold based on illustrations of the importance of teacher quality.

The basic argument was that teacher effectiveness is perhaps the most important factor under schools’ control, and the best way to improve that effectiveness was to identify and remove ineffective teachers via new teacher evaluations. Without question, there was a logic to this approach, but dismissing or compelling the exits of low performing teachers does not occur in a vacuum. Even if a given policy causes more low performers to exit, the effects of this shift can be attenuated by turnover among higher performers, not to mention other important factors, such as the quality of applicants (Adnot et al. 2016).

A new NBER working paper by Julie Berry Cullen, Cory Koedel, and Eric Parsons, addresses this dynamic directly by looking at the impact on turnover of a new evaluation system in Houston, Texas. It is an important piece of early evidence on one new evaluation system, but the results also speak more broadly to how these systems work.

New Teacher Evaluations And Teacher Job Satisfaction

Job satisfaction among teachers is a perenially popular topic of conversation in education policy circles. There is good reason for this. For example, whether or not teachers are satisfied with their work has been linked to their likelihood of changing schools or professions (e.g., Ingersoll 2001).

Yet much of the discussion of teacher satisfaction consists of advocates’ speculation that their policy preferences will make for a more rewarding profession, whereas opponents’ policies are sure to disillusion masses of educators. This was certainly true of the debate surrounding the rapid wave of teacher evaluation reform over the past ten or so years.

A paper just published in the American Education Research Journal addresses directly the impact of new evaluation systems on teacher job satisfaction. It is, therefore, not only among the first analyses to examine the impact of these systems, but also the first to look at their effect on teachers’ attitudes.

New Evidence On Teaching Quality And The Achievement Gap

It is an extensively documented fact that low-income students score more poorly on standardized tests than do their higher income peers. This so-called “achievement gap” has persisted for generations and is still one of the most significant challenges confronting the American educational system.

Some people tend to overstate -- while others tend to understate -- the degree to which this gap is attributable to differences in teacher (and school) effectiveness between lower and higher income students (with income usually defined in terms of students’ eligibility for subsidized lunch assistance). As discussed below, the evidence thus far suggests that lower income students are a more likely than higher income students to have less “effective” teachers -- with effectiveness defined in terms of the ability to help raise student test scores, or value-added, although the magnitude of these discrepancies varies by study. There are also some compelling theories as to the possible mechanisms behind these (often modest) discrepancies, most notably the fact that schools in low-income neighborhoods tend to have fewer resources, as well as more trouble recruiting and retaining highly qualified, experienced teachers.

The Mathematica Policy Research organization recently released a very large, very important study that addresses these issues directly. It focuses on shedding additional light on the magnitude of any measurable differences in access to effective teaching among students of different incomes (the “Effective Teaching Gap”), as well as the way in which hiring, mobility, and retention might contribute to these gaps. The analysis uses data on teachers in grades 4-8 or 6-8 (depending on data availability) over five years (2008-09 to 2012-13) in 26 districts across the nation.