Fix Schools, Not Teachers

This post was originally published at the Harvard Education Press blog.

Both John and Jasmine are fifth-grade teachers. Jasmine has a lot of experience under her belt, has been recognized as an excellent educator and, as a content expert in math and science, her colleagues seek her out as a major resource at her school. John has been teaching math and science for two years. His job evaluations show room for improvement but he isn’t always sure how to get there. Due to life circumstances, they both switch schools the following year. John starts working at a school where faculty routinely work collaboratively, which is a rather new experience for him. In Jasmine’s new school, teachers are friendly but they work independently and don’t function as a learning community like in her old school.

After a year John’s practice has improved considerably; he attributes much of it to the culture of his new school, which is clearly oriented toward professional learning. Jasmine’s instruction continues to be strong but she misses her old school, being sought out by her colleagues for advice, and the mutual learning that she felt resulted from those frequent professional exchanges.

This story helps to illustrate the limitations of how teachers’ knowledge and skills are often viewed: as rather static and existing in a vacuum, unaffected by the contexts where teachers work. Increasing evidence suggests that understanding teaching and supporting its improvement requires a recognition that the context of teachers’ work, particularly its interpersonal dimension, matters a great deal. Teachers’ professional relations and interactions with colleagues and supervisors can constrain or support their learning and, consequently, that of their students.

Subgroup-Specific Accountability, Teacher Job Assignments, And Teacher Attrition: Lessons For States

Our guest author today is Matthew Shirrell, assistant professor of educational leadership and administration in the Graduate School of Education and Human Development at the George Washington University.

Racial/ethnic gaps in student achievement persist, despite a wide variety of interventions designed to address them (see Reardon, Robinson-Cimpian, & Weathers, 2015). The No Child Left Behind Act of 2001 (NCLB) took a novel approach to closing these achievement gaps, requiring that schools make yearly improvements not only in overall student achievement, but also in the achievement of students of various subgroups, including racial/ethnic minority subgroups and students from economically disadvantaged families.

Evidence is mixed on whether NCLB’s “subgroup-specific accountability” accomplished its goal of narrowing racial/ethnic and other achievement gaps. Research on the impacts of the policy, however, has largely neglected the effects of this policy on teachers. Understanding any effects on teachers is important to gaining a more complete picture of the policy’s overall impact; if the policy increased student achievement but resulted in the turnover or attrition of large numbers of teachers, for example, these benefits and costs should be weighed together when assessing the policy’s overall effects.

In a study just published online in Education Finance and Policy (and supported by funding from the Albert Shanker Institute), I explore the effects of NCLB’s subgroup-specific accountability on teachers. Specifically, I examine whether teaching in a school that was held accountable for a particular subgroup’s performance in the first year of NCLB affected teachers’ job assignments, turnover, and attrition.

Teacher Evaluations And Turnover In Houston

We are now entering a time period in which we might start to see a lot of studies released about the impact of new teacher evaluations. This incredibly rapid policy shift, perhaps the centerpiece of the Obama Administration’s education efforts, was sold based on illustrations of the importance of teacher quality.

The basic argument was that teacher effectiveness is perhaps the most important factor under schools’ control, and the best way to improve that effectiveness was to identify and remove ineffective teachers via new teacher evaluations. Without question, there was a logic to this approach, but dismissing or compelling the exits of low performing teachers does not occur in a vacuum. Even if a given policy causes more low performers to exit, the effects of this shift can be attenuated by turnover among higher performers, not to mention other important factors, such as the quality of applicants (Adnot et al. 2016).

A new NBER working paper by Julie Berry Cullen, Cory Koedel, and Eric Parsons, addresses this dynamic directly by looking at the impact on turnover of a new evaluation system in Houston, Texas. It is an important piece of early evidence on one new evaluation system, but the results also speak more broadly to how these systems work.

New Teacher Evaluations And Teacher Job Satisfaction

Job satisfaction among teachers is a perenially popular topic of conversation in education policy circles. There is good reason for this. For example, whether or not teachers are satisfied with their work has been linked to their likelihood of changing schools or professions (e.g., Ingersoll 2001).

Yet much of the discussion of teacher satisfaction consists of advocates’ speculation that their policy preferences will make for a more rewarding profession, whereas opponents’ policies are sure to disillusion masses of educators. This was certainly true of the debate surrounding the rapid wave of teacher evaluation reform over the past ten or so years.

A paper just published in the American Education Research Journal addresses directly the impact of new evaluation systems on teacher job satisfaction. It is, therefore, not only among the first analyses to examine the impact of these systems, but also the first to look at their effect on teachers’ attitudes.

Our Request For Simple Data From The District Of Columbia

For our 2015 report, “The State of Teacher Diversity in American Education,” we requested data on teacher race and ethnicity between roughly 2000 and 2012 from nine of the largest school districts in the nation: Boston; Chicago; Cleveland; District of Columbia; Los Angeles; New Orleans; New York; Philadelphia; and San Francisco.

Only one of these districts failed to provide us with data that we could use to conduct our analysis: the District of Columbia.

To be clear, the data we requested are public record. Most of the eight other districts to which we submitted requests complied in a timely fashion. A couple of them took months to fill the request, and required a little follow up. But all of them gave us what we needed. We were actually able to get charter school data for virtually all of these eight cities (usually through the state).

Even New Orleans, which, during the years for which we requested data, was destroyed by a hurricane and underwent a comprehensive restructuring of its entire school system, provided the data.

But not DC.

New Evidence On Teaching Quality And The Achievement Gap

It is an extensively documented fact that low-income students score more poorly on standardized tests than do their higher income peers. This so-called “achievement gap” has persisted for generations and is still one of the most significant challenges confronting the American educational system.

Some people tend to overstate -- while others tend to understate -- the degree to which this gap is attributable to differences in teacher (and school) effectiveness between lower and higher income students (with income usually defined in terms of students’ eligibility for subsidized lunch assistance). As discussed below, the evidence thus far suggests that lower income students are a more likely than higher income students to have less “effective” teachers -- with effectiveness defined in terms of the ability to help raise student test scores, or value-added, although the magnitude of these discrepancies varies by study. There are also some compelling theories as to the possible mechanisms behind these (often modest) discrepancies, most notably the fact that schools in low-income neighborhoods tend to have fewer resources, as well as more trouble recruiting and retaining highly qualified, experienced teachers.

The Mathematica Policy Research organization recently released a very large, very important study that addresses these issues directly. It focuses on shedding additional light on the magnitude of any measurable differences in access to effective teaching among students of different incomes (the “Effective Teaching Gap”), as well as the way in which hiring, mobility, and retention might contribute to these gaps. The analysis uses data on teachers in grades 4-8 or 6-8 (depending on data availability) over five years (2008-09 to 2012-13) in 26 districts across the nation.

The Details Matter In Teacher Evaluations

Throughout the process of reforming teacher evaluation systems over the past 5-10 years, perhaps the most contentious, discussed issue was the importance, or weights, assigned to different components. Specifically, there was a great deal of debate about the proper weight to assign to test-based teacher productivity measures, such estimates from value-added and other growth models.

Some commentators, particularly those more enthusiastic about test-based accountability, argued that the new teacher evaluations somehow were not meaningful unless value-added or growth model estimates constituted a substantial proportion of teachers’ final evaluation ratings. Skeptics of test-based accountability, on the other hand, tended toward a rather different viewpoint – that test-based teacher performance measures should play little or no role in the new evaluation systems. Moreover, virtually all of the discussion of these systems’ results, once they were finally implemented, focused on the distribution of final ratings, particularly the proportions of teachers rated “ineffective.”

A recent working paper by Matthew Steinberg and Matthew Kraft directly addresses and informs this debate. Their very straightforward analysis shows just how consequential these weighting decisions, as well as choices of where to set the cutpoints for final rating categories (e.g., how many points does a teacher need to be given an “effective” versus “ineffective” rating), are for the distribution of final ratings.

An Alternative Income Measure Using Administrative Education Data

The relationship between family background and educational outcomes is well documented and the topic, rightfully, of endless debate and discussion. A students’ background is most often measured in terms of family income (even though it is actually the factors associated with income, such as health, early childhood education, etc., that are the direct causal agents).

Most education analyses rely on a single income/poverty indicator – i.e., whether or not students are eligible for federally-subsidized lunch (free/reduced-price lunch, or FRL). For instance, income-based achievement gaps are calculated by comparing test scores between students who are eligible for FRL and those who are not, while multivariate models almost always use FRL eligibility as a control variable. Similarly, schools and districts with relatively high FRL eligibility rates are characterized as “high poverty.” The primary advantages of FRL status are that it is simple and collected by virtually every school district in the nation (collecting actual income would not be feasible). Yet it is also a notoriously crude and noisy indicator. In addition to the fact that FRL eligibility is often called “poverty” even though the cutoff is by design 85 percent higher than the federal poverty line, FRL rates, like proficiency rates, mask a great deal of heterogeneity. Families of two students who are FRL eligible can have quite different incomes, as could two families of students who are not eligible. As a result, FRL-based estimates such as achievement gaps might differ quite a bit from those calculated using actual family income from surveys.

A new working paper by Michigan researchers Katherine Michelmore and Susan Dynarski presents a very clever means of obtaining a more accurate income/poverty proxy using the same administrative data that states and districts have been collecting for years.

On Focus Groups, Elections, and Predictions

Focus groups, a method in which small groups of subjects are questioned by researchers, are widely used in politics, marketing, and other areas. In education policy, focus groups, particularly those comprised of teachers or administrators, are often used to design or shape policy. And, of course, during national election cycles, they are particularly widespread, and there are even television networks that broadcast focus groups as a way to gauge the public’s reaction to debates or other events.

There are good reasons for using focus groups. Analyzing surveys can provide information regarding declaratory behaviors and issues’ rankings at a given point in time, and correlations between these declarations and certain demographic and social variables of interest. Focus groups, on the other hand, can help map out the issues important to voters (which can inform survey question design), as well investigate what reactions certain presentations (verbal or symbolic) evoke (which can, for example, help frame messages in political or informational campaigns).

Both polling/surveys and focus groups provide insights that the other method alone could not. Neither of them, however, can answer questions about why certain patterns occur or how likely they are to occur in the future. That said, having heard some of the commentary about focus groups, and particularly having seen them being broadcast live and discussed on cable news stations, I feel strongly compelled to comment, as I do whenever data are used improperly or methodologies are misinterpreted.

A Myth Grows In The Garden State

New Jersey Governor Chris Christie’s recently announced a new "fairness funding" plan to provide every school district in his state roughly the same amount of per-pupil state funding. This would represent a huge change from the current system, in which more state funds are allocated to the districts that serve a larger proportion of economically disadvantaged students. Thus, the Christie proposal would result in an increase in state funding for middle class and affluent districts, and a substantial decrease in money for poorer districts. According to the Governor, the change would reduce the property tax burden on many districts by replacing some of their revenue with state money.

This is a very bad idea. For one thing, NJ state funding of education is already about 7-8 percent lower than it was in 2008 (Leachman et al. 2015). And this plan would, most likely, cut revenue in the state’s poorest districts by dramatic amounts, absent an implausible increase in property tax rates. It is perfectly reasonable to have a discussion about how education money is spent and allocated, and/or about tax structure. But it is difficult to grasp how serious people could actually conceive of this particular idea. And it’s actually a perfect example of how dangerous it is when huge complicated bodies of empirical evidence are boiled down to talking points (and this happens on all “sides” of the education debate).

Pu simply, Governor Christie believes that “money doesn’t matter” in education. He and his advisors have been told that how much you spend on schools has little real impact on results. This is also a talking point that, in many respects, coincides with an ideological framework of skepticism toward government and government spending, which Christie shares.