Recent Evidence On Teacher Experience And Productivity

The idea that teachers’ test-based productivity does not improve after their first few years in the classroom is, it is fair to say, the “conventional wisdom” among many in the education reform arena. It has been repeated endlessly, and used to advocate forcefully for recent changes in teacher personnel policies, such as those regarding compensation, transfers, and layoffs. 

Following a few other, recent analyses (e.g., Harris and Sass 2011Wiswall 2013; Ladd and Sorensen 2013), a new working paper by researchers John Papay and Matthew Kraft examines this claim about the relationship between experience and (test-based) performance. In this case, the authors compare the various approaches with which the productivity returns to experience have been estimated in the literature, and put forth a new one. The paper did receive some attention, and will hopefully have some impact on the policy debate, as well as on the production of future work on this topic.

It might nevertheless be worthwhile to take a closer look at the “nuts and bolts” of this study, both because it is interesting (at least in my opinion) and policy relevant, and also because it illustrates some important lessons regarding the relationship between research and policy, specifically the fact that what we think we know is not always as straightforward as it appears.

Putting aside the common issues with using value-added models and other test-based measures as a proxy for teacher performance, estimating the (test-based) returns to teacher experience may seem to be a fairly simple endeavor. You simply look at whether value-added scores (or some other performance measure) are higher among teachers with more experience.

As usual, it’s far from that easy. For one thing, it's important to monitor individual teachers’ improvement over time using longitudinal data, since teacher attrition (differences between leavers and stayers in terms of their measured performance) and variations in initial productivity between cohorts might mask or distort the relationship between experience and performance. But then there's the issue of separating “year effects” from “experience effects.” In other words, because factors unrelated to teacher experience might generate “system-wide” changes in teacher productivity, it is potentially necessary to separate the effect of a given year-to-year transition from that of teachers gaining an additional year of experience. If, for example, a school or district changes its curriculum in a given year, its students may experience larger (or smaller) achievement gains as a result, which might be conflated with the impact of that school or district’s teachers gaining another year in the classroom.

Papay and Kraft identify three previous approaches, characterized as types of models, that have been employed to address this issue:

  1. Censored growth model (e.g., Rockoff 2004): Relying on previous research showing that teachers don’t exhibit productivity increases after 10 years in the classroom, the general idea here is to estimate the "year effects" using teachers with more than 10 years in the classroom (based on the assumption that these teachers will exhibit no experience effects, and so any within teacher change over time will represent year effects), and then using these estimates to isolate experience effects among teachers with less than 10 years in the classroom. The big question mark here is whether there are in fact any gains after year 10. If so, then this would generate bias in (specifically, understate) estimated experience effects;
  2. Indicator variable model (e.g., Harris and Sass 2011): Put simply, this approach sorts teachers into experience categories -- for example, 1-2, 3-4, 5-9, 10-15, 15-25, 25 or more years -- instead of the normal continuous experience variable, and then estimating the "year effects" using variation between teachers in each “band” (in a manner similar to the censored growth model, which does the same thing with teachers who have more than 10 years of experience). The possible downside is that it requires the assumption that teachers do not experience productivity gains within the bands. To the degree this assumption is violated, experience-driven productivity gains are again conflated with year-to-year transitions, and returns to experience are therefore underestimated;
  3. Dicontinuous career model (e.g., Wiswall 2013): This approach is quite different from the other two. It focuses on a subset of teachers: those who leave the profession and then return at a later date (e.g., due to family leave or medical reasons). Because of the interruptions, these teachers’ experience profiles are not collinear with year effects (i.e., the relationship is not 1:1), thus allowing researchers to address the problem of conflating experience and year. It is also, however, subject to bias if: 1) teachers who leave and return are not representative of all teachers (external validity); and/or 2) leaving and returning itself has an impact on productivity gains (internal validity).

Papay and Kraft propose a fourth approach, one that uses the full sample of teachers (not just those with nonstandard career patterns), and makes different assumptions about the relationship between experience and year. They call it the two-stage model. In the first stage, they estimate the year effects by, in a sense, "holding off" on modeling the improvement of each individual teacher over time (i..e, in technical terms, omitting teacher fixed effects), and instead modeling the relationship between year and productivity, controlling for experience. In the second step, they incorporate the "year effects" estimated in stage one as a control, which enables isolation of the productivity gains accruing to experience per se. This also, however, requires assuming that each cohort of new teachers has the same initial effectiveness as all previous and subsequent cohorts in the data - i.e., that new teacher productivity does not vary over time.

In order to compare these four approaches, specifically the degree to which they violate the assumptions upon which they are based and whether or not such violation generate bias in the estimation of the experience/productivity relationship, Papay and Kraft use both simulations and “real” data. Let’s summarize their results in broad strokes:

  • There is evidence that teachers’ productivity, particularly in mathematics, improves most rapidly during their first few years, but that improvement continues in the later years. This violates the big assumption of the censored growth model and biases productivity returns estimates downward (#1 above);
  • Similarly, the results of the indicator variable model (#2) are quite sensitive to the specification of the experience “bands.” As the “bands” get more narrow (e.g., instead of 5-10 years, one uses 5-7 and 8-10 years), the returns to experience become larger, suggesting that there may be bias from the assumption of no within-band productivity gains;
  • It seems that the experience/productivity relationship varies meaningfully between teachers with standard and nonstandard career paths, which calls into question interpreting the results of the discontinuous career model as valid for all teachers (#3);
  • There is some evidence that the main assumption of the two-stage model proposed by Papay and Kraft -- that new teachers’ initial productivity is not changing over time -- is violated, particularly in reading, but the magnitude of the resulting bias appears to be moderate.

These results as a whole indicate that teacher productivity improves most rapidly during teachers' first years, but they also suggest that improvement continues beyond five years, and perhaps even throughout the late career years, especially in math.

This obviously is not final word on this topic. For example, the samples of teachers with more than 20-25 years of experience in this analysis are small, and estimates are therefore imprecise. It is possible that things look different in other locations, and these results do not speak to the likelihood that organizational context matters for teacher improvement. And it bears reiterating that the results are consistent with previous research in finding that the productivity returns to experience are concentrated in the first few years in the classroom.

Others are better qualified than I to adjudicate between the different approaches evaluated in this paper. But there may be an additional lesson here, which is about the relationship between research and policy more generally. 

That is, a decent amount of what we think we know about education, particularly over the past 15-20 years, comes from research employing highly sophisticated methods that require a lot of training. Now, I would argue -- and I know many would disagree -- that this generally is a good thing. Education is complicated, and benefits greatly from complex analysis, particularly when one wishes to draw causal conclusions in non-experimental contexts. But these methods, for all their advantages, almost invariably rely on assumptions and decisions that, while an essential part of the research endeavor, are opaque to most policymakers, journalists, advocates, or other interested parties.

All of us, myself included, would do well to keep this in mind when making generalizations about "what the research shows," even when (perhaps especially when) those generalizations have been repeated so often that they seem beyond reproach.

Issues Areas