New Teacher Evaluations And Teacher Job Satisfaction

Job satisfaction among teachers is a perenially popular topic of conversation in education policy circles. There is good reason for this. For example, whether or not teachers are satisfied with their work has been linked to their likelihood of changing schools or professions (e.g., Ingersoll 2001).

Yet much of the discussion of teacher satisfaction consists of advocates’ speculation that their policy preferences will make for a more rewarding profession, whereas opponents’ policies are sure to disillusion masses of educators. This was certainly true of the debate surrounding the rapid wave of teacher evaluation reform over the past ten or so years.

A paper just published in the American Education Research Journal addresses directly the impact of new evaluation systems on teacher job satisfaction. It is, therefore, not only among the first analyses to examine the impact of these systems, but also the first to look at their effect on teachers’ attitudes.

The paper’s authors, Cory Koedel, Jiaxi Li, Matt Springer, and Li Tan, use data from Tennessee, a first round Race to the Top winner. Teachers’ evaluation ratings are linked to results from a teacher survey, which was conducted by the state, and which included a battery of questions about job satisfaction.

Koedel et al.’s approach, put simply, is to compare teachers with evaluation scores right below each rating threshold to their colleagues right above the line. For example, they compare teachers whose final scores were close to but did not meet the highest rating threshold with teachers who just made the cut. The logic behind this approach, called regression discontinuity (RD), is that these two groups of teachers should be relatively similar except for one thing: one group got the higher rating and the other did not. Any differences between them might therefore be attributed to a causal effect of the rating itself.

They find that higher evaluation ratings indeed do seem to boost teacher satisfaction, and the magnitude of the estimated effect is not huge but certainly meaningful – almost one-tenth of a standard deviation. In addition, and importantly, the effects are concentrated at the tails – i.e., the highest and lowest ratings. For example, Tennessee’s evaluation system has five categories, from “significantly above expecation” (rating of 5) to “signficantly below expectation” (rating of 1), though too few teachers received the lowest rating to include it in the analysis. Koedel et al. find a positive impact of receiving a 5 (versus a 4) and a negative impact of receiving a 2 (versus a 3), but the impact is muted at at the 3/4 threshold.

There are a few big picture implications of these important results that I’d like to discuss briefly.

First, just as a quick framing caveat, these findings do not imply that teachers are more satisfied with Tennessee’s new evaluation system than they were with the old system. The study uses data collected in the first year of the state’s new evaluation system. Without a comparison pre- and post-implementation, it is not possible to draw any conclusions about the overall impact of the new system on the satisfaction of teachers. Moreover, the findings only apply to teachers who were close to the ratings thresholds, not to all teachers. It is possible, but cannot be assumed, that teachers toward the midpoints of the categories responded similarly (note that Tennessee teachers didn’t know how close they were, because they do not receive their actual evaluation scores, only their ratings, which is a very curious policy). This study, rather, focuses on the "satisfaction effect" of the ratings.

That said, this is the first evidence that differentiated teacher ratings have a differentiating effect on teacher job satisfaction. This means, put crudely, that teachers are not just ignoring the results. Teachers who receive higher and lower ratings report greater and less job satisfaction, respectively, relative to their colleagues who were close.

On the one hand, it’s not surprising that teachers feel better about their jobs when given positive feedback. On the other hand, so much of the debate about the potential of new evaluations has focused on sorting and dismissing teachers, with insufficient attention to the crucial question of how teachers might react to the new systems. The findings reported by Koedel et al. bear directly on the latter issue, and they have implications for a range of outcomes, including, most notably, teacher attrition and retention.

There is also some indication that the reasons for the satisfaction effect are as much about informal and psychological factors as they are about “concrete” rewards and consequences. For instance, the authors of this study find no “satisfaction effect” among teachers who didn’t actually see their ratings. If “concrete” rewards and consequences, such as tenure or transfer preferences, were driving satisfaction, the effect presumably would be found regardless of whether or not teachers actually saw their ratings. It was not. In addition, the primary statewide stakes attached the ratings were about granting tenure. If the stakes were driving the impact, one might expect less impact among teachers not subject to those stakes (i.e., tenured teachers). Yet the results held for both tenured and non-tenured teachers.

Clearly, this is far from conclusive evidence, but it does provide some insight into the important question of why ratings might influence satisfaction. If, for example, teachers who receive higher ratings are more satisfied because of the concrete benefits (or avoiding the concrete costs), this carries different implications from a situation in which the satisfaction is more intrinsic. The latter case suggests some degree of credibility.

Finally, this study is only a starting point, but it’s an important one, and progress will depend on collecting these data. The results reported by Koedel et al. apply to teachers in one state and only in the first year of a brand new evaluation system. It is possible, for example, that the satisfaction effects are driven in part by the fact that this was the first time many teachers received feedback on their performance that was more than just a formality. It will be interesting to see how things look in subsequent years.

In addition, one cannot help but wonder whether these effects would hold if ratings were less differentiated – e.g., if virtually all teachers received one of the top two ratings, as has been the case in other states. For instance, is the satisfaction of receiving a 5, the highest rating, attenuated in a system where virtually no teacher receives a 4 or lower?

In any case, answering these and other important questions won’t happen unless states collect the data. Teacher surveys are neither free nor easy to administer. However, given the crucial importance of teachers’ opinions of and reactions to these new evaluation systems, such efforts are essential. It would be a shame to invest all this time and money in evaluation reform but skip the step of seeing how and why it works.

Issues Areas