Do Top Teachers Produce "A Year And A Half Of Learning?"

One claim that gets tossed around a lot in education circles is that “the most effective teachers produce a year and a half of learning per year, while the least effective produce a half of a year of learning."

This talking point is used all the time in advocacy materials and news articles. Its implications are pretty clear: Effective teachers can make all the difference, while ineffective teachers can do permanent damage.

As with most prepackaged talking points circulated in education debates, the “year and a half of learning” argument, when used without qualification, is both somewhat valid and somewhat misleading. So, seeing as it comes up so often, let’s very quickly identify its origins and what it means.

This particular finding is traceable to a 1992 paper by economist Eric Hanushek, one which focused primarily on the relationship between achievement and family composition in Gary, Indiana (the data are from the early- to mid-1970s, and include only low-income students). After reviewing his (very interesting) main results on the relationship between student achievement and family size, birth order and the interval between births, Hanushek presents an analysis of test-based teacher effects.

Put simply, his results show that adding a teacher effect variable to his models increases their explanatory power substantially, and that the difference between the gains of students with a highly-scoring teacher (in this case, at the 84th percentile) and a lower-scoring teacher (16th percentile) is roughly equivalent to one full grade level (i.e., the difference between half a year and a year and a half). Consistent with the prior and subsequent literature, he also finds that teacher effectiveness is not well explained by “traditional” measures such as teacher education (Hanushek also quickly explores the relationship between family characteristics and teacher effectiveness, which would presumably arise due to search behavior [as it turns out, there is little evidence for this, at least in these data]).

The reflexive reaction to this sourcing might be to dismiss the claim entirely, as it is based on a single analysis of a particular set of students in one place 40 years ago. In a limited sense, that's fair enough - it's obviously true that low-income students and their teachers in Gary, Indiana during the 1970s cannot be generalized to the rest of nation in 2012.

But it is, as always, much more complicated than that, as there are dozens of studies finding wide variation between the “top” and “bottom” teachers in any given year, at least in terms of test-based productivity (see here).

The size of these estimated discrepancies depends on so many factors, including (but not limited to): the subject (in general, there is larger variation in math than reading); the models and tests used; the years of data available; and the choice of comparison groups (definition of “top” and “bottom”). Sometimes the differences are equivalent to 4-5 "months of learning," sometimes they are not (especially in reading).*

So, here’s the deal (and this is strictly my opinion): There is a research consensus that estimated test-based teacher effects vary widely between the top and bottom of the distribution, but the “year and a half” assertion should probably be put out to pasture, at least when it's used without elaboration or qualification.

It implies a precision that belies the diversity of findings within the research literature, and it ignores the importance of context, data availability, variation between test subjects, etc. There are plenty of ways to express the fact that teachers matter without boiling a large, nuanced body of evidence down to a single effect estimate.

Accessible generalizations certainly have their role in policy discussions, but oversimplification has really crippled the debate about value-added and other growth models, on both "sides" of the issue.

One final note: The truly important point about the “year and a half of learning” argument, just like the ubiquitous “three great teachers in a row can close the achievement gap” talking point, is what they mean for policy (especially since they're usually used to make policy arguments). They are both stylized ways of saying the same thing, about which there is really very little disagreement – teachers are important. But, even if you take these points at face value, they do little to help answer the critical question that comes next, which is how the distribution of teacher quality can be improved. This is inarguably one of the most urgent issues facing education policy today, from which arguments about these talking points may just serve as a distraction.

- Matt Di Carlo

*****

* It bears noting that Hanushek's Gary, Indiana analysis used reading and vocabulary tests.

Blog Topics

My school of 1900 7-12th graders averaged double the yearly predicted growth across all tested grade levels and all tested teachers. Our students, some straight out of refugee camps, others from generational poverty, are still years behind. We'd be called failures, but Charter schools would get labeled "beting the odds".

"They are both stylized ways of saying the same thing, about which there is really very little disagreement – teachers are important."

Hmm. Feels like you're side-stepping a question. Or perhaps I'm just not understanding it.

The Hanushek side explicitly argues that teachers aren't just important, but that's it's plausible to make a large dent in the Achievement Gap with certain teacher policies.

The other side explicitly argues that teachers are perhaps "nominally important" -- nominal isn't the right word, but you know what I mean -- that even several unusually effective teachers will NOT make a large dent in the Achievement Gap.

I'm a fan of your blogging precisely because of your propensity to "bring people together," but there really is an empirical question here, that people seem to SHARPLY disagree about.

MG,

Such a good comment. As a reward, I offer this annoyingly long and complicated response.

You’ve raised what I think is an extremely important point (which I may have failed to delineate in this particular post).

Let’s start by clarifying the paragraph in question, in which I discuss two separate questions. The first is whether/how much teachers matter (more specifically, how much does “true teacher performance,” which cannot be observed, vary?). No doubt there are differences in opinion as to the extent of the “true” variation, and there’s definitely disagreement about using growth models to gauge performance at the individual level, but I don’t think there are too many people who would disagree that there are more and less effective teachers, and that there’s at least a substantial difference between them.

The second issue I discuss is which policies will shift (improve) the distribution of teacher quality (“the critical question”). As we both know, there is plenty of disagreement here. However, again, I don’t think there are many people who think that absolutely nothing can be done to improve teacher quality, even if they disagree about which policies will accomplish this goal.

So, I discussed the questions of whether “true” teacher quality varies and which policies will help. If I understand you correctly, you’re raising a related, critical question - the degree to which *any* teacher-related policy or policies can affect a shift in the distribution? In other words, if we chose the “correct” policies, no matter what they might be, how much difference would they actually make?

Although I wouldn’t necessarily frame it in terms of the achievement gap, I completely agree with you - I think that this is a *major* sticking point in not only the teacher quality debate, but also the education policy debate in general (please see these posts, for example, a couple of which are linked in the paragraph in question: http://shankerblog.org/?p=4496; http://shankerblog.org/?p=6363; http://shankerblog.org/?p=2319; and http://shankerblog.org/?p=5681; there are several others, but I’ll spare you the self-citations).

I guess I should have more clearly distinguished it in this instance, so that’s my bad.

However, you confuse me a bit by subsequently raising what I consider to be a different issue/question, which is whether “several unusually effective teachers will [or will] NOT make a large dent in the achievement gap.” I’m not sure how to address this issue, as it is tautological – if you define “effective” in terms of test-based productivity, then yes, you can divide any given test-based goal (e.g., closing the average achievement gap) by the single-year effect estimate of the “effective teachers, and come up with a number of consecutive teachers required to reach that goal. But that’s not particularly helpful as far as policy implications (see this post: http://shankerblog.org/?p=2156, which is also linked in the final paragraph).

The big questions are which policies will improve the distribution (e.g., using *individual* level growth model estimates in high-stake decisions), and how much they can actually do so. The former gets all the attention, but I actually think the latter is just as divisive. Both are, as you note, empirical questions, and, in the teacher quality context, largely open ones at that.

I hope that makes sense. Thanks, as always, for the comment.

Thanks for the thoughtful response.

Ah, yes, I asked a tautological question. But with the help of you pointing that out, it allows me to pose a better question.

Let's say teachers at the 80th percentile in math value-added raise scores by (I'm just picking a number) 0.15 SDs. (I'm sure you can insert a better number).

1. The Hanushek theory is if you're a kid who has 3 of these folks in a row, you'd get at least 0.45 SDs of gain.

Is that a correct restatement of his thesis?

2. If it is, I wonder -- has anyone actually ever run the numbers with actual kids and teachers?

3. If not, someone should do it!

Ie, in a large school district like NYC, with 80,000 teachers or whatever, that means about 20,000 will have value-added data.

A kid has a .2*.2*.2 of getting an 80th percentile teacher (or better) 3 years in a row. That's slightly less than 1% chance of what we might call "Teacher Jackpot." (Catchy phrases seem to help).

But that's 1% of 1 million kids who have Teacher Jackpot this year. Pull their numbers. See their true gains.

I wonder if it's rises in a simple, linear way -- .15 + .15 + .15

Or maybe gains are harder to come by in Year 3. .15 + .10 + .05 (regression to mean)

Or maybe gains are EASIER to come by in Year 3.

.15 + .20 + .25 (momentum)

MG,

See the discussion in the post linked in the third to last paragraph of my reply. It covers most of the issues you raise (the "jackpot," fade-out, etc.). The short answer on fade-out is no – you can’t simply multiply a single-year effect that way, as there is substantial “degradation.” But you can follow a group of students for three years, find those who ended up hitting the “jackpot,” and calculate gains (most analyses don’t do this, but a couple do [see the post]). And yes, the gains are large, but seeing as that's how "top" teachers are identified, that's hardly surprising (and, of course, the estimates are noisy).

So, again, the question of how many students receive "top" teachers (say, top quintile) for three years in a row, and what kind of gains they make, carries few policy implications. The important questions are: a) whether you can predict who those “top” teachers will be in any given year (for example, very few teachers are top quintile for three consecutive years); and b) whether you can get more of them (via improvement or selection).

In general, the “N consecutive teachers” thing is best viewed as an illustration of the variation in test-based teacher effects for people who aren’t accustomed to thinking in terms of metrics like standard deviations. It is not a policy argument beyond suggesting that teachers are important, an issue upon which (dare I say?) there isn’t much disagreement.

I can't help but think that sometimes the problem starts with what everyone assumes to be true. You state "...but I don’t think there are too many people who would disagree that there are more and less effective teachers, and that there’s at least a substantial difference between them." Maybe there are not many people who would question that, but the "substantial difference between them" part is a critical assumption. While it's reasonable there must be a difference between the very best and the very worst teachers, it's not at all clear that the possible differences between the vast majority of teachers are significant or are not overwhelmed by other factors.

Given scaled scores and advanced and proficient cut scores, it's easy to rank teachers, but when you look at the raw data, the students of the "better" teachers in a school or district may only average one or two correct answers above the students of the "poorer" teachers (again not assuming extremes). Just how significant is that difference despite the vary wide differences between the teachers themselves in terms of background, style and temperament.

Teacher effectiveness issues become further complicated when we consider the flow of time. It seems as if these types of conversations have a static picture of a teacher in mind and not one in which a teacher gets better as they become more experienced, better as they become more acquainted with teaching a particular subject to a particular age of students, or even that teachers like everyone else can have up and down years. Everyone assumes the differences between more effective and less effective teachers is so substantial that these noises if you will don't overwhelm the signal.

This, of course, leads to your second assumption which is "... again, I don’t think there are many people who think that absolutely nothing can be done to improve teacher quality, even if they disagree about which policies will accomplish this goal." Or maybe more than one assumption. Leaving aside the everything or nothing dichotomy the statement implies, the conversation around teacher improvement assumes there is a problem of teacher quality that needs or maybe even desperately needs to be solved that is focused on the teacher. It also assumes there is a ready supply of available effective teachers waiting in the wings if we could just get rid of the poor ones.

The hue and cry over poor test scores so often revolves around getting rid of the poorest performing teachers. It almost never focuses on the educational environment the teachers teach in. When a football or baseball team finishes poorly, the coach or manager is often fired. This act is not automatically paired with getting rid of a significant portion of the team (though it is suggested for poorly performing schools). The thinking in sports is the hope the team will respond with someone else in charge. The same thinking is not usually mentioned with regard to teachers.

Maybe a poorly performing teacher won't ever "make it", though one has to wonder just how many of these teachers stay in teaching. On the other hand, maybe some of these less effective teachers would be more effective with a proper mentoring or supporting system around them. Few people outside of teaching realize just how isolated teachers can be on a day to day basis. As long as the assumption about teacher quality is an assumption focused on the teacher and not the system (or team) the teacher works in, it's difficult to see how policy changes on teacher quality will ever make much headway.

As a final thought, there are studies that say class size doesn't really matter; statistically, smaller class sizes have minimal or no improvement in test scores. Anyone who actually tries to teach a class of nearly 40 students versus teaching only 20 to 25 students can tell you it really does matter and yet the studies are constantly mentioned as an argument against smaller class sizes. Just as you mentioned above with the Gary, Indiana study, it is truly questionable whether some studies are reliable or for that matter scalable.

We take snapshots of dynamic systems and assume there must be a real connection between the results and the effectiveness of the teacher, that the results are summative and causative, and that we can change this one thing and ignore all the others because they are truly independent. As you say, there are still many things that could be done, but those things are rarely what the conversation is about.