A Big Open Question: Do Value-Added Estimates Match Up With Teachers' Opinions Of Their Colleagues?

A recent article about the implementation of new teacher evaluations in Tennessee details some of the complicated issues with which state officials, teachers and administrators are dealing in adapting to the new system. One of these issues is somewhat technical – whether the various components of evaluations, most notably principal observations and test-based productivity measures (e.g., value-added) – tend to “match up." That is, whether teachers who score high on one measure tend to do similarly well on the other (see here for more on this issue).

In discussing this type of validation exercise, the article notes:

If they don't match up, the system's usefulness and reliability could come into question, and it could lose credibility among educators.
Value-added and other test-based measures of teacher productivity may have a credibility problem among many (but definitely not all) teachers, but I don’t think it’s due to – or can be helped much by – whether or not these estimates match up with observations or other measures being incorporated into states’ new systems. I’m all for this type of research (see here and here), but I’ve never seen what I think would be an extremely useful study for addressing the credibility issue among teachers: One that looked at the relationship between value-added estimates and teachers’ opinions of each other.

A Look Inside Principals' Decisions To Dismiss Teachers

Despite all the heated talk about how to identify and dismiss low-performing teachers, there’s relatively little research on how administrators choose whom to dismiss, whether various dismissal options might actually serve to improve performance, and other aspects in this area. A paper by economist Brian Jacob, released as working paper in 2010 and published late last year in the journal Education Evaluation and Policy Analysis, helps address at least one of these voids, by providing one of the few recent glimpses into administrators’ actual dismissal decisions.

Jacob exploits a change in Chicago Public Schools (CPS) personnel policy that took effect for the 2004-05 school year, one which strengthened principals’ ability to dismiss probationary teachers, allowing non-renewal for any reason, with minimal documentation. He was able to link these personnel records to student test scores, teacher and school characteristics and other variables, in order to examine the characteristics that principals might be considering, directly or indirectly, in deciding who would and would not be dismissed.

Jacob’s findings are intriguing, suggesting a more complicated situation than is sometimes acknowledged in the ongoing debate over teacher dismissal policy.

Fundamental Flaws In The IFF Report On D.C. Schools

A new report, commissioned by the District of Columbia Mayor Vincent Gray and conducted by the Chicago-based consulting organization IFF, was supposed to provide guidance on how the District might act and invest strategically in school improvement, including optimizing the distribution of students across schools, many of which are either over- or under-enrolled.

Needless to say, this is a monumental task. Not only does it entail the identification of high- and low-performing schools, but plans for improving them as well. Even the most rigorous efforts to achieve these goals, especially in a large city like D.C., would be to some degree speculative and error-prone.

This is not a rigorous effort. IFF’s final report is polished and attractive, with lovely maps and color-coded tables presenting a lot of summary statistics. But there’s no emperor underneath those clothes. The report's data and analysis are so deeply flawed that its (rather non-specific) recommendations should not be taken seriously.

The Perilous Conflation Of Student And School Performance

Unlike many of my colleagues and friends, I personally support the use of standardized testing results in education policy, even, with caution and in a limited role, in high-stakes decisions. That said, I also think that the focus on test scores has gone way too far and their use is being implemented unwisely, in many cases to a degree at which I believe the policies will not only fail to generate improvement, but may even risk harm.

In addition, of course, tests have a very productive low-stakes role to play on the ground – for example, when teachers and administrators use the results for diagnosis and to inform instruction.

Frankly, I would be a lot more comfortable with the role of testing data – whether in policy, on the ground, or in our public discourse – but for the relentless flow of misinterpretation from both supporters and opponents. In my experience (which I acknowledge may not be representative of reality), by far the most common mistake is the conflation of student and school performance, as measured by testing results.

Consider the following three stylized arguments, which you can hear in some form almost every week:

Schools' Effectiveness Varies By What They Do, Not What They Are

There may be a mini-trend emerging in certain types of charter school analyses, one that seems a bit trivial but has interesting implications that bear on the debate about charter schools in general. It pertains to how charter effects are presented.

Usually, when researchers estimate the effect of some intervention, the main finding is the overall impact, perhaps accompanied by a breakdown by subgroups and supplemental analyses. In the case of charter schools, this would be the estimated overall difference in performance (usually testing gains) between students attending charters versus their counterparts in comparable regular public schools.

Two relatively recent charter school reports, however – both generally well-done given their scope and available data – have taken a somewhat different approach, at least in the “public roll-out” of their results.

Income And Educational Outcomes

The role of poverty in shaping educational outcomes is one of the most common debates going on today. It can also be one of the most shallow.

The debate tends to focus on income. For example (and I’m generalizing a bit here), one “side” argues that income and test scores are strongly correlated; the other “side” points to the fact that many low-income students do very well and cautions against making excuses for schools’ failure to help poor kids.

Both arguments have merit, but it bears quickly mentioning that the focus on the relationship between income and achievement is a rather crude conceptualization of the importance of family background (and non-schooling factors in general) for education outcomes. Income is probably among the best widely available proxies for these factors, insofar as it is correlated with many of the conditions that can hinder learning, especially during a child’s earliest years. This includes (but is not at all limited to): peer effects; parental education; access to print and background knowledge; parental involvement; family stressors; access to healthcare; and, of course, the quality of neighborhood schools and their teachers.

And that is why, when researchers try to examine school performance – while holding constant the effect of factors outside of schools’ control – income or some kind of income-based proxy (usually free/reduced price lunch) can be a useful variable. It is, however, quite limited.

Trial And Error Is Fine, So Long As You Know The Difference

It’s fair to say that improved teacher evaluation is the cornerstone of most current education reform efforts. Although very few people have disagreed on the need to design and implement new evaluation systems, there has been a great deal of disagreement over how best to do so – specifically with regard to the incorporation of test-based measures of teacher productivity (i.e., value-added and other growth model estimates).

The use of these measures has become a polarizing issue. Opponents tend to adamantly object to any degree of incorporation, while many proponents do not consider new evaluations meaningful unless they include test-based measures as a major element (say, at least 40-50 percent). Despite the air of certainty on both sides, this debate has mostly been proceeding based on speculation. The new evaluations are just getting up and running, and there is virtually no evidence as to their effects under actual high-stakes implementation.

For my part, I’ve said many times that I'm receptive to trying value-added as a component in evaluations (see here and here), though I disagree strongly with the details of how it’s being done in most places. But there’s nothing necessarily wrong with divergent opinions over an untested policy intervention, or with trying one. There is, however, something wrong with fully implementing such a policy without adequate field testing, or at least ensuring that the costs and effects will be carefully evaluated post-implementation. To date, virtually no states/districts of which I'm aware have mandated large-scale, independent evaluations of their new systems.*

If this is indeed the case, the breathless, speculative debate happening now will only continue in perpetuity.

The Persistence Of Both Teacher Effects And Misinterpretations Of Research About Them

In a new National Bureau of Economic Research working paper on teacher value-added, researchers Raj Chetty, John Friedman and Jonah Rockoff present results from their analysis of an incredibly detailed dataset linking teachers and students in one large urban school district. The data include students’ testing results between 1991 and 2009, as well as proxies for future student outcomes, mostly from tax records, including college attendance (whether they were reported to have paid tuition or received scholarships), childbearing (whether they claimed dependents) and eventual earnings (as reported on the returns). Needless to say, the actual analysis includes only those students for whom testing data were available, and who could be successfully linked with teachers (with the latter group of course limited to those teaching math or reading in grades 4-8).

The paper caused a remarkable stir last week, and for good reason: It’s one of the most dense, important and interesting analyses on this topic in a very long time. Much of the reaction, however, was less than cautious, specifically the manner in which the research findings were interpreted to support actual policy implications (also see Bruce Baker’s excellent post).

What this paper shows – using an extremely detailed dataset and sophisticated, thoroughly-documented methods – is that teachers matter, perhaps in ways that some didn’t realize. What it does not show is how to measure and improve teacher quality, which are still open questions. This is a crucial distinction, one which has been discussed on this blog numerous times (also here and here), as it is frequently obscured or outright ignored in discussions of how research findings should inform concrete education policy.

New Report: Does Money Matter?

Over the past few years, due to massive budget deficits, governors, legislators and other elected officials are having to slash education spending. As a result, incredibly, there are at least 30 states in which state funding for 2011 is actually lower than in 2008. In some cases, including California, the amounts are over 20 percent lower.

Only the tiniest slice of Americans believe that we should spend less on education, while a large majority actually supports increased funding. At the same time, however, there’s a concerted effort among some advocates, elected officials and others to convince the public that spending more money on education will not improve outcomes, while huge cuts need not do any harm.

Often, their evidence comes down to some form of the following graph:

Do Half Of New Teachers Leave The Profession Within Five Years?

You’ll often hear the argument that half or almost half of all beginning U.S. public school teachers leave the profession within five years.

The implications of this statistic are, of course, that we are losing a huge proportion of our new teachers, creating a “revolving door” of sorts, with teachers constantly leaving the profession and having to be replaced. This is costly, both financially (it is expensive to recruit and train new teachers) and in terms of productivity (we are losing teachers before they reach their peak effectiveness). And this doesn’t even include teachers who stay in the profession but switch schools and/or districts (i.e., teacher mobility).*

Needless to say, some attrition is inevitable, and not all of it is necessarily harmful, Many new teachers, like all workers, leave (or are dismissed) because they are just aren’t good at it – and, indeed, there is test-based evidence that novice leavers are, on average, less effective. But there are many other excellent teachers who exit due to working conditions or other negative factors that might be improved (for reviews of the literature on attrition/retention, see here and here).

So, the “almost half of new teachers leave within five years” statistic might serve as a useful diagnosis of the extent of the problem. As is so often the case, however, it's rarely accompanied by a citation. Let’s quickly see where it comes from, how it might be interpreted, and, finally, take a look at some other relevant evidence.