Value-Added, For The Record

People often ask me for my “bottom line” on using value-added (or other growth model) estimates in teacher evaluations. I’ve written on this topic many times, and while I have in fact given my overall opinion a couple of times, I have avoided expressing it in a strong “yes or no” format. There's a reason for this, and I thought maybe I would write a short piece and explain myself.

My first reaction to the queries about where I stand on value-added is a shot of appreciation that people are interested in my views, followed quickly by an acute rush of humility and reticence. I know think tank people aren’t supposed to say things like this, but when it comes to sweeping, big picture conclusions about the design of new evaluations, I’m not sure my personal opinion is particularly important.

Frankly, given the importance of how people on the ground respond to these types of policies, as well as, of course, their knowledge of how schools operate, I would be more interested in the views of experienced, well-informed teachers and administrators than my own. And I am frequently taken aback by the unadulterated certainty I hear coming from advocates and others about this completely untested policy. That’s why I tend to focus on aspects such as design details and explaining the research – these are things I feel qualified to discuss.  (I also, by the way, acknowledge that it’s very easy for me to play armchair policy general when it's not my job or working conditions that might be on the line.)

That said, here’s my general viewpoint, in two parts. First, my sense, based on the available evidence, is that value-added should be given a try in new teacher evaluations.

Describing, Explaining And Affecting Teacher Retention In D.C.

The New Teacher Project (TNTP) has released a new report on teacher retention in D.C. Public Schools (DCPS). It is a spinoff of their “The Irreplaceables” report, which was released a few months ago, and which is discussed in this post. The four (unnamed) districts from that report are also used in this one, and their results are compared with those from DCPS.

I want to look quickly at this new supplemental analysis, not to rehash the issues I raised about“The Irreplaceables," but rather because of DCPS’s potential importance as a field test site for a host of policy reform ideas – indeed, the majority of core market-based reform policies have been in place in D.C. for several years, including teacher evaluations in which test-based measures are the dominant component, automatic dismissals based on those ratings, large performance bonuses, mutual consent for excessed teachers and a huge charter sector. There are many people itching to render a sweeping verdict, positive or negative, on these reforms, most often based on pre-existing beliefs, rather than solid evidence.

Although I will take issue with a couple of the conclusions offered in this report, I'm not going to review it systematically. I think research on retention is important, and it’s difficult to produce reports with original analysis, while very easy to pick them apart. Instead, I’m going to list a couple of findings in the report that I think are worth examining, mostly because they speak to larger issues.

Surveying The Teacher Opinion Landscape

I’m a big fan of surveys of teachers’ opinions of education policy, not only because of educators' valuable policy-relevant knowledge, but also because their views are sometimes misrepresented or disregarded in our public discourse.

For instance, the diverse set of ideas that might be loosely characterized as “market-based reform” faces a bit of tension when it comes to teacher support. Without question, some teachers support the more controversial market-based policy ideas, such as pay and evaluations based substantially on test scores, but most do not. The relatively low levels of teacher endorsement don’t necessarily mean these ideas are “bad," and much of the disagreement is less about the desirability of general policies (e.g., new teacher evaluations) than the specifics (e.g., the measures that comprise those evaluations). In any case, it's a somewhat awkward juxtaposition: A focus on “respecting and elevating the teaching profession” by means of policies that most teachers do not like.

Sometimes (albeit too infrequently) this tension is discussed meaningfully, other times it is obscured - e.g., by attempts to portray teachers' disagreement as "union opposition." But, as mentioned above, teachers are not a monolith and their opinions can and do change (see here). This is, in my view, a situation always worth monitoring, so I thought I’d take a look at a recent report from the organization Teach Plus, which presents data from a survey that they collected themselves.

The Data-Driven Education Movement

** Also reprinted here in the Washington Post

In the education community, many proclaim themselves to be "completely data-driven." Data Driven Decision Making (DDDM) has been a buzz phrase for a while now, and continues to be a badge many wear with pride. And yet, every time I hear it, I cringe.

Let me explain. During my first year in graduate school, I was taught that excessive attention to quantitative data impedes – rather than aids – in-depth understanding of social phenomena. In other words, explanations cannot simply be cranked out of statistical analyses, without the need for a precursor theory of some kind – a.k.a. “variable sociology” – and the attempt to do so constitutes a major obstacle to the advancement of knowledge.

I am no longer in graduate school, so part of me says: Okay, I know what data-driven means in education. But then, at times, I still think: No, really, what does “data-driven” mean even in this context?

NCLB And The Institutionalization Of Data Interpretation

It is a gross understatement to say that the No Child Left Behind (NCLB) law is, was – and will continue to be – a controversial piece of legislation. Although opinion tends toward the negative, there are certain features, such as a focus on student subgroup data, that many people support. And it’s difficult to make generalizations about whether the law’s impact on U.S. public education was “good” or “bad” by some absolute standard.

The one thing I would say about NCLB is that it has helped to institutionalize the improper interpretation of testing data.

Most of the attention to the methodological shortcomings of the law focuses on “adequate yearly progress” (AYP) – the crude requirement that all schools must make “adequate progress” toward the goal of 100 percent proficiency by 2014. And AYP is indeed an inept measure. But the problems are actually much deeper than AYP.

Rather, it’s the underlying methods and assumptions of NCLB (including AYP) that have had a persistent, negative impact on the way we interpret testing data.

Assessing Ourselves To Death

** Reprinted here in the Washington Post

I have two points to make. The first is something that I think everyone knows: Educational outcomes, such as graduation and test scores, are signals of or proxies for the traits that lead to success in life, not the cause of that success.

For example, it is well-documented that high school graduates earn more, on average, than non-graduates. Thus, one often hears arguments that increasing graduation rates will drastically improve students’ future prospects, and the performance of the economy overall. Well, not exactly.

The piece of paper, of course, only goes so far. Rather, the benefits of graduation arise because graduates are more likely to possess the skills – including the critical non-cognitive sort – that make people good employees (and, on a highly related note, because employers know that, and use credentials to screen applicants).

We could very easily increase the graduation rate by easing requirements, but this wouldn’t do much to help kids advance in the labor market. They might get a few more calls for interviews, but over the long haul, they’d still be at a tremendous disadvantage if they lacked the required skills and work habits.

Are Charter Caps Keeping Great Schools From Opening?

** Reprinted here in the Washington Post

Charter school “caps” are state-imposed limits on the size or growth of charter sectors. Currently, around 25 states set caps on schools or enrollment, with wide variation in terms of specifics: Some states simply set a cap on the number of schools (or charters in force); others limit annual growth; and still others specify caps on both growth and size (there are also a few places that cap proportional spending, coverage by individual operators and other dimensions).

A great many charter school supporters strongly support the lifting of these restrictions, arguing that they prevent the opening of high-quality schools. This is, of course, an oversimplification at best, as lifting caps could just as easily lead to the proliferation of the many unsuccessful charters. If the charter school experiment has taught us anything, it’s that these schools are anything but sure bets, and that even includes the tiny handful of highly successful models such as KIPP.*

Overall, the only direct impact of charter caps is to limit the potential size or growth of a state’s charter school sector. Assessing their implications for quality, on the other hand, is complicated, and there is every reason to believe that the impact of caps, and thus the basis of arguments for lifting them, varies by context – including the size and quality of states’ current sectors, as well as the criteria by which low-performing charters are closed and new ones are authorized. 

New Teacher Evaluations Are A Long-Term Investment, Not Test Score Arbitrage

One of the most important things in education policy to keep an eye on is the first round of changes to new teacher evaluation systems. Given all the moving parts and the lack of evidence on how these systems should be designed and their impact, course adjustments along the way are not just inevitable, but absolutely essential.

Changes might be guided by different types of evidence, such as feedback from teachers and administrators or analysis of ratings data. And, of course, human judgment will play a big role. One thing that states and districts should not be doing, however, is assessing their new systems – or making changes to them – based whether or not raw overall test scores go up or down within the first few years.

Here’s a little reality check: Even the best-designed, best-implemented new evaluations are unlikely to have an immediate measurable impact on aggregate student performance. Evaluations are an investment, not a quick fix. And they are not risk-free. Their effects will depend on the quality of systems, how current teachers and administrators react to them and how all of this shapes and plays out in the teacher labor market. As I’ve said before, the realistic expectation for overall performance – and this is no guarantee – is that there will be some very small, gradual improvements, unfolding over a period of years and decades.

States and districts that expect anything more risk making poor decisions during these crucial, early phases.

Does It Matter How We Measure Schools' Test-Based Performance?

In education policy debates, we like the "big picture." We love to say things like “hold schools accountable” and “set high expectations." Much less frequent are substantive discussions about the details of accountability systems, but it’s these details that make or break policy. The technical specs just aren’t that sexy. But even the best ideas with the sexiest catchphrases won’t improve things a bit unless they’re designed and executed well.

In this vein, I want to recommend a very interesting CALDER working paper by Mark Ehlert, Cory Koedel, Eric Parsons and Michael Podgursky. The paper takes a quick look at one of these extremely important, yet frequently under-discussed details in school (and teacher) accountability systems: The choice of growth model.

When value-added or other growth models come up in our debates, they’re usually discussed en masse, as if they’re all the same. They’re not. It's well-known (though perhaps overstated) that different models can, in many cases, lead to different conclusions for the same school or teacher. This paper, which focuses on school-level models but might easily be extended to teacher evaluations as well, helps illustrate this point in a policy-relevant manner.

The Impact Of Race To The Top Is An Open Question (But At Least It's Being Asked)

You don’t have to look very far to find very strong opinions about Race to the Top (RTTT), the U.S. Department of Education’s (USED) stimulus-funded state-level grant program (which has recently been joined by a district-level spinoff). There are those who think it is a smashing success, while others assert that it is a dismal failure. The truth, of course, is that these claims, particularly the extreme views on either side, are little more than speculation.*

To win the grants, states were strongly encouraged to make several different types of changes, such as adoption of new standards, the lifting/raising of charter school caps, the installation of new data systems and the implementation of brand new teacher evaluations. This means that any real evaluation of the program’s impact will take some years and will have to be multifaceted – that is, it is certain that the implementation/effects will vary not only by each of these components, but also between states.

In other words, the success or failure of RTTT is an empirical question, one that is still almost entirely open. But there is a silver lining here: USED is at least asking that question, in the form of a five-year, $19 million evaluation program, administered through the National Center for Education Evaluation and Regional Assistance, designed to assess the impact and implementation of various RTTT-fueled policy changes, as well as those of the controversial School Improvement Grants (SIGs).