Skip to:

Teacher Evaluation

  • The Ineffective Rating Fetish

    Written on March 21, 2013

    In a story for Education Week, always reliable Stephen Sawchuk reports on what may be a trend in states’ first results from their new teacher evaluation systems: The ratings are skewed toward the top.

    For example, the article notes that, in Michigan, Florida and Georgia, a high proportion of teachers (more than 90 percent) received one of the two top ratings (out of four or five). This has led to some grumbling among advocates and others, citing similarities between these results and those of the old systems, in which the vast majority of teachers were rated “satisfactory," and very few were found to be “unsatisfactory."

    Differentiation is very important in teacher evaluations – it’s kind of the whole point. Thus, it’s a problem when ratings are too heavily concentrated toward one end of the distribution. However, as Aaron Pallas points out, these important conversations about evaluation results sometimes seem less focused on good measurement or even the spread of teachers across categories than on the narrower question of how many teachers end up with the lowest rating - i.e., how many teachers will be fired.

  • Through The Sunshine State's Accountability Systems, Darkly

    Written on February 20, 2013

    Some Florida officials are still having trouble understanding why they're finding no relationship between the grades schools receive and the evaluation ratings of teachers in those schools. For his part, new Florida education Commissioner Tony Bennett is also concerned. According to the article linked above, he acknowledges (to his credit) that the two measures are different, but is also considering "revis[ing] the models to get some fidelity between the two rankings."

    This may be turning into a potentially risky situation. As discussed in a recent post, it is important to examine the results of the new teacher evaluations, but there is no reason one would expect to find a strong relationship between these ratings and the school grades, as they are in large part measuring different things (and imprecisely at that). The school grades are mostly (but not entirely) driven by how highly students score, whereas teacher evaluations are, to the degree possible, designed to be independent of these absolute performance levels. Florida cannot validate one system using the other.

    However, as also mentioned in that post, this is not to say that there should be no relationship at all. For example, both systems include growth-oriented measures (albeit using very different approaches). In addition, schools with lower average performance levels sometimes have trouble recruiting and retaining good teachers. Due to these and other factors, the reasonable expectation is to find some association overall, just not one that's extremely strong. And that's basically what one finds, even using the same set of results upon which the claims that there is no relationship are based.

  • Teacher Leadership As A School Improvement Strategy

    Written on February 19, 2013

    Our guest author today is David B. Cohen, a National Board Certified high school English teacher in Palo Alto, CA, and the associate director of Accomplished California Teachers (ACT). His blog is at InterACT.

    As we settle into 2013, I find myself increasingly optimistic about the future of the teaching profession. There are battles ahead, debates to be had and elections to be contested, but, as Sam Cooke sang, “A change is gonna come."

    The change that I’m most excited about is the potential for a shift towards teacher leadership in schools and school systems. I’m not naive enough to believe it will be a linear or rapid shift, but I’m confident in the long-term growth of teacher leadership because it provides a common ground for stakeholders to achieve their goals, because it’s replicable and scalable, and because it’s working already.

    Much of my understanding of school improvement comes from my teaching career - now approaching two decades in the classroom, mostly in public high schools. However, until six years ago, I hadn’t seen teachers putting forth a compelling argument about how we might begin to transform our profession. A key transition for me was reading a Teacher Solutions report from the Center for Teaching Quality (CTQ). That 2007 report, Performance-Pay for Teachers: Designing a System that Students Deserve, showed how the concept of performance pay could be modified and improved upon with better definitions of a variety of performance, and differentiated pay based on differentiated professional practice, rather than arbitrary test score targets. I ended up joining the CTQ Teacher Leaders Network the same year, and have had the opportunity ever since to learn from exceptional teachers from around the country.

  • Value-Added As A Screening Device: Part II

    Written on January 29, 2013

    Our guest author today is Douglas N. Harris, associate professor of economics and University Endowed Chair in Public Education at Tulane University in New Orleans. His latest bookValue-Added Measures in Education, provides an accessible review of the technical and practical issues surrounding these models. 

    This past November, I wrote a post for this blog about shifting course in the teacher evaluation movement and using value-added as a “screening device.”  This means that the measures would be used: (1) to help identify teachers who might be struggling and for whom additional classroom observations (and perhaps other information) should be gathered; and (2) to identify classroom observers who might not be doing an effective job.

    Screening takes advantage of the low cost of value-added and the fact that the estimates are more accurate in making general assessments of performance patterns across teachers, while avoiding the weaknesses of value-added—especially that the measures are often inaccurate for individual teachers, as well as confusing and not very credible among teachers when used for high-stakes decisions.

    I want to thank the many people who responded to the first post. There were three main camps.

  • Making Sense Of Florida's School And Teacher Performance Ratings

    Written on January 28, 2013

    Last week, Florida State Senate President Don Gaetz (R – Niceville) expressed his skepticism about the recently-released results of the state’s new teacher evaluation system. The senator was particularly concerned about his comparison of the ratings with schools’ “A-F” grades. He noted, “If you have a C school, 90 percent of the teachers in a C school can’t be highly effective. That doesn’t make sense."

    There’s an important discussion to be had about the results of both the school and teacher evaluation systems, and the distributions of the ratings can definitely be part of that discussion (even if this issue is sometimes approached in a superficial manner). However, arguing that we can validate Florida’s teacher evaluations using its school grades, or vice-versa, suggests little understanding of either. Actually, given the design of both systems, finding a modest or even weak association between them would make pretty good sense.

    In order to understand why, there are two facts to consider.

  • A Few Points About The Instability Of Value-Added Estimates

    Written on January 17, 2013

    One of the most frequent criticisms of value-added and other growth models is that they are "unstable" (or, more accurately, modestly stable). For instance, a teacher who is rated highly in one year might very well score toward the middle of the distribution – or even lower – in the next year (see here, here and here, or this accessible review).

    Some of this year-to-year variation is “real." A teacher might get better over the course of a year, or might have a personal problem that impedes their job performance. In addition, there could be changes in educational circumstances that are not captured by the models – e.g., a change in school leadership, new instructional policies, etc. However, a great deal of the the recorded variation is actually due to sampling error, or idiosyncrasies in student testing performance. In other words, there is a lot of “purely statistical” imprecision in any given year, and so the scores don’t always “match up” so well between years. As a result, value-added critics, including many teachers, argue that it’s not only unfair to use such error-prone measures for any decisions, but that it’s also bad policy, since we might reward or punish teachers based on estimates that could be completely different the next year.

    The concerns underlying these arguments are well-founded (and, often, casually dismissed by supporters and policymakers). At the same time, however, there are a few points about the stability of value-added (or lack thereof) that are frequently ignored or downplayed in our public discourse. All of them are pretty basic and have been noted many times elsewhere, but it might be useful to discuss them very briefly. Three in particular stand out.

  • The Year In Research On Market-Based Education Reform: 2012 Edition

    Written on December 20, 2012

    ** Reprinted here in the Washington Post

    2012 was another busy year for market-based education reform. The rapid proliferation of charter schools continued, while states and districts went about the hard work of designing and implementing new teacher evaluations that incorporate student testing data, and, in many cases, performance pay programs to go along with them.

    As in previous years (see our 2010 and 2011 reviews), much of the research on these three “core areas” – merit pay, charter schools, and the use of value-added and other growth models in teacher evaluations – appeared rather responsive to the direction of policy making, but could not always keep up with its breakneck pace.*

    Some lag time is inevitable, not only because good research takes time, but also because there's a degree to which you have to try things before you can see how they work. Nevertheless, what we don't know about these policies far exceeds what we know, and, given the sheer scope and rapid pace of reforms over the past few years, one cannot help but get the occasional “flying blind" feeling. Moreover, as is often the case, the only unsupportable position is certainty.

  • The Sensitive Task Of Sorting Value-Added Scores

    Written on December 18, 2012

    The New Teacher Project’s (TNTP) recent report on teacher retention, called “The Irreplaceables," garnered quite a bit of media attention. In a discussion of this report, I argued, among other things, that the label “irreplaceable” is a highly exaggerated way of describing their definitions, which, by the way, varied between the five districts included in the analysis. In general, TNTP's definitions are better-described as “probably above average in at least one subject" (and this distinction matters for how one interprets the results).

    I’d like to elaborate a bit on this issue – that is, how to categorize teachers’ growth model estimates, which one might do, for example, when incorporating them into a final evaluation score. This choice, which receives virtually no discussion in TNTP’s report, is always a judgment call to some degree, but it’s an important one for accountability policies. Many states and districts are drawing those very lines between teachers (and schools), and attaching consequences and rewards to the outcomes.

    Let's take a very quick look, using the publicly-released 2010 “teacher data reports” from New York City (there are details about the data in the first footnote*). Keep in mind that these are just value-added estimates, and are thus, at best, incomplete measures of the performance of teachers (however, importantly, the discussion below is not specific to growth models; it can apply to many different types of performance measures).

  • Creating A Valid Process For Using Teacher Value-Added Measures

    Written on November 28, 2012

    ** Reprinted here in the Washington Post

    Our guest author today is Douglas N. Harris, associate professor of economics and University Endowed Chair in Public Education at Tulane University in New Orleans. His latest book, Value-Added Measures in Education, provides an excellent, accessible review of the technical and practical issues surrounding these models. 

    Now that the election is over, the Obama Administration and policymakers nationally can return to governing.  Of all the education-related decisions that have to be made, the future of teacher evaluation has to be front and center.
    In particular, how should “value-added” measures be used in teacher evaluation? President Obama’s Race to the Top initiative expanded the use of these measures, which attempt to identify how much each teacher contributes to student test scores. In doing so, the initiative embraced and expanded the controversial reliance on standardized tests that started under President Bush’s No Child Left Behind.

    In many respects, The Race was well designed. It addresses an important problem - the vast majority of teachers report receiving limited quality feedback on instruction. As a competitive grants program, it was voluntary for states to participate (though involuntary for many districts within those states). The Administration also smartly embraced the idea of multiple measures of teacher performance.

    But they also made one decision that I think was a mistake.  They encouraged—or required, depending on your vantage point—states to lump value-added or other growth model estimates together with other measures. The raging debate since then has been over what percentage of teachers’ final ratings should be given to value-added versus the other measures. I believe there is a better way to approach this issue, one that focuses on teacher evaluations not as a measure, but rather as a process.

  • Value-Added, For The Record

    Written on November 13, 2012

    People often ask me for my “bottom line” on using value-added (or other growth model) estimates in teacher evaluations. I’ve written on this topic many times, and while I have in fact given my overall opinion a couple of times, I have avoided expressing it in a strong “yes or no” format. There's a reason for this, and I thought maybe I would write a short piece and explain myself.

    My first reaction to the queries about where I stand on value-added is a shot of appreciation that people are interested in my views, followed quickly by an acute rush of humility and reticence. I know think tank people aren’t supposed to say things like this, but when it comes to sweeping, big picture conclusions about the design of new evaluations, I’m not sure my personal opinion is particularly important.

    Frankly, given the importance of how people on the ground respond to these types of policies, as well as, of course, their knowledge of how schools operate, I would be more interested in the views of experienced, well-informed teachers and administrators than my own. And I am frequently taken aback by the unadulterated certainty I hear coming from advocates and others about this completely untested policy. That’s why I tend to focus on aspects such as design details and explaining the research – these are things I feel qualified to discuss.  (I also, by the way, acknowledge that it’s very easy for me to play armchair policy general when it's not my job or working conditions that might be on the line.)

    That said, here’s my general viewpoint, in two parts. First, my sense, based on the available evidence, is that value-added should be given a try in new teacher evaluations.



Subscribe to Teacher Evaluation


This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the Shanker Blog may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.