Student Sorting And Teacher Classroom Observations

Although value added and other growth models tend to be the focus of debates surrounding new teacher evaluation systems, the widely known but frequently unacknowledged reality is that most teachers don’t teach in the tested grades and subjects, and won’t even receive these test-based scores. The quality and impact of the new systems therefore will depend heavily upon the quality and impact of other measures, primarily classroom observations.

These systems have been in use for decades, and yet, until recently, relatively little is known about their properties, such as their association with student and teacher characteristics, and there are, as yet, only a handful of studies of their impact on teachers’ performance (e.g., Taylor and Tyler 2012). The Measures of Effective Teaching (MET) Project, conducted a few years ago, was a huge step forward in this area, though at the time it was perhaps underappreciated the degree to which MET’s contribution was not just in the (very important) reports it produced, but also in its having collected an extensive dataset for researchers to use going forward. A new paper, just published in Educational Evaluation and Policy Analysis, is among the many analyses that have and will use MET data to address important questions surrounding teacher evaluation.

The authors, Rachel Garrett and Matthew Steinberg, look at classroom observation scores, specifically those from Charlotte Danielson’s widely employed Framework for Teaching (FFT) protocol. These results are yet another example of how observation scores share most of the widely-cited (statistical) criticisms of value added scores, most notably their sensitivity to which students are assigned to teachers.

Beyond Teacher Quality

Beyond PD: Teacher Professional Learning in High-Performing Systems is a recent report from the Learning First Alliance and the International Center for Benchmarking in Education at the National Center for Education and the Economy. The paper describes practices and policies from four high-performing school systems – British Columbia, Hong Kong, Shanghai, and Singapore – where professional learning is believed to be the primary vehicle for school improvement.

My first reaction was: This sounds great, but where is the ubiquitous discussion of “teacher quality?” Frankly, I was somewhat baffled that a report on school improvement never even mentioned the phrase.* Upon close reading, I found the report to be full of radical (and very good) ideas. It’s not that the report proposed anything that would require an overhaul of the U.S. education system; rather, they were groundbreaking because these ideas did not rely on the typical assumptions about how the youth or the adults in these systems learn and achieve mastery. Because, while things are changing a bit in the U.S. with regard to our understanding of student learning – e.g., we now talk about “deep learning” – we have still not made this transition when it comes to teachers.

In the U.S., a number of unstated but common assumptions about “teacher quality” suffuse the entire school improvement conversation. As researchers have noted (see here and here), instructional effectiveness is implicitly viewed as an attribute of individuals, a quality that exists in a sort of vacuum (or independent of the context of teachers’ work), and which, as a result, teachers can carry with them, across and between schools. Effectiveness also is often perceived as fairly stable: teachers learn their craft within the first few years in the classroom and then plateau,** but, at the end of the day, some teachers have what it takes and others just don’t. So, the general assumption is that a “good teacher” will be effective under any conditions, and the quality of a given school is determined by how many individual “good teachers” it has acquired.

The IMPACT Of Teacher Turnover In DCPS

Teacher turnover has long been a flashpoint in education policy, yet these debates are rife with complications. For example, it is often implied that turnover is a “bad thing,” even though some turnover, as when low-performing teachers leave, can be beneficial, whereas some retention, as when low-performing teachers stay, can be harmful. The impact of turnover also depends heavily on other factors, such as the pool of candidates available to serve as replacements, and how disruptive turnover is to the teachers who are retained.

The recent widespread reform of teacher evaluation systems has made the turnover issue, never far below the surface, even more salient in recent years. Critics contend that the new evaluations, particularly their use of test-based productivity measures, will cause teachers to flee the profession. Supporters, on the other hand, are in a sense hoping for this outcome, as they anticipate that, under the new systems, voluntary and involuntary separations will serve to improve the quality of the teacher workforce.

A new working paper takes a close look the impact of teacher turnover under what is perhaps the most controversial teacher evaluation system in the nation – that used in the District of Columbia Public Schools (DCPS). It's a very strong analysis that speaks directly to policy in a manner that does not fit well into the tribal structure of education debates today.

Evidence From A Teacher Evaluation Pilot Program In Chicago

The majority of U.S. states have adopted new teacher evaluation systems over the past 5-10 years. Although these new systems remain among the most contentious issues in education policy today, there is still only minimal evidence on their impact on student performance or other outcomes. This is largely because good research takes time.

A new article, published in the journal Education Finance and Policy, is among the handful of analyses examining the preliminary impact of teacher evaluation systems. The researchers, Matthew Steinberg and Lauren Sartain, take a look at the Excellence in Teaching Project (EITP), a pilot program carried out in Chicago Public Schools starting in the 2008-09 school year. A total of 44 elementary schools participated in EITP in the first year (cohort 1), while an additional 49 schools (cohort 2) implemented the new evaluation systems the following year (2009-10). Participating schools were randomly selected, which permits researchers to gauge the impact of the evaluations experimentally.

The results of this study are important in themselves, and they also suggest some more general points about new teacher evaluations and the building body of evidence surrounding them.

The Magic Of Multiple Measures

Our guest author today is Cara Jackson, Assistant Director of Research and Evaluation at the Urban Teacher Center.

Teacher evaluation has become a contentious issue in U.S.  Some observers see the primary purpose of these reforms as the identification and removal of ineffective teachers; the popular media as well as politicians and education reform advocates have all played a role in the framing of teacher evaluation as such.  But, while removal of ineffective teachers was a criterion under Race to the Top, so too was the creation of evaluation systems to be used for teacher development and support.

I think most people would agree that teacher development and improvement should be the primary purpose, as argued here.  Some empirical evidence supports the efficacy of evaluation for this purpose (see here).  And given the sheer number of teachers we need, declining enrollment in teacher preparation programs, and the difficulty disadvantaged schools have retaining teachers, school principals are probably none too enthusiastic about dismissing teachers, as discussed here.

Of course, to achieve the ambitious goal of improving teaching practice, an evaluation system must be implemented well.  Fans of Harry Potter might remember when Dolores Umbridge from the Ministry of Magic takes over as High Inquisitor at Hogwarts and conducted “inspections” of Hogwart’s teachers in Book 5 of J.K. Rowling’s series.  These inspections pretty much demonstrate how not to approach classroom observations: she dictates the timing, fails to provide any of indication of what aspects of teaching practice she will be evaluating, interrupts lessons with pointed questions and comments, and evidently does no pre- or post-conferencing with the teachers. 

Research On Teacher Evaluation Metrics: The Weaponization Of Correlations

Our guest author today is Cara Jackson, Assistant Director of Research and Evaluation at the Urban Teacher Center.

In recent years, many districts have implemented multiple-measure teacher evaluation systems, partly in response to federal pressure from No Child Left Behind waivers and incentives from the Race to the Top grant program. These systems have not been without controversy, largely owing to the perception – not entirely unfounded - that such systems might be used to penalize teachers.  One ongoing controversy in the field of teacher evaluation is whether these measures are sufficiently reliable and valid to be used for high-stakes decisions, such as dismissal or tenure.  That is a topic that deserves considerably more attention than a single post; here, I discuss just one of the issues that arises when investigating validity.

 The diagram below is a visualization of a multiple-measure evaluation system, one that combines information on teaching practice (e.g. ratings from a classroom observation rubric) with student achievement-based measures (e.g. value-added or student growth percentiles) and student surveys.  The system need not be limited to three components; the point is simply that classroom observations are not the sole means of evaluating teachers.   

In validating the various components of an evaluation system, researchers often examine their correlation with other components.  To the extent that each component is an attempt to capture something about the teacher’s underlying effectiveness, it’s reasonable to expect that different measurements taken of the same teacher will be positively related.  For example, we might examine whether ratings from a classroom observation rubric are positively correlated with value-added.

Will Value-Added Reinforce The Walls Of The Egg-Crate School?

Our guest author today is Susan Moore Johnson, Jerome T. Murphy Research Professor in Education at Harvard Graduate School of Education. Johnson directs the Project on the Next Generation of Teachers, which examines how best to recruit, develop, and retain a strong teaching force.

Academic scholars are often dismayed when policymakers pass laws that disregard or misinterpret their research findings. The use of value-added methods (VAMS) in education policy is a case in point.

About a decade ago, researchers reported that teachers are the most important school-level factor in students’ learning, and that that their effectiveness varies widely within schools (McCaffrey, Koretz, Lockwood, & Hamilton 2004; Rivkin, Hanushek, & Kain 2005; Rockoff 2004). Many policymakers interpreted these findings to mean that teacher quality rests with the individual rather than the school and that, because some teachers are more effective than others, schools should concentrate on increasing their number of effective teachers.

Based on these assumptions, proponents of VAMS began to argue that schools could be improved substantially if they would only dismiss teachers with low VAMS ratings and replace them with teachers who have average or higher ratings (Hanushek 2009). Although panels of scholars warned against using VAMS to make high-stakes decisions because of their statistical limitations (American Statistical Association, 2014; National Research Council & National Academy of Education, 2010), policymakers in many states and districts moved quickly to do just that, requiring that VAMS scores be used as a substantial component in teacher evaluation.

Teacher Quality - Still Plenty Of Room For Debate

On March 3, the New York Times published one of their “Room for Debate” features, in which panelists were asked "How To Ensure and Improve Teacher Quality?" When I read through the various perspectives, my first reaction was: "Is that it?"

It's not that I don't think there is value in many of the ideas presented -- I actually do. The problem is that there are important aspects of teacher quality that continue to be ignored in policy discussions, despite compelling evidence suggesting that they matter in the quality equation. In other words, I wasn’t disappointed with what was said but, rather, what wasn’t. Let’s take a look at the panelists’ responses after making a couple of observations on the actual question and issue at hand.

The first thing that jumped out at me is that teacher quality is presented in a somewhat decontextualized manner. Teachers don't work in a vacuum; quality is produced in specific settings. Placing the quality question in context can help to broaden the conversation to include: 1) the role of the organization in shaping educator learning and effectiveness; and 2) the shining of light on the intersection between teachers and schools and the vital issue of employee-organization "fit."

Second, the manner in which teacher quality is typically framed -- including in the Times question -- suggests that effectiveness is a (fixed) individual attribute (i.e., human capital) that teachers carry with them across contexts (i.e., it's portable). In reality, however, it is context-dependent and can be (and is indeed) developed among individuals -- as a result of their networks, their professional interactions, and their shared norms and trust (i.e., social capital). In sum, it's not just what teachers know but who they know and where they work -- as well as the interaction of these three.

How Not To Improve New Teacher Evaluation Systems

One of the more interesting recurring education stories over the past couple of years has been the release of results from several states’ and districts’ new teacher evaluation systems, including those from New York, Indiana, Minneapolis, Michigan and Florida. In most of these instances, the primary focus has been on the distribution of teachers across ratings categories. Specifically, there seems to be a pattern emerging, in which the vast majority of teachers receive one of the higher ratings, whereas very few receive the lowest ratings.

This has prompted some advocates, and even some high-level officials, essentially to deem as failures the new systems, since their results suggest that the vast majority of teachers are “effective” or better. As I have written before, this issue cuts both ways. On the one hand, the results coming out of some states and districts seem problematic, and these systems may need adjustment. On the other hand, there is a danger here: States may respond by making rash, ill-advised changes in order to achieve “differentiation for the sake of differentiation,” and the changes may end up undermining the credibility and threatening the validity of the systems on which these states have spent so much time and money.

Granted, whether and how to alter new evaluations are difficult decisions, and there is no tried and true playbook. That said, New York Governor Andrew Cuomo’s proposals provide a stunning example of how not to approach these changes. To see why, let’s look at some sound general principles for improving teacher evaluation systems based on the first rounds of results, and how they compare with the New York approach.*

New York Public Schools And Governor Andrew Cuomo: An Essay, In List Form

A point-by-point commentary on Governor Andrew Cuomo’s newly-announced education plan.*

  1. New York State now has most racially and economically segregated schools in the nation, worse than Mississippi.
  2. New York is violating Campaign for Fiscal Equity ruling of highest state court to provide full, equitable funding to high poverty schools.
  3. As a result, New York State owes $6 billion it had promised to school districts with concentrations of poverty.
  4. One would think that a Democratic Governor would be focused on correcting such educational injustices.  But not Andrew Cuomo.
  5. Cuomo is proposing tax credits (aka vouchers) that would divert funds and resources from underfunded public schools to private schools.
  6. Poor and working class kids, students of color who attend public schools would be hurt.
  7. Cuomo is 1st ever Democratic Governor to propose tax credits for private schools, says conservative Checker Finn.
  8. League of Women Voters, Civil Liberties Union, school board ass., sup'ts ass't., teachers union all opposed to Cuomo’s tax credit scheme.
  9. The problem with our public schools, Cuomo says, is teachers.
  10. Teachers think: how convenient that Cuomo, who ignores his responsibilities regarding school segregation and funding, blames us.