Getting Teacher Evaluation Right
Linda Darling-Hammond’s new book, Getting Teacher Evaluation Right, is a detailed, practical guide about how to improve the teaching profession. It leverages the best research and best practices, offering actionable, illustrated steps to getting teacher evaluation right, with rich examples from the U.S. and abroad.
Here I offer a summary of the book’s main arguments and conclude with a couple of broad questions prompted by the book. But, before I delve into the details, here’s my quick take on Darling-Hammond’s overall stance.
We are at a crossroads in education; two paths lay before us. The first seems shorter, easier and more straightforward. The second seems long, winding and difficult. The big problem is that the first path does not really lead to where we need to go; in fact, it is taking us in the opposite direction. So, despite appearances, more steady progress will be made if we take the more difficult route. This book is a guide on how to get teacher evaluation right, not how to do it quickly or with minimal effort. So, in a way, the big message or take away is: There are no shortcuts.
The original inspiration for the book – says Darling-Hammond, who serves on our board of directors – was the Albert Shanker Institute’s Good Schools Seminar Series, the goal of which is to build a network of union leaders, district superintendents, and researchers by creating a safe, off-the-record space where they can work collaboratively on issues related to improving teaching and learning. Getting Teacher Evaluation Right is a response to requests from these stakeholders, and is intended to help all sides “imagine and create coherent systems for evaluating teachers in ways that support continuous improvement in classrooms and schools."
Despite the recent, intense and controversial focus on teacher evaluation as a means to increasing student learning, “existing [teacher evaluation] systems rarely help teachers improve or clearly distinguish those who are succeeding from those who are struggling." (p. 24)* One problem is that they are not really systems. Judging from the attention to teacher evaluation these days, one wouldn’t suspect that teacher evaluation is really only one small piece of the educational improvement puzzle: “Changing on-the-job evaluation will not, by itself, transform the quality of teaching."
We cannot fire our way into Finland, Darling-Hammond says:
We will not really improve the quality of the profession if we do not also cultivate an excellent supply of good teachers who are well prepared and committed to career-long learning. (p. 26)Getting Teacher Evaluation Right provides a framework for thinking comprehensively about “the development, support, and assessment of teaching," providing strategies based on research and successful experiences currently found in the field.
A second, important theme is that improving the skills of individual teachers is not enough to bring about large-scale, durable change:
We need to create and sustain productive, collegial working conditions that allow teachers to work collectively in an environment that supports learning for them and their students.In other words, the way forward does not lie so much with teachers’ human capital as with their social capital or, as Hargreaves and Fullan call it, their professional capital. Carrie Leana and others have brilliantly called this the Missing Link in School Reform and social capital/network scholars refer to the same basic phenomenon as the Social Side of the Reform Equation. What’s interesting to me is that such diverse disciplines, including sociology, organizational studies, and business management, as well as different methodological approaches, including systems’ thinking, social network analysis, etc. are coalescing around the exact same idea: We need to go beyond the individual and reflect more systematically about the behavior of groups if we are to really address our current educational challenges.
But how do we begin to build this system? What are its components? According to Darling-Hammond, in addition to high-quality curriculum and assessments, the ideal system should include five key elements, discussed in depth in each chapter of the book:
- Common statewide standards.
- Performance-based assessments, based on these standards.
- Local evaluation systems, aligned to the same standards.
- Aligned professional learning opportunities.
- Support structures to ensure proper evaluations, mentoring etc.
Next, work toward evaluation and state licensing and certification systems that are grounded in the same standards, and conceptualized as a continuum, so that they can jointly reinforce teacher development. Well-designed performance assessments for such a system would: 1) capture teaching in action; 2) observe and assess aspects of teaching related to teachers’ effectiveness; 3) consider and examine teachers’ intentions and strategies; 4) look at teaching in relation to student learning; and 5) use rubrics that vividly describe performance standards.
Examples of such systems include the NBPTS certification process, the Connecticut BEST assessments and the Performance Assessment for California Teachers (PACT). All three “collect evidence of teachers actual instruction, through videotapes, curriculum plans, and samples of student work (…)" (p.71) to assemble teacher scores that have been found to predict student gains.
Third, is the question of what it takes to develop and sustain a good teacher. Darling-Hammond explains that the process of teacher development rests on three legs – 1) observation of practice, 2) evidence of student learning, and 3) evidence of professional contributions. In a school that is a learning organization, “it is as important to be committed to the learning and improvement of the whole school as it is to be committed to one’s own development." In fact, says Darling-Hammond, teachers’ commitment to working collaboratively should be an aspect that is evaluated during the hiring process.
Chapter 5 focuses on the use of value-added methods (VAM) in teacher evaluation, a topic Darling-Hammond recently addressed at a Shanker Institute event. Value-added models (VAM) look at changes in student test scores over time, while trying to control for students’ prior test scores and (in most cases) socio-demographic characteristics known to be associated with achievement growth. An attempt is then made to link score changes to individual teachers so as to gauge the effectiveness of that teacher. But, as Darling-Hammond and others have pointed out, drawing this conclusion requires assumptions that many education researchers and methods experts find problematic: 1) student learning is measured well by a given test; 2) that his/her learning is influenced by the teacher alone; and 3) that it occurs independent of other features of the school and classroom context.
In other words, the book states, value-added measures are highly unstable, may reflect as much about whom a teacher teaches as they do about how well they teach, and cannot fully disentangle and account for the wide array of factors that influence student growth. Additional concerns are raised by how value-added models are being used in some states and the unintended effects of such uses. For example, the book notes that the focus on test score gains may act as an incentive to narrow the curriculum (i.e., “teach to the test”), avoid certain students with different needs, and replace cooperation among teachers with competition.
So, what might be a reasonable alternative to VAM? Darling-Hammond proposes that good evaluation start with rigorous, ongoing assessment by experts who review teachers’ instruction based on professional standards. These experts would look at classroom practice, evidence of student outcomes from classroom work, and school or district assessments. The author notes that feedback from this kind of evaluation process improves student achievement, because it helps teachers get better at what they do – an element completely missing in VAM.
Darling-Hammond describes professional development as the fourth core component of a well-designed system, while warning that not all professional learning opportunities are created equal. High-quality professional development should: 1) be intensive, ongoing, and connected to practice; 2) focus on the teaching and learning of specific academic content; 3) be connected to the other school initiatives; and 4) build strong working relationships among teachers.
In Chapter 7, she lays out the elements of fair, effective and sound evaluation systems, and gives several examples – e.g., in Cincinnati, Columbus, and Toledo, OH; Rochester, NY; Poway and San Juan, CA; and Seattle, WA. According to Darling-Hammond, these systems have been studied and found successful in identifying teachers for continuation and tenure, as well as in identifying those who need intensive assistance and/or dismissal.. She also notes that the systems that work well are invariably those with collaborations between unions and school boards. Thus, systems such as Peer Assistance and Review (PAR) “have proven more effective than traditional evaluation systems at both improving and efficiently dismissing teachers while avoiding union grievances."
This last point left me wanting more detail: If so many places have teacher evaluation systems that work well, in the case of PAR for over 30 years, why haven’t these models been adopted more broadly? How much would it cost and how long would it take to move a district in the direction of implementing a PAR-like system? What would be the steps and potential barriers to implementing such a system? And where would the average reader go to find such information?
Another issue I would have liked to know more about is how student effort and student motivation figure into this. One would think that, even given comparable conditions (e.g., teacher, family background, curriculum etc.), students who work harder, are more self-motivated and more engaged in their learning will experience greater educational gains. Does this matter in teacher evaluation systems and, if so, how?
My take away from this book is that we tend to think about teacher evaluation in partial (not systemic) ways and we measure teacher effectiveness either in mechanistic ways or by avoiding it (sort of). As Darling-Hammond notes, it is troubling that “our behavioral measurement systems fail to understand that teaching is more than implementing a set of canned routines in each lesson." VAM, on the other hand, seems to be more about measuring the contours of teaching: Since capturing teacher practice and teacher effectiveness is complex, these models try to account for all else (even though they can’t), arguing that what is left unaccounted for is a useful, approximate measure of instructional effects. I find the first, mechanistic approach simplistic and the second (e.g., VAM) somewhat unintentional. It leaves me wishing we were more creative about the study of teaching and teachers, and more deliberate about developing new tools that, as Darling-Hammond puts it, can measure “teaching in action” and can be used to improve it.
- Esther Quintero
* Page numbers are from e-book version