The following is written by Morgan S. Polikoff and Matthew Di Carlo. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California.
One of the primary policy levers now being employed in states and districts nationwide is teacher evaluation reform. Well-designed evaluations, which should include measures that capture both teacher practice and student learning, have great potential to inform and improve the performance of teachers and, thus, students. Furthermore, most everyone agrees that the previous systems were largely pro forma, failed to provide useful feedback, and needed replacement.
The attitude among many policymakers and advocates is that we must implement these systems and begin using them rapidly for decisions about teachers, while design flaws can be fixed later. Such urgency is undoubtedly influenced by the history of slow, incremental progress in education policy. However, we believe this attitude to be imprudent.
The risks to excessive haste are likely higher than whatever opportunity costs would be incurred by proceeding more cautiously. Moving too quickly gives policymakers and educators less time to devise and test the new systems, and to become familiar with how they work and the results they provide.
Moreover, careless rushing may result in avoidable erroneous high stakes decisions about individual teachers. Such decisions are harmful to the profession, they threaten the credibility of the evaluations, and they may well promote widespread backlash (such as the recent Florida lawsuits and the growing "opt-out" movement). Making things worse, the opposition will likely “spill over” into other promising policies, such as the already-fragile effort to enact the Common Core standards and aligned assessments.
Finally, we must not underestimate the costs, financial and otherwise, of making large changes to these systems once they are in place. A perfect example is NCLB – it had many obvious design flaws that were known early on, but few of these have been corrected, even in states’ ESEA “flexibility” applications.
In short, given these risks and the difficulty of fairly and accurately measuring teacher effectiveness, it seems short-sighted to rush into full-blown implementation without ensuring that the new systems are up to the task.
To that end, we would like to highlight four issues to which states and districts must pay attention in the short term.
The first is that the details of the evaluations, some of which may seem esoteric, in fact matter tremendously. Important choices include (but are not limited to): selecting measures, particularly for teachers in non-tested grades and subjects; reporting evaluation results to educators in a manner that is useful to their practice; ensuring accuracy in state data systems; choosing cut scores (if desired) to separate more and less effective educators; and designing scoring systems that preserve each measure’s intended importance, or “weight." All of these decisions are important, but even a cursory read of states' new evaluation policies under the waivers or Race to the Top highlights many decisions that contradict what little we know about effective teacher evaluation systems.
And, as is often the case with new policies, the flow of research in this area lags far behind the breakneck pace of policy making. For instance, a large number of states have chosen as their growth models for teacher evaluation a variant on what’s commonly called the “student growth percentile” (SGP) model. However, recent evidence suggests that value-added models can do a better job of leveling the playing field across classes. Similarly, the Measures of Effective Teaching project offered useful guidance for designing evaluation systems, but its results were released after many states and districts had already made these decisions.
A second issue is simple bad timing: The roll-out of the Common Core standards and new Core-aligned assessments creates serious complications for new teacher evaluation systems. Perhaps the most important of these is that curriculum, standards, and assessments are not yet in sync. New York has recently experienced this issue, administering new assessments before teachers have been supported to implement the Common Core through curriculum materials. And, while the stated hope is that the tests, curricula, and standards will be seamlessly aligned in a few years, if history is any guide this is far from guaranteed.
Doing evaluation reform and Common Core implementation at the same time may well be too much for states, districts, and schools to handle. Furthermore, evaluating teachers on the basis of tests that are not aligned with what they are supposed to be teaching is a fundamentally invalid use of those data.
The third issue is the need for states to avoid being overly prescriptive. Most notably, many schools and districts have well established evaluation systems already in place, and it makes little sense to uproot these systems and force a state-mandated model. Similarly, districts should be given room to experiment with system design and with different ways to use the results for personnel decisions. The state's optimal role may be to enforce a minimum standard for teacher evaluation, rather than mandating a particular evaluation model statewide.
Fourth and finally, new evaluations – as with any major policy – require significant time and resources to plan and pilot, and there must be substantial capacity building for educators to understand and carry out these systems. Policies should not move directly from the drawing board to high-stakes implementation if the goal is maximizing the policies' effectiveness and minimizing their negative unintended consequences. We recommend that schools and districts should have a year for planning and two years of implementation prior to tying ratings to high stakes decisions.
We conclude where we began – as two individuals who believe that improved teacher evaluation systems could indeed help elevate teaching and learning in U.S. schools. We are concerned that the overly quick, insufficiently careful manner in which many new systems are being installed threatens their likelihood of success.
Put simply, we need to slow down and work to create the best systems possible. Schools and districts in the middle of the design and implementation process should focus on the details of their systems and partner with researchers and other sites to study system effectiveness. In those places where evaluations are already in force, we would strongly advise policymakers to take a step back and consider our suggestions.
And, no matter the situation, high stakes decisions about teachers should not be made on the basis of assessment data collected during Common Core roll-out. Doing so is unfair and inappropriate and may cause serious harm.
We acknowledge that our arguments here do not fit neatly into the polarized, tribal framework that defines education policy discourse today. In fact, they may not resonate with either “side” of the reform debates, as we support evaluation reform but not unconditionally. To be clear, we do not expect that the new systems will ever be "perfect," and we fully acknowledge that there will be mistakes and adjustments going forward. Nevertheless, we believe that research and history show that time and attention to detail are usually the difference between policies that succeed and those that fail to improve outcomes. If this is worth doing, it is worth doing correctly.
- Morgan Polikoff and Matt Di Carlo