The Great Teacher Evaluation Evaluation: New York Edition
A couple of weeks ago, the New York State Education Department (NYSED) released data from the first year of the state's new teacher and principal evaluation system (called the “Annual Professional Performance Review," or APPR). In what has become a familiar pattern, this prompted a wave of criticism from advocates, much of it focused on the proportion of teachers in the state to receive the lowest ratings.
To be clear, evaluation systems that produce non-credible results should be examined and improved, and that includes those that put implausible proportions of teachers in the highest and lowest categories. Much of the commentary surrounding this and other issues has been thoughtful and measured. As usual, though, there have been some oversimplified reactions, as exemplified by this piece on the APPR results from Students First NY (SFNY).
SFNY notes what it considers to be the low proportion of teachers rated “ineffective," and points out that there was more differentiation across rating categories for the state growth measure (worth 20 percent of teachers’ final scores), compared with the local “student learning” measure (20 percent) and the classroom observation components (60 percent). Based on this, they conclude that New York’s "state test is the only reliable measure of teacher performance" (they are actually talking about validity, not reliability, but we’ll let that go). Again, this argument is not representative of the commentary surrounding the APPR results, but let’s use it as a springboard for making a few points, most of which are not particularly original. (UPDATE: After publication of this post, SFNY changed the headline of their piece from "the only reliable measure of teacher performance" to "the most reliable measure of teacher performance.")