Value-Added Versus Observations, Part One: Reliability


I think you are missing the bigger picture. The purpose of an observation is to reinforce effective teaching behaviors and eliminate ineffective practices. The appearance of either type will vary from lesson to lesson. Observations were never designed to evaluate. We need to stop worrying about measurement and focus on improving instruction.

Matt- With respect to value-added models you state "On a related note, different model specifications and different tests can yield very different results for the same teacher/class." You seem to acknowledge the importance of this issue in your note: "**** The choice of models (or, perhaps, of observation protocols) is a related issue here, though it is in many respects more of a validity issue. There are trade-offs in model selection, and, as usual, no “correct answer,” but many states have adopted models that are not appropriate for causal inferences, and thus for high-stakes decisions (see Bruce Baker’s discussions here and here)." The fact that different model specifications and outcome tests results in substantially different attributions of effectiveness is certainly a validity issue, if not THE validity issue for value-added measurement. If different, defensible VAMs "can yield very different results for the same teacher/class," how can value-added measurement provide valid measures of teacher effectiveness? To my knowledge, no legislated evaluation system attempts to measure or average the results of different models and none use alternative standardized tests. Until this validity question is addressed, the entire VAM enterprise seems fundamentally suspect to me.

Hey Matt, As usual, I love your insights into this. Quick question about the implementation and, specifically, time-to-implement side of this. Is there any research that quantifies the value of longer phase-in time for new initiatives? I see the side of "do it right -- take your time," and yet, as a former teacher, I also saw the urgent need for these systems ASAP. Is there any way to represent this tension mathematically, or at least find some sort of "sweet spot?" Thanks, and keep up the great work.


