There is currently a push to evaluate teacher preparation programs based in part on the value-added of their graduates. Predictably, this is a highly controversial issue, and the research supporting it is, to be charitable, still underdeveloped. At present, the evidence suggests that the differences in effectiveness between teachers trained by different prep programs may not be particularly large (see here, here, and here), though there may be exceptions (see this paper).
In the meantime, there’s an interesting little conflict underlying the debate about measuring preparation programs’ effectiveness, one that’s worth pointing out. For the purposes of this discussion, let’s put aside the very important issue of whether the models are able to account fully for where teaching candidates end up working (i.e., bias in the estimates based on school assignments/preferences), as well as (valid) concerns about judging teachers and preparation programs based solely on testing outcomes. All that aside, any assessment of preparation programs using the test-based effectiveness of their graduates is picking up on two separate factors: How well they prepare their candidates; and who applies to their programs in the first place.
In other words, programs that attract and enroll highly talented candidates might look good even if they don’t do a particularly good job preparing teachers for their eventual assignments. But does that really matter?
Put differently, should we judge the effectiveness of preparation programs despite knowledge that we are most likely doing so based in part on the candidates they attract, and not just how well the programs actually “work?"
The first point I would make is that these two factors are unlikely to be completely independent – i.e., perhaps programs attract talented candidates in part because they are high-quality.
Moreover, there are important contexts in which the information might be useful even if there is selection bias. A school or district looking to recruit strong new teachers may not place much emphasis on this distinction – they just want to hire good people, and if a given preparation program offers them good people, schools and districts may not be particularly concerned about separating selection from program effects. In short, whether or not preparation programs attract strong candidates is part of the package.
But this approach also carries risks, particularly in an accountability context. For instance, many factors besides the actual quality of programs, such as tuition costs or location, might influence the applicants they attract. If states decide to reward or punish preparation programs based on their graduates’ value-added, they might end up punishing effective programs due to nothing more than the students they enroll (and, once again, there are pretty serious issues with where these candidates end up teaching, and how that affects the scores).
In addition, to the degree prep programs look at each other’s value-added scores as a signal of how good their programs are, there may be a tendency to adopt practices of high-scoring programs, and move away from practices of low-scoring programs, under the extremely shaky assumption that the value-added scores are valid for that purpose.
This is a matter of how the estimates are used, and who is using them. If you’re a school or district looking to hire teachers, you might benefit from including the value-added scores as one of your criteria (hopefully, one among many others).
In contrast, if your purpose is to hold prep programs accountable for their results, then you’d be well advised to interpret these estimates with extreme caution, as they’re picking up on confounding factors (selection) that the programs cannot control. And the same basic message applies to prospective teachers choosing a program for themselves. The fact that a given program’s graduates have high value-added scores doesn’t necessarily mean that the program is the reason why – i.e., it may not speak to whether or not the program will make the prospective student more effective.
This problem – disentangling program from selection effects - is a very common one in education research (and that in other fields as well). As is the case with school rating systems, the best solution is for people to understand that the estimates can mean different things depending on the context in which they're employed.
And I am very concerned that this critical detail isn’t receiving enough attention, and that states (e.g., Tennessee) are rushing to publish the prep program value-added ratings not only before the evidence is mature enough, but, even more basically, without sufficient explanation of how they should be interpreted.
- Matt Di Carlo