The U.S. Department of Education (USED) has just released the long-anticipated final regulations for teacher preparation (TP) program accountability. These regulations will guide states, which are required to design their own systems for assessing TP program performance for full implementation in 2018-19. The earliest year in which stakes (namely, eligibility for federal grants) will be attached to the ratings is 2021-22.
Among the provisions receiving attention is the softening of the requirement regarding the use of test-based productivity measures, such as value-added and other growth models (see Goldhaber et al. 2013; Mihaly et al. 2013; Koedel et al. 2015). Specifically, the final regulations allow greater “flexibility” in how and how much these indicators must count toward final ratings. For the reasons that Cory Koedel and I laid out in this piece (and I will not reiterate here), this is a wise decision. Although it is possible that value-added estimates will eventually play a significant role in these TP program accountability systems, the USED timeline provides insufficient time for the requisite empirical groundwork.
Yet this does not resolve the issues facing those who must design these systems, since putting partial brakes on value-added for TP programs also puts increased focus on the other measures which might be used to gauge program performance. And, as is often the case with formal accountability systems, the non-test-based bench is not particularly deep.
For instance, as a component of the ratings, the USED regulations encourage states to use placement and retention rates of TP program graduates, including placement/retention in high needs schools. This is a laudable idea to be sure, but the degree to which any observed variation between programs in placement and retention are due to the programs themselves, rather than factors such as selection (i.e., which students apply) and geography (e.g., proximity to big cities), is questionable. Moreover, as is the case with TP value-added, the pool of a program's graduates in any given year is usually quite small, which means, for instance, that the results of any placement or retention measure might vary year to year based on the decisions of just a handful of graduates.
Another recommended component is feedback from program graduates (presumably gathered via surveys). This makes intuitive sense and could plausibly capture aspects of program quality, but there is, to my knowledge, very little research on this measure in an accountability context, which means that the signal it offers, or even basic aspects (such as how much these responses vary by program), remain uncertain. And, once again, the samples for these surveys in any given year will also be small in many cases.
Now, to be clear, these issues are not insurmountable. And the best way to address them (and designs for TP program value added as well) – really the only way – is to try them out in real accountability systems. Toward that end, I have three general reactions to the final regulations.
First, to reiterate, USED’s decision to allow states more leeway in designing their systems, particularly rolling back the requirement for including value-added in TP program evaluation systems, is a smart move, as it reflects the current (uncertain) state of the research, which does not yet support ex ante prioritization of any one type of measure. In addition, the flexibility will probably encourage greater variation in system design, which can in turn be exploited in policy evaluation.
Second, it is very important to keep in mind that, although TP program value-added gets all the attention, it is really no more untested or problematic than the alternatives. The wise course would be for states to try a bunch of different measures and -- this is critical -- have their systems undergo a rigorous policy evaluation by independent researchers.
On a third, final, and related note, since there is still so little known about whether these measures and systems are able to gauge TP program quality "accurately" (or whether they might be useful in improving quality), the longest possible trial period, during which few if any stakes are attached, would be prudent. Giving national TP program accountability a try might not be a bad idea, if it’s done properly, but it will be impossible to assess this endeavor without strong policy evaluation, and that takes time.