I have reviewed, albeit superficially, the test-based components of several states’ school rating systems (e.g., OH, FL, NYC, LA, CO), with a particular focus on the degree to which they are actually measuring student performance (how highly students score), rather than school effectiveness per se (whether students are making progress). Both types of measures have a role to play in accountability systems, even if they are often confused or conflated, resulting in widespread misinterpretation of what the final ratings actually mean, and many state systems’ failure to tailor interventions to the indicators being used.
One aspect of these systems that I rarely discuss is the possibility that the ratings systems are an end in themselves. That is, the idea that public ratings, no matter how they are constructed, provide an incentive for schools to get better. From this perspective, even if the ratings are misinterpreted or imprecise, they might still “work."*
There’s obviously something to this. After all, the central purpose of any accountability system is less about closing or intervening in a few schools than about giving all schools incentive to up their respective games. And, no matter how you feel about school rating systems, there can be little doubt that people pay attention to them. Educators and school administrators do so, not only because they fear closure or desire monetary rewards; they also take pride in what they do, and they like being recognized for it. In short, my somewhat technocratic viewpoint on school ratings ignores the fact that their purpose is less about rigorous measurement than encouraging improvement.
The available evidence, which is scarce and significantly curtailed by the difficulty in assessing the effect of policies implemented on a statewide (or, in the case of NCLB, national) level, suggests that state-level accountability systems may have a very modest positive impact on aggregate test-based performance, especially when there are consequences attached to the ratings (also see here). Results may also vary by subject and the type of measure being used. Finally, one recent paper, which looked at the impact of NCLB, found positive effects in only one of the four main NAEP assessments.
This evidence is hardly clear-cut (and this is far from a comprehensive review), and it is always difficult to discern the validity of increases in test scores immediately after the introduction of an accountability regime that is heavily focused on those test scores. Nevertheless, there is reason to believe that people respond to these systems, even if, in many cases, the measures they employ are poorly designed.**
I have always recognized this, yet I cannot accept the idea that measurement is merely a side issue, for two main reasons:
First, people aren’t fools – if ratings presented as school performance measures are heavily biased by student characteristics, teachers and administrators will catch on quickly (in many places, they already have). When this happens, the people who are supposed to be incentivized to improve the system can actually end up resenting it. People don’t like being held accountable for things that they cannot control. One can only cringe at the thought of the groups of hard-working teachers in high-poverty schools being shamed annually by a system that dooms them to low ratings by virtue of their dedication to serving the kids who need them the most.
Second, and more practically, there is no sense in tolerating bad measurement when there are alternatives (especially given that there is no guarantee the systems will have a discernible impact). School rating systems can be improved – e.g., the measures can be constructed to more accurately reflect school performance, and/or different types of measures can be used for different purposes. If done correctly – with stakeholder input – this could actually improve the efficacy of the incentives embedded in the system, as well as the merit of the policy decisions that are based on the ratings.
So, while I acknowledge the possibility that a deeply flawed school rating system is better than none at all, I don’t think that’s the choice we are facing. It’s not "imperfect versus nothing," or the “perfect versus the good." Rather, it's the better versus the worse. We should opt for the former.
- Matt Di Carlo
* This (highly unoriginal) concept might also be applied to other policies, such as teacher evaluations and rating teacher preparation programs.
** The irony of this perspective, at least for me, is that one might argue it partially depends on the misinterpretation of testing data. For example, hypothetically, if schools were rated entirely according to their value-added estimates (which I consider to be a more advanced, albeit imperfect, measure of schools’ performance), it might actually “water down” the incentives since, due to instability, few schools would receive consistently high or low ratings. This would not only harm the credibility of the ratings (everyone would notice that the ratings jumped around every year), but, eventually, most schools would get a decent rating, which might encourage complacency. Similarly, if people actually interpreted the existing ratings accurately – in most places, they are predominantly measures of student rather than school performance – then the public, as well as teachers and administrators, could more easily dismiss them as “beyond our control."