Lost In Citation
The so-called Vergara trial in California, in which the state’s tenure and layoff statutes were deemed unconstitutional, already has its first “spin-off," this time in New York, where a newly-formed organization, the Partnership for Educational Justice (PEJ), is among the organizations and entities spearheading the effort.
Upon first visiting PEJ’s new website, I was immediately (and predictably) drawn to the “Research” tab. It contains five statements (which, I guess, PEJ would characterize as “facts”). Each argument is presented in the most accessible form possible, typically accompanied by one citation (or two at most). I assume that the presentation of evidence in the actual trial will be a lot more thorough than that offered on this webpage, which seems geared toward the public rather than the more extensive evidentiary requirements of the courtroom (also see Bruce Baker’s comments on many of these same issues surrounding the New York situation).
That said, I thought it might be useful to review the basic arguments and evidence PEJ presents, not really in the context of whether they will “work” in the lawsuit (a judgment I am unqualified to make), but rather because they're very common, and also because it's been my observation that advocates, on both “sides” of the education debate, tend to be fairly good at using data and research to describe problems and/or situations, yet sometimes fall a bit short when it comes to evidence-based discussions of what to do about them (including the essential task of acknowledging when the evidence is still undeveloped). PEJ’s five bullet points, discussed below, are pretty good examples of what I mean.
Teachers matter. This is an extremely commonplace talking point, one that is not particularly controversial (except in how it is measured). The evidence presented by PEJ boils down a rather large and complex literature to findings from a single paper, though an important one (which is discussed here). Nevertheless, fair enough - it’s an important starting point when talking about tenure/layoff reform. There is almost certainly a big difference between the most and least effective teachers.*
I would, however, hasten to add one critical point (besides the usual caveat that value-added estimates are just one measure of teacher performance). As we’ve discussed here many times, the issue is not whether teachers matter, but rather how policy can be used to shift the distribution of teacher effectiveness (however you measure it). This bullet point by PEJ does not speak to the latter aspect, and it's the one that really matters, particularly for an organization whose website is replete with promises of “common sense changes” that will “fix outdated policies." Teacher quality is a target, not an arrow.
Schools can accurately identify their most and least effective teachers. This claim is so oversimplified that it is, at best, highly misleading. PEJ’s evidence here is the Measures of Effective Teaching (MET) project. MET was an extremely thorough, well-done and important analysis of teacher performance measures. It did not, however, prove that states and districts can “accurately identify their best and worst teachers” (although it should be noted that the MET packaging to the press and public did in fact make this assertion).
What MET did, among other things, is test whether value-added estimates varied when students were randomly assigned to teachers, as well as show that varying combinations of different measures – e.g., value-added estimates, classroom observations and student surveys – can do a modestly good (but still inevitably imprecise) job of predicting themselves (e.g., value-added in the following year). That’s a long way from grand statements such as “schools can accurately identify their most and least effective teachers."
There are currently great strides being made in the research on this topic, including MET, and now fueled mostly by the flood of available data from and studies of the new evaluation systems. But measuring teacher performance is obviously very complicated, will require a great deal more work, and will never reach a point where the phrase "accurately identify" is quite appropriate for a general audience.
Moreover, even if such measures are or become available, there is still the question of how to use them in a manner that will improve performance (e.g., incentives, etc.). This is all still in its first stages (see Dee and Wyckoff 2013 and Loeb et al. 2014 for some very early evidence), and a very liberal dose of humility is required.
Low-income students get less effective teachers. Although, to reiterate, the degree to which available measures do a good job of capturing who is an “effective teacher” remains a contentious unanswered question, this argument is generally accurate. PEJ offers one citation (Isenberg et al. 2013), but no matter how you measure teacher effectiveness, the available evidence suggests that disadvantaged students are at least somewhat more likely to get less qualified, effective teachers (Lankford et al. 2002; Hannaway, et al. 2009; Glazerman and Max 2011; Goldhaber et al. 2014; Sass et al. 2012). And this too is an important point, though it's not a new one.
That said, the literature that focuses on value-added as the teacher quality proxy is also clear that the magnitude of the differences vary widely (they are quite large in some places and nil in others), and on balance tend to be on the modest side (though even modest differences of course can matter).**
And, once again, the key here is that this talking point describes a problem, not a solution. In other words, insofar as the whole point of PEJ is to advocate for “common sense policy," the big question is not just whether there is unequal access to “effective teaching," no matter how it’s defined, but also why that’s the case, and what can be done about it (e.g., see Glazerman et al. 2013).
Although it’s quite difficult to isolate why teachers apply, leave or stay in certain types of schools, the available research suggests, for example, that teachers tend to prefer working in schools serving more affluent student populations (Hanushek et al. 2004; Lankford et al. 2002), or schools closer to their homes (Boyd et al. 2004). These factors, as well as, on a potentially related note, organizational forces (Ingersoll 2001), are probably among the big ones contributing to the gaps in measured effectiveness between higher- and lower-poverty schools, and many are not easily addressed by policy interventions. It is certainly possible that existing HR policies related to tenure and transfers fuel this impact, but it’s a stretch to argue forcefully that they are the among the crucial factors. And PEJ offers no evidence pertaining to any policies, to say nothing of those for which they are advocating.
Quality-blind layoffs can cost students entire months of learning. This conclusion defines “months of learning” entirely in terms of testing performance, and it is a crude summary of analyses (Boyd et al. 2010; Goldhaber and Theobald 2010) that compare two types of layoff policies: one based on seniority with an alternative that uses test-based effectiveness (i.e., value-added). It is, therefore, hardly surprising that a layoff based on value-added scores might improve testing performance relative to available alternatives, since value-added scores are a purely test-based measure.***
Thus, while this talking point comes closer than the others to indicating a possible solution (mostly by implication), it is an illustrative alternative, since most teachers (i.e., those in untested grades/subjects) don’t yet receive value-added scores.
Instead of this overstated, rather circular argument, one might argue fairly that there could be alternatives for layoff criteria that do a better job than seniority of capturing effectiveness, in terms of efficiency, fairness and other considerations.
(Side note: The somewhat uncomfortable role of serving as a citation for a unfalsifiable statement is given to a report by the New Teacher Project, which we discussed in detail here.)
And, finally, one more time – accepting the argument that schools could do a better job keeping the “right teachers," where do we go from there? The real question, again, is how this can be done (for a thorough review of the retention literature, see Guarino et al. 2004). If PEJ’s point here is to say that tenure, layoff and transfer policies result in the retention of more of the “wrong teachers," and fewer of the “right teachers," than their proposed alternative(s), they might just say that (including specifying their alternatives) and back it up.
Again, I realize this lawsuit is going to present a great deal more evidence than the bullets on this webpage, which seems meant for the public. Still, as stated in the beginning of this piece, I often wonder whether our policy discourse is well-served by what seems like the common assumption that “short and sweet” is the best way to carry out a policy campaign. I realize this is an all-too-common, perhaps even mundane point, but I feel obligated to make it (as I so often do).
And, more importantly, I am hoping that we who are involved in contemporary education debates (myself included) might be more careful about conflating the description of problems with explanations of why they arise and what can be done about them. All of these are necessary steps in the policymaking process, of course, but they are distinct elements of it. For instance, if PEJ’s stated mission is “fixing outdated policies” with “common sense changes," pointing out the variation in measured teacher effects on test scores is relevant and important, but it’s not a concrete policy proposal, to say nothing of a proposal with some basis in research.
And I for one would prefer to approach the difficult task of designing and implementing education policy armed with a bit more than just “common sense."
- Matt Di Carlo
* News coverage of PEJ's filing of its actual lawsuit, which occurred after this post was written, indicate that the documents submitted include "hundreds of pages of academic and journalistic articles on the importance of effective teachers." This does not affect the point I am making about the distinction between describing and affecting the distribution of teacher effectiveness.
** For example, one of the analyses on this topic actually uses New York City data between 2001 and 2005 (Hannaway, et al. 2009). The findings suggest that teachers in low-poverty schools (those with free/reduced-price lunch rates under 70 percent) are generally more effective in raising test scores than their counterparts in higher-poverty schools (over 70 percent FRL), but that the differences are not huge, and do not show up in all comparisons.
** There are other issues here, such as the fact that layoffs, which are not commonplace and often relatively small, must be quite large to have much of an impact on aggregate student testing performance (frankly, PEJ’s headline argument -- "can cost students months..." -- is worded such that it might apply to any policy), and that the imprecision of value-added estimates means that the impact of layoffs based on those estimates shrinks considerably in future years.