Dispatches From The Nexus Of Bad Research And Bad Journalism

In a recent story, the New York Daily News uses the recently-released teacher data reports (TDRs) to “prove” that the city’s charter school teachers are better than their counterparts in regular public schools. The headline announces boldly: New York City charter schools have a higher percentage of better teachers than public schools (it has since been changed to: "Charters outshine public schools").

Taking things even further, within the article itself, the reporters note, “The newly released records indicate charters have higher performing teachers than regular public schools."

So, not only are they equating words like “better” with value-added scores, but they’re obviously comfortable drawing conclusions about these traits based on the TDR data.

The article is a pretty remarkable display of both poor journalism and poor research. The reporters not only attempted to do something they couldn’t do, but they did it badly to boot. It’s unfortunate to have to waste one’s time addressing this kind of thing, but, no matter your opinion on charter schools, it's a good example of how not to use the data that the Daily News and other newspapers released to the public.

Ready, Aim, Hire: Predicting The Future Performance Of Teacher Candidates

In a previous post, I discussed the idea of “attracting the best candidates” to teaching by reviewing the research on the association between pre-service characteristics and future performance (usually defined in terms of teachers’ estimated effect on test scores once they get into the classroom). In general, this body of work indicates that, while far from futile, it’s extremely difficult to predict who will be an “effective” teacher based on their paper traits, including those that are typically used to define “top candidates," such as the selectivity of the undergraduate institutions they attend, certification test scores and GPA (see here, here, here and here, for examples).

There is some very limited evidence that other, “non-traditional” measures might help. For example, a working paper, released last year, found a statistically discernible, fairly strong association between first-year math value-added and an index constructed from surveys administered to Teach for America candidates. There was, however, no association in reading (note that the sample was small), and no relationships in either subject found during these teachers’ second years.*

A recently-published paper – which appears in the peer-reviewed journal Education Finance and Policy, originally released as working paper in 2008 –  represents another step forward in this area. The analysis, presented by the respected quartet of Jonah Rockoff, Brian Jacob, Thomas Kane, and Douglas Staiger (RJKS), attempts to look beyond the set of characteristics that researchers are typically constrained (by data availability) to examine.

In short, the results do reveal some meaningful, potentially policy-relevant associations between pre-service characteristics and future outcomes. From a more general perspective, however, they are also a testament to the difficulties inherent in predicting who will be a good teacher based on observable traits.

Do Value-Added Models "Control For Poverty?"

There is some controversy over the fact that Florida’s recently-announced value-added model (one of a class often called “covariate adjustment models”), which will be used to determine merit pay bonuses and other high-stakes decisions, doesn’t include a direct measure of poverty.

Personally, I support adding a direct income proxy to these models, if for no other reason than to avoid this type of debate (and to facilitate the disaggregation of results for instructional purposes). It does bear pointing out, however, that the measure that’s almost always used as a proxy for income/poverty – students’ eligibility for free/reduced-price lunch – is terrible as a poverty (or income) gauge. It tells you only whether a student’s family has earnings below (or above) a given threshold (usually 185 percent of the poverty line), and this masks most of the variation among both eligible and non-eligible students. For example, families with incomes of $5,000 and $20,000 might both be coded as eligible, while families earning $40,000 and $400,000 are both coded as not eligible. A lot of hugely important information gets ignored this way, especially when the vast majority of students are (or are not) eligible, as is the case in many schools and districts.

That said, it’s not quite accurate to assert that Florida and similar models “don’t control for poverty." The model may not include a direct income measure, but it does control for prior achievement (a student’s test score in the previous year[s]). And a student’s test score is probably a better proxy for income than whether or not they’re eligible for free/reduced-price lunch.

Even more importantly, however, the key issue about bias is not whether the models “control for poverty," but rather whether they control for the range of factors – school and non-school – that are known to affect student test score growth, independent of teachers’ performance. Income is only one part of this issue, which is relevant to all teachers, regardless of the characteristics of the students that they teach.

Guessing About NAEP Results

Every two years, the release of data from the National Assessment of Educational Progress (NAEP) generates a wave of research and commentary trying to explain short- and long-term trends. For instance, there have been a bunch of recent attempts to “explain” an increase in aggregate NAEP scores during the late 1990s and 2000s. Some analyses postulate that the accountability provisions of NCLB were responsible, while more recent arguments have focused on the “effect” (or lack thereof) of newer market-based reforms – for example, looking to NAEP data to “prove” or “disprove” the idea that changes in teacher personnel and other policies have (or have not) generated “gains” in student test scores.

The basic idea here is that, for every increase or decrease in cross-sectional NAEP scores over a given period of time (both for all students and especially for subgroups such as minority and low-income students), there must be “something” in our education system that explains it. In many (but not all) cases, these discussions consist of little more than speculation. Discernible trends in NAEP test score data are almost certainly due to a combination of factors, and it’s unlikely that one policy or set of policies is dominant enough to be identified as “the one." Now, there’s nothing necessarily wrong with speculation, so long as it is clearly identified as such, and conclusions presented accordingly. But I find it curious that some people involved with these speculative arguments seem a bit too willing to assume that schooling factors – rather than changes in cohorts’ circumstances outside of school – are the primary driver of NAEP trends.

So, let me try a little bit of illustrative speculation of my own: I might argue that changes in the economic conditions of American schoolchildren and their families are the most compelling explanation for changes in NAEP.

A Big Open Question: Do Value-Added Estimates Match Up With Teachers' Opinions Of Their Colleagues?

A recent article about the implementation of new teacher evaluations in Tennessee details some of the complicated issues with which state officials, teachers and administrators are dealing in adapting to the new system. One of these issues is somewhat technical – whether the various components of evaluations, most notably principal observations and test-based productivity measures (e.g., value-added) – tend to “match up." That is, whether teachers who score high on one measure tend to do similarly well on the other (see here for more on this issue).

In discussing this type of validation exercise, the article notes:

If they don't match up, the system's usefulness and reliability could come into question, and it could lose credibility among educators.
Value-added and other test-based measures of teacher productivity may have a credibility problem among many (but definitely not all) teachers, but I don’t think it’s due to – or can be helped much by – whether or not these estimates match up with observations or other measures being incorporated into states’ new systems. I’m all for this type of research (see here and here), but I’ve never seen what I think would be an extremely useful study for addressing the credibility issue among teachers: One that looked at the relationship between value-added estimates and teachers’ opinions of each other.

A Look Inside Principals' Decisions To Dismiss Teachers

Despite all the heated talk about how to identify and dismiss low-performing teachers, there’s relatively little research on how administrators choose whom to dismiss, whether various dismissal options might actually serve to improve performance, and other aspects in this area. A paper by economist Brian Jacob, released as working paper in 2010 and published late last year in the journal Education Evaluation and Policy Analysis, helps address at least one of these voids, by providing one of the few recent glimpses into administrators’ actual dismissal decisions.

Jacob exploits a change in Chicago Public Schools (CPS) personnel policy that took effect for the 2004-05 school year, one which strengthened principals’ ability to dismiss probationary teachers, allowing non-renewal for any reason, with minimal documentation. He was able to link these personnel records to student test scores, teacher and school characteristics and other variables, in order to examine the characteristics that principals might be considering, directly or indirectly, in deciding who would and would not be dismissed.

Jacob’s findings are intriguing, suggesting a more complicated situation than is sometimes acknowledged in the ongoing debate over teacher dismissal policy.

Fundamental Flaws In The IFF Report On D.C. Schools

A new report, commissioned by the District of Columbia Mayor Vincent Gray and conducted by the Chicago-based consulting organization IFF, was supposed to provide guidance on how the District might act and invest strategically in school improvement, including optimizing the distribution of students across schools, many of which are either over- or under-enrolled.

Needless to say, this is a monumental task. Not only does it entail the identification of high- and low-performing schools, but plans for improving them as well. Even the most rigorous efforts to achieve these goals, especially in a large city like D.C., would be to some degree speculative and error-prone.

This is not a rigorous effort. IFF’s final report is polished and attractive, with lovely maps and color-coded tables presenting a lot of summary statistics. But there’s no emperor underneath those clothes. The report's data and analysis are so deeply flawed that its (rather non-specific) recommendations should not be taken seriously.

The Perilous Conflation Of Student And School Performance

Unlike many of my colleagues and friends, I personally support the use of standardized testing results in education policy, even, with caution and in a limited role, in high-stakes decisions. That said, I also think that the focus on test scores has gone way too far and their use is being implemented unwisely, in many cases to a degree at which I believe the policies will not only fail to generate improvement, but may even risk harm.

In addition, of course, tests have a very productive low-stakes role to play on the ground – for example, when teachers and administrators use the results for diagnosis and to inform instruction.

Frankly, I would be a lot more comfortable with the role of testing data – whether in policy, on the ground, or in our public discourse – but for the relentless flow of misinterpretation from both supporters and opponents. In my experience (which I acknowledge may not be representative of reality), by far the most common mistake is the conflation of student and school performance, as measured by testing results.

Consider the following three stylized arguments, which you can hear in some form almost every week:

Schools' Effectiveness Varies By What They Do, Not What They Are

There may be a mini-trend emerging in certain types of charter school analyses, one that seems a bit trivial but has interesting implications that bear on the debate about charter schools in general. It pertains to how charter effects are presented.

Usually, when researchers estimate the effect of some intervention, the main finding is the overall impact, perhaps accompanied by a breakdown by subgroups and supplemental analyses. In the case of charter schools, this would be the estimated overall difference in performance (usually testing gains) between students attending charters versus their counterparts in comparable regular public schools.

Two relatively recent charter school reports, however – both generally well-done given their scope and available data – have taken a somewhat different approach, at least in the “public roll-out” of their results.

Income And Educational Outcomes

The role of poverty in shaping educational outcomes is one of the most common debates going on today. It can also be one of the most shallow.

The debate tends to focus on income. For example (and I’m generalizing a bit here), one “side” argues that income and test scores are strongly correlated; the other “side” points to the fact that many low-income students do very well and cautions against making excuses for schools’ failure to help poor kids.

Both arguments have merit, but it bears quickly mentioning that the focus on the relationship between income and achievement is a rather crude conceptualization of the importance of family background (and non-schooling factors in general) for education outcomes. Income is probably among the best widely available proxies for these factors, insofar as it is correlated with many of the conditions that can hinder learning, especially during a child’s earliest years. This includes (but is not at all limited to): peer effects; parental education; access to print and background knowledge; parental involvement; family stressors; access to healthcare; and, of course, the quality of neighborhood schools and their teachers.

And that is why, when researchers try to examine school performance – while holding constant the effect of factors outside of schools’ control – income or some kind of income-based proxy (usually free/reduced price lunch) can be a useful variable. It is, however, quite limited.