New Policy Brief: The Evidence On Charter Schools And Test Scores

In case you missed it, today we released a new policy brief, which provides an accessible review of the research on charter schools’ testing effects, how their varying impacts might be explained, and what this evidence suggests about the ongoing proliferation of these schools.

The brief is an adaptation of a three-part series of posts on this blog (here is part one, part two and part three).

Download the policy brief (PDF)

The abstract is pasted directly below.

The Deafening Silence Of Unstated Assumptions

Here’s a thought experiment. Let’s say we were magically granted the ability to perfectly design our public education system. In other words, we were somehow given the knowledge of the most effective policies and how to implement them, and we put everything in place. How quickly would schools improve? Where would we be after 20 years of having the best possible policies in place? What about after 50 years?

I suspect there is much disagreement here, and that answers would vary widely. But, since there is a tendency in education policy to shy away from even talking realistically about expectations, we may never really know. We sometimes operate as though we expect immediate gratification - quick gains, every single year. When schools or districts don't achieve gains, even over a short period of time, they are subject to being labeled as failures.

Without question, we need to set and maintain high expectations, and no school or district should ever cease trying to improve. Yet, in the context of serious policy discussions, the failure to even discuss expectations in a realistic manner hinders our ability to interpret and talk about evidence, as it often means that we have no productive standard by which to judge our progress or the effects of the policies we try.

The Year In Research On Market-Based Education Reform: 2011 Edition

** Also posted here on 'Valerie Strauss' Answer Sheet' in the Washington Post

If 2010 was the year of the bombshell in research in the three “major areas” of market-based education reform – charter schools, performance pay, and value-added in evaluations – then 2011 was the year of the slow, sustained march.

Last year, the landmark Race to the Top program was accompanied by a set of extremely consequential research reports, ranging from the policy-related importance of the first experimental study of teacher-level performance pay (the POINT program in Nashville) and the preliminary report of the $45 million Measures of Effective Teaching project, to the political controversy of the Los Angeles Times’ release of teachers’ scores from their commissioned analysis of Los Angeles testing data.

In 2011, on the other hand, as new schools opened and states and districts went about the hard work of designing and implementing new evaluations compensation systems, the research almost seemed to adapt to the situation. There were few (if any) "milestones," but rather a steady flow of papers and reports focused on the finer-grained details of actual policy.*

Nevertheless, a review of this year's research shows that one thing remained constant: Despite all the lofty rhetoric, what we don’t know about these interventions outweighs what we do know by an order of magnitude.

Do Teachers Really Come From The "Bottom Third" Of College Graduates?

** Also posted here on 'Valerie Strauss' Answer Sheet' in the Washington Post

The conventional wisdom among many education commentators is that U.S. public school teachers “come from the bottom third” of their classes. Most recently, New York City Mayor Michael Bloomberg took this talking point a step further, and asserted at a press conference last week that teachers are drawn from the bottom 20 percent of graduates.

All of this is supposed to imply that the U.S. has a serious problem with the “quality” of applicants to the profession.

Despite the ubiquity of the “bottom third” and similar arguments (which are sometimes phrased as massive generalizations, with no reference to actual proportions), it’s unclear how many of those who offer them know what specifically they refer to (e.g., GPA, SAT/ACT, college rank, etc.). This is especially important since so many of these measurable characteristics are not associated with future test-based effectiveness in the classroom, while those that are are only modestly so.

Still, given how often it is used, as well as the fact that it is always useful to understand and examine the characteristics of the teacher labor supply, it’s worth taking a quick look at where the “bottom third” claim comes from and what it might or might not mean.

What Value-Added Research Does And Does Not Show

Value-added and other types of growth models are probably the most controversial issue in education today. These methods, which use sophisticated statistical techniques to attempt to isolate a teacher’s effect on student test score growth, are rapidly assuming a central role in policy, particularly in the new teacher evaluation systems currently being designed and implemented. Proponents view them as a primary tool for differentiating teachers based on performance/effectiveness.

Opponents, on the other hand, including a great many teachers, argue that the models’ estimates are unstable over time, subject to bias and imprecision, and that they rely entirely on standardized test scores, which are, at best, an extremely partial measure of student performance. Many have come to view growth models as exemplifying all that’s wrong with the market-based approach to education policy.

It’s very easy to understand this frustration. But it's also important to separate the research on value-added from the manner in which the estimates are being used. Virtually all of the contention pertains to the latter, not the former. Actually, you would be hard-pressed to find many solid findings in the value-added literature that wouldn't ring true to most educators.

Has Teacher Quality Declined Over Time?

** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

One of the common assumptions lurking in the background of our education debates is that “quality” of the teaching workforce has declined a great deal over the past few decades (see here, here, here and here [slide 16]). There is a very plausible storyline supporting this assertion: Prior to the dramatic rise in female labor force participation since the 1960s, professional women were concentrated in a handful of female-dominated occupations, chief among them teaching. Since then, women’s options have changed, and many have moved into professions such as law and medicine instead of the classroom.

The result of this dynamic, so the story goes, is that the pool of candidates to the teaching profession has been “watered down." This in turn has generated a decline in the aggregate “quality” of U.S. teachers, and, it follows, a stagnation of student achievement growth. This portrayal is often used as a set-up for a preferred set of solutions – e.g., remaking teaching in the image of the other professions into which women are moving, largely by increasing risk and rewards.

Although the argument that “teacher quality” has declined substantially is sometimes taken for granted, its empirical backing is actually quite thin, and not as clear-cut as some might believe.

Smear Review

A few weeks ago, the National Education Policy Center (NEPC) issued a review of the research on virtual learning. Several proponents of online education issued responses that didn't offer much substance beyond pointing out NEPC’s funding sources. A similar reaction ensued after the release last year of the Gates Foundation's preliminary report on the Measures of Effective Teaching Project. There were plenty of substantive critiques, but many of the reactions amounted to knee-jerk dismissals of the report based on pre-existing attitudes toward the foundation's agenda.

More recently, we’ve even seen unbelievably puerile schemes in which political operatives actually pretend to represent legitimate organizations requesting consulting services. They record the phone calls, and post out-of-context snippets online to discredit the researchers.

Almost all of the people who partake in this behavior share at least one fundamental characteristic: They are unable to judge research for themselves, on its merits. They can’t tell the difference, so they default to attacking substantive work based on nothing more than the affiliations and/or viewpoints of the researchers.

The Categorical Imperative In New Teacher Evaluations

There is a push among many individuals and groups advocating new teacher evaluations to predetermine the number of outcome categories – e.g., highly effective, effective, developing, ineffective, etc. - that these new systems will include. For instance, a "statement of principles" signed by 25 education advocacy organizations recommends that the reauthorized ESEA law require “four or more levels of teacher performance." The New Teacher Project’s primary report on redesigning evaluations made the same suggestion.* For their part, many states have followed suit, mandating new systems with a minimum of 4-5 categories.

The rationale here is pretty simple on the surface: Those pushing for a minimum number of outcome categories believe that teacher performance must be adequately differentiated, a goal on which prior systems, most of which relied on dichotomous satisfactory/unsatisfactory schemes, fell short. In other words, the categories in new evaluation systems must reflect the variation in teacher performance, and that cannot be accomplished when there are only a couple of categories.

It’s certainly true that the number of categories matters – it is an implicit statement as to the system’s ability to tease out the “true” variation in teacher performance. The number of categories a teacher evaluation system employs should depend on how on how well it can differentiate teachers with a reasonable degree of accuracy. If a system is unable to pick up this “true” variation, then using several categories may end up doing more harm than good, because it will be providing faulty information. And, at this early stage, despite the appearance of certainty among some advocates, it remains unclear whether all new teacher evaluation systems should require four or more levels of “effectiveness."

The Uncertain Future Of Charter School Proliferation

This is the third in a series of three posts about charter schools. Here are the first and second parts.

As discussed in prior posts, high-quality analyses of charter school effects show that there is wide variation in the test-based effects of these schools but that, overall, charter students do no better than their comparable regular public school counterparts. The existing evidence, though very tentative, suggests that the few schools achieving large gains tend to be well-funded, offer massive amounts of additional time, provide extensive tutoring services and maintain strict, often high-stakes discipline policies.

There will always be a few high-flying chains dispersed throughout the nation that get results, and we should learn from them. But there’s also the issue of whether a bunch of charters schools with different operators using diverse approaches can expand within a single location and produce consistent results.

Charter supporters typically argue that state and local policies can be leveraged to “close the bad charters and replicate the good ones." Opponents, on the other hand, contend that successful charters can’t expand beyond a certain point because they rely on selection bias of the best students into these schools (so-called “cream skimming”), as well as the exclusion of high-needs students.

Given the current push to increase the number of charter schools, these are critical issues, and there is, once again, some very tentative evidence that might provide insights.

Explaining The Consistently Inconsistent Results of Charter Schools

This is the second in a series of three posts about charter schools. Here is the first part, and here is the third.

As discussed in a previous post, there is a fairly well-developed body of evidence showing that charter and regular public schools vary widely in their impacts on achievement growth. This research finds that, on the whole, there is usually not much of a difference between them, and when there are differences, they tend to be very modest. In other words, there is nothing about "charterness" that leads to strong results.

It is, however, the exceptions that are often most instructive to policy. By taking a look at the handful of schools that are successful, we might finally start moving past the “horse race” incarnation of the charter debate, and start figuring out which specific policies and conditions are associated with success, at least in terms of test score improvement (which is the focus of this post).

Unfortunately, this question is also extremely difficult to answer – policies and conditions are not randomly assigned to schools, and it’s very tough to disentangle all the factors (many unmeasurable) that might affect achievement. But the available evidence at this point is sufficient to start draw a few highly tentative conclusions about “what works."