The Ratings Game: New York City Edition

Gotham Schools reports that the New York City Department of Education rolled out this year’s school report card grades by highlighting the grades’ stability between this year and last. That is, they argued that schools’ grades were roughly the same between years, which is supposed to serve as evidence of the system’s quality.

The city’s logic here is generally sound. As I’ve noted before, most schools don’t undergo drastic changes in their operations over the course of a year, and so fluctuations in grades among a large number of schools might serve as a warning sign that there’s something wrong with the measures being used. Conversely, it’s not unreasonable to expect from a high-quality rating system that, over a two-year period, some schools would get higher grades and some lower, but that most would stay put. That was the city’s argument this year.

The only problem is that this wasn’t really the case.

Quality Control, When You Don't Know The Product

Last week, New York State’s Supreme Court issued an important ruling on the state’s teacher evaluations. The aspect of the ruling that got the most attention was the proportion of evaluations – or “weight” – that could be assigned to measures based on state assessments (in the form of estimates from value-added models). Specifically, the Court ruled that these measures can only comprise 20 percent of a teacher’s evaluation, compared with the option of up to 40 percent for which Governor Cuomo and others were pushing. Under the decision, the other 20 percent must consist entirely of alternative test-based measures (e.g., local assessments).

Joe Williams, head of Democrats for Education Reform, one of the flagship organizations of the market-based reform movement, called the ruling “a slap in the face” and “a huge win for the teachers unions." He characterized the policy impact as follows: “A mediocre teacher evaluation just got even weaker."

This statement illustrates perfectly the strange reasoning that seems to be driving our debate about evaluations.

Our Annual Testing Data Charade

Every year, around this time, states and districts throughout the nation release their official testing results. Schools are closed and reputations are made or broken by these data. But this annual tradition is, in some places, becoming a charade.

Most states and districts release two types of assessment data every year (by student subgroup, school and grade): Average scores (“scale scores”); and the percent of students who meet the standards to be labeled proficient, advanced, basic and below basic. The latter type – the rates – are of course derived from the scores – that is, they tell us the proportion of students whose scale score was above the minimum necessary to be considered proficient, advanced, etc.

Both types of data are cross-sectional. They don’t follow individual students over time, but rather give a “snapshot” of aggregate performance among two different groups of students (for example, third graders in 2010 compared with third graders in 2011). Calling the change in these results “progress” or “gains” is inaccurate; they are cohort changes, and might just as well be chalked up to differences in the characteristics of the students (especially when changes are small). Even averaged across an entire school or district, there can be huge differences in the groups compared between years – not only is there often considerable student mobility in and out of schools/districts, but every year, a new cohort enters at the lowest tested grade, while a whole other cohort exits at the highest tested grade (except for those retained).

For these reasons, any comparisons between years must be done with extreme caution, but the most common way - simply comparing proficiency rates between years - is in many respects the worst. A closer look at this year’s New York City results illustrates this perfectly.

Melodramatic

At a press conference earlier this week, New York City Mayor Michael Bloomberg announced the city’s 2011 test results. Wall Street Journal reporter Lisa Fleisher, who was on the scene, tweeted Mayor Bloomberg’s remarks. According to Fleisher, the mayor claimed that there was a “dramatic difference” between his city’s testing progress between 2010 and 2011, as compared with the rest of state.

Putting aside the fact that the results do not measure “progress” per se, but rather cohort changes – a comparison of cross-sectional data that measures the aggregate performance of two different groups of students – I must say that I was a little astounded by this claim. Fleisher was also kind enough to tweet a photograph that the mayor put on the screen in order to illustrate the “dramatic difference” between the gains of NYC students relative to their non-NYC counterparts across the state.  Here it is:

A 'Summary Opinion' Of The Hoxby NYC Charter School Study

Almost two years ago, a report on New York City charter schools rocked the education policy world. It was written by Hoover Institution scholar Caroline Hoxby with co-authors Sonali Murarka and Jenny Kang. Their primary finding was that:

On average, a student who attended a charter school for all of grades kindergarten through eight would close about 86 percent of the “Scarsdale-Harlem achievement gap” [the difference in scores between students in Harlem and those in the affluent NYC suburb] in math, and 66 percent of the achievement gap in English.
The headline-grabbing conclusion was uncritically repeated by most major news outlets, including the New York Post, which called the charter effects “off the charts," and the NY Daily News, which announced that, from that day forward, anyone who opposed charter schools was “fighting to block thousands of children from getting superior educations." A week or two later, Mayor Michael Bloomberg specifically cited the study in announcing that he was moving to expand the number of NYC charter schools. Even today, the report is often mentioned as primary evidence favoring the efficacy of charter schools.

I would like to revisit this study, but not as a means to relitigate the “do charters work?" debate. Indeed, I have argued previously that we spend too much time debating whether charter schools “work," and too little time asking why some few are successful. Instead, my purpose is to illustrate an important research maxim: Even well-designed, sophisticated analyses with important conclusions can be compromised by a misleading presentation of results.

Would The New York City Layoffs Hurt Poor Schools More?

As first reported by the New York Times, the New York City Department of Education released a dataset this past Sunday, which lists the number of potential teacher layoffs that would occur in each school absent a budget infusion.

Layoffs are a terrible thing for schools and students, and this list is sobering. But the primary impetus for releasing for this dataset appears to be the city’s ongoing push to end so-called seniority-based layoffs, and its support for seniority-ending legislation that is now making its way through the state legislature. One of the big talking points on this issue has always been that layoffs that take experience into account would hurt high-poverty schools the most, because these schools tend to have the least experienced teachers. As I discussed in a prior post, Michelle Rhee is making this argument everywhere she goes, and it was one of the primary themes in a new report by the New Teacher Project (released last week). Although I have not heard city officials use the argument since the database was released over the weekend, similar assertions have very recently been made by Mayor Bloomberg, former Chancellor Joel Klein, and current Chancellor Cathie Black.

I find all this a bit curious, given that the best research on the topic finds that the argument is untrue (including a study of New York City, and a statewide analysis of Washington [also here]). Now, it is at least possible that, if layoffs were conducted strictly on the basis of seniority, higher-poverty schools could end up bearing the brunt of dismissals. This is almost never the case, however – layoffs in almost every district proceed based on a variety of criteria, among which seniority is only one (albeit often the most important).

It is fortuitous, then, that the city’s dataset provides an opportunity to test the claim that the “worst-case scenario” – over 4,500 layoffs using current New York City procedures – would hurt high-poverty schools the most. Let’s take a look.

Why Does Joel Klein Keep Misrepresenting Al Shanker?

Outgoing New York City Chancellor Klein loves to try to wrap himself in the mantle of Al Shanker. He is especially fond of pulling clipped Shanker quotes out of his hat—and out of context—when speaking about his favorite education “reforms." At first this may seem puzzling, because the ex-Chancellor is disinclined to give either the United Federation of Teachers or its parent organization, the American Federation of Teachers, credit for much of anything except intransigence. It must be an inconvenient truth for Klein that Shanker devoted his life to making both organizations into the strong and aggressive advocates for teachers and teaching that they continue to be.

In "What I Learned at the Barricades," a December 6 Wall Street Journal column, Klein leads up to his latest Shanker references with a characteristic litany of inaccurate claims – ones that Al would be quick to correct:

First, it is wrong to assert that students’ poverty and family circumstances severely limit their educational potential." And “Second, traditional proposals for improving education—more money, better curriculum, smaller classes, etc —aren’t going to get the job done.
Really? It’s hard to imagine which barricades Klein learned at. There is plenty of evidence to support the impact of all of these.

But, for those of us who knew and worked closely with Al (I did from 1967-1984 and from 1989 until his death in 1997), what’s truly galling is Klein’s distorted use of Al’s thinking to shore up a simplistic, narrowly punitive agenda that Shanker would have discredited.

Teacher Value-Added Scores: Publish And Perish

On the heels of the Los Angeles Times’ August decision to publish a database of teachers’ value-added scores, New York City newspapers are poised to do the same, with the hearing scheduled for late November.

Here’s a proposition: Those who support the use of value-added models (VAM) for any purpose should be lobbying against the release of teachers’ names and value-added scores.

The reason? Publishing the names directly compromises the accuracy of an already-compromised measure. Those who blindly advocate for publication – often saying things like “what’s the harm?" – betray their lack of knowledge about the importance of the models’ core assumptions, and the implications they carry for the accuracy of results. Indeed, the widespread publication of these databases may even threaten VAM’s future utility in public education.

"Outsource Everybody?"

The New York Times apparently thinks that outsourcing all city services in times of financial stress is such a great innovation that it merits page one treatment. The case in point: Maywood, Calif., where city officials last month fired every city employee and outsourced their work. According to the Times, many Maywood residents seem delighted, hence the headline: "A City Outsources Everything. Sky Doesn’t Fall."

The article describes Maywood as city that was abysmally managed for so long - its police department was especially singled out as a source of financial, legal, and political problems - that city officials claimed it faced bankruptcy unless drastic measures were taken. After reading the article, the first solution that came to my mind was to fire the city council that was responsible for the mess. But, of course, council members did not choose to fire themselves. Instead, after Maywood lost its liability insurance on June 30, city officials abruptly fired all city employees.

The Anti-Sweatshop Label: Dignity At 80 Cents A Pop

Can “doing the right thing” sell as well as “Just Do It”?

That’s the premise of a recent New York Times story, which describes the Knights Apparel company’s efforts to pay a living wage to unionized workers at a model factory in the Dominican Republic (hat tip to Jeff Ballinger). Knight’s Alta Gracia factory will pay factory workers $2.83 an hour to make college-label clothing for the U.S. market. This is enough to support a Dominican family of four, and nearly three and a half times the prevailing minimum wage paid by other factories making products bound for the U.S.