• Charter And Regular Public School Performance In "Ohio 8" Districts, 2010-11

    Every year, the state of Ohio releases an enormous amount of district- and school-level performance data. Since Ohio has among the largest charter school populations in the nation, the data provide an opportunity to examine performance differences between charters and regular public schools in the state.

    Ohio’s charters are concentrated largely in the urban “Ohio 8” districts (sometimes called the “Big 8”): Akron; Canton; Cincinnati; Cleveland; Columbus; Dayton; Toledo; and Youngstown. Charter coverage varies considerably between the “Ohio 8” districts, but it is, on average, about 20 percent, compared with roughly five percent across the whole state. I will therefore limit my quick analysis to these districts.

    Let’s start with the measure that gets the most attention in the state: Overall “report card grades." Schools (and districts) can receive one of six possible ratings: Academic emergency; academic watch; continuous improvement; effective; excellent; and excellent with distinction.

    These ratings represent a weighted combination of four measures. Two of them measure performance “growth," while the other two measure “absolute” performance levels. The growth measures are AYP (yes or no), and value-added (whether schools meet, exceed, or come in below the growth expectations set by the state’s value-added model). The first “absolute” performance measure is the state’s “performance index," which is calculated based on the percentage of a school’s students who fall into the four NCLB categories of advanced, proficient, basic and below basic. The second is the number of “state standards” that schools meet as a percentage of the number of standards for which they are “eligible." For example, the state requires 75 percent proficiency in all the grade/subject tests that a given school administers, and schools are “awarded” a “standard met” for each grade/subject in which three-quarters of their students score above the proficiency cutoff (state standards also include targets for attendance and a couple of other non-test outcomes).

    The graph below presents the raw breakdown in report card ratings for charter and regular public schools.

  • What Americans Think About Teachers Versus What They're Hearing

    The results from the recent Gallup/PDK education survey found that 71 percent of surveyed Americans “have trust and confidence in the men and women who are teaching children in public schools." Although this finding received a fair amount of media attention, it is not at all surprising. Polls have long indicated that teachers are among the most trusted professions in the U.S., up there with doctors, nurses and firefighters.

    (Side note: The teaching profession also ranks among the most prestigious U.S. occupations – in both analyses of survey data as well as in polls [though see here for an argument that occupational prestige scores are obsolete].)

    What was rather surprising, on the other hand, was the Gallup/PDK results for the question about what people are hearing about teachers in the news media. Respondents were asked, “Generally speaking, do you hear more good stories or bad stories about teachers in the news media?"

    Over two-thirds (68 percent) said they heard more bad stories than good ones. A little over a quarter (28 percent) said the opposite.

  • Certainty And Good Policymaking Don't Mix

    Using value-added and other types of growth model estimates in teacher evaluations is probably the most controversial and oft-discussed issue in education policy over the past few years.

    Many people (including a large proportion of teachers) are opposed to using student test scores in their evaluations, as they feel that the measures are not valid or reliable, and that they will incentivize perverse behavior, such as cheating or competition between teachers. Advocates, on the other hand, argue that student performance is a vital part of teachers’ performance evaluations, and that the growth model estimates, while imperfect, represent the best available option.

    I am sympathetic to both views. In fact, in my opinion, there are only two unsupportable positions in this debate: Certainty that using these measures in evaluations will work; and certainty that it won’t. Unfortunately, that’s often how the debate has proceeded – two deeply-entrenched sides convinced of their absolutist positions, and resolved that any nuance in or compromise of their views will only preclude the success of their efforts. You’re with them or against them. The problem is that it's the nuance - the details - that determine policy effects.

    Let’s be clear about something: I'm not aware of a shred of evidence – not a shred – that the use of growth model estimates in teacher evaluations improves performance of either teachers or students.

  • Our Annual Testing Data Charade

    Every year, around this time, states and districts throughout the nation release their official testing results. Schools are closed and reputations are made or broken by these data. But this annual tradition is, in some places, becoming a charade.

    Most states and districts release two types of assessment data every year (by student subgroup, school and grade): Average scores (“scale scores”); and the percent of students who meet the standards to be labeled proficient, advanced, basic and below basic. The latter type – the rates – are of course derived from the scores – that is, they tell us the proportion of students whose scale score was above the minimum necessary to be considered proficient, advanced, etc.

    Both types of data are cross-sectional. They don’t follow individual students over time, but rather give a “snapshot” of aggregate performance among two different groups of students (for example, third graders in 2010 compared with third graders in 2011). Calling the change in these results “progress” or “gains” is inaccurate; they are cohort changes, and might just as well be chalked up to differences in the characteristics of the students (especially when changes are small). Even averaged across an entire school or district, there can be huge differences in the groups compared between years – not only is there often considerable student mobility in and out of schools/districts, but every year, a new cohort enters at the lowest tested grade, while a whole other cohort exits at the highest tested grade (except for those retained).

    For these reasons, any comparisons between years must be done with extreme caution, but the most common way - simply comparing proficiency rates between years - is in many respects the worst. A closer look at this year’s New York City results illustrates this perfectly.

  • Teachers' Preparation Routes And Policy Views

    In a previous post, I lamented the scarcity of survey data measuring what teachers think of different education policy reforms. A couple of weeks ago, the National Center for Education Information (NCEI) released the results of their teacher survey (conducted every five years), which provides a useful snapshot of teachers’ opinions toward different policies (albeit not at the level of detail that one might wish).

    There are too many interesting results to review in one post, and I encourage you to take a look at the full set yourself. There was, however, one thing about the survey tabulations that I found particularly striking, and that was the high degree to which policy opinions differed between traditionally-certified teachers and those who entered teaching through alternative certification (alt-cert).

    In the figure below, I reproduce data from the NCEI report’s battery of questions about whether teachers think different policies would “improve education." Respondents are divided by preparation route – traditional and alternative.

  • Test-Based Teacher Evaluations Are The Status Quo

    We talk a lot about the “status quo” in our education debates. For instance, there is a common argument that the failure to use evidence of “student learning” (in practice, usually defined in terms of test scores) in teacher evaluations represents the “status quo” in this (very important) area.

    Now, the implication that “anything is better than the status quo” is a rather massive fallacy in public policy, as it assumes that the costs of alternatives will outweigh benefits, and that there is no chance the replacement policy will have a negative impact (almost always an unsafe assumption). But, in the case of teacher evaluations, the “status quo” is no longer what people seem to think.

    Not counting Puerto Rico and Hawaii, the ten largest school districts in the U.S. are (in order): New York City; Los Angeles; Chicago; Dade County (FL); Clark County (NV); Broward County (FL); Houston; Hillsborough (FL); Orange County (FL); and Palm Beach County (FL). Together, they serve about eight percent of all K-12 public school students in the U.S., and over one in ten of the nation’s low-income children.

    Although details vary, every single one of them is either currently using test-based measures of effectiveness in its evaluations, or is in the process of designing/implementing these systems (most due to statewide legislation).

  • Merit Pay: The End Of Innocence?

    ** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

    The current teacher salary scale has come under increasing fire, and for a reason. Systems where people are treated more or less the same suffer from two basic problems. First, there will always be a number of "free riders." Second, and relatedly, some people may feel their contributions aren’t sufficiently recognized. So, what are good alternatives? I am not sure; but based on decades worth of economic and psychological research, measures such as merit pay are not it.

    Although individual pay for performance (or merit pay) is a widespread practice among U.S. businesses, the research on its effectiveness shows it to be of limited utility (see here, here, here, and here), mostly because it’s easy for its benefits to be swamped by unintended consequences. Indeed, psychological research indicates that a focus on financial rewards may serve to (a) reduce intrinsic motivation, (b) heighten stress to the point that it impairs performance, and (c) promote a narrow focus reducing how well people do in all dimensions except the one being measured.

    In 1971, a research psychologist named Edward Deci published a paper concluding that, while verbal reinforcement and positive feedback tends to strengthen intrinsic motivation, monetary rewards tend to weaken it. In 1999, Deci and his colleagues published a meta-analysis of 128 studies (see here), again concluding that, when people do things in exchange for external rewards, their intrinsic motivation tends to diminish. That is, once a certain activity is associated with a tangible reward, such as money, people will be less inclined to participate in the task when the reward is not present. Deci concluded that extrinsic rewards make it harder for people to sustain self-motivation.

  • Attracting The "Best Candidates" To Teaching

    ** Also posted here on "Valerie Strauss' Answer Sheet" in the Washington Post

    One of the few issues that all sides in the education debate agree upon is the desirability of attracting “better people” into the teaching profession. While this certainly includes the possibility of using policy to lure career-switchers, most of the focus is on attracting “top” candidates right out of college or graduate school.

    The common metric that is used to identify these “top” candidates is their pre-service (especially college) characteristics and performance. Most commonly, people call for the need to attract teachers from the “top third” of graduating classes, an outcome that is frequently cited as being the case in high-performing nations such as Finland. Now, it bears noting that “attracting better people," like “improving teacher quality," is a policy goal, not a concrete policy proposal – it tells us what we want, not how to get it. And how to make teaching more enticing for “top” candidates is still very much an open question (as is the equally important question of how to improve the performance of existing teachers).

    In order to answer that question, we need to have some idea of whom we’re pursuing – who are these “top” candidates, and what do they want? I sometimes worry that our conception of this group – in terms of the “top third” and similar constructions – doesn’t quite square with the evidence, and that this misconception might actually be misguiding rather than focusing our policy discussions.

  • Comparing Teacher Turnover In Charter And Regular Public Schools

    ** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

    A couple of weeks ago, a new working paper on teacher turnover in Los Angeles got a lot of attention, and for good reason. Teacher turnover, which tends to be alarmingly high in lower-income schools and districts, has been identified as a major impediment to improvements in student achievement.

    Unfortunately, some of the media coverage of this paper has tended to miss the mark. Mostly, we have seen horserace stories focusing on fact that many charter schools have very high teacher turnover rates, much higher than most regular public schools in LA. The problem is that, as a group, charter school teachers are significantly dissimilar to their public school peers. For instance, they tend to be younger and/or less experienced than public school teachers overall; and younger, less experienced teachers tend to exhibit higher levels of turnover across all types of schools. So, if there is more overall churn in charter schools, this may simply be a result of the demographics of the teaching force or other factors, rather than any direct effect of charter schools per se (e.g., more difficult working conditions).

    But the important results in this paper aren’t about the amount of turnover in charters versus regular public schools, which can measured very easily, but rather the likelihood that similar teachers in these schools will exit.

  • Melodramatic

    At a press conference earlier this week, New York City Mayor Michael Bloomberg announced the city’s 2011 test results. Wall Street Journal reporter Lisa Fleisher, who was on the scene, tweeted Mayor Bloomberg’s remarks. According to Fleisher, the mayor claimed that there was a “dramatic difference” between his city’s testing progress between 2010 and 2011, as compared with the rest of state.

    Putting aside the fact that the results do not measure “progress” per se, but rather cohort changes – a comparison of cross-sectional data that measures the aggregate performance of two different groups of students – I must say that I was a little astounded by this claim. Fleisher was also kind enough to tweet a photograph that the mayor put on the screen in order to illustrate the “dramatic difference” between the gains of NYC students relative to their non-NYC counterparts across the state.  Here it is: