• A Case For Value-Added In Low-Stakes Contexts

    Most of the controversy surrounding value-added and other test-based models of teacher productivity centers on the high-stakes use of these estimates. This is unfortunate – no matter what you think about these methods in the high-stakes context, they have a great deal of potential to improve instruction.

    When supporters of value-added and other growth models talk about low-stakes applications, they tend to assert that the data will inspire and motivate teachers who are completely unaware that they’re not raising test scores. In other words, confronted with the value-added evidence that their performance is subpar (at least as far as tests are an indication), teachers will rethink their approach. I don’t find this very compelling. Value-added data will not help teachers – even those who believe in its utility – unless they know why their students’ performance appears to be comparatively low. It’s rather like telling a baseball player they’re not getting hits, or telling a chef that the food is bad – it’s not constructive.

    Granted, a big problem is that value-added models are not actually designed to tell us why teachers get different results – i.e., whether certain instructional practices are associated with better student performance. But the data can be made useful in this context; the key is to present the information to teachers in the right way, and rely on their expertise to use it effectively.

  • A Big Open Question: Do Value-Added Estimates Match Up With Teachers' Opinions Of Their Colleagues?

    A recent article about the implementation of new teacher evaluations in Tennessee details some of the complicated issues with which state officials, teachers and administrators are dealing in adapting to the new system. One of these issues is somewhat technical – whether the various components of evaluations, most notably principal observations and test-based productivity measures (e.g., value-added) – tend to “match up." That is, whether teachers who score high on one measure tend to do similarly well on the other (see here for more on this issue).

    In discussing this type of validation exercise, the article notes:

    If they don't match up, the system's usefulness and reliability could come into question, and it could lose credibility among educators.
    Value-added and other test-based measures of teacher productivity may have a credibility problem among many (but definitely not all) teachers, but I don’t think it’s due to – or can be helped much by – whether or not these estimates match up with observations or other measures being incorporated into states’ new systems. I’m all for this type of research (see here and here), but I’ve never seen what I think would be an extremely useful study for addressing the credibility issue among teachers: One that looked at the relationship between value-added estimates and teachers’ opinions of each other.
  • A Look Inside Principals' Decisions To Dismiss Teachers

    Despite all the heated talk about how to identify and dismiss low-performing teachers, there’s relatively little research on how administrators choose whom to dismiss, whether various dismissal options might actually serve to improve performance, and other aspects in this area. A paper by economist Brian Jacob, released as working paper in 2010 and published late last year in the journal Education Evaluation and Policy Analysis, helps address at least one of these voids, by providing one of the few recent glimpses into administrators’ actual dismissal decisions.

    Jacob exploits a change in Chicago Public Schools (CPS) personnel policy that took effect for the 2004-05 school year, one which strengthened principals’ ability to dismiss probationary teachers, allowing non-renewal for any reason, with minimal documentation. He was able to link these personnel records to student test scores, teacher and school characteristics and other variables, in order to examine the characteristics that principals might be considering, directly or indirectly, in deciding who would and would not be dismissed.

    Jacob’s findings are intriguing, suggesting a more complicated situation than is sometimes acknowledged in the ongoing debate over teacher dismissal policy.

  • Fundamental Flaws In The IFF Report On D.C. Schools

    A new report, commissioned by the District of Columbia Mayor Vincent Gray and conducted by the Chicago-based consulting organization IFF, was supposed to provide guidance on how the District might act and invest strategically in school improvement, including optimizing the distribution of students across schools, many of which are either over- or under-enrolled.

    Needless to say, this is a monumental task. Not only does it entail the identification of high- and low-performing schools, but plans for improving them as well. Even the most rigorous efforts to achieve these goals, especially in a large city like D.C., would be to some degree speculative and error-prone.

    This is not a rigorous effort. IFF’s final report is polished and attractive, with lovely maps and color-coded tables presenting a lot of summary statistics. But there’s no emperor underneath those clothes. The report's data and analysis are so deeply flawed that its (rather non-specific) recommendations should not be taken seriously.

  • The Perilous Conflation Of Student And School Performance

    Unlike many of my colleagues and friends, I personally support the use of standardized testing results in education policy, even, with caution and in a limited role, in high-stakes decisions. That said, I also think that the focus on test scores has gone way too far and their use is being implemented unwisely, in many cases to a degree at which I believe the policies will not only fail to generate improvement, but may even risk harm.

    In addition, of course, tests have a very productive low-stakes role to play on the ground – for example, when teachers and administrators use the results for diagnosis and to inform instruction.

    Frankly, I would be a lot more comfortable with the role of testing data – whether in policy, on the ground, or in our public discourse – but for the relentless flow of misinterpretation from both supporters and opponents. In my experience (which I acknowledge may not be representative of reality), by far the most common mistake is the conflation of student and school performance, as measured by testing results.

    Consider the following three stylized arguments, which you can hear in some form almost every week:

  • Schools' Effectiveness Varies By What They Do, Not What They Are

    There may be a mini-trend emerging in certain types of charter school analyses, one that seems a bit trivial but has interesting implications that bear on the debate about charter schools in general. It pertains to how charter effects are presented.

    Usually, when researchers estimate the effect of some intervention, the main finding is the overall impact, perhaps accompanied by a breakdown by subgroups and supplemental analyses. In the case of charter schools, this would be the estimated overall difference in performance (usually testing gains) between students attending charters versus their counterparts in comparable regular public schools.

    Two relatively recent charter school reports, however – both generally well-done given their scope and available data – have taken a somewhat different approach, at least in the “public roll-out” of their results.

  • Getting Ready For The Common Core

    Our guest author today, Susan B. Neuman, is a professor in Educational Studies at the University of Michigan specializing in early literacy development and a former U.S. Secretary of Education for Elementary and Secondary Education. She and her colleagues at the University of Michigan have also partnered with the Albert Shanker Institute in sponsoring a summer institute for early childhood educators, focusing specifically on oral language development and the ways it can support and help build strong content knowledge. For more information, see here.

    States are now working intently on developing plans that will make new, common standards a reality. A recent report from Education First and the Editorial Projects in Education Research Center concludes that that all but one of the 47 states adopting the Common Core State Standards is now in the implementation phase. Seven states have fully upgraded professional development, curriculum materials, and evaluation systems in preparation for the 2014-2015 school year.

    Nary a word has been spoken about how to prepare teachers to implement common standards appropriately in the early childhood years. Although the emphasis on content-rich instruction in ways that builds knowledge is an important one, standards groups have virtually ignored the early years when these critical skills first begin to develop.

    Young children are eager to learn about the sciences, arts, and the world around them. And, as many early childhood teachers recognize, we need to provide content-rich instruction that is both developmentally appropriate and highly engaging to support students' learning.

  • The Indiana Model

    Indiana is well on its way to becoming a ‘right-to-work’ state this week, with the state’s Republican-controlled House of Representatives approving new legislation and the Senate poised to follow suit. The legislation weakens union protections and enables individual workers to refuse to pay their share of union representation costs, even if a majority of their coworkers have voted for union representation and the union is legally obligated to pay to bargain for and protect their rights on the job. It is the first Midwestern manufacturing state to pass such a bill, though other Republican-dominated state legislatures are considering similar legislation.

    One of the most interesting things about this move is just how unpopular it is. According to the AFL-CIO, only one-third of Indiana voters favor the legislation and more than 70 percent of them want the question submitted to a vote, via a state referendum. So why, in an election year, have Republican politicians decided to push forward?

  • A Dark Day For Educational Measurement In The Sunshine State

    Just this week, Florida announced its new district grading system. These systems have been popping up all over the nation, and given the fact that designing one is a requirement of states applying for No Child Left Behind waivers, we are sure to see more.

    I acknowledge that the designers of these schemes have the difficult job of balancing accessibility and accuracy. Moreover, the latter requirement – accuracy – cannot be directly tested, since we cannot know “true” school quality. As a result, to whatever degree it can be partially approximated using test scores, disagreements over what specific measures to include and how to include them are inevitable (see these brief analyses of Ohio and California).

    As I’ve discussed before, there are two general types of test-based measures that typically comprise these systems: absolute performance and growth. Each has its strengths and weaknesses. Florida’s attempt to balance these components is a near total failure, and it shows in the results.

  • Performance And Chance In New York's Competitive District Grant Program

    New York State recently announced a new $75 million competitive grant program, which is part of its Race to the Top plan. In order to receive some of the money, districts must apply, and their applications receive a score between zero and 115. Almost a third of the points (35) are based on proposals for programs geared toward boosting student achievement, 10 points are based on need, and there are 20 possible points awarded for a description of how the proposal fits into districts’ budgets.

    The remaining 50 points – almost half – of the application is based on “academic performance” over the prior year. Four measures are used to produce the 0-50 point score: One is the year-to-year change (between 2010 and 2011) in the district’s graduation rate, and the other three are changes in the state “performance index” in math, English Language Arts (ELA) and science. The “performance index” in these three subjects is calculated using a simple weighting formula that accounts for the proportion of students scoring at levels 2 (basic), 3 (proficient) and 4 (advanced).

    The idea of using testing results as a criterion in the awarding of grants is to reward those districts that are performing well. Unfortunately, due to the choice of measures and how they are used, the 50 points will be biased and to no small extent based on chance.