When It Comes To How We Use Evidence, Is Education Reform The New Welfare Reform?

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

In the mid-1990s, after a long and contentious debate, the U.S. Congress passed the Personal Responsibility and Work Opportunity Reconciliation Act of 1996, which President Clinton signed into law. It is usually called the “Welfare Reform Act," as it effectively ended the Aid to Families with Dependent Children (AFDC) program (which is what most people mean when they say “welfare," even though it was [and its successor is] only a tiny part of our welfare state). Established during the New Deal, AFDC was mostly designed to give assistance to needy young children (it was later expanded to include support for their parents/caretakers as well).

In place of AFDC was a new program – Temporary Assistance for Needy Families (TANF). TANF gave block grants to states, which were directed to design their own “welfare” programs. Although the states were given considerable leeway, their new programs were to have two basic features: first, for welfare recipients to receive benefits, they had to be working; and second, there was to be a time limit on benefits, usually 3-5 years over a lifetime, after which individuals were no longer eligible for cash assistance (states could exempt a proportion of their caseload from these requirements). The general idea was that time limits and work requirements would “break the cycle of poverty”; recipients would be motivated (read: forced) to work, and in doing so, would acquire the experience and confidence necessary for a bootstrap-esque transformation.

There are several similarities between the bipartisan welfare reform movement of the 1990s and the general thrust of the education reform movement happening today. For example, there is the reliance on market-based mechanisms to “cure” longstanding problems, and the unusually strong liberal-conservative alliance of the proponents. Nevertheless, while calling education reform “the new welfare reform” might be a good soundbyte, it would also take the analogy way too far.

My intention here is not to draw a direct parallel between the two movements in terms of how they approach their respective problems (poverty/unemployment and student achievement), but rather in how we evaluate their success in doing so. In other words, I am concerned that the manner in which we assess the success or failure of education reform in our public debate will proceed using the same flawed and misguided methods that were used by many for welfare reform.

The Ethics of Testing Children Solely To Evaluate Adults

The recent New York Times article, “Tests for Pupils, but the Grades Go to Teachers," alerts us of an emerging paradox in education – the development and use of standardized student testing solely as a means to evaluate teachers, not students. “We are not focusing on teaching and learning anymore; we are focusing on collecting data," says one mother quoted in the article. Now, let’s see: collecting data on minors that is not explicitly for their benefit – does this ring a bell?

In the world of social/behavioral science research, such an enterprise – collecting data on people, especially on minors – would inevitably require approval from the Institutional Review Board (IRB). For those not familiar, IRB is a committee that oversees research that involves people and is responsible for ensuring that studies are designed in an ethical manner. Even in conducting a seemingly harmless interview on political attitudes or observing a group studying in a public library, the researcher would almost certainly be required to go through a series of steps to safeguard participants and ensure that the norms governing ethical research will be observed.

Very succinctly, IRBs’ mission is to see that (1) the risk-benefit ratio of conducting the research is favorable; (2) any suffering or distress that participants may experience during or after the study is understood, minimized, and addressed; and (3) research participants’ agreed to participate freely and knowingly – usually, subjects are requested to sign an informed consent which includes a description of the study’s risks and benefits, a discussion of how confidentiality will be guaranteed, a statement on the voluntary nature of involvement, and a clarification that refusal or withdrawal at any time will involve no penalty or loss of benefits. When the research involves minors, parental consent and sometimes child assent are needed.

In short, IRB procedures exist to protect people. To my knowledge, student evaluation procedures and standardized testing are exempt from this sort of scrutiny. So the real question is: Should they be? Perhaps not.

Settling Scores

In 2007, when the D.C. City Council passed a law giving the mayor control of public schools, it required that a five-year independent evaluation be conducted to document the law’s effects and suggest changes. The National Research Council (a division of the National Academies) was charged with performing this task. As reported by Bill Turque in the Washington Post, the first report was released a couple of weeks ago.

The primary purpose of this first report was to give “first impressions” and offer advice on how the actual evaluation should proceed. It covered several areas – finance, special programs, organizational structure, etc. – but, given the controversy surrounding Michelle Rhee’s tenure, the section on achievement results got the most attention. The team was only able to analyze preliminary performance data; the same data that are used constantly by Rhee, her supporters, and her detractors to judge her tenure at the helm of DCPS.

It was one of those reports that tells us what we should already know, but too often fail to consider.

K-12 Standardized Testing Craze Hinders Enthusiasm And Creativity For The Long Haul

Our guest author today is Bill Scheuerman, professor of political science at the State University of New York, Oswego and a retired president of the United University Professions. He is also a member of the Shanker Institute board of directors.

A recent study by Richard Arum, Josipa Roksa, and Esther Cho, entitled Improving Undergraduate Learning: Findings and Policy Recommendations from the SSRC-CLA Longitudinal Project, should make us all take a closer look at student learning in higher education. The report finds that students enter college with values at odds with academic achievement. They party more and work less, but this lack of effort has had little or no effect on grade point averages. The study indicates that some 36 percent of current college graduates did not improve their critical thinking, complex reasoning and written communication skills, despite having relatively high GPAs. In other words, more than a third of new graduates lack the ability to understand and critically evaluate the world we live in.

Nobody is arguing that we should go back to the good old days when college access was limited to the elite. Politicians and business are united in the goal of the United States once again attaining the highest percentage of college graduates in the world.

Notably, in the face of rising global competition from China and India, President Obama has called this the "Sputnik moment" for math and science education in the U.S.

The Test-Based Language Of Education

A recent poll on education attitudes from Gallup and Phi Delta Kappan got a lot of attention, including a mention on ABC’s "This Week with Christian Amanpour," which devoted most of its show to education yesterday. They flashed results for one of the poll’s questions, showing that 72 percent of Americans believe that "each teacher should be paid on the basis of the quality of his or her work," rather than on a "standard-scale basis."

Anyone who knows anything about survey methodology knows that responses to questions can vary dramatically with different wordings (death tax, anyone?). The wording of this Gallup/PDK question, of course, presumes that the "quality of work" among teachers might be measured accurately. The term "teacher quality" is thrown around constantly in education circles, and in practice, it is usually used in the context of teachers’ effects on students’ test scores (as estimated by various classes of "value-added" models).

But let’s say the Gallup/PDK poll asked respondents if "each teacher should be paid on the basis of their estimated effect on their students’ standardized test scores, relative to other teachers." Think the results would be different? Of course. This doesn’t necessarily say anything about the "merit" of the compensation argument, so to speak, nor does it suggest that survey questions should always emphasize perfect accuracy over clarity (which would also create bias of a different sort). But has anyone looked around recently and seen just how many powerful words, such as "quality," are routinely used to refer to standardized test score-related measures? I made a tentative list.

The Cost Of Success In Education

Many are skeptical of the current push to improve our education system by means of test-based “accountability” - hiring, firing, and paying teachers and administrators, as well as closing and retaining schools, based largely on test scores. They say it won’t work. I share their skepticism, because I think it will.

There is a simple logic to this approach: when you control the supply of teachers, leaders, and schools based on their ability to increase test scores, then this attribute will become increasingly common among these individuals and institutions. It is called “selecting on the dependent variable," and it is, given the talent of the people overseeing this process and the money behind it, a decent bet to work in the long run.

Now, we all know the arguments about the limitations of test scores. We all know they’re largely true. Some people take them too far, others are too casual in their disregard. The question is not whether test scores provide a comprehensive measure of learning or subject mastery (of course they don’t). The better question is the extent to which teachers (and schools) who increase test scores a great deal are imparting and/or reinforcing the skills and traits that students will need after their K-12 education, relative to teachers who produce smaller gains. And this question remains largely unanswered.

This is dangerous, because if there is an unreliable relationship between teaching essential skills and the boosting of test scores, then success is no longer success. And by selecting teachers and schools based on those scores, we will have deliberately engineered our public education system to fail in spite of success.

It may be only then that we truly realize what we have done.

Data-Driven Decisions, No Data

According to an article in yesterday’s Washington Post, the outcome of the upcoming D.C. mayoral primary may depend in large part on gains in students’ “test scores” since Mayor Adrian Fenty appointed Michelle Rhee to serve as chancellor of the D.C. Public Schools (DCPS).

That struck me as particularly interesting because, as far as I can tell, Michelle Rhee has never released any test scores to the public. Not an average test score for any grade level or for any of the district’s schools or any subgroup of its students. None.

A Below Basic Understanding Of Proficiency

Given our extreme reliance on test scores as measures of educational success and failure, I'm sorry I have to make this point: proficiency rates are not test scores, and changes in proficiency rates do not necessarily tell us much about changes in test scores.

Yet, for example, in the Washington Post editorial about the latest test results from the District of Columbia Public Schools, at no fewer than seven different points (in a 450 word piece) do they refer to proficiency rates (and changes in these rates) as "scores." This is only one example of many.

So, what's the problem?