Assessing Ourselves To Death

** Reprinted here in the Washington Post

I have two points to make. The first is something that I think everyone knows: Educational outcomes, such as graduation and test scores, are signals of or proxies for the traits that lead to success in life, not the cause of that success.

For example, it is well-documented that high school graduates earn more, on average, than non-graduates. Thus, one often hears arguments that increasing graduation rates will drastically improve students’ future prospects, and the performance of the economy overall. Well, not exactly.

The piece of paper, of course, only goes so far. Rather, the benefits of graduation arise because graduates are more likely to possess the skills – including the critical non-cognitive sort – that make people good employees (and, on a highly related note, because employers know that, and use credentials to screen applicants).

We could very easily increase the graduation rate by easing requirements, but this wouldn’t do much to help kids advance in the labor market. They might get a few more calls for interviews, but over the long haul, they’d still be at a tremendous disadvantage if they lacked the required skills and work habits.

Moreover, employers would quickly catch on, and adjust course accordingly. They’d stop relying as much on high school graduation to screen potential workers. This would not only deflate the economic value of a diploma, but high school completion would also become a less useful measure for policymakers and researchers.

This is, of course, one of the well-known risks of a high-stakes focus on metrics such as test scores. Test-based accountability presumes that tests can account for ability. We all know about what is sometimes called "Campbell’s Law," and we’ve all heard the warnings and complaints about so-called “teaching to the test." Some people take these arguments too far, while others are too casually dismissive. In general, though, the public (if not all policymakers) have a sense that test-based accountability can be a good thing so long as it is done correctly and doesn’t go too far.

Now, here’s my second point: I’m afraid we’ve gone too far.

I am not personally opposed to a healthy dose of test-based accountability. I believe that it has a useful role to play, both for measuring performance and for incentivizing improvement (and, of course, the use of testing data for research purposes is critical). I acknowledge that there’s no solid line and I realize that what I'm saying is not at all original, but I’m at the point where I think we need to stop putting more and more faith in instruments that are not really designed to bear that burden.

One can often hear people say that test-based accountability won’t "work." The reality, however, is that it probably will.

If we mold policy such that livelihoods depend on increasing scores, and we select and deselect people and institutions based on their ability to do so, then, over time, scores will most likely go up.

The question is what that will mean. A portion of this increase will reflect a concurrent improvement in useful skills and knowledge. But part of it will not (e.g., various forms of score inflation). To the degree the latter is the case, not only will it not help the students, but we will have more and more trouble knowing where we stand. Researchers will be less able to evaluate policies. We’ll end up celebrating and making decisions based on success that isn't really success, and that's worse than outright failure.

Obviously, this is all a matter of balancing the power of measurement and incentives against the risks. We most certainly should hold schools accountable for their results, and there are, at least at the moment, relatively few feasible alternatives to standardized tests. Furthermore, states have ways to keep track of tests' validity, such as comparing them with the results of low-stakes tests, so we're not quite flying blind here (though, even at this early stage, some of these comparisons are not exactly encouraging, and we sometimes seem unaware of what it means to have to resort to low-stakes tests to justify high-stakes test-based policies).

But think about what’s been happening – the big picture. Tests have been used for decision making for a long time, but, over the past decade or so, U.S. public schools have been held formally accountable for those outcomes. The pressure to boost scores is already very high - I would say too high in some places - but it's now shifting into overdrive. More and more schools are being subject to closure, restructuring, reconstitution, and other high-stakes consequences based mostly on how their students’ test scores turn out. Several states are awarding grant money and cash bonuses using test results. Schools are receiving grades and ratings, and, just like their students, their futures depend on them.

In many places, the jobs and reputations of superintendents and principals rise and fall with scale scores and proficiency rates. Such increases are a necessary (though hopefully not sufficient) condition for being considered a success. Every year, the release of data makes headlines. Mayors run campaigns on them. Districts’ hire publicity experts to present results in the most favorable light.

Moreover, over just 2-3 short years, it has become the norm to evaluate teachers based to varying degrees on their students’ testing outcomes. Non-test measures are often deemed suitable based on their correlation with test measures. Teachers are the core of any education system, and we are increasingly moving toward hiring, paying and firing them using standardized tests (which, by the way, most of them don't particularly trust).

New assessments - additional grades and subjects - are being designed largely for accountability purposes. There is a growing movement to hold teacher preparation programs accountable in part for the test-based productivity of their graduates. Websites and other resources are proliferating, allowing parents to choose schools (and even teachers) using testing data. Districts hire high-priced consultants specifically to boost achievement outcomes. We are even experimenting with test-based incentives for students.

Any one of these developments, or a group of them, might very well be a good thing. As a whole, however, they show how, at every level of our system, we are increasingly allocating resources and picking winners and losers – people and institutions – based in whole or in part on scores. This is a fundamental change in the relationships and structure of U.S. schools.

(And, making things worse, the manner in which in which the data are used and/or interpreted is often inappropriate.)

Few if any other nations in the world have gone this far. That doesn’t make it wrong, but it does mean that we have little idea how this will turn out.

I suspect that our relentless, expanding focus on high-stakes testing has already eroded the connection between scores and future outcomes. Some of this erosion is inevitable and even tolerable, but the more it occurs, the less ab;e we’ll be to have any sense of what works or where we are. I think that research on this connection and how it is changing over time is among the most important areas in education policy today.

And I’m troubled by the possibility that, if we don't pull back the reins, this research may eventually show that we pushed the pendulum to its ultimate breaking point and structured a huge portion of our education system around measures that were only useful in the first place because we didn’t use them so much.

- Matt Di Carlo

Blog Topics

Education Policy

Education Reform

NCLB

I was once like you - I believed standardized testing appropriately defined and measured what students knew and could do. After many years of teaching, I trust NO data. Students are being taught how to take tests from teachers that excel at reproducing questions that appear on state-mandated tests.

Everything we do hinges on getting kids to excel at testing, sort of like in Asian countries. The only difference is our kids don't care.

This is not education, it is educational perversion.

Love what you said about grad rates, that grad rates are signals and that our education system should be more concerned about what a HS diploma means rather than on how many can be awarded.

One thing I'd like to add on the subject of high-stakes testing is that it's not only the schools, teachers, soon-to-be teachers, principals, and superintendents who are affected; the students, how they view learning and school, and the quality of education they receive has also been greatly affected. Listen to what a group of 4th grade public school students have to say about school, testing, and learning, (and then talk to the parents who care for them and then talk to the HS and college instructors who teach them later) and it will be become even clearer how destructive high-stakes testing is on so many levels--psychologically, intellectually, spiritually, physically.

Very well written. Thanks.

Nailed it.

Thought provoking.

Can I attempt to pin you down on something?

You write: "I suspect that our relentless, expanding focus on high-stakes testing has already eroded the connection between scores and future outcomes."

My question...if future studies emerge to show close correlations b/w scores on high stakes tests and future life outcomes, will you walk this one back? Cuz I think they're coming...

MG,

Absolutely. Gladly. But you're not pinning me down, and there's nothing to walk back. I don't want this to happen. It's not an argument; it's a genuine concern (one that's not even remotely original).

Two more things. First, remember that this is all a matter of degree. Even if the connection between scores and future outcomes is eroding, there will still be a connection. Second, even if the current situation is okay, my primary concern is over the long haul. It will be many years before this plays out (and the eventual outcomes will unfold in stages - college enrollment, college completion, employment, early career earnings, and so on). But monitoring along the way is helpful.

Again, though, I'm (uncharacteristically) happy to be wrong about this.

Matt,Very thoughtful post. I'm getting close to arriving at your position.

MG - I would not be surprised to continue to see close correlations between high stakes tests and future outcomes in studies for some time, for many reasons. First, most studies will continue to rely on lots of data collected before the high accountability context of the past couple of years. Indeed, didn't the Chetty et al study rely primarily on low-stakes testing data? This was by definition a necessity, since they were looking at long term outcomes and back tracked to historic data to predict these outcomes.

Second, test scores may continue to be strong predictors because they are proxies for skills and knowledge that are not what is actually directly assessed (i.e., the content matter) but that still matter. For instance, a strong work ethic, ability to concentrate on a relatively boring task for a long period of time, problem solving skills, ability to defer gratification, goal orientation, etc.

The danger is that we make false attributions about what skills, knowledge, and dispositions actually matter for long term success and skew our education programs in response to this. (This is not even getting to whole cheating issues).

I'm not so sure that I agree with Matt that everyone knows that: "Educational outcomes, such as graduation and test scores, are signals of or proxies for the traits that lead to success in life, not the cause of that success." Many of the less thoughtful educators and policymakers act as if they don't fully understand this.

I am a music teacher, and I believe that music study is brain-training because of the body's simultaneous engagement of both brain hemispheres during musical experiences. However, I believe it's important to examine reality. Stating that graduation is the reason for success is like saying that studying music simply makes you a better student. While there is research that implies that "music makes you 'smarter'," after a fashion, there are other studies that point to a different reason as to why school music programs have so many successful students--strong students are attracted to these programs. Strong students are willing to work hard to achieve.

MD, thanks.

Maybe I could ask it differently.

How would we know if the connection between scores and future outcomes "erodes?" What numbers would emerge to show they're weaker than "now"?

MG,

For example, put very simply, a decrease in the consistency/degree to which levels and/or increases in scores/graduation predict future outcomes such as college attainment, earnings, etc.

One example (of many): http://bit.ly/RJZiTD

(Note: These relationships are of course tough to disentangle and attribute to causes, especially over time.)

Shanker Blog

Assessing Ourselves To Death