What Kindergartners Might Teach Us About Test-Based Accountability

There is an ongoing debate about widespread administration of standardized tests to kindergartners. This is of course a serious decision. My personal opinion about whether this is a good idea depends on several factors, such as how good the tests will be and, most importantly, how the results will be used (and I cannot say that I am optimistic about the latter).

Although the policy itself must be considered seriously on its merits, there is one side aspect of testing kindergarteners that fascinates me: It would demonstrate how absurd it is to judge school performance, as does NCLB, using absolute performance levels – i.e., how highly students score on tests, rather than their progress over time.

Basically, the kindergarten tests would inevitably shake out the same way as those administered in later grades. Schools and districts serving more disadvantaged students would score substantially lower than their counterparts in more affluent areas. If the scores were converted to proficiency rates or similar cut-score measures, they would show extremely low pass rates in urban districts such as Detroit.

A thought experiment: I would be very curious to see whether the same ill-informed advocates who use absolute performance measures to deem some schools and districts “low-performing” or “high-performing” would do the same with the kindergarten results. I’m pretty sure they would not, for obvious reasons: These tests would have been administered to students in their first year of formal K-12 schooling (perhaps even at the beginning of that year). And there would be no plausible way to attribute the testing results to the quality of schools’ educational services, since kindergartners would not yet have had the benefit of receiving these services.

Rather, they would have entered the schooling system at a certain level of performance (at least to the degree that tests could actually measure it). Some of them would be way ahead of others, and these discrepancies would be strongly and systematically associated with students’ background characteristics, such as income and parental education level.

So, no – I don’t think anyone would mistake absolute scores on kindergarten tests for measures of school performance. Because doing so would be ridiculous.

And if I’m correct about this – that everyone would recognize that the kindergarten results were pretty much entirely reflective of differences in non-schooling inputs (e.g., parental resources, access to early childhood education, etc.) prior to entry into the K-12 schooling system – then it’s a very short trip to questioning other, more widely accepted notions.

Consider, for instance, the results for fourth graders on the National Assessment of Educational Progress. Every time these results are released, scores of advocates and commentators portray high scores as representative of “high performing” states or districts, and low scores as evidence of low performance. Yet fourth graders have only been in the K-12 schooling system for a few years.

It may be reasonable to believe that differences in states’ average fourth grade scores – say, between Massachusetts and Mississippi – may partially reflect differences in the quality of those states’ schooling systems, but isn’t it a little absurd to believe that schooling inputs are the primary cause? Isn’t it more plausible to believe that the results for these states’ fourth graders are primarily a reflection of differences at the starting gate, rather than the quality of their schooling over three or four short years? And the same question applies, to a lesser extent, to the results from later grades as well.

If so, then how can we possibly justify the widespread portrayal of high-scoring states as having “high performing” schools? For that matter, how can we justify the fact that NCLB does exactly the same thing (i.e, relies on unadjusted proficiency rates as performance measures), and that the NCLB waivers are in most respects perpetuating this approach?

Similarly, the new tests might also demonstrate how the average scores of incoming cohorts of kindergartners can fluctuate quite a bit, especially at the school- and district levels. This would illustrate, at least indirectly, that what is commonly (and mistakenly) called "growth" -- for example, changes in average fourth or eighth grade TUDA scores between years -- is often largely attributable to changes in the sample of students taking the test.

Of course, I don’t actually advocate for a kindergarten testing regime based on my own frustrations with education measurement and how test results are portrayed in our debate (if anything, the latter is why I'm skeptical of the former) And there are some places where kindergartners are actually tested already. That said, I do think some of the participants in the debate, not to mention policymakers, could use a little “shock therapy” to see just how indefensible their assumptions actually are.

- Matt Di Carlo

Blog Topics

I agree that absolute proficiency, when used alone, is poor metric to gauge a school's quality. However, I disagree that it is absurd to use 4th grade NAEP scores to assess a school system. Of course, factors other than school inputs are at play, just as they are in 8th grade and 12th grade. However, these kids have likely been in the public school system for 5 years by the time they take the 4th grade NAPE, and the NAEP is simply assessing whether they have learned what a 4th grader should know.

Florida assesses incoming kindergartners with a very weak test and then determines the quality of funded four year old programs. A provider can be eliminated from the 540 hour program based upon a child's scores in the first 30 days of kindergarten. Wish I was making this up, but it's true.