The Uncertain Short-Term Future Of School Growth Models

Over the past 20 years, public schools in the U.S. have come to rely more and more on standardized tests, and the COVID-19 pandemic has halted the flow of these data. This is hardly among the most important disruptions that teachers, parents, and students have endured over the past year or so. But one of the corollaries of skipping a year (or more) of testing is its implications for estimating growth models, which are statistical approaches for assessing the association between students' testing progress and those students' teachers, schools, or districts. 

This type of information, used properly, is always potentially useful, but it may be particularly timely right now, as we seek to understand how the COVID-19 pandemic affected educational outcomes, and, perhaps, how those outcomes varied by different peri-pandemic approaches to schooling. This includes the extent to which there were meaningful differences by student subgroup (e.g., low-income students who may have had more issues with virtual schooling). 

To be clear, the question of when states should resume testing should be evaluated based on what’s best for schools and students, and in my view this decision should not include consideration of any impact on accountability systems (the latest development is that states will not be allowed to cancel testing entirely but may be allowed to curtail it). In either case, though, the fate of growth models over the next couple of years is highly uncertain. The models rely on tracking student test scores over time, and so skipping a year (and maybe even more) is obviously a potential problem. A new working paper takes a first step toward assessing the short-term feasibility of growth estimates (specifically school and district scores). But this analysis also provides a good context for a deeper discussion of how we use (and sometimes misuse) testing data in education policy.

For Florida's School Grading System, A Smart Change With Unexpected Effects

Last year, we discussed a small but potentially meaningful change that Florida made to its school grading system, one that might have attenuated a long-standing tendency of its student “gains” measures, by design, to favor schools that serve more advantaged students. Unfortunately, this result doesn’t seem to have been achieved.

Prior to 2014-15, one of the criteria by which Florida students could be counted as having “made gains” was scoring as proficient or better in two consecutive years, without having dropped a level (e.g., from advanced to proficient). Put simply, this meant that students scoring above the proficiency threshold would be counted as making “gains,” even if they in fact made only average or even below average progress, so long as they stayed above the line. As a result of this somewhat crude “growth” measure, schools serving large proportions of students scoring above the proficiency line (i.e., schools in affluent neighborhoods) were virtually guaranteed to receive strong “gains” scores. Such “double counting” in the “gains” measures likely contributed to a very strong relationship between schools’ grades and their students’ socio-economic status (as gauged, albeit roughly, by subsidized lunch eligibility rates).

Florida, to its credit, changed this “double counting” rule effective in 2014-15. Students who score as proficient in two consecutive years are no longer automatically counted as making “gains.” They must also exhibit some score growth in order to receive the designation.

Improving Accountability Measurement Under ESSA

Despite the recent repeal of federal guidelines for states’ compliance with the Every Student Succeeds Act (ESSA), states are steadily submitting their proposals, and they are rightfully receiving some attention. The policies in these proposals will have far-reaching consequences for the future of school accountability (among many other types of policies), as well as, of course, for educators and students in U.S. public schools.

There are plenty of positive signs in these proposals, which are indicative of progress in the role of proper measurement in school accountability policy. It is important to recognize this progress, but impossible not to see that ESSA perpetuates long-standing measurement problems that were institutionalized under No Child Left Behind (NCLB). These issues, particularly the ongoing failure to distinguish between student and school performance, continue to dominate accountability policy to this day. Part of the confusion stems from the fact that school and student performance are not independent of each other. For example, a test score, by itself, gauges student performance, but it also reflects, at least in part, school effectiveness (i.e., the score might have been higher or lower had the student attended a different school).

Both student and school performance measures have an important role to play in accountability, but distinguishing between them is crucial. States’ ESSA proposals make the distinction in some respects but not in others. The result may end up being accountability systems that, while better than those under NCLB, are still severely hampered by improper inference and misaligned incentives. Let’s take a look at some of the key areas where we find these issues manifested.

Thinking About Tests While Rethinking Test-Based Accountability

Earlier this week, per the late summer ritual, New York State released its testing results for the 2015-2016 school year. New York City (NYC), always the most closely watched set of results in the state, showed a 7.6 percentage point increase in its ELA proficiency rate, along with a 1.2 percentage point increase in its math rate. These increases were roughly equivalent to the statewide changes.

City officials were quick to pounce on the results, which were called “historic,” and “pure hard evidence” that the city’s new education policies are working. This interpretation, while standard in the U.S. education debate, is, of course, inappropriate for many reasons, all of which we’ve discussed here countless times and will not detail again (see here). Suffice it to say that even under the best of circumstances these changes in proficiency rates are only very tentative evidence that students improved their performance over time, to say nothing of whether that improvement was due to a specific policy or set of policies.

Still, the results represent good news. A larger proportion of NYC students are scoring proficient in math and ELA than did last year. Real improvement is slow and sustained, and this is improvement. In addition, the proficiency rate in NYC is now on par with the statewide rate, which is unprecedented. There are, however, a couple of additional issues with these results that are worth discussing quickly.

A Small But Meaningful Change In Florida's School Grades System

Beginning in the late 1990s, Florida became one of the first states to assign performance ratings to public schools. The purpose of these ratings, which are in the form of A-F grades, is to communicate to the public “how schools are performing relative to state standards.” For elementary and middle schools, the grades are based entirely on standardized testing results.

We have written extensively here about Florida’s school grading system (see here for just one example), and have used it to illustrate features that can be found in most other states’ school ratings. The primary issue is the heavy reliance that states place on how highly students score on tests, which tells you more about the students the schools serve than about how well they serve those students – i.e., it conflates school and student performance. Put simply, some schools exhibit lower absolute testing performance levels than do other schools, largely because their students enter performing at lower levels. As a result, schools in poorer neighborhoods tend to receive lower grades, even though many of these schools are very successful in helping their students make fast progress during their few short years of attendance.

Although virtually every states’ school rating system has this same basic structure to varying degrees, Florida’s system warrants special attention, as it was one of the first in the nation and has been widely touted and copied (as well as researched -- see our policy brief for a review of this evidence). It is also noteworthy because it contains a couple of interesting features, one of which exacerbates the aforementioned conflation of student and school performance in a largely unnoticed manner. But, this feature, discussed below, has just been changed by the Florida Department of Education (FLDOE). This correction merits discussion, as it may be a sign of improvement in how policymakers think about these systems.

Charter Schools And Longer Term Student Outcomes

An important article in the Journal of Policy Analysis and Management presents results from one of the published analyses to look at the long term impact of attending charter schools.

The authors, Kevin Booker, Tim Sass, Brian Gill, and Ron Zimmer, replicate part of their earlier analysis of charter schools in Florida and Chicago (Booker et al. 2011), which found that students attending charter high schools had a substantially higher chance of graduation and college enrollment (relative to students that attended charter middle schools but regular public high schools). For this more recent paper, they extend the previous analysis, including the addition of two very important, longer term outcomes – college persistence and labor market earnings.

The limitations of test scores, the current coin of the realm, are well known; similarly, outcomes such as graduation may fail to capture meaningful skills. This paper is among the first to extend the charter school effects literature, which has long relied almost exclusively on test scores, into the longer term postsecondary and even adulthood realms, representing a huge step forward for this body of evidence. It is a development that is likely to become more and more common, as longitudinal data hopefully become available from other locations. And this particular paper, in addition to its obvious importance for the charter school literature, also carries some implications regarding the use of test-based outcomes in education policy evaluation.

Where Al Shanker Stood: The Importance And Meaning Of NAEP Results

In this New York Times piece, published on July 29, 1990, Al Shanker discusses the results of the National Assessment of Educational Progress (NAEP), and what they suggested about the U.S. education system at the time.

One of the things that has influenced me most strongly to call for radical school reform has been the results of the National Assessment of Educational Progress (NAEP) examinations. These exams have been testing the achievement of our 9, 13 and 17-year olds in a number of basic areas over the past 20 years, and the results have been almost uniformly dismal.

According to NAEP results, no 17-year-olds who are still in school are illiterate and innumerate - that is, all of them can read the words you would find on a cereal box or a billboard, and they can do simple arithmetic. But very few achieve what a reasonable person would call competence in reading, writing or computing.

For example, NAEP's 20-year overview, Crossroads in American Education, indicated that only 2.6 percent of 17-year-olds taking the test could write a good letter to a high school principal about why a rule should be changed. And when I say good, I'm talking about a straightforward presentation of a couple of simple points. Only 5 percent could grasp a paragraph as complicated as the kind you would find in a first-year college textbook. And only 6 percent could solve a multi-step math problem like this one:"Christine borrowed $850 for one year from Friendly Finance Company. If she paid 12% simple interest on the loan, what was the total amount she repaid?"

Is The Social Side Of Education Touchy Feely?

That's right, measuring social and organizational aspects of schools is just... well, "touchy feely." We all intuitively grasp that social relations are important in our work environments, that having mentors on the job can make a world of difference, that knowing how to work with colleagues matters to the quality of the end product, that innovation and improvement relies on the sharing of ideas, that having a good relationship with supervisors influences both engagement and performance, and so on.

I could go on, but I don't have to; we all just know these things. But is there hard evidence, other than common sense and our personal experiences? Behaviors such as collaboration and interaction or qualities like trust are difficult to quantify. In the end, is it possible that they are just 'soft' and that, even if they’re important (and they are), they just don't belong in policy conversations?

Wrong.

In this post, I review three distinct methodological approaches that researchers have used to understand social-organizational aspects of schools. Specifically, I selected studies that examine the relationship between aspects of teachers' social-organizational environments and their students' achievement growth. I focus both on the methods and on the substantive findings. This is because I think some basic sense of how researchers look at complex constructs like trust or collegiality can deepen our understanding of this work and lead us to embrace its implications for policy and practice more fully.

Charter Schools, Special Education Students, And Test-Based Accountability

Opponents often argue that charter schools tend to serve a disproportionately low number of special education students. And, while there may be exceptions and certainly a great deal of variation, that argument is essentially accurate. Regardless of why this is the case (and there is plenty of contentious debate about that), some charter school supporters have acknowledged that it may be a problem insofar as charters are viewed as a large scale alternative to regular public schools.

For example, Robin Lake, writing for the Center for Reinventing Public Education, takes issue with her fellow charter supporters who assert that “we cannot expect every school to be all things to every child.” She argues instead that schools, regardless of their governance structures, should never “send the soft message that kids with significant differences are not welcome,” or treat them as if “they are somebody else’s problem.” Rather, Ms. Lake calls upon charter school operators to take up the banner of serving the most vulnerable and challenging students and “work for systemic special education solutions.”

These are, needless to say, noble thoughts, with which many charter opponents and supporters can agree. Still, there is a somewhat more technocratic but perhaps more actionable issue lurking beneath the surface here: Put simply, until test-based accountability systems in the U.S. are redesigned such that they stop penalizing schools for the students they serve, rather than their effectiveness in serving those students, there will be a rather strong disincentive for charters to focus aggressively on serving special education students. Moreover, whatever accountability disadvantage may be faced by regular public schools that serve higher proportions of special education students pales in comparison with that faced by all schools, charter and regular public, located in higher-poverty areas. In this sense, then, addressing this problem is something that charter supporters and opponents should be doing together.

The Big Story About Gender Gaps In Test Scores

The OECD recently published a report about differences in test scores between boys and girls on the Programme for International Student Assessment (PISA), which is a test of 15 year olds conducted every three years in multiple subjects. The main summary finding is that, in most nations, girls are significantly less likely than boys to score below the “proficient” threshold in all three subjects (math, reading and science). The report also includes survey items and other outcomes.

First, it is interesting to me how discussions of these gender gaps differ from those about gaps between income or ethnicity groups. Specifically, when we talk about gender gaps, we interpret them properly – as gaps in measured performance between groups of students. Any discussion of gaps between groups defined in terms of income or ethnicity, on the other hand, are almost always framed in terms of school performance.

This is partially because schools in the U.S. are segregated by income and ethnicity, but not really by gender, and also because some folks have a tendency to overestimate the degree to which income- and ethnicity-based achievement gaps stem from systematic variation in schooling inputs, whereas in reality they are more a function of non-school factors (though, of course, schools matter, and differences in school quality reinforce the non-school-based impact). That said, returning to the findings of this report, I was slightly concerned with how, in some cases, they were reported in the media.