The Uncertain Short-Term Future Of School Growth Models

Over the past 20 years, public schools in the U.S. have come to rely more and more on standardized tests, and the COVID-19 pandemic has halted the flow of these data. This is hardly among the most important disruptions that teachers, parents, and students have endured over the past year or so. But one of the corollaries of skipping a year (or more) of testing is its implications for estimating growth models, which are statistical approaches for assessing the association between students' testing progress and those students' teachers, schools, or districts. 

This type of information, used properly, is always potentially useful, but it may be particularly timely right now, as we seek to understand how the COVID-19 pandemic affected educational outcomes, and, perhaps, how those outcomes varied by different peri-pandemic approaches to schooling. This includes the extent to which there were meaningful differences by student subgroup (e.g., low-income students who may have had more issues with virtual schooling). 

To be clear, the question of when states should resume testing should be evaluated based on what’s best for schools and students, and in my view this decision should not include consideration of any impact on accountability systems (the latest development is that states will not be allowed to cancel testing entirely but may be allowed to curtail it). In either case, though, the fate of growth models over the next couple of years is highly uncertain. The models rely on tracking student test scores over time, and so skipping a year (and maybe even more) is obviously a potential problem. A new working paper takes a first step toward assessing the short-term feasibility of growth estimates (specifically school and district scores). But this analysis also provides a good context for a deeper discussion of how we use (and sometimes misuse) testing data in education policy.

The False Choice Of Growth Versus Proficiency

Tennessee is considering changing its school accountability system such that schools have the choice of having their test-based performance judged by either status (how highly students score) or growth (how much progress students make over the course of the year). In other words, if schools do poorly on one measure, they are judged by the other (apparently, Texas already has a similar system in place).

As we’ve discussed here many times in the past, status measures, such as proficiency rates, are poor measures of school performance, since some students, particularly those living in poverty, enter their schools far behind their more affluent peers. As a result, schools serving larger proportions of poor students will exhibit lower scores and proficiency rates, even if they are very effective in compelling progress from their students. That is why growth models, which focus on individual student gains on over time, are a superior measure of school performance per se.

This so-called “growth versus proficiency” debate has resurfaced several times over the years, and it was particularly prevalent during the time when states were submitting proposals for their accountability systems during reauthorization of the Elementary and Secondary Education Act. The policy that came out of these discussions was generally promising, as many states moved at least somewhat toward weighting growth model estimates more heavily. 

At the same time, however, it is important to mention that the “growth versus proficiency” debate sometimes implies that states must choose between these two types of indicators. This is misleading. And the Tennessee proposal is a very interesting context for discussing this, since they are essentially using these two types of measures interchangeably. The reality, of course, is that both types of measures transmit valuable but different information, and both have a potentially useful role to play in accountability systems.

Where Do Achievement Gaps Come From?

For almost two decades now, educational accountability policy in the U.S. has included a focus on the performance of student subgroups, such as those defined by race and ethnicity, income, or special education status. The (very sensible) logic behind this focus is the simple fact that aggregate performance measures, whether at the state-, district-, or school levels, often mask large gaps between subgroups.

Yet one of the unintended consequences of this subgroup focus has been confusion among both policymakers and the public as to how to interpret and use subgroup indicators in formal school accountability systems, particularly when those indicators are expressed as simple “achievement gaps” or “gap closing” measures. This is not only because achievement gaps can narrow for undesirable reasons and widen for desirable reasons, but also because many gaps exist prior to entry into the school (or district). If, for instance, a large Hispanic/White achievement gap for a given cohort exists at the start of kindergarten, it is misleading and potentially damaging to hold a school accountable for the persistence of that gap in later grades – particularly in cases where public policy has failed to provide the extra resources and supports that might help lower-performing students make accelerated achievement gains every year. In addition, the coarseness of current educational variables, particularly those usually used as income proxies, limits the detail and utility of some subgroup measures.

A helpful and timely little analysis by David Figlio and Krzystof Karbownik, published by the Brookings Institution, addresses some of these issues, and the findings have clear policy implications.

For Florida's School Grading System, A Smart Change With Unexpected Effects

Last year, we discussed a small but potentially meaningful change that Florida made to its school grading system, one that might have attenuated a long-standing tendency of its student “gains” measures, by design, to favor schools that serve more advantaged students. Unfortunately, this result doesn’t seem to have been achieved.

Prior to 2014-15, one of the criteria by which Florida students could be counted as having “made gains” was scoring as proficient or better in two consecutive years, without having dropped a level (e.g., from advanced to proficient). Put simply, this meant that students scoring above the proficiency threshold would be counted as making “gains,” even if they in fact made only average or even below average progress, so long as they stayed above the line. As a result of this somewhat crude “growth” measure, schools serving large proportions of students scoring above the proficiency line (i.e., schools in affluent neighborhoods) were virtually guaranteed to receive strong “gains” scores. Such “double counting” in the “gains” measures likely contributed to a very strong relationship between schools’ grades and their students’ socio-economic status (as gauged, albeit roughly, by subsidized lunch eligibility rates).

Florida, to its credit, changed this “double counting” rule effective in 2014-15. Students who score as proficient in two consecutive years are no longer automatically counted as making “gains.” They must also exhibit some score growth in order to receive the designation.

Improving Accountability Measurement Under ESSA

Despite the recent repeal of federal guidelines for states’ compliance with the Every Student Succeeds Act (ESSA), states are steadily submitting their proposals, and they are rightfully receiving some attention. The policies in these proposals will have far-reaching consequences for the future of school accountability (among many other types of policies), as well as, of course, for educators and students in U.S. public schools.

There are plenty of positive signs in these proposals, which are indicative of progress in the role of proper measurement in school accountability policy. It is important to recognize this progress, but impossible not to see that ESSA perpetuates long-standing measurement problems that were institutionalized under No Child Left Behind (NCLB). These issues, particularly the ongoing failure to distinguish between student and school performance, continue to dominate accountability policy to this day. Part of the confusion stems from the fact that school and student performance are not independent of each other. For example, a test score, by itself, gauges student performance, but it also reflects, at least in part, school effectiveness (i.e., the score might have been higher or lower had the student attended a different school).

Both student and school performance measures have an important role to play in accountability, but distinguishing between them is crucial. States’ ESSA proposals make the distinction in some respects but not in others. The result may end up being accountability systems that, while better than those under NCLB, are still severely hampered by improper inference and misaligned incentives. Let’s take a look at some of the key areas where we find these issues manifested.

How Relationships Drive School Improvement—And Actionable Data Foster Strong Relationships

Our guest authors today are Elaine Allensworth, Molly Gordon and Lucinda Fickel. Allensworth is Lewis-Sebring Director of the University of Chicago Consortium on School Research; Gordon is Senior Research Analyst at the University of Chicago Consortium on School Research; and Fickel is Associate Director of Policy at the University of Chicago Urban Education Institute. Elaine Allensworth explores this topic further in Teaching in Context: The Social Side of Education Reform edited by Esther Quintero (Harvard Education Press: 2017). 

As researchers at the UChicago Consortium on School Research, we believe in using data to support school improvement, such as data on students’ performance in school (attendance, grades, behavior, test scores), surveys of students and teachers on their school experiences. But data does nothing on its own. In the quarter-century that our organization has been conducting research on Chicago Public Schools, one factor has emerged time and time again as vital both for making good use of data, and the key element in school improvement: relationships.

Squishy and amorphous as it might initially sound, there is actually solid empirical grounding not only about the importance of relationships for student learning, but also about the organizational factors that foster strong relationships. In 2010, the Consortium published Organizing Schools for Improvement, which drew on a decade of administrative and survey data to examine a framework called the 5Essentials (Bryk et al. 2010). The book details findings that elementary/middle schools strong on the 5Essentials—strong leaders, professional capacity, parent-community ties, instructional guidance, and a student-centered learning climate—were highly likely to improve, while others showed little change or fell behind.

Do Subgroup Accountability Measures Affect School Ratings Systems?

The school accountability provisions of No Child Left Behind (NCLB) institutionalized a focus on the (test-based) performance of student subgroups, such as English language learners, racial and ethnic groups, and students eligible for free- and reduced-price lunch (FRL). The idea was to shine a spotlight on achievement gaps in the U.S., and to hold schools accountable for serving all students.

This was a laudable goal, and disaggregating data by student subgroups is a wise policy, as there is much to learn from such comparisons. Unfortunately, however, NCLB also institutionalized the poor measurement of school performance, and so-called subgroup accountability was not immune. The problem, which we’ve discussed here many times, is that test-based accountability systems in the U.S. tend to interpret how highly students score as a measure of school performance, when it is largely a function of factors out of schools' control, such as student background. In other words, schools (or subgroups of those students) may exhibit higher average scores or proficiency rates simply because their students entered the schools at higher levels, regardless of how effective the school may be in raising scores. Although NCLB’s successor, the Every Student Succeeds Act (ESSA), perpetuates many of these misinterpretations, it still represents some limited progress, as it encourages greater reliance on growth-based measures, which look at how quickly students progress while they attend a school, rather than how highly they score in any given year (see here for more on this).

Yet this evolution, slow though it may be, presents a somewhat unique challenge for the inclusion of subgroup-based measures in formal school accountability systems. That is, if we stipulate that growth model estimates are the best available test-based way to measure school (rather than student) performance, how should accountability systems apply these models to traditionally lower scoring student subgroups?

Thinking About Tests While Rethinking Test-Based Accountability

Earlier this week, per the late summer ritual, New York State released its testing results for the 2015-2016 school year. New York City (NYC), always the most closely watched set of results in the state, showed a 7.6 percentage point increase in its ELA proficiency rate, along with a 1.2 percentage point increase in its math rate. These increases were roughly equivalent to the statewide changes.

City officials were quick to pounce on the results, which were called “historic,” and “pure hard evidence” that the city’s new education policies are working. This interpretation, while standard in the U.S. education debate, is, of course, inappropriate for many reasons, all of which we’ve discussed here countless times and will not detail again (see here). Suffice it to say that even under the best of circumstances these changes in proficiency rates are only very tentative evidence that students improved their performance over time, to say nothing of whether that improvement was due to a specific policy or set of policies.

Still, the results represent good news. A larger proportion of NYC students are scoring proficient in math and ELA than did last year. Real improvement is slow and sustained, and this is improvement. In addition, the proficiency rate in NYC is now on par with the statewide rate, which is unprecedented. There are, however, a couple of additional issues with these results that are worth discussing quickly.

A Small But Meaningful Change In Florida's School Grades System

Beginning in the late 1990s, Florida became one of the first states to assign performance ratings to public schools. The purpose of these ratings, which are in the form of A-F grades, is to communicate to the public “how schools are performing relative to state standards.” For elementary and middle schools, the grades are based entirely on standardized testing results.

We have written extensively here about Florida’s school grading system (see here for just one example), and have used it to illustrate features that can be found in most other states’ school ratings. The primary issue is the heavy reliance that states place on how highly students score on tests, which tells you more about the students the schools serve than about how well they serve those students – i.e., it conflates school and student performance. Put simply, some schools exhibit lower absolute testing performance levels than do other schools, largely because their students enter performing at lower levels. As a result, schools in poorer neighborhoods tend to receive lower grades, even though many of these schools are very successful in helping their students make fast progress during their few short years of attendance.

Although virtually every states’ school rating system has this same basic structure to varying degrees, Florida’s system warrants special attention, as it was one of the first in the nation and has been widely touted and copied (as well as researched -- see our policy brief for a review of this evidence). It is also noteworthy because it contains a couple of interesting features, one of which exacerbates the aforementioned conflation of student and school performance in a largely unnoticed manner. But, this feature, discussed below, has just been changed by the Florida Department of Education (FLDOE). This correction merits discussion, as it may be a sign of improvement in how policymakers think about these systems.

Caring School Leadership

Our guest authors today are Mark A. Smylie, professor emeritus at the University of Illinois-Chicago, Joseph Murphy, professor at Peabody College, Vanderbilt University, and Karen Seashore Louis, professor at the University of Minnesota.  Their research concerns school organization, leadership, and improvement.  This blog post is based on an article titled “Caring School Leadership: A Multi-Disciplinary, Cross-Occupational Model” which will be published later this year in the American Journal of Education.

From our years of studying school leadership and reform, working with practicing educators, and participating in education policy development, we have come to the conclusion that caring lies at the heart of effective schooling and good school leadership.  In this time of intense academic pressures, accountability policies, and top-down approaches to reform, however, the concept of caring has been neglected, overshadowed by attention to more “objective”, task-oriented aspects of school organization and leadership (Cassidy & Bates, 2005; Richert, 1994 (pp.109-118); Rooney, 2015).  This, we contend, is a serious problem for both students and teachers.

In this blog, we share some of our recent thinking about what caring school leadership is and why it is important. We draw on empirical and theoretical literatures from education and from disciplines outside education, particularly research on human service occupations such as health care, social services, and the ministry. And we present a model of caring school leadership. Our ideas were developed with principals in mind, but they apply to any educator engaged in school leadership work. We focus on students as the primary beneficiaries of caring. It should be noted that, as we argue for the importance of caring in schools, we do not mean to diminish the importance of academic achievement nor the need to care for staff and the community. We consider managing mutually-reinforcing combinations of caring support and academic press a central function of school leadership.