The Equity Projection

A new Mathematica report examines the test-based impact of The Equity Project (TEP), a New York City charter school serving grades 5-8. TEP opened up for the 2009-10 school year, receiving national attention mostly due to one unusual policy: They paid teachers $125,000 per year, regardless of experience and education, in addition to annual bonuses (up to $25,000) for returning teachers. TEP largely makes up for these unusually high salary costs by minimizing the number of administrators and maintaining larger class sizes.

As is typical of Mathematica, the TEP analysis is thorough and well-done. The school's students' performance is compared to that of similar peers with a comparable probability of enrolling in TEP, as identified with propensity scores. In general, the study’s results were quite positive. Although there were statistically discernible negative impacts of attendance for TEP’s first cohort of students during their first two years, the cumulative estimated test-based impact was significant, positive and educationally meaningful after three and four years of attendance. As always, the estimated effect was stronger in math than in reading (estimated effect sizes for the former were very large in magnitude). The Mathematica researchers also present analyses on student attrition, which did not appear to bias the estimates substantially, and they also show that their primary results are robust when using alternative specifications (e.g., different matching techniques, score transformations, etc.).

Now we get to the tricky questions about these results: What caused them and what can be learned as a result? That’s the big issue with charter analyses in general (and with research on many other interventions): One can almost never separate the “why” from the “what” with any degree of confidence. And TEP, with its "flagship policy" of high teacher salaries, which might appeal to all "sides" in the education policy debate, provides an interesting example in this respect.

The Superintendent Factor

One of the more visible manifestations of what I have called “informal test-based accountability” -- that is, how testing results play out in the media and public discourse -- is the phenomenon of superintendents, particularly big city superintendents, making their reputations based on the results during their administrations.

In general, big city superintendents are expected to promise large testing increases, and their success or failure is to no small extent judged on whether those promises are fulfilled. Several superintendents almost seem to have built entire careers on a few (misinterpreted) points in proficiency rates or NAEP scale scores. This particular phenomenon, in my view, is rather curious. For one thing, any district leader will tell you that many of their core duties, such as improving administrative efficiency, communicating with parents and the community, strengthening districts' financial situation, etc., might have little or no impact on short-term testing gains. In addition, even those policies that do have such an impact often take many years to show up in aggregate results.

In short, judging superintendents based largely on the testing results during their tenures seems misguided. A recent report issued by the Brown Center at Brookings, and written by Matt Chingos, Grover Whitehurst and Katharine Lindquist, adds a little bit of empirical insight to this viewpoint.

Can Early Language Development Promote Children's Psychological Wellbeing?

We know oral language is young children's door into the world of knowledge and ideas, the foundation for reading, and the bedrock of all academic learning. But, can language also protect young kids against behavioral problems?

A number of studies have identified a co-occurrence of language delays and behavioral maladjustment, an association that remains after controlling for socio-demographic characteristics and academic achievement (here and here). However, most research on the issue has been cross-sectional and correlational making it hard to establish whether behavioral issues cause language delays, language delays cause behavioral issues, or another factor is responsible for both.

A recent paper by Marc Bornstein, Chun-Shin Hahn, and Joan Suwalsky (2013) was able to shed some light on these questions concluding that "language competencies in early childhood keep behavioral adjustment problems at bay." This is important given the fact that minority children raised in poverty tend to have smaller than average vocabularies and are also overrepresented in pre-K expulsions and suspensions.

Matching Up Teacher Value-Added Between Different Tests

The U.S. Department of Education has released a very short, readable report on the comparability of value-added estimates using two different tests in Indiana – one of them norm-referenced (the Measures of Academic Progress test, or MAP), and the other criterion-referenced (the Indiana Statewide Testing for Educational Progress Plus, or ISTEP+, which is also the state’s official test for NCLB purposes).

The research design here is straightforward – fourth and fifth grade students in 46 schools across 10 districts in Indiana took both tests, their teachers’ value-added scores were calculated, and the scores were compared. Since both sets of scores were based on the same students and teachers, this is allows a direct comparison of how teachers’ value-added estimates compare between these two tests. The results are not surprising, and they square with similar prior studies (see here, here, here, for example): The estimates based on the two tests are moderately correlated. Depending on the grade/subject, they are between 0.4 and 0.7. If you’re not used to interpreting correlation coefficients, consider that only around one-third of teachers were in the same quintile (fifth) on both tests, and another 40 or so percent were one quintile higher or lower. So, most teachers were within a quartile, about a quarter of teachers moved two or more quintiles, and a small percentage moved from top to bottom or vice-versa.

Although, as mentioned above, these findings are in line with prior research, it is worth remembering why this “instability” occurs (and what can be done about it).

Opportunity To Churn: Teacher Assignments Within New York City Schools

Virtually all discussions of teacher turnover focuses on teachers leaving schools and/or the profession. However, a recent working paper by Allison Atteberry, Susanna Loeb and James Wyckoff, which was presented at this month’s CALDER conference, reaches a very interesting conclusion using data from New York City: There is actually more movement within NYC schools than between them.*

Specifically, the authors show that, during the years for which they had data (1997-2002 and 2004-2010), over 50 percent of teachers in any given year exhibited some form of movement (including leaving the profession or switching schools), but two-thirds of these moves were within schools – i.e., teachers changing grades or subjects. Moreover, they find that these within-school moves, like those between-schools/professions, appear to have a negative impact on testing outcomes, one which is very modest but statistically discernible in both math and reading.

There are a couple of interesting points related to these main findings.

Unionization And Working Poverty

Our guest author today is Ian Robinson, Lecturer in the Department of Sociology and in the Residential College’s interdisciplinary Social Theory and Practice program at the University of Michigan.

Poverty is (by definition) a function of inadequate income relative to family or household size. Low income has two possible proximate causes: insufficient hours of employment and/or insufficient hourly wages.  In 2001, there were four times more poor U.S. households in which someone had a job than there were in households in which no one did.  The same is still true today.  In other words, despite levels of unemployment far above post-World War Two norms, low wage jobs are by far the most important proximate cause of poverty in America today.

Perversely, despite this reality, the academic literature on U.S. poverty pays less attention to such jobs than it does to unemployment. A recent article, published in the journal American Sociological Review, both identifies and makes up for that shortcoming. In the process, its authors arrive at some striking conclusions. In particular, they find that unions are a major force for reducing poverty rates among households with at least one employed person.

ESEA Waivers And The Perpetuation Of Poor Educational Measurement

Some of the best research out there is a product not of sophisticated statistical methods or complex research designs, but rather of painstaking manual data collection. A good example is a recent paper by Morgan Polikoff, Andrew McEachin, Stephani Wrabel and Matthew Duque, which was published in the latest issue of the journal Educational Researcher.

Polikoff and his colleagues performed a task that makes most of the rest of us cringe: They read and coded every one of the over 40 state applications for ESEA flexibility, or “waivers." The end product is a simple but highly useful presentation of the measures states are using to identify “priority” (low-performing) and “focus” (schools "contributing to achievement gaps") schools. The results are disturbing to anyone who believes that strong measurement should guide educational decisions.

There's plenty of great data and discussion in the paper, but consider just one central finding: How states are identifying priority (i.e., lowest-performing) schools at the elementary level (the measures are of course a bit different for secondary schools).

Can Knowledge Level The Learning Field For Children?

** Reprinted here in the Core Knowledge Blog

How much do preschoolers from disadvantaged and more affluent backgrounds know about the world and why does that matter? One recent study by Tanya Kaefer (Lakehead University) Susan B. Neuman (New York University) and Ashley M. Pinkham (University of Michigan) provides some answers.

The researchers randomly selected children from preschool classrooms in two sites, one serving kids from disadvantaged backgrounds, the other serving middle-class kids. They then set about to answer three questions:

Incentives And Behavior In DC's Teacher Evaluation System

A new working paper, published by the National Bureau of Economic Research, is the first high quality assessment of one of the new teacher evaluation systems sweeping across the nation. The study, by Thomas Dee and James Wyckoff, both highly respected economists, focuses on the first three years of IMPACT, the evaluation system put into place in the District of Columbia Public Schools in 2009.

Under IMPACT, each teacher receives a point total based on a combination of test-based and non-test-based measures (the formula varies between teachers who are and are not in tested grades/subjects). These point totals are then sorted into one of four categories – highly effective, effective, minimally effective and ineffective. Teachers who receive a highly effective (HE) rating are eligible for salary increases, whereas teachers rated ineffective are dismissed immediately and those receiving minimally effective (ME) for two consecutive years can also be terminated. The design of this study exploits that incentive structure by, put very simply, comparing the teachers who were directly above the ME and HE thresholds to those who were directly below them, and to see whether they differed in terms of retention and performance from those who were not. The basic idea is that these teachers are all very similar in terms of their measured performance, so any differences in outcomes can be (cautiously) attributed to the system’s incentives.

The short answer is that there were meaningful differences.

So Many Purposes, So Few Tests

In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.

In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some "pilot" sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it's tough to make meaningful comparisons.

But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test," in which administrators and educators use questions from prior assessments to guide their instruction for the current year.