• The Challenges Of Pre-K Assessment

    In the United States, nearly 1.3 million children attend publicly-funded preschool. As enrollment continues to grow, states are under pressure to prove these programs serve to increase school readiness. Thus, the task of figuring out how best to measure preschoolers’ learning outcomes has become a major policy focus.

    First, it should be noted that researchers are almost unanimous in their caution about this subject. There are inherent difficulties in the accurate assessment of very young children’s learning in the fields of language, cognition, socio-emotional development, and even physical development. Young children’s attention spans tend to be short and there are wide, natural variations in children’s performance in any given domain and on any given day. Thus, great care is advised for both the design and implementation of such assessments (see here, here, and here for examples). The question of if and how to use these student assessments to determine program or staff effectiveness is even more difficult and controversial (for instance, here and here). Nevertheless, many states are already using various forms of assessment to oversee their preschool investments.

    It is difficult to react to this (unsurprising) paradox. Sadly, in education, there is often a disconnect between what we know (i.e., research) and what we do (i.e., policy). But, since our general desire for accountability seems to be here to stay, a case can be made that states should, at a minimum, expand what they measure to reflect learning as accurately and broadly as possible.

    So, what types of assessments are better for capturing what a four- or a five- year old knows? How might these assessments be improved?

  • Still In Residence: Arts Education In U.S. Public Schools

    There is a somewhat common argument in education circles that the focus on math and reading tests in No Child Left Behind has had the unintended consequence of generating a concurrent deemphasis on other subjects. This includes science and history, of course, but among the most frequently-mentioned presumed victims of this trend are art and music.

    A new report by the National Center for Education Statistics (NCES) presents some basic data on the availability of arts instruction in U.S. public schools between 1999 and 2010.

    The results provide only very mixed support for the hypothesis that these programs are less available now than they were prior to the implementation of NCLB.

  • Measuring Journalist Quality

    Journalists play an essential role in our society. They are charged with informing the public, a vital function in a representative democracy. Yet, year after year, large pockets of the electorate remain poorly-informed on both foreign and domestic affairs. For a long time, commentators have blamed any number of different culprits for this problem, including poverty, education, increasing work hours and the rapid proliferation of entertainment media.

    There is no doubt that these and other factors matter a great deal. Recently, however, there is growing evidence that the factors shaping the degree to which people are informed about current events include not only social and economic conditions, but journalist quality as well. Put simply, better journalists produce better stories, which in turn attract more readers. On the whole, the U.S. journalist community is world class. But there is, as always, a tremendous amount of underlying variation. It’s likely that improving the overall quality of reporters would not only result in higher quality information, but it would also bring in more readers. Both outcomes would contribute to a better-informed, more active electorate.

    We at the Shanker Institute feel that it is time to start a public conversation about this issue. We have requested and received datasets documenting the story-by-story readership of the websites of U.S. newspapers, large and small. We are using these data in statistical models that we call “Readers-Added Models," or “RAMs."

  • Pay Equity In Higher Education

    Blatant forms of discrimination against women in academia have diminished since the Equal Pay Act and Title IX became law in 1964 and 1972, respectively. Yet gender differences in salary, tenure status, and leadership roles still persist among men and women in higher education. In particular, wage differences among male and female professors have not been fully explained, even when productivity, teaching experience, institutional size and prestige, disciplinary fields, type of appointment, and family-related responsibilities are controlled for statistically (see here).

    Scholars have argued that the “unexplained” gender wage gap is a function of less easily quantifiable (supply-type) factors, such as preferences and career aspirations, professional networks, etc. In fact, there is extensive evidence that both supply-side (e.g., career choices) and demand-side factors (e.g., employer discrimination) are shaped by broadly shared (often implicit) schemas about what men and women can and should do (a.k.a. descriptive and prescriptive gender stereotypes – see here)

    Regardless of the causes, which are clearly complex and multi-faceted, the fact remains that the salary advantage held by male faculty over female faculty exists across institutions and has changed very little over the past twenty-five years (see here). How big is this gap, exactly?

  • Ohio's New School Rating System: Different Results, Same Flawed Methods

    Without question, designing school and district rating systems is a difficult task, and Ohio was somewhat ahead of the curve in attempting to do so (and they're also great about releasing a ton of data every year). As part of its application for ESEA waivers, the state recently announced a newly-designed version of its long-standing system, with the changes slated to go into effect in 2014-15. State officials told reporters that the new scheme is a “more accurate reflection of … true [school and district] quality."

    In reality, however, despite its best intentions, what Ohio has done is perpetuate a troubled system by making less-than-substantive changes that seem to serve the primary purpose of giving lower grades to more schools in order for the results to square with preconceptions about the distribution of “true quality." It’s not a better system in terms of measurement - both the new and old schemes consist of mostly the same inappropriate components, and the ratings differentiate schools based largely on student characteristics rather than school performance.

    So, whether or not the aggregate results seem more plausible is not particularly important, since the manner in which they're calculated is still deeply flawed. And demonstrating this is very easy.

  • An Uncertain Time For One In Five Female Workers

    It’s well-known that patterns of occupational sex segregation in the labor market – the degree to which men and women are concentrated in certain occupations – have changed quite a bit over the past few decades, along with the rise of female labor force participation.

    Nevertheless, this phenomenon is still a persistent feature of the U.S. labor market (and those in other nations as well). There are many reasons for this, institutional, cultural and historical. But it’s interesting to take a quick look at a few specific groups, as there are implications in our current policy environment.

    The simple graph below presents the proportion of all working men and women that fall into three different occupational groups. The data are from the Bureau of Labor Statistics, and they apply to 2011.

  • If Your Evidence Is Changes In Proficiency Rates, You Probably Don't Have Much Evidence

    Education policymaking and debates are under constant threat from an improbable assailant: Short-term changes in cross-sectional proficiency rates.

    The use of rate changes is still proliferating rapidly at all levels of our education system. These measures, which play an important role in the provisions of No Child Left Behind, are already prominent components of many states’ core accountability systems (e..g, California), while several others will be using some version of them in their new, high-stakes school/district “grading systems." New York State is awarding millions in competitive grants, with almost half the criteria based on rate changes. District consultants issue reports recommending widespread school closures and reconstitutions based on these measures. And, most recently, U.S. Secretary of Education Arne Duncan used proficiency rate increases as “preliminary evidence” supporting the School Improvement Grants program.

    Meanwhile, on the public discourse front, district officials and other national leaders use rate changes to “prove” that their preferred reforms are working (or are needed), while their critics argue the opposite. Similarly, entire charter school sectors are judged, up or down, by whether their raw, unadjusted rates increase or decrease.

    So, what’s the problem? In short, it’s that year-to-year changes in proficiency rates are not valid evidence of school or policy effects. These measures cannot do the job we’re having them do, even on a limited basis. This really has to stop.

  • The Uses (And Abuses?) Of Student Data

    Knewton, a technology firm founded in 2008, has developed an “adaptive learning platform” that received significant media attention (also here, here, here and here), as well as funding and recognition early last fall and, again, in February this year (here and here). Although the firm is not alone in the adaptive learning game – e.g., Dreambox, Carnegie Learning – Knewton’s partnership with Pearson puts the company in a whole different league.

    Adaptive learning takes advantage of student-generated information; thus, important questions about data use and ownership need to be brought to the forefront of the technology debate.

    Adaptive learning software adjusts the presentation of educational content to students' needs, based on students’ prior responses to such content. In the world of research, such ‘prior responses’ would count and be treated as data. To the extent that adaptive learning is a mechanism for collecting information about learners, questions about privacy, confidentiality and ownership should be addressed.

  • Learning From Teach For America

    There is a small but growing body of evidence about the (usually test-based) effectiveness of teachers from Teach for America (TFA), an extremely selective program that trains and places new teachers in mostly higher needs schools and districts. Rather than review this literature paper-by-paper, which has already been done by others (see here and here), I’ll just give you the super-short summary of the higher-quality analyses, and quickly discuss what I think it means.*

    The evidence on TFA teachers focuses mostly on comparing their effect on test score growth vis-à-vis other groups of teachers who entered the profession via traditional certification (or through other alternative routes). This is no easy task, and the findings do vary quite a bit by study, as well as by the group to which TFA corps members are compared (e.g., new or more experienced teachers). One can quibble endlessly over the methodological details (and I’m all for that), and this area is still underdeveloped, but a fair summary of these papers is that TFA teachers are no more or less effective than comparable peers in terms of reading tests, and sometimes but not always more effective in math (the differences, whether positive or negative, tend to be small and/or only surface after 2-3 years). Overall, the evidence thus far suggests that TFA teachers perform comparably, at least in terms of test-based outcomes.

    Somewhat in contrast with these findings, TFA has been the subject of both intensive criticism and fawning praise. I don’t want to engage this debate directly, except to say that there has to be some middle ground on which a program that brings talented young people into the field of education is not such a divisive issue. I do, however, want to make a wider point specifically about the evidence on TFA teachers – what it might suggest about the current focus to “attract the best people” to the profession.

  • Beware Of Anecdotes In The Value-Added Debate

    A recent New York Times "teacher diary" presents the compelling account of a New York City teacher whose value-added rating was 6th percentile in 2009 – one of the lowest scores in the city – and 96th percentile the following year, one of the highest. Similar articles - for example, about teachers with errors in their rosters or scores that conflict with their colleagues'/principals' opinions - have been published since the release of the city’s teacher data reports (also see here). These accounts provoke a lot of outrage and disbelief, and that makes sense – they can sound absurd.

    Stories like these can be useful as illustrations of larger trends and issues - in this case, of the unfairness of publishing the NYC scores,  most of which are based on samples that are too small to provide meaningful information. But, in the debate over using these estimates in actual policy, we need to be careful not to focus too much on anecdotes. For every one NYC teacher whose value-added rank changed over 90 points between 2009 and 2010, there are almost 100 teachers whose ranks were within 10 points (and percentile ranks overstate the actual size of all these differences). Moreover, even if the models yielded perfect measures of test-based teacher performance, there would still be many implausible fluctuations between years - those that are unlikely to be "real" change - due to nothing more than random error.*

    The reliability of value-added estimates, like that of all performance measures (including classroom observations), is an important issue, and is sometimes dismissed by supporters in a cavalier fashion. There are serious concerns here, and no absolute answers. But none of this can be examined or addressed with anecdotes.