PISA And TIMSS: A Distinction Without A Difference?

Our guest author today is William Schmidt, a University Distinguished Professor and co-director of the Education Policy Center at Michigan State University. He is also a member of the Shanker Institute board of directors.

Every year or two, the mass media is full of stories on the latest iterations of one of the two major international large scale assessments, the Trends in International Mathematics and Science Study (TIMSS) and the Program for International Student Assessment (PISA). What perplexes many is that the results of these two tests -- both well-established and run by respectable, experienced organizations -- suggest different conclusions about the state of U.S. mathematics education. Generally speaking, U.S. students do better on the TIMSS and poorly on the PISA, relative to their peers in other nations. Depending on their personal preferences, policy advocates can simply choose whichever test result is convenient to press their argument, leaving the general public without clear guidance.

Now, in one sense, the differences between the tests are more apparent than real. One reason why the U.S. ranks better on the TIMSS than the PISA is that the two tests sample students from different sets of countries. The PISA has many more wealthy countries, whose students tend to do better – hence, the U.S.’s lower ranking. It turns out that when looking at only the countries that participated in both the TIMSS and the PISA we find similar country rankings. There are also some differences in statistical sampling, but these are fairly minor.

A Descriptive Analysis Of The 2014 D.C. Charter School Ratings

The District of Columbia Public Charter School Board (PCSB) recently released the 2014 results of their “Performance Management Framework” (PMF), which is the rating system that the PCSB uses for its schools.

Very quick background: This system sorts schools into one of three “tiers," with Tier 1 being the highest-performing, as measured by the system, and Tier 3 being the lowest. The ratings are based on a weighted combination of four types of factors -- progress, achievement, gateway, and leading -- which are described in detail in the first footnote.* As discussed in a previous post, the PCSB system, in my opinion, is better than many others out there, since growth measures play a fairly prominent role in the ratings, and, as a result, the final scores are only moderately correlated with key student characteristics such as subsidized lunch eligibility.** In addition, the PCSB is quite diligent about making the PMF results accessible to parents and other stakeholders, and, for the record, I have found the staff very open to sharing data and answering questions.

That said, PCSB's big message this year was that schools’ ratings are improving over time, and that, as a result, a substantially larger proportion of DC charter students are attending top-rated schools. This was reported uncritically by several media outlets, including this story in the Washington Post. It is also based on a somewhat questionable use of the data. Let’s take a very simple look at the PMF dataset, first to examine this claim and then, more importantly, to see what we can learn about the PMF and DC charter schools in 2013 and 2014.

Feeling Socially Connected Fuels Intrinsic Motivation And Engagement

Our "social side of education reform" series has emphasized that teaching is a cooperative endeavor, and as such is deeply influenced by the quality of a school's social environment -- i.e., trusting relationships, teamwork and cooperation. But what about learning? To what extent are dispositions such as motivation, persistence and engagement mediated by relationships and the social-relational context?

This is, of course, a very complex question, which can't be addressed comprehensively here. But I would like to discuss three papers that provide some important answers. In terms of our "social side" theme, the studies I will highlight suggest that efforts to improve learning should include and leverage social-relational processes, such as how learners perceive (and relate to) -- how they think they fit into -- their social contexts. Finally, this research, particularly the last paper, suggests that translating this knowledge into policy may be less about top down, prescriptive regulations and more about what Stanford psychologist Gregory M. Walton has called "wise interventions" -- i.e., small but precise strategies that target recursive processes (more below).

The first paper, by Lucas P. Butler and Gregory M. Walton (2013), describes the results of two experiments testing whether the perceived collaborative nature of an activity that was done individually would cause greater enjoyment of and persistence on that activity among preschoolers.

Rethinking The Use Of Simple Achievement Gap Measures In School Accountability Systems

So-called achievement gaps – the differences in average test performance among student subgroups, usually defined in terms of ethnicity or income –  are important measures. They demonstrate persistent inequality of educational outcomes and economic opportunities between different members of our society.

So long as these gaps remain, it means that historically lower-performing subgroups (e.g., low-income students or ethnic minorities) are less likely to gain access to higher education, good jobs, and political voice. We should monitor these gaps; try to identify all the factors that affect them, for good and for ill; and endeavor to narrow them using every appropriate policy lever – both inside and outside of the educational system.

Achievement gaps have also, however, taken on a very different role over the past 10 or so years. The sizes of gaps, and extent of “gap closing," are routinely used by reporters and advocates to judge the performance of schools, school districts, and states. In addition, gaps and gap trends are employed directly in formal accountability systems (e.g., states’ school grading systems), in which they are conceptualized as performance measures.

Although simple measures of the magnitude of or changes in achievement gaps are potentially very useful in several different contexts, they are poor gauges of school performance, and shouldn’t be the basis for high-stakes rewards and punishments in any accountability system.

Multiple Measures And Singular Conclusions In A Twin City

A few weeks ago, the Minneapolis Star Tribune published teacher evaluation results for the district’s public school teachers in 2013-14. This decision generated a fair amount of controversy, but it’s worth noting that the Tribune, unlike the Los Angeles Times and New York City newspapers a few years ago, did not publish scores for individual teachers, only totals by school.

The data once again provide an opportunity to take a look at how results vary by student characteristics. This was indeed the focus of the Tribune’s story, which included the following headline: “Minneapolis’ worst teachers are in the poorest schools, data show." These types of conclusions, which simply take the results of new evaluations at face value, have characterized the discussion since the first new systems came online. Though understandable, they are also frustrating and a potential impediment to the policy process. At this early point, “the city’s teachers with the lowest evaluation ratings” is not the same thing as “the city’s worst teachers." Actually, as discussed in a previous post, the systematic variation in evaluation results by student characteristics, which the Tribune uses to draw conclusions about the distribution of the city’s “worst teachers," could just as easily be viewed as one of the many ways that one might assess the properties and even the validity of those results.

So, while there are no clear-cut "right" or "wrong" answers here, let’s take a quick look at the data and what they might tell us.

The Bewildering Arguments Underlying Florida's Fight Over ELL Test Scores

The State of Florida is currently engaged in a policy tussle of sorts with the U.S. Department of Education (USED) over Florida’s accountability system. To make a long story short, last spring, Florida passed a law saying that the test scores of English language learners (ELLs) would only count toward schools’ accountability grades (and teacher evaluations) once the ELL students had been in the system for at least two years. This runs up against federal law, which requires that ELLs’ scores be counted after only one year, and USED has indicated that it’s not willing to budge on this requirement. In response, Florida is considering legal action.

This conflict might seem incredibly inane (unless you’re in one of the affected schools, of course). Beneath the surface, though, this is actually kind of an amazing story.

Put simply, Florida’s argument against USED's policy of counting ELL scores after just one year is a perfect example of the reason why most of the state's core accountability measures (not to mention those of NCLB as a whole) are so inappropriate: Because they judge schools’ performance based largely on where their students’ scores end up without paying any attention to where they start out.

The Virtue Of Boring In Education

The College Board recently released the latest SAT results, for the first time combining this release with that of data from the PSAT and AP exams. The release of these data generated the usual stream of news coverage, much of which misinterpreted the year-to-year changes in SAT scores as a lack of improvement, even though the data are cross-sectional and the test-taking sample has been changing, and/or misinterpreted the percent of test takers who scored above the “college ready” line as a national measure of college readiness, even though the tests are not administered to a representative sample of students.

It is disheartening to watch this annual exercise, in which the most common “take home” headlines (e.g., "no progress in SAT scores" and "more, different students take SAT") are in many important respects contradictory. In past years, much of the blame had to be placed on the College Board’s presentation of the data. This year, to their credit, the roll-out is substantially better (hopefully, this will continue).

But I don’t want to focus on this aspect of the organization's activities (see this post for more); instead, I would like to discuss briefly the College Board’s recent change in mission.

Redesigning Florida's School Report Cards

The Foundation for Excellence in Education, an organization that advocates for education reform in Florida, in particular the set of policies sometimes called the "Florida Formula," recently announced a competition to redesign the “appearance, presentation and usability” of the state’s school report cards. Winners of the competition will share prize money totaling $35,000.

The contest seems like a great idea. Improving the manner in which education data are presented is, of course, a laudable goal, and an open competition could potentially attract a diverse group of talented people. As regular readers of this blog know, however, I am not opposed to sensibly-designed test-based accountability policies, but my primary concern about school rating systems is focused mostly on the quality and interpretation of the measures used therein. So, while I support the idea of a competition for improving the design of the report cards, I am hoping that the end result won't just be a very attractive, clever instrument devoted to the misinterpretation of testing data.

In this spirit, I would like to submit four simple graphs that illustrate, as clearly as possible and using the latest data from 2014, what Florida’s school grades are actually telling us. Since the scoring and measures vary a bit between different types of schools, let’s focus on elementary schools.

Attitudes Toward Education And Hard Work In Post-Communist Poland

The following is written by Kinga Wysieńska-Di Carlo and Matthew Di Carlo. Wysieńska-Di Carlo is an Assistant Professor of Sociology in the Institute of Philosophy and Sociology at the Polish Academy of Sciences.

Economic returns to education -- that is, the value of investment in education, principally in terms of better jobs, earnings, etc. -- rightly receives a great deal of attention in the U.S., as well as in other nations. But it is also useful to examine what people believe about the value and importance of education, as these perceptions influence, among other outcomes, individuals’ decisions to pursue additional schooling.

When it comes to beliefs regarding whether education and other factors contribute to success, economic or otherwise, Poland is a particularly interesting nation. Poland underwent a dramatic economic transformation during and after the collapse of Communism (you can read about Al Shanker’s role here). An aggressive program of reform, sometimes described as “shock therapy," dismantled the planned socialist economy and built a market economy in its place. Needless to say, actual conditions in a nation can influence and reflect attitudes about those conditions (see, for example, Kunovich and Słomczyński 2007 for a cross-national analysis of pro-meritocratic beliefs).

This transition in Poland fundamentally reshaped the relationships between education, employment and material success. In addition, it is likely to have influenced Poles’ perception of these dynamics. Let’s take a look at Polish survey data since the transformation, focusing first on Poles’ perceptions of the importance of education for one’s success.