• Our Not-So-College-Ready Annual Discussion Of SAT Results

    Every year, around this time, the College Board publicizes its SAT results, and hundreds of newspapers, blogs, and television stations run stories suggesting that trends in the aggregate scores are, by themselves, a meaningful indicator of U.S. school quality. They’re not.

    Everyone knows that the vast majority of the students who take the SAT in a given year didn’t take the test the previous year – i.e., the data are cross-sectional. Everyone also knows that participation is voluntary (as is participation in the ACT test), and that the number of students taking the test has been increasing for many years and current test-takers have different measurable characteristics from their predecessors. That means we cannot use the raw results to draw strong conclusions about changes in the performance of the typical student, and certainly not about the effectiveness of schools, whether nationally or in a given state or district. This is common sense.

    Unfortunately, the College Board plays a role in stoking the apparent confusion - or, at least, they could do much more to prevent it. Consider the headline of this year’s press release:

  • New Teacher Evaluations Are A Long-Term Investment, Not Test Score Arbitrage

    One of the most important things in education policy to keep an eye on is the first round of changes to new teacher evaluation systems. Given all the moving parts and the lack of evidence on how these systems should be designed and their impact, course adjustments along the way are not just inevitable, but absolutely essential.

    Changes might be guided by different types of evidence, such as feedback from teachers and administrators or analysis of ratings data. And, of course, human judgment will play a big role. One thing that states and districts should not be doing, however, is assessing their new systems – or making changes to them – based whether or not raw overall test scores go up or down within the first few years.

    Here’s a little reality check: Even the best-designed, best-implemented new evaluations are unlikely to have an immediate measurable impact on aggregate student performance. Evaluations are an investment, not a quick fix. And they are not risk-free. Their effects will depend on the quality of systems, how current teachers and administrators react to them and how all of this shapes and plays out in the teacher labor market. As I’ve said before, the realistic expectation for overall performance – and this is no guarantee – is that there will be some very small, gradual improvements, unfolding over a period of years and decades.

    States and districts that expect anything more risk making poor decisions during these crucial, early phases.

  • That's Not Teacher-Like

    I’ve been reading Albert Shanker’s “The Power of Ideas: Al In His Own Words," the American Educator’s compendium of Al’s speeches and columns, published posthumously in 1997. What an enjoyable, witty and informative collection of essays.

    Two columns especially caught my attention: “That’s Very Unprofessional Mr. Shanker!" and “Does Pavarotti Need to File an Aria Plan” – where Al discusses expectations for (and treatment of) teachers. They made me reflect, yet again, on whether perceptions of teacher professionalism might be gendered. In other words, when society thinks of the attributes of a professional teacher, might we unconsciously be thinking of women teachers? And, if so, why might this be important?

    In “That’s Very Unprofessional, Mr. Shanker!" Al writes:

  • Does It Matter How We Measure Schools' Test-Based Performance?

    In education policy debates, we like the "big picture." We love to say things like “hold schools accountable” and “set high expectations." Much less frequent are substantive discussions about the details of accountability systems, but it’s these details that make or break policy. The technical specs just aren’t that sexy. But even the best ideas with the sexiest catchphrases won’t improve things a bit unless they’re designed and executed well.

    In this vein, I want to recommend a very interesting CALDER working paper by Mark Ehlert, Cory Koedel, Eric Parsons and Michael Podgursky. The paper takes a quick look at one of these extremely important, yet frequently under-discussed details in school (and teacher) accountability systems: The choice of growth model.

    When value-added or other growth models come up in our debates, they’re usually discussed en masse, as if they’re all the same. They’re not. It's well-known (though perhaps overstated) that different models can, in many cases, lead to different conclusions for the same school or teacher. This paper, which focuses on school-level models but might easily be extended to teacher evaluations as well, helps illustrate this point in a policy-relevant manner.

  • "We Need Teachers, Not Computers"

    I’ve been noticing for a while that a lot of articles about education technology have a similar ring to them: “Must Have Apps for K12 Educators," “What Every Teacher Should Know About Using iPads in the Classroom," “The Best 50 Education Technology Resources for Teachers." You get the drift.

    This type of headline suggests that educators are the ones in need of schooling when it comes to technology, while the articles themselves often portray students as technology natives, naturally gifted at all things digital. One almost gets the impression that, when it comes to technology, students should be teaching their teachers.

    But maybe my perception is skewed. After all, a portion of the education news I read “comes to me” via Zite, Flipboard and other news aggregators. It wouldn’t surprise me to learn that that these types of software applications have a bias toward certain types of technology-centered stories which may not be representative of the broader education technology press.

    So, is it me, or is it true that the media sometimes sees educators as a bunch of technological neophytes, while seeing students as technological whizzes from whom teachers must learn? And, if true, is this particular to the field of education or is something similar seen in regard to professionals in other fields?

  • The Impact Of Race To The Top Is An Open Question (But At Least It's Being Asked)

    You don’t have to look very far to find very strong opinions about Race to the Top (RTTT), the U.S. Department of Education’s (USED) stimulus-funded state-level grant program (which has recently been joined by a district-level spinoff). There are those who think it is a smashing success, while others assert that it is a dismal failure. The truth, of course, is that these claims, particularly the extreme views on either side, are little more than speculation.*

    To win the grants, states were strongly encouraged to make several different types of changes, such as adoption of new standards, the lifting/raising of charter school caps, the installation of new data systems and the implementation of brand new teacher evaluations. This means that any real evaluation of the program’s impact will take some years and will have to be multifaceted – that is, it is certain that the implementation/effects will vary not only by each of these components, but also between states.

    In other words, the success or failure of RTTT is an empirical question, one that is still almost entirely open. But there is a silver lining here: USED is at least asking that question, in the form of a five-year, $19 million evaluation program, administered through the National Center for Education Evaluation and Regional Assistance, designed to assess the impact and implementation of various RTTT-fueled policy changes, as well as those of the controversial School Improvement Grants (SIGs).

  • Do Top Teachers Produce "A Year And A Half Of Learning?"

    One claim that gets tossed around a lot in education circles is that “the most effective teachers produce a year and a half of learning per year, while the least effective produce a half of a year of learning."

    This talking point is used all the time in advocacy materials and news articles. Its implications are pretty clear: Effective teachers can make all the difference, while ineffective teachers can do permanent damage.

    As with most prepackaged talking points circulated in education debates, the “year and a half of learning” argument, when used without qualification, is both somewhat valid and somewhat misleading. So, seeing as it comes up so often, let’s very quickly identify its origins and what it means.

  • Who's Afraid of Virginia's Proficiency Targets?

    The accountability provisions in Virginia’s original application for “ESEA flexibility” (or "waiver") have received a great deal of criticism (see here, here, here and here). Most of this criticism focused on the Commonwealth's expectation levels, as described in “annual measurable objectives” (AMOs) – i.e., the statewide proficiency rates that its students are expected to achieve at the completion of each of the next five years, with separate targets established for subgroups such as those defined by race (black, Hispanic, Asian, white), income (subsidized lunch eligibility), limited English proficiency (LEP), and special education.

    Last week, in response to the criticism, Virginia agreed to amend its application, and it’s not yet clear how specifically they will calculate the new rates (only that lower-performing subgroups will be expected to make faster progress).

    In the meantime, I think it’s useful to review a few of the main criticisms that have been made over the past week or two and what they mean. The actual table containing the AMOs is pasted below (for math only; reading AMOs will be released after this year, since there’s a new test).

  • Five Recommendations For Reporting On (Or Just Interpreting) State Test Scores

    From my experience, education reporters are smart, knowledgeable, and attentive to detail. That said, the bulk of the stories about testing data – in big cities and suburbs, in this year and in previous years – could be better.

    Listen, I know it’s unreasonable to expect every reporter and editor to address every little detail when they try to write accessible copy about complicated issues, such as test data interpretation. Moreover, I fully acknowledge that some of the errors to which I object – such as calling proficiency rates “scores” – are well within tolerable limits, and that news stories need not interpret data in the same way as researchers. Nevertheless, no matter what you think about the role of test scores in our public discourse, it is in everyone’s interest that the coverage of them be reliable. And there are a few mostly easy suggestions that I think would help a great deal.

    Below are five such recommendations. They are of course not meant to be an exhaustive list, but rather a quick compilation of points, all of which I’ve discussed in previous posts, and all of which might also be useful to non-journalists.

  • How Can We Tell If Vouchers "Work"?

    Brookings recently released an evaluation of New York City’s voucher program, called the School Choice Scholarship Foundation Program (SCSF), which was implemented in the late 1990s. Voucher offers were randomized, and the authors looked at the impact of being offered/accepting them on a very important medium-term outcome – college enrollment (they were also able to follow an unusually high proportion of the original voucher recipients to check this outcome).

    The short version of the story is that, overall, the vouchers didn’t have any statistically discernible impact on college enrollment. But, as is often the case, there was some underlying variation in the results, including positive estimated impacts among African-American students, which certainly merit discussion.*

    Unfortunately, such nuance was not always evident in the coverage of and reaction to the report, with some voucher supporters (strangely, given the results) exclaiming that the program was an unqualified success, and some opponents questioning the affiliations of the researchers. For my part, I’d like to make a quick, not-particularly-original point about voucher studies in general: Even the best of them don’t necessarily tell us much about whether “vouchers work."