Teacher Surveys And Standardized Tests: Different Data, Similar Warning Labels
As a strong believer in paying attention to what teachers think about policy, I always review the results of MetLife’s annual teacher survey. The big theme of this year’s survey, as pushed by the press release and reiterated in most of the media coverage, was that job satisfaction among teachers is at “its lowest level in 25 years."
It turns out that changes in question wording over the years complicates straight comparisons of responses to the teacher job satisfaction over time. Even slight changes in wording can affect results, though it seems implausible that this one had a dramatic effect. In any case, it is instructive to take a look at the reactions to this finding. If I may generalize a bit here, one “camp” argued that the decline in teacher satisfaction is due to recent policy changes, such as eroding job protections, new evaluations, and the upcoming implementation of the Common Core. Another “camp” urged caution – they pointed out that not only is job satisfaction still rather high, but also that the decline among teachers can be found among many other groups of workers too, likely a result of the ongoing recession.
Although it is more than plausible that recent reforms are taking a toll on teacher morale, and this possibility merits attention, those urging caution, in my view, are correct. It’s simply not appropriate to draw strong conclusions as to what is causing this (or any other) trend in aggregate teacher attitudes, and it’s even more questionable to chalk it up to a reaction against specific policies, particularly during a time of economic hardship.
I would offer two additional, boilerplate caveats about these results.
The first is that this survey, like most surveys, is prone to various forms of error. This means that smaller differences (or changes between years) should be viewed with caution. For instance, the five point decline between 2011 and 2012 in the percentage of teachers saying they are "very satisfied" with their jobs is probably large enough to be taken seriously, but may also be due in part to imprecision from sampling error, weighting, etc.*
The second, highly related warning is that the data are of course cross-sectional, and thus trends in responses might to some extent reflect changes in the composition of the sample/workforce, rather than actual shifts in teachers’ attitudes. This is particularly salient given that the teacher workforce has been changing rapidly. For example, newer teachers are less likely to express dissatisfaction than their more experienced colleagues. So, if there are more novice teachers now than in the past, this might create the "illusion" of increasing satisfaction (or mitigate "real" declines in satisfaction), particularly when responses are viewed over longer periods of time.
But my primary point is that these warnings – the need for caution in drawing causal inferences, compositional changes, etc. – apply just as strongly to the interpretation of student test scores. Those who jumped at the chance to use the teacher survey trends to argue against policies they oppose are implicitly endorsing a “method” that, when applied to testing data, is often used to support those same policies. And, similarly, if you're urging caution about the interpretation of survey results, you should be just as vigilant about that of testing data.
All of us (myself included, of course) are prone to interpret results in a manner that squares with our pre-existing beliefs, and it’s often very difficult to monitor one’s self in this regard. Nevertheless, I am still naïve enough to believe that we can keep improving, and so I am heartened to see people urge caution, even if we all slip up now and again.
- Matt Di Carlo
* MetLife does not publish error margins for this survey; see page 77 of the report.
Of course some of the objections are true for both sets of data.
But there are particular issues with the use of standardized test that do not apply here.
1) This might be our only source of data on teacher attitudes. We already have grades and graduations rates to look at student/school success.
2) We know that standardized test do NOT actually measure the whole construct they claim to measure. I don't mean they sample from it, it mean that they do not even try to measure some parts of the curriculum or standards to which they claim to be aligned.
3) The kinds of standardized tests we are willing to use are incapable of measuring the higher order thinking/21st century skills/non-cognitive skill/habits of mind that so many (most? all?) people think are the most important lessons of schooling. These things are hardly even in the standards/curriculum -- though CCSS brings some of them in -- so, they assessment developers aren't even supposed to assess them.
The fact is that we do not treat these kinds of data the same. When looking at the teacher survey we focus on particular questions and dive into what they mean -- some paying more attention to the actual wording than others. But we don't do that for standardized tests. In fact, for test security reasons we often do not know what the questions are. With multiple forms, students don't all get the same questions. etc. etc.
And so, I do not think acceptance or caution of the teacher survey data should match acceptance or caution of testing results as closely as you say (imply?). Yes, there are some statistical concerns they should have in common, but those are the merely the simplest concerns.
(Of course, there are concerns about the surveys that do not apply to the tests. The trickiness of wording, something you point to quite a bit. The focus on just one item, when good research surveys build a composite score of out multiple items (which exacerbates the wording thing). Etc..)
Thank you for an important point about testing.
Looking at the actual survey report you are citing (thanks for the link), I don't agree with your statement that the current question wording "precludes straight comparisons of responses to the teacher job satisfaction question with those of the surveys conducted in 2009 or earlier. " The chart on p. 45 notes that the current wording was used in the original 1984 survey and in 1986, 1987, 2001, 2011, 2012 . So direct comparisons for those years is possible, and the data show a clear decline in recent years, 13 percentage points since 2001 and 5 points in the last year to a 25-year low. I think you also make the point that two people can look at the same data and see different things.
NOTE TO READERS:
The original version of this post stated that the change in wording of the teacher satisfaction question "precludes straight comparisons of responses to the teacher job satisfaction question with those of the surveys conducted in 2009 or earlier."
This was incorrect. The current wording was also used in 1984, 1986, 1987, 2001 and 2011. The post has been corrected.
I apologize for the error, and thank you to edwatcher, the astute commenter who noticed it.