Errors in the pre-election estimates could also have stemmed from decisions about how the data were actually collected. All of primary polls were conducted by telephone, but these polls varied in the number of call attempts they made to each case. Studies have shown that increasing the number of call attempts in a survey can change the partisan composition of the sample (Traugott 1987; Keeter et al. 2000, 2006), and it might change the proportions of likely voters or the candidate preference distribution as well.
The committee was severely constrained in its ability to investigate any relationship between the number of calls attempted and the degree of error in the survey estimates. Only CBS News and Gallup provided datasets for their New Hampshire polls with the number of call attempts made on each case. The CBS News New Hampshire Democratic primary estimates are based on a re-contact survey rather than a fresh sample, which limits the generalizability of the results. CBS News conducted a maximum of six calls, and Gallup conducted a maximum of five calls. In both surveys, more than 94% of respondents provided data on the first or second attempt, undoubtedly a consequence of each firm trying to achieve a large enough sample size in a brief field period so that campaign events would not have an appreciable effect on candidate preferences. These distributions are shown in Table 14.
In our assessment of candidate preference by call effort, we found no statistically significant difference in the preference distribution for those who were interviewed on the first call or the second call. The small number of cases in the third or later call category limits the statistical power of this test, but the results are shown in Figure 6. The Gallup survey’s unweighted estimates (dashed lines) indicate that respondents requiring 3 or more calls favored Clinton over Obama (49% to 35%), while those requiring 1 or 2 calls favored Obama (37% to 35%). When weighted estimates (solid lines) are considered, the difference in candidate preference by level of call effort is similar – with the hard-to- reach favoring Clinton and the easy-to reach favoring Obama – though less dramatic.
The findings from the CBS News survey are more mixed, reflecting the fact that only 17 of the 322 re-contacted respondents required 3 or more calls. The weighted CBS estimates show the same pattern as the Gallup data, with harder-to-reach respondents favoring Clinton by a 2-to-1 margin, and the easy-to-reach favoring Obama. The unweighted CBS estimates, however, show the opposite: those requiring three or more calls were slightly more favorable toward Obama than those requiring one or two calls. The callback design of the CBS study and the small sample size in the “higher effort” group makes these data less suitable for this type of analysis than the Gallup data, which come from fresh RDD landline and cell phone samples. However, in both surveys, the more difficult to reach respondents were more likely to favor Clinton.
Analysis of the Gallup data offers some indication that the primary pollsters may have achieved more accurate estimates had they implemented a more rigorous (and expensive) data collection protocol. Had the surveys used, say, an 8 or 10 call maximum rule rather than a 5 or 6 maximum, it appears they may have tapped into Clinton support in New Hampshire that was missing from their final estimates. That said, this analysis is limited in statistical power and potentially confounded with real changes in preferences. The CBS News and Gallup surveys were fielded January 5th - 6th and January 4th - 6th, respectively. On average, those requiring more than two calls were most likely interviewed on the last or penultimate day of interviewing, while those requiring fewer calls were interviewed earlier. Changes in candidate preference at the individual level during this January 4th - 6th period might be misclassified in the Figure 6 analysis as differences between groups (i.e., those requiring low- versus high-effort calling). While such individual change may have occurred, we suspect it does not swamp the differences between the low- versus high-effort groups. We say this in part because Senator Clinton’s emotional moment at the New Hampshire diner – which is speculated to have influenced some undecideds and tepid Obama supporters – did not occur until January 7th, after both studies had ended data collection. It is also important to keep in mind how the primary calendar influenced data collection. The New Hampshire primary was held just five days after the Iowa caucuses. It is unlikely that any “bounce” Obama received after Iowa would have dissipated by the time interviewing was underway for final New Hampshire polls.
Summary: We found a modest indication that the primary pollsters may have achieved slightly more accurate estimates in New Hampshire had they implemented a more rigorous (and expensive) data collection protocol. Respondents reached in three or more calls were more likely to support Clinton. In polls with very short field periods, the sample tends to be comprised of respondents contacted on the first few attempts, complicating assessments of the impact of interviews collected with more effort to contact respondents. This raises the prospect that sample management during the field period could have affected accuracy, with more prolonged effort producing better estimation. However, these results are based on only two surveys, one of which was unusual because it was a call-back study.
25 Dashed lines denote unweighted estimates; solid lines denote weighted estimates.
Timing of Data Collection
Another design feature often thought to affect the accuracy of pre-election surveys is the length of time between data collection and Election Day. Research on this topic has yielded mixed results (DeSart and Holbrook 2003). Some studies find that polls fielded closer to Election Day are more accurate (Crespi 1988); others find a null or even negative relationship (Lau 1994; Martin et al. 2005). We tested for a relationship between poll timing and accuracy in the Iowa, New Hampshire, South Carolina, and California primaries. The results, presented in Table 15, emanate from a richer analysis because the field dates of surveys are commonly reported by polling firms, as per AAPOR guidelines. We summarized the relationship between timing and accuracy with simple bivariate correlations.
Table 15
These are the correlations between the absolute values of the accuracy scores (
A) and the number of days out from the election. Simply put, if polls taken closer to the election were more accurate, then we would expect to observe a positive correlation: accuracy scores near zero are better than those farther from zero in either a positive or negative direction. For example, in the South Carolina Republican primary, the later polls were more reflective of McCain’s three-point victory than earlier polls, resulting in positive correlation (.64).
The results in Table 15 demonstrate that the relationship between timing and accuracy varied greatly by election. In the problematical New Hampshire Democratic primary, there is a negative correlation (-.30 for the absolute value of
A) between accuracy and temporal distance from the election. This quantifies the pattern in Figure 2 where we see that the final polls faired slightly worse on average than those fielded earlier (before the Iowa caucuses). We found a similar though weaker relationship (-.17), computed on the same basis, between poll timing and accuracy in the Iowa Democratic caucuses, and we found essentially no relationship in the Iowa Republican caucus or the California Democratic and Republican primaries. The only races in which polls conducted later were noticeably more accurate are the New Hampshire Republican primary, the South Carolina Republican and Democratic primaries, and the Wisconsin Democratic and Republican primaries – and most of these events were later in the calendar, when the field of candidates was smaller. However, in the most problematic races, particularly the Iowa Democratic caucus and the New Hampshire Democratic primary that preceded it, the final polls did not seem to improve as Election Day approached.
We attempted to test this further by using the micro-level datasets provided to the committee. Three of the New Hampshire datasets contained a variable for the interview date (CBS and Gallup).
26 We merged cases containing relevant common variables from these datasets and used a logistic regression model to test whether the timing of the interview relative to the election had a significant relationship with vote preference for Hillary Clinton, while controlling for other factors. Specifically, we estimated a logistic regression with vote preference for Clinton as the dependent variable and number of days until election, gender, age, education, survey firm, interviewer demographics, and Democratic Party affiliation as the independent variables.
This approach is limited in two important ways. First, the key independent variable has a very narrow range of values because the interviewing dates for these three surveys were between January 4th and 7th. Second, any observed effect from the number of days until the election will be confounded to some extent by other factors such as ease of contact, as reported earlier. The regression analysis suggests that the number of days until the election did not have a significant effect on the likelihood of favoring Clinton after controlling for the other factors. The estimated model parameters are provided in Appendix Table 2. This null finding does not rule out the possibility that vote preferences changed in the days leading up to the New Hampshire primary, but we find no support for a shift toward Clinton during the January 4–7 time period.
Summary: We found that the timing or field periods of the polls may have contributed to estimation errors in the early Democratic events in Iowa and New Hampshire, though we lack the full set of relevant data for proper evaluation. The timing of the New Hampshire primary, so closely following the Iowa caucus, appears to have contributed to instability in or changing preferences and, in turn, in the poll estimates.
26 The University of New Hampshire dataset also contained the interview date, but it did not contain all of the other predictors in the model. A reduced model was estimated so that cases from all three surveys were included, and the results did not change appreciably. There was still no significant effect from the number of days until the election.
Social Desirability
Some election observers speculated that the New Hampshire polls overestimated support for Obama because some respondents told interviewers that they would vote for him but actually voted for Clinton (Nichols 2008; Robinson 2008). Such intentional misreporting in voter preference polls is attributed to latent bigotry or intolerance in conjunction with an inclination to provide socially desirable responses.
27 However, in the New Hampshire pre-primary polls, the estimation error did not derive from overestimating support for Obama – which could have been driven by latent racism among respondents – but from underestimating support for Clinton. Therefore, latent misogyny cannot explain the errors in the New Hampshire polls because it would have had the opposite observed effect: the polls would have overstated support for Clinton rather than understated it.
Several compelling pieces of evidence suggest that the New Hampshire estimation errors were probably not caused by the “Bradley effect” – or the tendency for respondents to report a preference for a black candidate (Obama) but vote instead for a white opponent. A meta-analysis by Hopkins (2008) indicates that while the Bradley effect did undermine some state-level polls in previous decades, there is no evidence for such an effect in recent years. In the 2008 general election, the very accurate final poll estimates of Barack Obama’s fairly decisive victory over John McCain dispelled suspicion that the Bradley effect was at play during the final weeks of the fall contest. There is also a conspicuous lack of evidence for a Bradley effect in the primary contests outside of New Hampshire. Of the 81 polls conducted during the final 30 days of the Iowa, South Carolina, California, and Wisconsin contests, the vast majority (86%) over-estimated Clinton’s relative vote share, while just 14% over-estimated Obama’s relative vote share. This finding is based on the signed direction of
A for each survey.
28 Furthermore, as reported in Table 3, poll estimates of Obama’s vote share in New Hampshire were quite accurate – it was only Clinton’s share that was consistently underestimated. However, it is still possible that intentional misreporting occurred during the lead up to the New Hampshire Democratic primary because of the interaction between the race of the interviewer and the race of the respondent. If social desirability influenced respondents’ answers, we would expect to observe more support for the African-American candidate when the interviewer was African American than when the interviewer was not, based upon an assumption that the respondent could correctly infer the race of the interviewer over the telephone. If respondents were answering truthfully, we would expect to find no statistically significant difference between the vote preferences recorded by African- American interviewers and interviewers of other races.
The interviewer effects approach has substantial drawbacks. To the extent that respondents misreport regardless of interviewer race, this test will understate a social desirability effect. Also, interviewing staffs in the United States tend to be comprised mostly of Caucasians; consequently, the number of interviews conducted by African Americans is often low, yielding low statistical power for the test. And interviewers were not assigned randomly to cases so that the race of interviewer effects could be discerned without confounds. Results from race-of-interviewer effects analysis should be interpreted with these factors in mind.
In October, Gary Langer (2008a) of ABC News reported results from his analysis in the general election of the relationship between the race of the interviewer and the respondent. He found no evidence of racially motivated misreporting. Other pollsters also found no such evidence. As mentioned above, however, this does not rule out the possibility that racially motivated misreporting occurred during the primaries.
CBS News polling director Kathleen Frankovic (2008) used panel data to test whether racial attitudes affected the New Hampshire polls. She noted that voters who are concerned that their candidate preference may be socially unpopular could contribute to polling error in two ways. They could misreport their true preference or they could decline to be interviewed altogether. The CBS panel data was used to evaluate the latter hypothesis – were voters opposing Obama less likely to be interviewed in New Hampshire? Frankovic’s analysis suggests that the answer is “No.” The January response rate for those who supported Obama in November was similar to the January response rate for those who supported Clinton in November (74% and 68%, respectively). This difference is in the expected direction, but the magnitude is not large enough to explain the error in the polls. CBS News post-stratified their January sample to account for this difference in response rates.
While informative, this analysis has an important limitation. The test is based on people who already agreed to participate in a survey. This step may have filtered out many of those who would decline a survey request for fear of offending someone with their candidate preference. This limitation could explain the lack of a large difference in the January response rates. The CBS News analysis, therefore, does not rule out the possibility that Obama supporters were more likely to respond than those who did not support him.
The committee was also able to conduct analysis on the topic. Three survey organizations provided data to the committee that could be used to test for a race-of-interviewer effect in the New Hampshire Democratic primary. Gallup, CBS News, and the University of New Hampshire included the race of the interviewer in their survey dataset; but only CBS News included race of the respondent.
Based upon the 2006 U.S. Census estimate of the proportion of the New Hampshire population that is white (95.8%), we assumed in our analysis that all of the survey respondents were white.
29 We combined data from the three surveys to increase statistical power and compared reported vote preference among respondents who spoke with a white interviewer to respondents who spoke with an African-American interviewer. The results are displayed in Table 16. In the pooled analysis, Obama led Clinton 36% to 35% when the interviewer was white, and he led 43% to 29% when the interview was black. This finding is in the direction of a social desirability effect and is statistically significant. Using just the CBS News dataset, we performed the same analysis looking only at white respondents. We find that black interviewers recorded higher support for Obama than white interviewers. Although the effect is not quite statistically significant by standard levels due to small sample size (
p=.13), it is quite noticeable and in the expected direction. We also tested for this effect in a multivariate setting. The race of the interviewer was a significant predictor of vote preference for Clinton when controlling for other factors in the logistic regression presented in Appendix Table 2.
Overall, these findings suggest that the misreporting of candidate preference due to racial sensitivity to black interviewers may have contributed to the overstatement of support for Obama relative to Clinton in the New Hampshire Democratic primary polls. It could also be the case that a social desirability effect was at play even when the interviewer was white. Absent individual-level vote data to append to these datasets, we are unable to test that hypothesis rigorously.
We used these same pooled datasets to test whether vote preferences reported in the polls were influenced by the gender of the interviewer. Just as social desirability pressure may have led some Clinton supporters to report they would vote for Obama, so might some Obama supporters report they would vote for Clinton if they were interviewed by a female interviewer. We did not find strong evidence that interviewer gender influenced responses. The bivariate findings are presented in Table 17. Among male respondents, the Obama lead was 14 percentage points for male interviewers and 17 percentage points for female interviewers, indicating no effect from interviewer gender. Among female respondents, Clinton had a 3 percentage point lead when the interviewer was male and a 6 percentage point lead when the interviewer was female. This finding for female respondents is in the direction of a social desirability effect and is statistically significant. Interviewer gender is only marginally significant (
p=.079) in the multivariate model. Any misreporting favoring Clinton in New Hampshire does not, however, help to explain why her support was underestimated in the polls.
Both the race-of-interviewer and gender-of-interviewer analyses are limited because they rely solely on survey responses. Ideally, we would be able to compare survey responses about candidate preference with actual voting behavior, and the interviewers would have been randomly assigned to cases rather than in conjunction with their schedules or other factors.
Summary: We found mixed evidence for social desirability effects on polling errors. Social desirability pressures may explain a small proportion of the error but probably no more than that. In a pooled analysis of three New Hampshire surveys, we find that support for Obama is significantly greater when the interviewer is black than when he or she is white. In the same analysis, however, Obama is still favored over Clinton among respondents interviewed by a white interviewer, and the number of interviews taken by black interviewers was too small to affect the overall estimates.
27 There is a dispute about whether and to what extent a “Bradley effect” ever existed. In terms of the original 1982 gubernatorial election in California, Tom Bradley received more votes than George Deukmejian at the polls but lost in the absentee balloting by a much larger amount in the first election when parties could organize efforts to make absentee ballots available to voters. The Republicans outmaneuvered the Democrats in this regard. But in 1989, in relation to the election returns, there appeared to be an over-report of support for David Dinkins in the pre-election polls for the New York mayor’s race and in an exit poll estimating support for L. Douglas Wilder in the Virginia governor’s race. (Traugott and Price, 1992) See also a discussion by Lance Tarrance during the 2008 general election campaign (http://www.realclearpolitics.com/articles/2008/10/the_bradley_effect_selective_m.html).
28 The reader should remember, as shown in Table 3, that the inclusion of undecided voters in the candidate preference distribution implies that individual candidate support levels will be underestimated. In New Hampshire, for example, 22 out of 22 polls in the month leading up to the primary underestimated support for Obama, and 18 out of 22 polls underestimated support for Clinton.
29 Available at http://quickfacts.census.gov/qfd/states/33000.html.
Weighting
In the common application of the technique, pre-election pollsters use weighting to align the demographic characteristics of their sample with known characteristics of the voting population after the interviewing is completed, usually based on Census data or information about the characteristics of registered voters.
30 Constructing a survey weight for primary pre-election polls is complicated by several factors. As with nearly all telephone surveys of the U.S. public, responding samples contain disproportionate numbers of women, seniors, and whites (among other demographic characteristics). Furthermore, the demographic and party identification characteristics of a primary electorate can shift substantially from election to election, making it difficult to identify appropriate parameter estimates for weighting. For example, what proportion of the New Hampshire Republican primary voters will be registered Republicans versus registered Independents, or others? In effect, the selection of weighting variables and the construction of the weights themselves are akin to building a likely voter model.
In 2008, primary pollsters addressed the weighting issue in a number of ways. Some procedures were implemented at the sampling stage, while others were implemented after data collection was completed. We discuss sampling and post-survey adjustment procedures together because, essentially, they were used as two different tools to accomplish the same task: achieving the appropriate levels of representation in the poll for certain groups that most commonly included women/men, young/old, white/black, Hispanic/non-Hispanic, and registered or self-identified Independents, Democrats, and Republicans, depending upon the state and available data from it.
Table 18 presents a summary of the procedures implemented for each poll studied by the committee. At the sampling stage, polls using registration-based sampling (RBS) made use of the information on the frame. Most of the RBS pollsters drew their samples based on what they knew about each person’s voting history, party registration, and/or demographics at some time in the past. At the post-survey adjustment stage, pollsters used several different procedures that appear related to the mode of administration. Two of the three IVR pollsters who were willing to discuss their methodology described how they deleted cases to make their samples more representative.
31 They randomly deleted a subset of cases from demographic groups, such as older women, who were overrepresented in the
responding sample. None of the CATI pollsters used this deletion technique. Instead, the CATI pollsters generally used an iterative-proportional fitting technique (e.g., raking) to align the responding sample with population control totals for several relevant dimensions.
The committee is not in a position to evaluate fully the relative merits of case deletion versus post-stratification weighting. With full disclosure of the specific procedures employed, the application of weights has an advantage in that a secondary analyst can compare the weighted distribution to the unweighted distribution to assess the impact of the weights. And if such an analyst does not agree with the algorithm or its results could in principle apply their own population-based nonresponse adjustment and assess its relative efficacy compared to the original. Deleting a case could have roughly the same effect on point estimates as assigning it a weight close to zero under certain assumptions, such as a simple random procedure for deletion; but the absence of such a case from the publicly available dataset does not provide a mechanism for assessing alternative models for deletion. Table 6 shows that the IVR polls (some of which use deletion) performed at least as well as the CATI polls (which do not use deletion). In terms of poll accuracy, we found no discernable difference between these adjustment techniques. But given the potential confounding of mode and adjustment technique, we could not evaluate the relative merits of these two adjustment techniques without additional information.
An issue of serious concern, however, is whether or not these procedures are appropriately reflected in the margins of error. If cases are deleted, the margins of error should be increased accordingly. None of the pollsters who used the case deletion approach disclosed the number of cases that were deleted. This information is critical to understanding the magnitude of the adjustment and implications for the variance of the survey estimates. Similarly, for post-stratified surveys, the variance in the weights should be reflected in the margin of error.
In addition to presenting this overview of procedures, we sought to evaluate empirically the effect of weighting in the primary polls by comparing weighted trial heat estimates to unweighted estimates where possible. The results are shown in Table 19. In general we found the unweighted estimates were very similar to the weighted estimates, and the few differences do not lead the weighted estimates to be consistently more accurate than the unweighted estimates. For example, in the New Hampshire Democratic primary, the unweighted University of New Hampshire estimates were 38% Obama and 33% Clinton, for an
A value of 0.21. The weighted estimates were 39% Obama and 30% Clinton for an
A value of 0.33. The reader may recall that higher
A values reflect a greater deviation between the survey estimates and the actual election result. The results in Table 19 suggest that the weights applied to several of the primary polls were not particularly effective in reducing errors in the estimates of candidate preference. But these weights were often combined with the likely voter model. This makes it difficult to segregate the effects due to weighting from the effects due to likely voter estimation. This seems to be especially true for Gallup, whose unweighted estimates were the closet to the Election Day outcome – though they still showed a slight (37% to 36%) Obama lead – but whose weighted estimates were furthest from the final result.
Summary: We found strong evidence that faulty weighting techniques explain some (but not all) of the polling errors. In three of the four surveys suitable for examination, the weighted New Hampshire estimates were less accurate than the unweighted estimates. Because all four unweighted estimates still had Obama in the lead, however, weighting cannot explain all of the error in the polls.
30 In some cases, pre-election pollsters are now combining weights to account for sample representation with adjustments made for likelihood of voting. For example, they assign unlikely voters a weight of zero to delete them from the analysis. There is additional discussion of such procedures in the section on Likely Voter Definitions, above.
31 This information was provided in follow up telephone conversations and not in their original submissions of materials for their polls.
Time of Decision
Two survey organizations polling in New Hampshire - Gallup and CBS News - conducted panel studies in which the same sample of voters was interviewed twice. These types of studies can be highly informative because they allow researchers to evaluate changes in preference the level of the individual voter, not just aggregate change. The two New Hampshire panel studies were designed quite differently. The Gallup study is a “before and after” panel in that the sample was interviewed immediately before the January 8th primary (January 4-6, 2008) and immediately after (January 11-29, 2008). The CBS News study involved two surveys with the same sample that were both conducted before the primary, although they were spaced several months apart (November 2-12, 2007 and January 5-6, 2008).
The re-interview rates in both studies were quite high. The original Gallup sample included 1,179 respondents planning to vote in the Democratic primary and 1,180 who said that they planned to vote in the Republican primary. The Gallup post-election survey included interviews with 818 of them (69%) who reported voting in the Democratic primary and 800 (68%) who reported voting in the Republican primary. In the CBS News study, three-quarters (77%) of the likely Democratic primary voters identified in November completed the January re-interview.
The CBS News study measured change in candidate preferences between November 2007 and early January 2008. The results, presented in Table 20, suggest significant shifts in support during that time. Among those supporting Obama just days before the primary, only half had been supporting him in November. Those self-reported late-comers to the Obama campaign included significant numbers of former Clinton supporters.
The composition of Clinton supporters looked quite different. The vast majority (87%) of those supporting Clinton in early January had been planning to vote for her since November. The CBS News study found no evidence that Clinton captured significant numbers of supporters from any of the other major candidates during that time. However, the study did not measure any changes in preferences that took place in the last two or three days of the campaign. Results from a trial heat follow-up question suggest that this gap in measurement may have been important. When asked if their mind was made up or it was too early to say for sure whom they would vote for, almost half (46%) of John Edwards supporters said that it was too early. This suggests that many Edwards’ supporters might have anticipated his third place finish and were strongly considering voting for one of the top two contenders.
The Gallup study began where the CBS study left off, albeit with a different group of New Hampshire voters. Results from the Gallup post-election interview are presented in Table 21. There is some evidence that Clinton won the vote of many Edwards supporters. Some 6% of Clinton voters said that they supported Edwards in the pre-election poll. Similarly, about 4% of Obama voters had recently come from the Edwards camp. For his part, Edwards looks to have attracted some former Obama and Clinton supporters. Unfortunately, sample sizes in both of these studies are too small to say with confidence whether many of these shifts within the electorate were significant.
The impact of this movement toward Clinton on the self-reported vote distribution is shown in Table 22, in which the data are not weighted. The level of self-reported support for Obama (about 37%) did not change substantially from the pre-election to the post-election survey. The proportion endorsing Clinton, however, increased by 4 percentage points. Taken together, the CBS News and the Gallup panels provide some evidence for a real, sizable shift in preferences to Obama in the final weeks of the campaign and a separate shift to Clinton in the final days. This changing terrain complicated the task of the pollsters and increased their difficulty in producing accurate projections.
The Gallup re-interview also provides some empirical insights on why some Democratic voters decided during the final days of the campaign to vote for Clinton. The survey explored voter reactions to the January 5th Democratic debate, Clinton’s emotional campaign appearance on January 7th, and get-out-the-vote (GOTV) efforts conducted January 7th and 8th. Each of these three events appears to account for a small increase in Clinton’s support at the close of the campaign.
Among Democratic primary voters, viewership of the final debate was quite broad. About two- thirds (65%) reported watching the debate, and an additional 23% reported hearing or reading news coverage about it. Only a small fraction (4%) of those who watched or saw news coverage said that they changed their candidate preference because of the debate.
Coverage of Clinton’s January 7th campaign appearance was followed closely by voters as well. Eight in ten (82%) of Democratic primary voters said that they had seen the video of the campaign appearance, and most viewers said that their reaction was positive or neutral. The Gallup re-interview also asked Clinton voters whether a number of different considerations factored into their decision to vote for her. One of the considerations was the video of that campaign appearance, and 15% reported that this was a factor in their vote.
Another speculation about a cause for late decisions was a highly publicized campaign event in which Senator Clinton responded in an uncharacteristically emotional way to a voter’s question. The day before the primary election, a woman in a Portsmouth asked Clinton how she stays so “put together” on the campaign trail.
32 Up until that point, Clinton had a reputation for being somewhat steely and hard-driving, but the question triggered a reflective, emotional response. This exchange revealed a more human side of Clinton that may have appealed to some Democratic voters, including some tepid Obama supporters. However, because most of the New Hampshire polls ended data collection prior to this event, these polls would have missed any related last-minute shift to Clinton.
The only information that we have about when people decided on their choice comes from self- reports of that decision in the exit polls. These are not easy data to evaluate because the exit poll respondents may have had difficulty in responding to or interpreting this question. For example, if a New Hampshire voter had been an early supporter of Hillary Clinton and then reconsidered his or her preference after Iowa but ended up voting for her on Election Day, would they have responded that they decided months ago or in the last day or two? On average, the pre-election polls suggested only about 3% of likely voters were undecided, but about 17% of voters told exit poll interviewers that they made their candidate choice on Election Day.
Table 23 shows proportions of respondents that were reportedly late deciders in the primary elections. Between 17% and 19% of those who voted in the New Hampshire Democratic or Republican primary, respectively, said they made up their mind on the day they voted. This was only slightly more than those who said they made up their minds at the same point in the same 2000 events (15% and 14%, respectively). Almost 40% said they made up their minds during the final three days of the 2008 campaign (38% of those voting in the Democratic primary and 39% in the Republican primary). This was the same as in the Democratic primary in 2004 (35%), but higher than the proportion of late deciders in either primary in 2004 (both 26%).
A closer look at the late deciders in the 2008 New Hampshire Democratic primary exit poll does not show enough late movement to Clinton to explain the error in the polls. According to analysis by Gary Langer of ABC News (2008b), the 17% of voters who said they made up their mind on the last day went narrowly for Clinton (39% to 36%) – a margin too small to explain fully the overestimation of support for Obama. Another 21% reported that they decided in the last three days, and they split narrowly for Obama (37% to 34%), again insufficient to explain the estimation problems in the pre-election polls.
Summary: We found that decision timing – in particular, late decisions – may have contributed significantly to the error in the New Hampshire polls, though we lack the data for proper evaluation. The fact that Clinton's “emotional moment” at the diner occurred after nearly all the polls were complete adds fuel to this speculation, as does the high proportion of New Hampshire Democratic primary voters who said they made up their minds during the final three days. It is also true that the percentages of voters reporting they made up their minds on Election Day or in the three preceding days are not substantially different from historical levels.
32 The exact question that prompted the response was “How did you get out the door every day? I mean, as a woman, I know how hard it is to get out of the house and get ready. Who does your hair?"
Participation of Independents Assuming an Obama Victory
Another hypothesis for the error in the New Hampshire polling estimates concerns the relative proportion of self-described Independents in the survey sample and in the electorate.
33 Some have speculated that Independents who liked both Obama and McCain could have been under the impression from the polls that Obama had locked up the Democratic race, and so they decided to participate in the Republican primary and support McCain. This hypothesis suggests that: (1) Independents should comprise a
larger segment of the Democratic primary electorate in the pre- election polls than in the exit poll because at the time that they were interviewed they had not decided to abandon Obama to support McCain, and (2) Independents should comprise a
smaller segment of the Republican primary electorate in the pre-election polls than in the exit poll for the same reason. We find mixed support at best for this hypotheses based on the four pre-election surveys that are reported in Table 24.
Each of these surveys ended data collection on January 6th, two days prior to the primary election. According to the New Hampshire exit poll, 44% of Democratic primary voters were Independents, compared to 37% of Republican primary voters, so they formed a slightly larger proportion of the Democratic than the Republican primary electorate. The pre-election surveys estimated that this figure would be between 39% and 45% on the Democratic side, a fairly narrow range that encompasses the exit poll reading. On the Republican side, the pre-election polls estimated that 30% (Gallup) or 34% (University of New Hampshire) of voters would be Independents, lower than the 37% indicated in the exit poll. These estimates for the Republican race suggest that some Independents classified as Obama voters before the election may have ended up McCain voters at the polls. This effect, however, is not especially large and is not conclusive evidence for an Independent shift.
Summary: We found little compelling information to suggest that Independents, by deciding in disproportionate numbers and at the last minute to vote in the Republican rather than the Democratic primary, contributed to the New Hampshire polling errors. The proportion of Independent voters in the Democratic primary pre-election polls is comparable to the proportion in the exit poll. Also, the differences between the exit poll and the pre-election surveys are not large enough to explain the errors in the polls.
33 New Hampshire has a party registration system whereby citizens must register as a Democrat, Republican or Undeclared. The Undeclared registrants may opt into either party’s presidential primary. This formal designation is not what is typically measuredin the pre-election polls, however; that is self-reported party identification. Andrew Smith of the University of New Hampshire reports that the Secretary of State counted 121,515 Undeclared voters in the Democratic primary (42.1% of all voters) and 75,522 Undeclared voters in the Republican primary (31.3% of all voters).
Allocations of Those Who Remained Undecided
In some pre-election surveys, pollsters allocate respondents who remain “undecided” to yield proportions reflecting support for each candidate that add to 100%. Pollsters may use a number of different allocation approaches, any of which may have an effect on the accuracy of their estimates. However, only one of the pollsters who collected data in the contests under study, the University of New Hampshire Survey Center, used an allocation method. Without allocation, they showed a 9 percentage point lead for Obama over Clinton, and with allocation, they showed the same lead.
34 Hence this cannot explain differences between the final pre-election estimates and the outcome of the elections.
34 See the University of New Hampshire press release for this result: http://www.unh.edu/survey- center/news/pdf/primary2008_demprim10708.pdf.
Ballot Order Effects
Another possible explanation for differences between the pre-primary poll estimates and the election outcomes concerns measurement in the election itself, rather than in the polls. Political methodologists have documented a small but non-trivial bias in favor of candidates listed first on election ballots (Miller and Krosnick 1998), one that appears to be robust in primary elections (Ho and Imai 2008). This bias is a version of a primacy effect (the opposite of the recency effect discussed in the Trial Heat Question Wording section), which is a cognitive bias leading people to select options presented near the top of a list when the list is presented visually, as on a ballot.
Jon Krosnick of Stanford University wrote an op-ed for the ABC News website explaining how a ballot order effect may have influenced the results of the New Hampshire Democratic primary (2008) and contributed significantly to discrepancies between the pre-election surveys and the election outcome in New Hampshire. Krosnick noted that, unlike previous primaries in the state, the 2008 contest featured the same ordering of candidate names on all ballots: an alphabetical list starting with the randomly drawn letter Z. Consequently, Joe Biden was listed first on every ballot, closely followed by Hillary Clinton, and Barack Obama was listed near the bottom of 21 candidate names. Krosnick estimates that Clinton received at least a three percentage point boost from her position near the top of the ballot order.
In other early primaries, ballot order does not explain differences between the election outcome and the pre-election polls. One reason for this may be that the list of candidates on the New Hampshire ballot was more than twice as long as the list in other primaries (21 names versus 8 names). Recency effects are generally thought to increase in size with list length. Ballot order rules, candidate name orderings, and election outcomes are presented in Table 25.
In South Carolina and Wisconsin, Clinton was placed near the top of the list and Obama was listed near the bottom. Had there been a strong ballot order effect, we would expect Clinton to have done better on Election Day in these states than in the polls. In fact, the reverse occurred. Obama’s actual margin of victory in South Carolina and Wisconsin was substantially greater than the margin suggested by the polls. This result does not rule out the possibility that Clinton benefitted from her higher ballot position in these states, but it suggests that any such effect was swamped by other factors. In California, the ordering was randomized and then rotated across districts, so there is no reason to believe that ballot order affected Election Day support for Clinton or Obama in that primary.
The ballot order analysis presented here is purely observational and therefore may be limited in its generalizability. We do not find evidence for a ballot order effect for the top two candidates in the South Carolina, Wisconsin, or California primaries, but this does not rule out the possibility of such an effect in New Hampshire, especially given the much longer list of candidates on the New Hampshire ballot. Unfortunately, we have no way to test directly for a ballot order effect in New Hampshire. We can only infer the consequences, as Krosnick does, from similar elections in which ballot order was varied across precincts. Absent more compelling data, we find no reason to discount Krosnick’s hypothesis.
Summary: Ballot order is one of the possible explanations for some of the estimation errors, and it may explain some of the error in the New Hampshire Democratic primary polls. Krosnick’s analysis of recent New Hampshire primaries suggests a 3 percentage point effect from ballot order (2008). Clinton was listed near the top and Obama near the bottom on every ballot, which is consistent with greater support for Clinton in the returns than in the pre-election polls. This conclusion is based upon an observation and not experimental evidence to evaluate ballot order effects.
Conclusions
The committee evaluated a series of hypotheses that could be tested empirically, employing information at the level of the state, the poll, and, in limited cases, the respondent. Since the analysis was conducted after data collection, it was not possible to evaluate all of the hypotheses in a way that permitted strong causal inferences. And given the incomplete nature of the data for various measures, it was not possible to pursue all hypotheses about what might have happened, nor was it possible to pursue multivariate analyses that looked simultaneously at multiple explanatory factors. In the end, however, the analysis suggests potential explanations for the estimation errors and the unlikely impact of other factors. The research also highlights the need for additional disclosure requirements and the need for better education by professional associations like AAPOR, the Council of American Survey Research Organizations (CASRO), and the National Council on Public Polls (NCPP).
Polling in primary elections is inherently more difficult than polling in a general election. Usually there are more candidates in a contested primary than in a general election, and this is especially true at the beginning of the presidential selection process. For example, there were a total of 15 candidates entered in the Iowa caucuses and more than 20 names on the New Hampshire primary ballot. Since primaries are within-party events, the voters do not have the cue of party identification to rely on in making their choice. Uncertainty among voters can create additional problems for pollsters. Turnout is usually much lower in primaries than in general elections, although it varies widely across events. Turnout in the Iowa caucuses tends to be relatively low compared to the New Hampshire primary, for example. So estimating the likely electorate is often more difficult in primaries than in the general election. Furthermore, the rules of eligibility to vote in the primaries vary from state to state and even within party; New Hampshire has an open primary in which independents can make a choice at the last minute in which one to vote. All of these factors can contribute to variations in turnout, which in turn may have an effect on the candidate preference distribution among voters in a primary compared to the general election.
The estimation errors in the polls before the New Hampshire Democratic primary were of about the same magnitude, as measured by the statistic
A, as in the Iowa caucus. But the misestimation problems in New Hampshire received much more – and more negative –coverage than they did in Iowa. Because of a small level of undecided voters in every poll, the estimates for each individual candidate were generally lower than the proportion of votes they received. And these underestimates tended to be greater for the first place finisher than the second place finisher. But the majority of the polls before New Hampshire suggested the wrong winner, while only half in Iowa did.
All of the committee’s conclusions are summarized briefly in Table 26. Factors that may have influenced the estimation errors in the New Hampshire pre-primary polls include:
- Respondents who required more effort to contact seemed more likely to support Senator Clinton, but most interviews were conducted on the first or second call, favoring Senator Obama.
- Patterns of nonresponse, derived from comparing the characteristics of the pre-election samples with the exit poll samples, suggest that some groups that supported Senator Hillary Clinton were underrepresented in the pre-election polls.
- Variations in likely voter models could explain some of the estimation problems in individual polls. Application of the Gallup likely voter model, for example, produced a larger error than their unadjusted data. While the “time of decision” data do not look very different in 2008 compared to recent presidential primaries, about one-fifth of the voters in the 2008 New Hampshire primary said they were voting for the first time. This influx of first-time voters may have had an adverse effect on likely voter models.
- Variations in weighting procedures could explain some of the estimation problems in individual polls. And for some polls, the weighting and likely voter modeling were comingled in a way that makes it impossible to distinguish their separate effects.
- Although no significant social desirability effects were found that systematically produced an overestimate of support for Senator Obama among white respondents or for Senator Clinton among male respondents, an interaction effect between the race of the interviewer and the race of the respondent did seem to produce higher support for Senator Obama in the case of a black interviewer. However, Obama was also preferred over Clinton by those who were interviewed by a white interviewer.
Factors unlikely to have contributed to the estimation errors in New Hampshire include:
- The exclusion of cell phone only (CPO) individuals from the samples did not seem to have an effect. However, this proportion of citizens is going to change over time, and pollsters should remain attentive to its possible future effects.
- The use of a two-part trial heat question, intended to reduce the level of “undecided” responses, did not produce that desired effect and does not seem to have affected the eventual distributions of candidate preference.
- The use of either computerized telephone interviewing (CATI) techniques or interactive voice response (IVR) techniques made no difference to the accuracy of estimates.
- The use of the trial heat questions was quite variable, especially with regard to question order, but no discernible patterns of effects on candidate preference distributions were noted. While the names of the (main) candidates were frequently randomized, the committee did not receive data that would have permitted an analysis of the impact of order.
- Little compelling information indicates that Independents made a late decision to vote in the New Hampshire Republican primary, thereby increasing estimate errors.
Factors that present intriguing potential explanations for the estimation errors in the New Hampshire polls, but for which the committee lacked adequate empirical information to thoroughly assess include:
- The wide variation in sample frames used to design and implement samples – ranging from random samples of listed telephone numbers, to lists of registered voters with telephone numbers attached, to lists of party members – may have had an effect. Greater disclosure about sample frames and sample designs, including respondent selection techniques, would facilitate future evaluations of poll performance.
- Differences among polls in techniques employed to exclude data collected from some respondents could have affected estimates. Given the lack of detailed disclosure of how this was done, it is not possible to assess the impact of this procedure.
Finally, factors that appeared to be potential explanations for estimation errors, but for which the committee lacked any empirical information to assess include:
- Because of attempts by some states to manipulate the calendar of primaries and caucuses, the Iowa and New Hampshire events were rescheduled to the first half of January, with only five days between the events, truncating the polling field period in New Hampshire following the Iowa caucus.
- Given the calendar, polling before the New Hampshire primary may have ended too early to capture late shifts in the electorate there, measuring momentum as citizens responded to the Obama victory in the Iowa caucus but not to later events in New Hampshire such as the restaurant interview with Senator Hillary Clinton.
- The order of the names on the ballot – randomly assigned but fixed on every ballot - may have contributed to the increased support that Senator Hillary Clinton received in New Hampshire.
All of the information provided to the committee is being deposited in the Roper Center Data Archive, where it will be available to other analysts who wish to check on the work of the committee or to pursue their own independent analysis of the pre-primary polls in the 2008 campaign.
References
Blumberg, Stephen J. and Julian V. Luke. 2008. “Wireless Substitution: Early Release of Estimates from the National Health Interviews Survey, July-December 2007.” Web article posted for the National Center for Health Statistic at http://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless200805.htm.
Crespi, Irving. 1988.
Pre-election Polling: Sources of Accuracy and Error. New York: Russell Sage Foundation.
Curtin, Richard, Stanley Presser, and Eleanor Singer. 2000. "The Effects of Response Rate Changes on the Index of Consumer Sentiment."
Public Opinion Quarterly 64:413-28.
DeSart, Jay and Thomas Holbrook. 2003. “Campaigns, Polls, and the States: Assessing the Accuracy of Statewide Presidential Trial-Heat Polls.”
Political Science Quarterly, 56(4): 431-439.
Erikson, Robert S., Costas Panagopoulos, and Christopher Wlezien. 2004. “Likely (and Unlikely) Voters and the Assessment of Campaign Dynamics.”
Public Opinion Quarterly 68(4):588-601.
Erikson, Robert S. and Christopher Wlezien. 2008. “Likely Voter Screens and the Clinton Surprise in New Hampshire.” Web article posted for Pollster.com at http://www.pollster.com/blogs/likely_voter_screens_and_the_c.php.
Frankovic, Kathleen. 2008. “N.H. Polls: What Went Wrong?” Web article posted for CBS News at http://www.cbsnews.com/stories/2008/01/14/opinion/pollpositions/main3709095.shtml?source=RSS&attr=_3709095.
Groves, Robert M. 2006 “Non-response Rates and Non-response Bias in Household Surveys.”
Public Opinion Quarterly 70:646-675.
Ho, Daniel E. and Kosuke Imai. 2008. “Estimating Causal Effects of Ballot Order from a Randomized Natural Experiment: The California Alphabet Lottery, 1978 – 2002.”
Public Opinion Quarterly, 72:216-240.
Holbrook, Allison, Jon A. Krosnick, David Moore, and Roger Tourangeau. 2007. “Response Order Effects in Dichotomous Categorical Questions Presented Orally: The Impact of Question and Respondent Attributes.”
Public Opinion Quarterly 71:325-348.
Hopkins, Daniel J. 2008. “No More Wilder Effect, Never a Whitman Effect: When and Why Polls Mislead about Black and Female Candidates.” Poster presented at the Meeting of the Society for Political Methodology.
Jones, Jeff. 2008. “Cell Phones in Primary Pre-Election Surveys.” Paper presented at the Annual Conference of the American Association for Public Opinion Research.
Keeter, Scott. 2006. “The Impact of Cell Phone Noncoverage Bias in the 2004 Presidential Election.”
Public Opinion Quarterly 70:88-98.
Keeter, Scott. 2008. “The Impact of ‘Cell-Onlys’ on Public Opinion Polling: Ways of Coping with a Growing Population Segment” Web article posted for the Pew Research Center at http://people-press.org/report/391/.
Keeter, Scott, Michael Dimock, and Leah Christian. 2008. “Cell Phones and the 2008 Vote: An Update” Web article posted at http://pewresearch.org/pubs/964/.
Keeter, Scott, Carolyn Miller, Andrew Kohut, Robert M. Groves, and Stanley Presser. 2000. "Consequences of Reducing Non-response in a Large National Telephone Survey."
Public Opinion Quarterly 64:125-48.
Keeter, Scott, Courtney Kennedy, Michael Dimock, Jonathan Best, and Peyton Craighill. 2006. “Gauging the Impact of Growing Non-response on Estimates from a National RDD Telephone Survey."
Public Opinion Quarterly 70:759-779.
Kohut, Andrew. 2008. “Getting It Wrong.”
The New York Times, January 10. Available at http://www.nytimes.com/2008/01/10/opinion/10kohut.html?ref=opinion.
Krosnick, Jon A. 2008. “Clinton’s Favorable Placement on Ballots May Account for Part of Poll Mistakes.” Web article posted for ABC News at http://abcnews.go.com/PollingUnit/Decision2008/story?id=4107883.
Krosnick, Jon A. and Duane Alwin. 1987. “An Evaluation of a Cognitive Theory of Response-Order Effects in Survey Measurement.”
Public Opinion Quarterly 51:210-219.
Langer, Gary. 2008a. “Dissecting the ‘Bradley Effect’.” Web article posted for ABC News at http://blogs.abcnews.com/thenumbers/2008/10/the-bradley-eff.html.
Langer, Gary. 2008b. “A New Hampshire Post-Mortem.” Web article posted for ABC News at http://blogs.abcnews.com/thenumbers/2008/02/a-new-hampshire.html.
Langer, Gary. 2008c. “Cell-Onlies: Report on a Test.” Web article posted for ABC News at http://blogs.abcnews.com/thenumbers/2008/09/cell-onlies-rep.html.
Martin, Elizabeth A., Michael W. Traugott, and Courtney Kennedy. 2005. “A Review and Proposal for a New Measure of Poll Accuracy.”
Public Opinion Quarterly 69: 342-369.
Merkle, Daniel, and Murray Edelman. 2002. "Non-response in Exit Polls: A Comprehensive Analysis." In
Survey Non-response, ed. Robert M. Groves, Don A. Dillman, John L. Eltinge, and Roderick J. A. Little, pp. 243-58. New York: Wiley.
Mosteller, Frederick. 1948.
The Pre-Election Polls of 1948. New York: Social Science Research Council.
Nichols, John. 2008. “Did ‘The Bradley Effect’ Beat Obama in New Hampshire?” Web article posted for The Nation and available at http://www.thenation.com/blogs/state_of_change/268328.
Robinson, Eugene. 2008. “Echoes of Tom Bradley.”
The Washington Post January 11, A17.
Traugott, Michael W. Robert M. Groves, James M. Lepkowski. 1987. “Using Dual Frame Designs to Reduce Non-response in Telephone Surveys.”
Public Opinion Quarterly 51: 522-539.
Traugott, Michael W., Brian Krenz, and Colleen McClain. 2008. “Press Coverage of the Polling Surprises in the New Hampshire Primary.” Paper presented at the Annual Conference of the Midwest Association for Public Opinion Research.
Traugott, Michael W. and Vincent Price. 1992. “Exit Polls in the 1989 Virginia Gubernatorial Race: Where Did They Go Wrong?”
Public Opinion Quarterly 52: 245-253.
Traugott, Michael W. and Clyde Tucker. 1984. “Strategies for Predicting Whether a Citizen Will Vote and Estimation of Electoral Outcomes.”
Public Opinion Quarterly 48: 330-343.
ZuWallack, Randal, Jeri Piehl, and Keating Holland. 2008. “Supplementing a National Poll with Cell Phone Only Respondents.” Paper presented at the Annual Conference of the American Association for Public Opinion Research.
Appendix A
Members of the AAPOR Special Committee on 2008 Presidential Primary Polling
Glen Bolger, a partner and co-founder of Public Opinion Strategies.
Darren W. Davis, Professor of Political Science at the University of Notre Dame.
Charles Franklin, Professor of Political Science at the University of Wisconsin and co-developer of Pollster.com.
Robert M. Groves, Director, the University of Michigan Survey Research Center, Professor of Sociology at the University of Michigan, Research Professor at its Institute for Social Research, and Research Professor at the Joint Program in Survey Methodology at the University of Maryland.
Paul J. Lavrakas, a methodological research consultant. Mark S. Mellman, CEO of The Mellman Group.
Philip Meyer, Professor Emeritus in Journalism at the University of North Carolina.
Kristen Olson, Assistant Professor of Survey Research and Methodology and Assistant Professor of Sociology at the University of Nebraska-Lincoln.
J. Ann Selzer, President of Selzer & Company.
(Chair) Michael W. Traugott, Professor of Communication Studies and Senior Research Scientist in the Center for Political Studies at the Institute for Social Research at the University of Michigan.
Christopher Wlezien, Professor of Political Science and Faculty Affiliate in the Institute for Public Affairs at Temple University.
Appendix B
Charge to the Committee
- Examine the available information concerning the conduct and analysis of the pre-election, entrance and exit polls related to the 2008 primaries, including, but not limited to, press releases, post-election hypotheses, and evaluations conducted by the respective polling organizations as well as the news media.
- To the extent possible, interview those involved in the New Hampshire primary pre-election and exit polls to gather additional information for the committee.
- Synthesize and report on the findings from the various polling organizations. The report will include a summary of the factors that have been considered to date as well as recommendations and guidelines for future research. The report is tentatively scheduled for release in early April, 2008.
- Present the findings from the report in a public forum, hosted by the Kaiser Family Foundation at its Barbara Jordan Conference Center in Washington, D.C. The forum is tentatively scheduled for Spring, 2008.
- To facilitate research on all of the possible factors that may have contributed to the New Hampshire polling process this year, request all sample, survey and ancillary data associated with the polls leading up to and following the New Hampshire primary, including the New Hampshire exit poll data. The request will be broad in nature – for example, so as to inform hypotheses concerning nonresponse, sample and call-record data will be included.
- The Roper Center has generously offered to serve as the archivist for the data associated with the ad hoc committee. The Roper Center is keenly sensitive to the risks associated with these datafiles and the potential for exposure of confidential information. These data will be archived and maintained separately from the general access archives in a secure environment. Scholars interested in analyzing these restricted datasets will complete an Application for Restricted Data Use. Approval of the application will be made if the research purpose meets the criteria outlined by the ad-hoc committee. Further limitations to the researcher will include the destruction of the data files after a designated period of time, as outlined in the application.
- Establish seed-funding for support of additional research on the New Hampshire and other primary pre-election and exit poll data. This may include support of the work of the ad hoc committee to undertake analysis or the work of individual scholars interested in conducting research on the topic.
- Beyond the public forum in Spring, 2008, the findings of the ad hoc committee will be disseminated on the AAPOR web site and as part of a special panel discussion at the AAPOR 63rd Annual Conference.
Letter to Survey Firms from President Mathiowetz
I am writing to you as President of the American Association for Public Opinion Research (AAPOR) with regard to polling you have conducted in [STATE] as part of the 2008 presidential election campaign. As you may be aware, AAPOR has named an Ad Hoc Committee on 2008 Presidential Primary Polling. The task of that committee is to evaluate the methodology of the pre-election primary polls and the way they are being reported by the media and used to characterize the contests in pre-election press coverage. Although originally formed in response to the disparity between the pre-election polls and the outcome of the Democratic contest in New Hampshire, the mission statement of the committee has been expanded to include examination and archiving of primary data conducted in states other than New Hampshire.
The variation between the pre-election polls and the final outcome of the elections in, for example, New Hampshire (Democrats), South Carolina (Democrats), and California (Republicans), has raised questions concerning the profession as a whole as a reflection of the quality of estimates of candidate standing in those contests. The horserace aspect of polling, albeit only a small part of the work our profession does, offers an immediate and visible validation of survey estimates. In this way, the image of the entire industry is affected by the quality of the estimates that are made at the end of political campaigns. We are a profession that benefits from our collective understanding of the sources of errors that impact our estimates. After the 1948 Presidential election, the pollsters involved in the pre- election polling undertook an examination and analysis of the factors that contributed to the miscalling of that election. It is in that spirit, and because of the collective knowledge that will come from this work, that I ask for your cooperation with the request outlined below.
The AAPOR Code of Professional Ethics and Practices as well as the Principles of Disclosure of the National Council of Public Polls (NCPP) and the Code of Standards and Ethics of the Council of American Survey Research Organizations (CASRO) all call for the disclosure of essential information about how research was conducted when the results are widely disseminated to the public. At a minimum, we ask for disclosure of the essential information outlined by these codes. However, you will see that the request does go beyond the disclosure guidelines outlined in these respective codes. This information will be critical for the AAPOR Committee to pursue its evaluation.
The Ad Hoc Committee will focus on addressing empirically-based hypotheses that can be addressed post hoc–for example, whether differences in likely voter screening, turnout models, differential non-response, the allocation of undecideds, weighting procedures, and other sources of measurement error could have contributed to these estimation errors. To address these issues, the request outlined in the attached document is broad-based, ranging from full questionnaires to individual-level data to documentation of procedures. The committee is interested in obtaining information from every firm or research center that collected data prior to these elections and caucuses.
The Roper Center has offered to serve as the archivist for the data associated with the work of the committee because we expect that when the Ad Hoc Committee issues its report, others will be interested in examining how the committee came to the conclusions it did. The Roper Center is keenly sensitive to the risks associated with these data files and the potential for exposure of confidential information. In the short run, access will be limited to the committee (and research assistants working with the committee members). In the long run, the goal is to provide access to the data for other scholars. The Roper Center will work with individual pollsters to determine which files may eventually be made available to the broader community of scholars (for example, after an embargo period).
Scholars interested in analyzing these restricted datasets will complete an Application for Restricted Data Use; review of these applications will be completed by a joint committee of the Roper Center and a subgroup of the Ad Hoc Committee.
I look forward to working with you in the weeks to come. The issues we face as a profession are challenging; I hope the work of the committee sheds light on issues that may have been unique to the 2008 pre-election primary polls as well as those issues that will inform and improve the methodology of our industry in the years to come. If you have any questions or would like any additional information about AAPOR or the Ad Hoc Committee, please feel free to contact me.
Regards,
Nancy A. Mathiowetz
President, American Association for Public Opinion Research
Appendix C
AAPOR Special Committee on 2008 Presidential Primary Polls Information Request
The request for information and data has been separated into two parts, (1) information that is part of the AAPOR Standards for Minimal Disclosure and (2) information or data that goes beyond the minimal disclosure requirements. The information you provide could take one or more forms— documentation, tables, and individual-level data. Our mission is to be able to examine data that will empirically inform questions concerning the disjuncture between the pre-election primary polls and the election outcomes. Members of the committee are willing to work with you in order to obtain the information of interest, in whatever form is easiest for you to provide.
AAPOR Standards for Minimal Disclosure
As noted in the AAPOR code of professional ethics and practices, the following items should, at a minimum be disclosed for all public polls:
1. Who sponsored the survey, and who conducted it.
2. The exact wording of questions asked, including the text of any preceding instruction or explanation to the interviewer or respondents that might reasonably be expected to affect the response.
Here the committee would request that you provide the complete questionnaire–hard copy, executable, screen shots–in whatever format is feasible. This should indicate which question or questions are used as the likely voter screening and which question or questions are used for the trial heat. The questionnaire should indicate any randomization of questions or response options
3. A definition of the population under study, and a description of the sampling frame used to identify this population.
The description of the sampling frame should indicate whether or not the frame includes cell phones.
4. A description of the sample design, giving a clear indication of the method by which the respondents were selected by the researcher, or whether the respondents were entirely self- selected.
For those studies that include cell phones in the sample frame, the description should include the sample selection for these numbers.
5. Sample sizes and, where appropriate, eligibility criteria, screening procedures, and response rates computed according to AAPOR Standard Definitions. At a minimum, a summary of disposition of sample cases should be provided so that response rates could be computed.
In addition to sample sizes, demographic information on those screened out would be useful in distinguishing between nonresponse bias and turnout model bias.
6. A discussion of the precision of the findings, including estimates of sampling error, and a description of any weighting or estimating procedures used.
In addition to the standard weighting information, documentation on the weighting procedure used to produce final turnout estimate
7. Which results are based on parts of the sample, rather than on the total sample, and the size of such parts.
8. Method, location, and dates of data collection.
If interviewers are used for the data collection, please indicate if live interviewers or IVR technology was used.
Beyond Minimum Disclosure
To address the various hypotheses concerning the New Hampshire, South Carolina, California, and Wisconsin pre-election polls, the committee is requesting information and data beyond the minimum disclosure outlined above. For several of the items, the request is for the raw data; if the data are not available or can not be made available, the committee would benefit from the provision of the analysis tables as described. In cases where no written documentation is available, one or more of the committee members would be willing to discuss the procedures with you or one of your staff members.
I. Data
1. Individual-level data for all individuals contacted and interviewed, including those who failed the likely-voter screening and including all weights used in the production of the final estimates prior to the election, date of the interview, and interviewer identification number
A. In lieu of the individual-level data, demographic information on those who were screened out along with crosstabulations between vote preference and likely/ unlikely voters as well as registered/unregistered voters
B. Final estimate of the demographic composition of the turnout
C. Share of the voting age population represented by the turnout estimate, within demographic subgroup
D. Tabulations that reflect the combination of likelihood of voting and candidate preference, indicating how the trial heat question might differ by different estimates of turnout
E. If date of interview is not included on the individual level data file, distribution of candidate preferences (the trial heat question) by date of interview
F. In lieu of individual-level data, crosstabulations of voter preference by demographic characteristic, within subgroups. We are especially interested in the distributions of candidate preference by age, race, sex, party identification, political ideology, and religiosity
2. Reinterview data, if a post-election re-interview was conducted
3. To examine hypotheses related to social desirability, it would be beneficial to be able to examine characteristics of interviewers, matched to respondents. Listed in (I.1) above was a request for an interviewer ID as part of the individual file. Ideally, we would like to be able to link individual level records to characteristics of interviewers
4. To examine hypotheses related to nonresponse bias, the committee would need to have access to data for the full sample, including call record information, disposition of each sampled number, including attempts at recontacts
II. Documentation
1. Interviewer documentation, including instructions not included in the text of the questionnaire (e.g., instructions for probing “Don’t Know” responses).
2. Allocation rules for Don’t Knows and Undecideds
3. Documentation of the rules, if any, for sample allocation to interviewers
4. Documentation of approach to handling “early voting” in the composition of the final sample
5. The last press release or releases associate with your poll most proximal to the election.

Appendix D
Hypotheses for Sources of Error in 2008 Primary Pre-election Surveys
1. Likely voter screening
1a. Likely voter screening questions used in general elections do not work as well in (the 2008) primary elections.
1b. Likely voter screening questions used in primaries do not work as well in unusually high turnout primaries.
2. Turnout models/Turnout surge
2a. Because of the calendar and the proximity of events, voter interest is stimulated from one event to another.
2b. The higher turnout of African Americans was underestimated on the Democratic side. 2c. The higher turnout of women was underestimated on the Democratic side
3. Inability to capture last-minute deciders or changes in preference
3a. The nature of the contests (in 2008) means that many voters are making up their minds late and deciding to go to the polls (combination of turnout and decisions)
3b. Voters are changing their minds late; turnout estimates are all right but preferences change.
4. Misreporting issues
4a. Voters are misreporting preferences to interviewers (social desirability) because they are unwilling to say they won’t vote for an African American or a female candidate
4b. Respondents are misreporting their intention to vote (staying home or going to the polls) 4c. Misreporting is greater for “non-traditional” (non-white, non-male) candidates than for traditional candidates
5. Nonresponse bias
5a. The short time between events and the need to take previous results into account in measurement means that response rates are low, and nonrespondents are different from respondents.
5b. Differential nonresponse by key groups in the electorate affects the distribution of candidate preferences (African Americans, Whites, men, women).
6. Question wording effects
6a. Differences in the way that the “trial heat” question is asked account for differences in polls 6b. Differences in the way that the trial heat question is asked (e.g., explicit or implicit Don’t Know alternatives) produce different levels of Don’t Know or Undecided responses
6c. The ordering of the candidates in the trial heat question affects the distribution of responses (i.e., primacy or recency effects)
7. Question order effects
7a. The preceding questions affect responses to the “trial heat” question but do not have an impact in the voting booth
8. Allocation of undecideds
8a. Some results are being reported with undecideds included while others are being reported with them excluded
8b. Some results are being reported with the undecideds allocated (by different methods) while others are not
9. Sampling Issues
9a. Some groups are being oversampled, others under sampled
9b. In states where there are open primaries, Independents or Not identified people are being oversampled or weighted
9c. Weighting algorithms are not working as well in the primaries as in a general election.
9d. A large proportion of votes are being cast absentee or by other “early” procedures and they are not being captured in the survey or they are mis-weighted proportionately when combined with those who intend to vote in person on Election Day.
9e. Differential loss of cell phone only voters by state in certain primaries?
10. External Factors
10a. The rules of voting in the primaries are not adequately captured in the polling methodology (who is eligible to vote in a specific event)
10b. The order of the names on the ballot has an independent effect on the outcome (or in relation to the order of the names on in the “trial heat” question)
10c. Senator Clinton’s emotional response swayed voters at the last minute
10d. President Clinton’s sharp criticisms of Obama swayed African American voters at the last minute