identifying sources of survey error in 2012 presidential ... · pdf filefutures are traded.1...
TRANSCRIPT
Identifying Sources of Survey Errorin 2012 Presidential Election Polls:
A Dynamic Factor Analytic Approach∗
Jee-Kwang ParkAssistant Professor
Nazarbayev [email protected]
Adam G. HughesPhD Candidate
University of [email protected]
April 3, 2014
Abstract
Although election polls provide important information to voters, they are often biased andunreliable. Existing measures of poll accuracy fail to assess accuracy in the weeks priorto the election and fail to detect systematic manipulation by polling firms. Using datafrom the 2012 U.S. presidential election, we propose a new econometric method (dynamicfactor analysis) for estimating true electoral opinion in the months before an election. Ourestimate improves upon simple averages, smoothing estimates, and Bayesian models of truesupport by distinguishing between the kinds of survey errors in each poll. We provideempirical validation of the total survey error framework; we estimate that non-samplingerror accounts for the majority of survey error in 2012’s national presidential polls. Bycomparing individual poll results with our estimate, we create a measure of accuracy overtime and rank eleven major polling firms. We show which polling organizations succeededthroughout the campaign and identify which polling practices were most effective. Theresults of our analysis suggest that internet polls outperformed telephone polls and thatlarger sample sizes are correlated with inaccuracy.
∗Prepared for presentation at the Annual Meeting of the Midwest Political Science Association, April 3-6,2014, Chicago, Illinois.
1
1 Introduction
How accurate are election polls? The 2012 U.S. presidential election triggered
another heated debate about this old question when final polls from respected polling orga-
nizations proved to be surprisingly inaccurate. Barack Obama won the election by a margin
of 3.8%. However, major polling organizations failed to predict Obama’s easy victory: al-
though ABC/WP and Pew favored Obama by a 3% margin, other pollsters predicted a tied
race, or even victory for Mitt Romney. Indeed, quite a few final polls were further from the
mark than their so-called margin of error would suggest. Moreover, these polls systemat-
ically overestimated public support for Romney by 1% to 5%, which strongly suggests an
industry-wide polling bias in favor of Romney. Despite pollsters’ attempts to draw random
and unbiased samples, the 2012 election polls turned out to be less accurate and more biased
than those in earlier elections.
Polls issued in the final weeks before the 2012 election actually took place were particu-
larly inaccurate, suggesting manipulation and industry-wide herding behavior. In the month
of October 2012, a majority of polls consistently reported that Romney held a substantial
lead or that the race was tied. For example, Gallup reported that Romney would beat
Obama by between 0 and 7 points throughout October. According to these polls, Romney
never trailed Obama throughout the entire month. Gallup wasn’t alone in reporting favor-
able polling results for Romney: Rasmussen’s daily poll consistently reported that Romney
led by 0% to 4%, while ARG reported a 0% to 2% Republican lead. Almost all other polling
organizations (except Rand) described the race in October as neck and neck if not favoring
Romney.
Was Romney ahead in October as much as these polls indicated? Gallup, Rasmussen,
and ARG’s results contrast sharply with betting in the Iowa election market, where election
2
futures are traded.1 During October 2012, Obama never trailed Romney in the Iowa election
markets: he maintained a substantial lead. The prediction market Intrade also suggested that
Obama would win; in the final month of betting, the chance of Obama’s reelection was always
higher than 50%. At the same time, several October polls asked respondents: “Regardless
of who you might support, who do you think is most likely to win the presidential election?”
The results from this question, a survey item that is consistently better at predicting election
outcomes than personal vote preference questions (Rothschild and Wolfers 2012), reliably
showed Obama winning by a margin ranging from 18% (Pew) to 30% (AP-GfK). Overall,
it seems that these less traditional predictions directly contradicted the results reported by
well known pollsters.
However, Romney’s fortunes changed dramatically in many final election polls. Despite
the fact that there were no major political scandals, events, or new revelations about the
candidates in the final week of the election, several major polling organizations reversed
course and predicted a tied race or even victory for Obama. For example, Gallup consistently
reported that Romney held a 5 - 7 point lead over Obama until its second-to-last poll. Then
Romney’s lead suddenly dwindled to a single point in the final poll. Gallup’s explanation is
that the super-storm Sandy boosted support for the incumbent president (Newport 2012a).
But this explanation should invite skepticism: comparable four-point drops in support for
Romney did not occur among other published polls.2 Some political commentators suspect
that Gallup changed its likely voter weighting methodology so that its results more closely
resembled those of other polls (Silver 2012b, Blumenthal 2013).
1The IEM 2012 U.S. Presidential Election Markets are real-money futures markets where contract payoffsare determined by the popular vote cast in the 2012 U.S. Presidential Election. There are currently twomarkets: Pres12 VS – based on vote shares won by the two major party candidates in the 2012 U.S.Presidential election and Pres12 WTA – a winner take all market based on the popular vote plurality winnerof the 2012 U.S. Presidential election.
2In fact, Hurricane Sandy should decrease the incumbent president’s popularity. Katrina was a blow toBush’s approval rating in 2005, and research points out that natural disasters inflict a significant decreasein support for the incumbent (Achen and Bartels 2012).
3
The unjustifiable drop in support for Romney does not seem to reflect real changes in
public preferences. Rather, it appears to reflect Gallup’s strategic decision to avoid making
a prediction far from what other sources indicated. In the face of mounting evidence from
expectation-based polls, election prediction markets, and polling experts’ criticism (Blumen-
thal 2012, Silver 2012b), Gallup likely realized that it was overestimating public support for
Romney and then changed its polling methodology to reduce Romney’s lead. Several other
polling organizations likely followed suit. Since final presidential pre-election polls draw close
attention from the mass media and a wrongheaded prediction can damage a polling orga-
nization’s reputation, polling firms have strong incentives to adjust their final poll in ways
that reflect the consensus forecast. This kind of behavior often results in polls moving closer
to each other. And indeed, polls appear to have herded closer together in the last moments
of campaign period. Careful observers of the 2012 pre-election polls (Silver 2012b, Linzer
2012) reported on this phenomenon.
These events suggest that the accuracy of pre-election polls may be worse than we might
expect. Since retrospective evaluations of pollster accuracy are often based on manipulated
final polls, overall evaluations based solely on predictive accuracy (e.g. Panagopolous 2013)
could understate inaccuracy in the weeks before an election. Accordingly, commentators like
Nate Silver emphasize the importance of non-final polls in assessing poll accuracy (Silver
2012a). So the question we seek to answer is how to best measure accuracy across an entire
campaign.
In this paper, we propose a novel way to measure the ‘actual’ survey error of pre-election
polls. To overcome the aforementioned problems, we include not just final polls but also
other polls conducted in the months before an election in our analysis. To measure the
survey errors of non-final polls, we need an estimate of the underlying true support levels for
both candidates before the election, which are not known – unlike actual election results. We
resolve this challenge by adopting an advanced econometric model, dynamic factor analysis.
4
The dynamic factor model (DFM) estimates latent true values from correlated time series
with Kalman filtering in tandem with MLE or MCMC. Since a poll is composed of both the
true opinion value and survey error, different polling series from polling organizations are
very highly correlated with each other. On the other hand, individual polling series are also
serially correlated. By taking advantage of this double correlation in the pre-election polling
series, dynamic factor analysis models can quite precisely estimate the underlying trend in
support levels from multiple polling series. Although dynamic factor models have not been
applied in political science, they are widely used in economics to study business cycles, price
changes, the housing market and other topics.3
Our DFM-based estimate of polling accuracy shows that 2012 pre-election polls moved
closer to each other about one week before the election, which suggests that herding occurred
around that time (Figure 1). Thus, it can be misleading to judge a polling organization’s
performance solely by its predictive accuracy. Our results also provide strong empirical
support for the total survey error framework, which describes the error properties of sample
survey statistics as incorporating a variety of error sources beyond sampling error (Groves
and Lyberg 2010; Weisberg 2005). Although the total survey error framework is conceptually
convincing, it suffers from a lack of empirical studies buttressing its theoretical arguments,
especially the claim that non-sampling errors are the largest part of total survey error. This
gap is the result of practical difficulty estimating the size of non-sampling error.4 To the best
of our knowledge, there are few empirical studies that attempt to measure total survey error,
and existing research is experimental, based on company-wide polls (Groves and Magilavy
1984; Assael and Keon 1982). We believe that our study is the first empirical research
to estimate the size of total survey error for national polls conducted by major polling
3DFM can be used in both STATA and E-Views without installing additional software.4Several studies measure the size of a specific type of non-sampling error, including nonresponse error
(Brehm 1993; Berinsky 2004), interviewer effects (Kish 1962, Groves and Magilavy 1986), and interviewmode effect (Beland and St-Pierre 2008, Villanueva 2001).
5
organizations. Our results confirm the findings of experimental studies in the total survey
error literature: the size of non-sampling error far exceeds that of sampling error. More
concretely, we find that sampling error usually accounts for only 20 to 30% of total survey
error for most 2012 pre-election polls. Furthermore, a large proportion of non-sampling
error is attributable to the bias of survey errors, not to variance, as Groves and Magilavy
(1984) find in an experimental context. Our results show that early findings from total
survey error tradition remain valid even for national polls. To achieve more accurate poll
estimates, therefore, the most important methodological goals should be to reduce non-
sampling errors, especially bias, instead of increasing sample size in an effort to decrease
sampling error. Gallup’s 3000 person sample size did not help it predict the 2012 election
results more accurately than other firms with much smaller samples.
2 Survey Errors and Pre-Election Polls
Several studies examine the accuracy of pre-election polls, many by comparing final polls
with actual election results (Crespi 1988; Erikson and Sigelman 1995; Moore and Saad
1997; Panagopoulos 2009; Martin, Traugott, and Kennedy 2005; Traugott 2001; Traugott
and Wlezien 2009). However, since this comparison involves only final polls, it fails to
assess the degree to which polling firms manipulated final results. Herding behavior is one
form of manipulation, in which pollsters change poll results to approximate those reported
by other firms. We also suspect that polling organizations engage in result calibration:
the process of adjusting poll results to take into account information from other sources,
including poll aggregators, journalists, and betting markets. Additionally, the comparison
between final polls and election results involves only one poll; it is not a reliable measure
of polling accuracy even in the absence of herding or calibration behavior. With just one
poll, we cannot distinguish between an accurate poll with low survey error and one that was
6
simply lucky.
For these reasons, scholars have tried to use non-final polls as well as final polls in
accuracy assessments. Some researchers aggregate polls conducted during the last several
months before the election and then compare the firm-wide average of the polls with the
actual results on Election Day (Lau 1994; DeSart and Holbrook 2003; Traugott 2001). We
argue that while this measure does capture more information about each polling firm, it
is conceptually more problematic: a poll conducted a few weeks before the election is not
supposed to represent public support on election day but rather, public support on that
particular polling day. A poll conducted a few weeks before election should differ from the
actual election results and it ought to be compared with the actual latent public support on
that particular day.
In order to use non-final polls to assess poll accuracy, it is necessary to first estimate the
true level of public support for candidates in the weeks before the election. Several methods
have been proposed to estimate underlying true values. The first group of studies, which
use what is often called a “poll of polls,” adopts a straightforward approach to estimating
underlying true support. Insisting that a much larger sample size will provide a more precise
estimate of the underlying true value, those studies calculate an average level of support
using all polls conducted in the same time period and use the new average as an estimate of
the underlying true values. For example, Lau (1994) classifies the last month of polling in the
1992 presidential election into four sub-groups according to their polling week. Excluding
the poll whose accuracy is being judged, he calculates the average of other polls, weighted by
sample size, in the same week and uses the average as an estimate of the latent true support
levels for the week. Poll accuracy is measured by how close a poll is to the average of other
polls.5 The more distant a poll is from the average, the less accurate it is considered to be.
5The poll whose accuracy is judged is usually excluded. If we include the poll in estimating the average,the poll will be of part of the “reality” against which that poll is to be judged, which may lead to a dependencyproblem.
7
This is how Real Clear Politics calculates its RCP average.
We argue that this method is ineffective. Most importantly, there is no theoretical justi-
fication for why the poll of polls should be considered an unbiased and consistent estimate of
underlying true candidate support. Practically, when individual polls are collectively biased
(as they were in 2012), the poll of polls will be biased. In those cases, a poll that is close to
the true value but distant from many other, biased polls would actually be judged as more
inaccurate. And indeed, there is consistent evidence for polling industry bias: the 1996 pre-
election polls broadly overestimated public support for Bill Clinton (Ladd 1996; Mitofsky
1998; Silver 2012c). In the 2000 election, out of nineteen final pre-election poll estimates,
fourteen predicted victory for Bush, three a tie, and only two victory for Gore (Traugott
2001). 17 out of 23 final polls in the 2008 presidential election over-estimated Obama’s lead
while only 3 final polls overestimated McCain’s support (Panagopoulos 2009). In the 2008
New Hampshire Democratic primary, every poll wrongly predicted Barak Obama’s victory
over Hillary Clinton by 1 to 13 percentage points: Hillary Clinton won the actual vote by
3% (Traugott and Wlezien 2009).6
At the same time, the poll of polls method implicitly assumes that survey error can be
reduced to sampling error; accordingly, a larger sample size will provide a more accurate es-
timate of underlying true support. However, there are a variety of important non-sampling
errors that do not decrease as sample sizes increase. If non-sampling error is larger than sam-
pling error, any gains from the increased sample size in an aggregated poll will be marginal.
The total survey error framework emphasizes that non-sampling errors are larger and po-
tentially more problematic than sampling errors, and our analysis shows that sampling error
accounts for just 20 to 30% of total survey error in many 2012 pre-election polls. Empirical
6This wrong prediction prompted AAPOR to appoint a committee to review the performance of the polls.Traugott and Wlezien (2009) show that the problems of New Hampshire were not unique; the pre-electionpolls as a group generally underestimated the winner’s share of the voter for the two leading candidates inthe week leading up to each election.
8
studies of pre-election polls also consistently find that sample size is not significantly related
to predictive accuracy (Pickup et al. 2011, Arzheimer and Evans 2014). The 2012 election
is no exception. For example, Gallup’s sample sizes for pre-election polls in 2012 were two
or three times larger than those of other pollsters, but its performance was the worst in
terms of predictive accuracy (Panagopoulus 2013). Our analysis also shows that in the 2012
pre-election polls, sample size is negatively related to predictive accuracy; the larger sample
size is, the less accurate a poll is (Table 3). Thus, it is very unlikely that a poll of polls will
be closer to underlying true levels of support than individual polls.
Smoothing methods provide a similar but more advanced strategy for estimating under-
lying true opinion from non-final polls. Since a smoothed estimate is a “weighted”’ average
of adjacent polls, however, this smoothing method suffers from the same problems that we
identified for the poll of polls method. Furthermore, the smoothing method gives less weight
to outliers since it assumes that outlier polls are poorly conducted and that an estimate
would be closer to the underlying true values without them. When there is polling industry
bias (as in the 2012 pre-election polls), however, outliers may be the most correct polls. Thus,
a purported advantage of this method, robustness to outliers, is in fact a critical weakness
in the presence of polling industry bias.
Neither a poll of polls or poll smoothing is the most effective way to estimate underlying
true opinion because each lacks theoretical justification, ignores substantive non-sampling
error, and does not adjust for the prevalence of polling industry bias in pre-election polls.7
The most advanced existing approach to measuring pre-election poll accuracy is to pool
polls over an entire electoral campaign and apply Kalman filtering (Jackman 2005, Pickup
and Johnson 2008). Unlike other smoothing methods, Kalman filtering is theoretically proven
7Polling industry bias is not a phenomenon confined to the US; it is frequently reported in other countries’presidential elections. For example, South Korean polling companies were collectively biased in favor of theliberal candidate in the 2012 presidential election, while Egyptian polls showed bias against the Islamistcandidate, Morsi.
9
to provide an unbiased and consistent estimate of the underlying true value from an obser-
vation with measurement error. Since pre-election polls are ‘noisy signals’ in sense that the
true values are distributed with survey errors, the pooled estimate performs well. When
Jackman’s (2005) approach is adopted, house effects are determined ex post, and as a result,
estimated true opinion may be robust to polling industry bias. However, this application
of Kalman filter requires a priori knowledge of measurement error. In engineering and the
natural sciences, measurement error is already known, either through lab experiments or
predefined theoretical expectations. In polling, however, survey error is always unknown a
priori. For this reason, Jackman (2005) and other scholars replace the size of total survey
error with that of sampling error, which can be calculated from sample size before estimating
the model. As a result, the success of this application of Kalman filtering is predicated on the
degree to which polling accuracy is determined by sampling error - a problematic assumption.
The total survey error framework identifies many possible sources of error which influence
polling accuracy that are unrelated to sampling error. Futhermore, empirical research on
pre-election polls, especially the 2012 polls (Panagopoulos 2013), consistently shows that the
size of survey error is not related to sample size. In his extensive examination of pre-election
polling accuracy, Crespi (1988) found that sample size has trivial effects on polling accuracy:
“Once basic same size requirement are met, increasing the sample size may make less of a
contribution to poll accuracy than other aspects of poll methodology (64).”
For this reason, we introduce a “multivariate” Kalman filtering method which does not
require a priori knowledge of measurement error: dynamic factor analysis. The dynamic
factor model estimates the measurement error (survey error) from data with MLE or MCMC
rather than assuming that it is given. Although this model is not frequently used in political
science, dynamic factor analysis is an established method that has been widely used in
macroeconomics and finance since the early 1980s. We describe the approach in detail in the
next section.
10
3 Dynamic Factor Model
The dynamic factor model (DFM) is a multivariate time series approach comparable to
vector autoregression (VAR) or error correction models (ECM). However, not all multivari-
ate time series are the subject of dynamic factor modeling. DFM requires two or more time
series that show co-movement over time. DFM also assumes that the observed co-movement
of the time series is driven by an underlying a common factor or factors. However, each
time series is also moved by idiosyncratic disturbances arising from measurement error and
special features specific to each series. The goal of DFM is to estimate the underlying com-
mon factors from multiple correlated time series with disturbances. When the distribution
of idiosyncratic disturbances in the time series behaves reasonably well, we can obtain unbi-
ased and consistent estimates of the underlying factor(s). When idiosyncratic disturbances
approximate a Gaussian distribution, we use Kalman filtering in tandem with MLE to obtain
quite efficient estimates, even though the size of measurement error is not known a priori.
The dynamic factor model can be written as a linear state space model:8
Yt = α + γ(L) ft + et , where et ∼ N(0,σ2e )
ft = Ψ(L) ft−1 + ηt , where ηt ∼ N(0,σ2η)
Where Yt is a vector of observations, α is a vector of deterministic intercepts, ft is a vector
of common factors, and et is a vector of idiosyncratic disturbances. In the application we
develop, a poll estimate Yt is the linear combination of the bias from a polling organization,
α , an underlying true value (factor) at the time t, ft , and a random error at the time t, et .
Poll biases are assumed to be constants and random errors are assumed to have a Gaussian
distribution. Where there are N polling series, Yt , α , and et are N×1 vectors, respectively.
8This is a parametric representation of the dynamic factor model based on Stock and Watson’s notation.Non-parametric and semi-parametric representation, which is often used in principal component analysis,is represented quite differently. Refer to Stock and Watson (2012) for details of different dynamic factormodels and estimation procedures.
11
Where there are q dynamic factors, ft and ηt , in the transition equation there are q× 1
vectors. γ(L) is a vector of factor loadings, L is the lag operator, and the lag polynomial
matrices γ(L) and Ψ(L) are N×q and q×q matrixes, respectively. ηt is another idiosyncratic
disturbance term with a Gaussian distribution.9 The idiosyncratic disturbance term in the
observation equation and in the transition equation are not correlated even between lagged
or lead terms, E(et ,ηt−k) = 0 for all k. Finally, idiosyncratic disturbances in the observation
equation are not correlated with each other, E(eit ,e js) = 0 for all s if i 6= j .
When MLE is used in tandem with the Kalman filter, MLE estimates the parameters of
the space-state model, and the Kalman filter is used to obtain efficient estimates of latent
factors (the “unobserved state”). Unlike the pooling method, DFM does not pool different
polls nor treat them as a univariate time series. Instead, the DFM treats each poll as a time
series and recovers an estimate of underlying true public opinion from the high correlation
between polling series. The DFM is also different from the pooling method in the sense that
it allows the underlying true values (here, factors) to take the form of an AR(p) process and
to be stationary (among many others, Stock and Watson 1999 and 2001) or non-stationary
(Chang, Miller, and Park 2009).
The DFM has been widely used in macroeconomics and finance. For example, Sargent
and Sims (1977) show that two factors can explain a large fraction of fluctuations in im-
portant U.S quarterly macroeconomic indexes. Marcellino, Stock, and Watson (2003) and
Hamilton (1989) used the DFM to estimate the underlying business cycle from macroeco-
nomic indexes. Ng and Moench (2009) and Stock and Watson (2010) apply the DFM to
analysis of regional or state housing markets to see how much of price dynamics of those
housing markets can be explained by the national economy, the common factor.
9The assumption that the two disturbance terms have Gaussian distributions is not necessary. However,the assumption eases the burden of estimation. Furthermore, in the context of polling, this assumptionmakes sense: both sampling and non-sampling error are measurement errors. So, we assume that the twodisturbance terms have Gaussian distributions.
12
We argue that the analysis of a series of polls is a better application of dynamic factor
modeling than are macroeconomic time series. First, each poll has a simple structure, con-
sisting of both true public opinion and survey error. This is an ideal structure for DFM as
it mirrors the model’s structure, including an underlying factor and measurement error. In
most macroeconomic applications, economic variables are rarely combination of a factor and
measurement errors. There are many other specific variables affecting only a certain eco-
nomic time series, which makes estimation more challenging. At the same time, the number
of factors (underlying economic forces) that drive the business cycle, housing prices, or other
macroeconomic fluctuations is often unknown a priori, although we might expect that those
are strongly affected by some common economic forces. Thus, the identification of number of
common factors is an important problem in macroeconomic scholarship (Bai and Ng 2002).
In our polling application, however, we can safely assume only one common factor since an
opinion poll is composed of true public opinion and survey error.10 Furthermore, in macroe-
conomics, the distribution of the idiosyncratic disturbance term is not known a priori, or
even a posteriori in many cases, and a Gaussian distribution may be inappropriate. In fact,
the idiosyncratic disturbance term et often has a non-Gaussian or even non-standard distri-
bution, which makes estimation very challenging, if not practically impossible. In the case
of polls, theoretically the idiosyncratic term should have a Gaussian distribution because it
represents the random error of survey error.
Although our representation of polls in the state-space form may appear to be similar
to models used in existing scholarship (e.g. Jackman 2005, Pickup and 2007), in fact, it is
very different. In the previous studies, random errors (idiosyncratic disturbances), et , are
considered to be purely sampling errors:
poll = bias(house effect, systematic error) + true value + random error (sampling error)
10Survey error is composed of systematic error, which is constant (deterministic), and random errors. Theidiosyncratic disturbance term is equal to random errors.
13
In contrast, we adopt the total survey error (TSE) framework, in which random error is
distinct from sampling error. In our model, the random error et also contains non-sampling
error.
poll = bias(systematic error) + true value + random error (sampling error + non-sampling error)
By writing the model like this, we implicitly reject the view that house effects make
a poll biased but not necessarily more volatile. As the TSE framework suggests, house
effects have an influence on the variance of survey errors. Thus results from some polling
houses should fluctuate more than others, even if they have the same sample size. Polling
houses use different modes of interview, question wording, and weighting procedures. The
differences between these non-sampling error sources surely influence both the variance of
survey errors and poll biases. For example, Biemer (2010) shows that measurement error
inflates the random error of a survey.11 Thus, the TSE framework assumes that polling
houses will experience different random errors, due to their unique methods for adjusting to
non-sampling errors – even if they have exactly same sample size. This is one of the reasons
why we object the pooling method, which assumes that random error is solely determined
by sampling size, and favor DFM, which directly estimates the size of random error.
4 Data and Analysis
4.1 Data Sources and Accuracy Measure A
Twenty-nine organizations and teams released final polls that were conducted during
the last week of the 2012 presidential campaign, while more than ninety conducted at least
one pre-election poll for the presidential race. Not all of these polls are the subject of
our analysis, however: we include only polls that meet several criteria. First we analyze
11Biemer (2010) identifies five different non-sampling errors which theoretically affect the variance of surveyerror.
14
polls from reputable, well-known polling organizations (Silver 2012a). We focus on polls
conducted during the last two months of the campaign period, when polls are conducted on
a regular basis. We also restrict our analysis to organizations that conducted at least seven
polls in September, October, and November 2012. This standard is somewhat arbitrary, but
we believe that seven polls is the minimum number necessary to capture the distribution of
survey errors for each firm. In total, eleven polling firms meet these standards: ABC/WP,
ARG, DailyKos/SEIU/PPP, Gallup, GWU/Politico Battleground, IBD/TIPP, Ipsos, Rand,
Rasmussen, UPI/CVOTER, and YouGov/Economist.
The simplest way to evaluate the survey errors of final polls is to compare the difference
in poll estimates with that of actual election results. However, the process of comparing per-
centage point differences is problematic; polls include various numbers of undecided voters
(Martin, Traugott, Kennedy 2005). To address that problem, we use Martin, Traugott, and
Kennedy’s predictive accuracy measure, A.
A = ln
(PolledSupport f orRomneyPolledSupport f orObamaActualSupport f orRomneyActualSupport f orObama
)This measure first calculates poll odds: the ratio of two candidates’ estimated support in
the final poll. A value of 1 indicates a tie between Romney and Obama, values greater than
1 suggest Romney leading in the poll, and values smaller than 1 indicate a lead for Obama.
Then the poll odds are divided by the actual odds (vote share), generating the odds ratio.
Lastly, the odds ratio is logarithmized to produce the measure A. Positive values of the
measure A indicate polling bias for the Republican candidate and negative values show bias
in favor of the Democrat candidate. Although A is designed to assess predictive accuracy by
comparing final polls and actual results, it can also be used for non-final polls when we have
estimates of actual values.12
12Arzheimer and Evans (2013) adopt the predictive accuracy measure A and apply it to multi-candidate
15
Figure 1 shows the poll odds: Romney’s support divided by Obama’s support. The eleven
polls are represented with different line types and the DFM estimate of the underlying actual
ratio appears as a bold line.
Figure 1 about here
The figure shows that Romney’s support level decreased sharply in the aftermath of
the Democratic National Convention, which reconfirms existing evidence of a ‘convention
bounce’ (Gelman and King 1993; Holbrook 1996; Zaller 2002; Silver 2012d). Romney gained
popularity around October 5th, presumably as a result of his performance in the first pres-
idential debate on October 3rd. The majority of the polls, especially phone-based surveys,
indicate that Romney held a lead over Obama at least until late October. However, the
DFM estimate shows that Obama never trailed by Romney at any time in October. Thus,
most polling organizations (excluding internet polls) overestimated support for Romney.
4.2 Poll Performance: Does Mode Matter?
With DFM-estimated actual values, we calculate the measure A for non-final polls (Fig-
ure 2) to see how well polls tracked variation in candidate support throughout the campaign
period. Poll performance can be measured two different ways. The simpler approach com-
pares the average absolute difference of the measure A for each poll with the estimated
actual values. These values are reported in the fifth column of Table 1. We find that Rand
and YouGov/Economist fared relatively well and Rasmussen and Gallup did very poorly in
tracking public opinion. When bias exists in a poll, however, scholars in the TSE tradition
recommend mean squared error (MSE) as an alternative accuracy measure. MSE consists
of squared bias plus variance. When we use MSE, Gallup performs worst, closely followed
by Rasmussen. On the other hand, Rand and YouGov/Economist are tied for the top spot.
Both measures indicate that the Rand and YouGov/Economist polls are the best performers
election.
16
overall while Rasmussen and Gallup are the worst.
Figure 2 about here
On the whole, internet surveys fared very well in the 2012 presidential election: the
three best performing polls, Rand, YouGov/Economist, and Ipsos/Reuters each draw online
samples. The average absolute difference of the three internet polls is 0.032, which is much
smaller than 0.074, the average for the six interviewer-assisted phone surveys and 0.084 for
the two IVR polls. This finding contrasts sharply with Panagopoulos’ (2009) analysis of poll
accuracy in the 2008 presidential election, in which IVR polls fared best and internet polls
worst. In an analysis of the 2004 presidential election, Traugott (2005, p. 645) shows that
Harris Interactive’s online poll was quite inaccurate, especially when compared to a phone
survey from the same polling organization in the same period. Our contradictory results for
2012 should invite further research on internet polls: the question of whether or not they
will continue to outperform phone surveys in the future remains unresolved. The success of
internet polls in 2012 should also renew interest in non-probability polling, which was widely
condemned in the aftermath of the 1936 election’s Literary Digest poll and has fallen into
disfavor in much of the traditional polling world. New research (Wang, Rothschild, Goel, and
Gelman 2013) shows that non-probability polls based on samples drawn from X-Box video
gamers can be as accurate as traditional phone-based surveys with the joint use of multi-level
modeling and careful post-stratification. In the era of extremely high non-response rates and
increasing numbers of young people without landline phones, the probability poll may not
be as random as it should be. However, we refrain from definitive conclusions about the
success of internet polling and its future.
4.3 Determinants of Survey Error
The fact that the final polls in 2012 were biased in favor of Romney has become common
knowledge (Panagopoulos 2013; Silver 2012a; Blumenthal 2013). We find that non-final
17
polls are also biased in favor of Romney. Table 1 shows that almost all of the 243 polls
in our sample have positive values for A. That is, Romney’s popularity was consistently
overestimated for the last two months of the campaign period. We find that non-final polls
were slightly more biased than final ones: the average size of bias for the eleven polling
organizations we examine is 0.064 in terms of A. Table 2 shows that the average size of bias
for the twenty nine final polls in Table 2 is 0.057 and, for our subsample of nine, 0.056.
More importantly, we find that the size of a poll’s bias is noticeably larger than its
standard deviation, excluding IBD/TIPP and Rand (See Table 1). On average, bias consists
of the 65% of MSE of the eleven polls. Thus, we confirm the expectation, from the total
survey error approach, that house bias has a larger effect on poll accuracy than sampling
error across nationwide polls.
Table 1 about here
We also believe that even the remaining 35% of MSE which comes from variance is not
significantly determined by sampling error. This is because the actual standard deviation
does not decrease with larger sample sizes (Table 1). Rather, in our example, sample size
has a positive correlation of 0.58 with the actual standard deviation. That is, a poll with a
large sample size is more likely to have larger variance. Thus, the variance of polls seems to
be largely dominated by non-sampling errors, just as TSE research suggests.
Accordingly, polling accuracy or the precision of poll is not related to sample size, casting
doubt on approaches that use a poll of polls or the pooling method. We observe a link between
larger sample size and decreased accuracy. The correlation between the average sample size
of polling organization and its absolute difference in the measure A in Table 1 is 0.5 (N=11).
The more respondents a polling organization has, the larger its standard deviation. Even for
our 243 individual polls, sample size is still positively related to polling inaccuracy (Table
3).
To put this finding into context, consider evidence from the 2012 CCES. That study,
18
which was conducted by YouGov (separately from its work with The Economist) from Octo-
ber 31 to November 3, 2012, included 36,472 respondents. In the CCES, 49% of respondents
said that they intended to vote for Obama and 47% for Romney. This result is the exactly
same as a YouGov/Economist poll conducted from November 3 to November 5 with just 740
respondents. As there were no major political or economic shocks between October 31 and
November 5, the polls should show the same results - and indeed, they do. Thus, it seems that
very large sample sizes do not dramatically increase polling accuracy: YouGov/Economist
regularly used a sample of about 800 respondents but performed extremely well throughout
the 2012 election.
Table 2 about here
Table 3 about here
4.4 Poll Manipulation: Herding and Calibrating
As an election nears, the differences between polls tend to decrease. This phenomenon,
often called “herding,” involves polling organizations adjusting their estimates or methodol-
ogy at the end of the campaign so that their predictions are closer to those made by other
firms (Silver 2012a).13 Linzer (2012) shows that herding appears to have occurred in the
2012 election and that herding around the wrong numbers could lead to polling industry
bias. We find evidence that herding did occur in 2012, especially for polling organizations
with inaccurate pre-election polls. We also suggest that the term ‘calibration’ – rather than
herding – offers a more precise conceptual description of contemporary poll manipulation.
13Two possible explanations for the improved polling accuracy or the convergence at the end of campaignare increased sample sizes and more stabilized voting intention at the end of the campaign. Polling organiza-tions often increase their sample size at the end of the campaign to provide more precise results. However, nopolling organizations other than UPI/CVOTER substantially increased their sample size. We have alreadyshown that an increased sample size is unlikely to increase the accuracy of a poll. If convergence among pollsoccurs because of more stabilized public opinion at the end of campaign, the convergence should occur moregradually. However, Figure 1 shows that the convergence was sudden and restricted to the last ten days ofpolling.
19
Our analysis of eleven polls shows that some organizations engaged in herding behavior
at the end of the 2012 election, but not all firms appear to have adjusted their results. The
average absolute difference between non-final polls and the actual election result is .064, but
the difference between final polls and the election result is .056, just .008 less. Thus, the
extent of industry-wide herding behavior in 2012 appears slight.
However, we suspect that some polling organizations were more likely than others to
manipulate their estimates and herd toward their competitors. ABC/Post’s final poll has
an A value of 0.018, which puts the organization in fifth place among the 29 final polls for
predictive accuracy. However, the average absolute difference of its eleven non-final polls is
0.071, which is about four times larger. Of the eleven polls we examine across the election,
ABC/Post’s performance ranks fifth from the worst overall. The predictive accuracy of
DailyKos/SEIU/PPP is ranked 10th among final polls with a 0.039 difference. However,
the difference for non-final polls conducted by DailyKos/SEIU/PPP is 0.063. Both of these
pollsters appear to have improved dramatically in their final polls.
Gallup also appears to have participated in herding behavior in 2012. On October 10th,
Gallup officially announced a change in their polling method for “theoretical reasons” (New-
port 2012b). However, even after the official methodology change, the firm reported a
dramatic 4 point drop in Romney’s lead during the last week of the campaign, moving its es-
timates much closer in line with other polls. We believe there is no event that would justify a
sudden drop in support. Gallup’s executives (Newport 2012a) insisted that Hurricane Sandy
boosted support for the incumbent and weakened Romney’s lead, but research suggests that
natural disasters should decrease the incumbent candidate’s popularity (Achen and Bartels
2012). Indeed, our DFM-based estimate of actual public opinion shows that support for
Obama slightly slipped in the last week, perhaps due to Sandy.
Figure 3 about here
We represent herding behavior among these three polling firms graphically in Figure 3.
20
The figure also strongly suggests that UPI/CVOTER intentionally manipulated its results,
though probably somewhat earlier than the three polling firms.
We find that herding behavior was only adopted by a subset of major firms: the four
polling organizations that fared very poorly during the full campaign period (Figure 3).
In sharp contrast, none of the six best-performing polling firms show any signs of herd-
ing. Though almost all polling groups overestimated public support for Romney in 2012,
worse performing polls were especially likely to overstate Romney’s support, while better
performing polls were relatively less likely to do the same. If herding occurs only because
of polling firms’ desire to report results that are in line with others, worse performing and
better performing firms should have herded to the average of all estimates. Although the
term “herding” simply connotes group behavior, in this application herding is most likely to
lead pollsters with extreme results to move to the center. That is, more accurate polls should
have biased their estimated support in Romney’s favor, while less accurate polls should have
decreased support for Romney. However, only less accurate polls decreased support for
Romney as the election approached.
If herding occurs only because polling firms do not want to be singled out for making
the wrong prediction or to provide estimates that are in line with others, firms should herd
to the average of all estimates. Although the term herding simply connotes group behavior,
this kind of manipulation is most likely lead pollsters with extreme results to move to the
center. That is, more accurate polls should have biased their estimated support in Romney’s
favor, while less accurate polls should have decreased support for Romney. However, we do
not observe such a pattern in the data.
This suggests that poorly performing pollsters probably knew that they were overesti-
mating support for Romney and that more accurate pollsters were more confident that their
poll estimates were close to actual public opinion. As a result, more accurate pollsters may
have declined to adjust their models. This could not have occurred if polls had taken cues
21
just from other national polls – which were collectively biased in favor of Romney. For this
reason, we believe that the polling organizations calibrated their results according to cues
from other information sources, including electoral prediction betting markets, poll aggre-
gators, published criticism of Gallup’s likely voter model, state-level polls showing Obama’s
solid lead in most battleground states, and other informal sources of information. In major
online prediction markets, Romney’s chances were much lower than those reported by polls.
In the aftermath of the first presidential debate, Romney first attained majority support in
nationwide polls, but he did not actually take the lead in most statewide polls (Silver 2012a).
Surprisingly, many of the state-level polls were conducted by the same polling organizations
which conducted national polls - those that told a completely different story. Therefore, we
believe that polling firms had strong incentives to resolve these conflicting results by taking
cues from other sources of information.
We prefer to use the term ‘calibration’ to describe this kind of behavior, which is not
simply an automatic process of herding toward the average of other published poll results.
While herding is a form of calibration behavior, our evidence suggests that polling firms
adjusted their results in a more nuanced and strategic way.
5 Conclusion
Pre-election polls are more than simply predictive tools; the numbers they provide shape
political debate, galvanize supporters, and inspire political behavior. But these polls are
subject to survey error and intentional manipulation, each of which threatens to undermine
polls’ democratic functions. In this paper, we show that poll accuracy should be assessed
throughout a campaign, not just at its end. Using dynamic factor analysis, we estimated
true presidential candidate support in the two months before an election and identified
which polls accurately represented that support. We distinguished between sampling error
22
and other kinds of survey error, demonstrating the limited usefulness of ever-larger samples.
We build upon previous scholarship by providing a better way of estimating true support
and by showing how non-sampling error affects poll results. We also argue that polling firms
are more strategic actors than accounts of herding behavior imply. Instead, a subset of less
successful polling organizations appear to have calibrated their results to conform with a
variety of additional information sources beyond rival polls.
Our findings have implications for both practitioners and political scientists. First, poll
accuracy should not be assessed using only final polls, and assessment strategies like smooth-
ing and aggregating polls by week are potentially misleading approaches. Assessing accuracy
throughout the length of an election campaign is the best way to evaluate the performance
of election forecasters. Second, we provide evidence that non-probability based internet polls
outperformed traditional RDD polls for the first time in recent presidential elections. Future
research should more directly contrast the approaches of internet pollsters; sample matching
and post-stratification could introduce a variety of new kinds of errors beyond those identi-
fied in the total survey error framework. Finally, we showed that increasing sample size is
an ineffective way of improving poll accuracy; indeed, larger samples were associated with
worse performance. As polling firms prepare for upcoming electoral contests, we hope that
they will provide more accurate forecasts than they did in 2012.
23
References
[1] Achen, Christopher H., and Larry M. Bartels. 2012. “Blind Retrospection: Why SharkAttacks are Bad for Democracy.” Vanderbilt University working paper, retrieved fromhttps://my.vanderbilt.edu/larrybartels/files/2011/12/CSDI_WP_05-2013.pdf
[2] Arzheimer, Kai, and Jocelyn Evans. 2014. “A New Multinomial Accuracy Measure forPolling Bias.” Political Analysis 22.1: 31-44.
[3] Assael, Henry, and John Keon. 1982. “Nonsampling vs. Sampling Errors in Survey Re-search.” The Journal of Marketing : 114-123.
[4] Bai, Jushan. and Serena Ng. 2002. “Determining the Number of Factors in ApproximateFactor Models.” Econometrica 70:191-221.
[5] Beland, Yves, and Martin St-Pierre. 2008. “Mode Effects in the Canadian CommunityHealth Survey: A Comparison of CATI and CAPI.” In Advances in Telephone SurveyMethodology, eds. James Lepkowski, Clyde Tucker, Michael Brick, Edith de Leeuw, LilliJapec, Paul Lavrakas, Michael Link, and Roberta Sangster. New York: Wiley, 297314.
[6] Berinsky, Adam J. 2004. Silent Voices: Public Opinion and Political Participation inAmerica. Princeton, NJ: Princeton University Press.
[7] Biemer, Paul. 2010. “Total Survey Error: Design, Implementation, and Evaluation,”Public Opinion Quarterly 74(5): 817-848.
[8] Brehm, John. 1993. The Phantom Respondents. University of Michigan Press.
[9] Blumenthal, Mark. 2012. “Race Matters: Why Gallup Poll Finds Less Support For Presi-dent Obama” The Huffington Post, Jun. 7. Retrieved from http://www.huffingtonpost.
com/2012/06/17/gallup-poll-race-barack-obama_n_1589937.html
[10] Blumenthal, Mark. 2013. “Gallup Presidential Poll: How Did Brand-Name Firm BlowElection” The Huffington Post, Mar. 8. Retrieved from http://www.huffingtonpost.
com/2013/03/08/gallup-presidential-poll_n_2806361.html
[11] Chang, Yoo Soon, J. Issac Miller, and Joon Y. Park. 2009, “Extracting a CommonStochastic Trend: Theory with Some Applications,” Journal of Econometrics150.2: 231-247.
[12] Crespi, Irving.1988. Pre-Election Polling: Sources of Accuracy and Error. New York:Russell Sage Foundation.
[13] DeSart, Jay and Thomas Holbrook. 2003. “Campaigns, Polls, and the States: Assess-ing the Accuracy of Statewide Presidential Trial-Heat Polls.” Political Science Quarterly56:431-39.
24
[14] Erikson, Robert S., and Lee Sigelman. 1995. “Poll-Based Forecast of Midterm Con-gressional Election Outcomes: Do the Pollsters Get It Right?” Public Opinion Quarterly59:589-605.
[15] Erikson, Robert S., and Christopher Wlezien. 2008. “Leading Economic Indicators, thePolls, and the Presidential Vote.” PS: Political Science & Politics 41.4: 703-707.
[16] Gelman, Andrew and Gary King. 1993. “Why Are American Presidential CampaignPolls so Variable When Votes are so Predictable?” British Journal of Political Science23.4: 409-451.
[17] Groves, Robert M., and Lars Lyberg. 2010. “Total Survey Error: Past, Present, andFuture.” Public Opinion Quarterly 74.5: 849-879.
[18] Groves, Robert M., and Nancy A. Mathiowetz. 1984. “Computer Assisted TelephoneInterviewing: Effects on Interviewers and Respondents.” Public Opinion Quarterly 48.1B:356-369.
[19] Groves, Robert M., and Lou J. Magilavy. 1986. “Measuring and Explaining InterviewerEffects in Centralized Telephone Surveys.” Public Opinion Quarterly 50.2: 251-266.
[20] Hamilton, James D. 1989. “A New Approach to the Economic Analysis of NonstationaryTime Series and the Business Cycle,” Econometrica 57: 357-384.
[21] Holbrook, Thomas, 1996. Do Campaigns Matter? Thousand Oaks, CA: Sage Publish-ing.
[22] Jackman, Simon. 2005. “Pooling the Polls Over an Election Campaign.” AustralianJournal of Political Science 40:499-517.
[23] Kish, Leslie. 1962. “Studies of Interviewer Variance for Attitudinal Variables.” Journalof the American Statistical Association 57.297: 92-115.
[24] Ladd, E.C. 1996. “The Election Polls: An American Waterloo.” Chronicle of HigherEducation, Nov. 22, A52.
[25] Lau, Richard R. 1994. “An Analysis of The Accuracy of ‘Trial Heat’ Polls During the1992 Presidential Election.” Public Opinion Quarterly 58:2-20.
[26] Linzer, Drew. 2012. “Pollsters May Be Herding.” Votamatic, Nov 5. Retrieved fromhttp://votamatic.org/pollsters-may-be-herding/
[27] Linzer, Drew A. 2013. “Dynamic Bayesian Forecasting of Presidential Elections in theStates.” Journal of the American Statistical Association 108.501: 124-134.
[28] Marcellino, Massimiliano, James H. Stock, and Mark W. Watson. 2003. “MacroeconomicForecasting in the Euro Area: Country-specific versus Euro-wide Information,” EuropeanEconomic Review 47.1: pp. 1-18.
25
[29] Martin, Elizabeth A., Michael W. Traugott, and Courtney Kennedy. 2005. “A Reviewand Proposal a New Measure of Poll Accuracy.” Public Opinion Quarterly 69:342-369.
[30] Mitofsky, Warren J. 1998. “Review: Was 1996 a Worse Year for Polls Than 1948?”Public Opinion Quarterly 62:230-249.
[31] Moench, Emanuel, and Serena Ng. 2011. “A Hierarchical Factor Analysis of US HousingMarket Dynamics.” The Econometrics Journal 14.1: C1-C24.
[32] Moore, David W. and Lydia Saad. 1997. “The Generic Ballot in Midterm CongressionalElections: Its Accuracy and Relationship To House Seats.” Public Opinion Quarterly61:603-614.
[33] Mosteller, Frederick, Herbert Hyman, Philip J. McCarthy, Eli S. Marks, and David B.Truman. 1949. The Pre-Election Polls of 1948: Report to the Committee on Analysis ofPre-Election Polls and Forecasts. New York: The Social Science Research Council.
[34] Newport, Gary. 2012a. “Polling, Likely Voters, and the Law of the Com-mons.” Gallup Inc. Retrieved from http://pollingmatters.gallup.com/2012/11/
polling-likely-voters-and-law-of-commons.html
[35] Newport, Gary. 2012b. “Survey Methods, Complex and Ever Evolving.”Gallup Inc. Retrieved from http://pollingmatters.gallup.com/2012/10/
survey-methods-complex-and-ever-evolving.html
[36] Panagopoulos, Costas. 2009. “Polls and Elections: Preelection Poll Accuracy in the2008 General Elections.” Presidential Studies Quarterly 39:896-907.
[37] Panagopoulos, Costas. 2013. “Poll Accuracy in the 2012 Presidential Election Final Re-port.” Fordham University Department of Political Science. Retrieved from http://www.
fordham.edu/images/academics/graduate_schools/gsas/elections_and_campaign_
/poll_accuracy_2012_presidential_election_updated_1530pm_110712_2.pdf
[38] Pickup, Mark and Richard Johnston. 2008. “Campaign Trial Heats as Election Forecasts:Measurement Error and Bias in 2004 Presidential Campaign Polls.” International Journalof Forecasting 24: 272-284.
[39] Pickup, Mark, J. Scott Matthews, Will Jennings, Robert Ford, and Stephen D. Fisher.2011. “Why Did the Polls Overestimate Liberal Democrat Support? Sources of PollingError in the 2010 British General Election.” Journal of Elections, Public Opinion andParties 21.2: 179-209.
[40] Real Clear Politics. 2012. “RealClearPolitics Poll Averages.” Retrieved from http://
www.realclearpolitics.com/polls/
[41] Rothschild, David and Justin Wolfers. 2012. “Forecasting Elections: Voter Intentionsversus Expectations.” Working Paper, November 1.
26
[42] Silver, Nate. 2012a. “Which Polls Fared Best (and Worst) in the2012 Presidential Race.” The New York Times ’ FiveThirtyEight. Re-trieved from http://fivethirtyeight.blogs.nytimes.com/2012/11/10/
which-polls-fared-best-and-worst-in-the-2012-presidential-race/
[43] Silver, Nate. 2012b. “Gallup vs. the World.” The New York Times ’ FiveThir-tyEight. Retrieved from http://fivethirtyeight.blogs.nytimes.com/2012/10/18/
gallup-vs-the-world/
[44] Silver, Nate. 2012c. “Last 10 Presidential Elections Show No Consistent Bias in Polls.”The New York Times ’ FiveThirtyEight. Retrieved from http://www.nytimes.com/2012/
10/01/us/last-10-presidential-elections-show-no-consistent-bias-in-polls.
html?pagewanted=all
[45] Silver, Nate. 2012c. “Measuring a Convention Bounce.” The New York Times ’ FiveThir-tyEight. Retrieved from http://fivethirtyeight.blogs.nytimes.com/2012/08/29/
measuring-a-convention-bounce/
[46] Stock, James and Mark Watson. 1988. “Testing for Common Trends.” Journal of theAmerican Statistical Association 83(404):1097-1107.
[47] Stock, James and Mark Watson. 2002. “Macroeconomic forecasting using diffusion in-dexes.” Journal of Business and Economic Statistics 20:147-162.
[48] Stock, James H. and Watson, Mark W. 2010. “The Evolution of National and RegionalFactors in U.S. Housing Construction,” in Volatility and Time series Econometrics: Essaysin Honor of Robert F. Engle, eds. Tim Bollerslev, Jefferey Russell and Mark Watson,Oxford: Oxford University Press.
[49] Stock, James and Mark Watson. 2011. “Dynamic Factor Models.” In Oxford Handbook ofEconomic Forecasting, Michael P. Clements and David F. Hendry (eds), Oxford UniversityPress.
[50] Traugott, Michael W. 2001. “Assessing Poll Performance in the 2000 Campaign.” PublicOpinion Quarterly 65:389-419.
[51] Traugott, Michael W. 2005. “The Accuracy of The National Preelection polls in the2004 Presidential Election.” Public Opinion Quarterly 69:642-654.
[52] Traugott, Michael W. and Christopher Wlezien. 2009 “The Dynamics of Poll Per-formance during the 2008 Presidential Nomination Contest.” Public Opinion Quarterly73:866-94.
[53] Traugott, Michael W. 2011. “The Accuracy of Opinion Polling and Its Relation to ItsFuture.” in Robert Y. Shapiro and Lawrence R. Jacobs (eds), The Oxford Handbook ofAmerican Public Opinion and the Media. New York: Oxford University Press.
27
[54] Villanueva, Elmer V. 2001. “The Validity of Self-Reported Weight in US Adults: APopulation Based Cross-Sectional Study.” BMC Public Health 1.1: 11.
[55] Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman. 2013. “ForecastingElections with Non-Representative Polls.” International Journal of Forecasting, forthcom-ing.
[56] Weisberg, Herbert F. 2005. The Total Survey Error Approach: A Guide to the NewScience of Survey Research. Chicago: University of Chicago Press.
[57] Wlezien, Christopher. 2003. “Presidential Election Polls in 2000: A Study in Dynamics.”Presidential Studies Quarterly 33:172-186.
[58] Wlezien, Christopher and Robert S. Erikson. 2002. “The Timeline of Presidential Cam-paign.” Journal of Politics 64: 969-993.
[59] Wlezien, Christopher and Robert S. Erikson. 2007. “The Horse Race: What Polls Revealas the Election Campaign Unfolds.” International Journal of Political Opinion Research19: 74-88.
[60] Zaller, John. 2002. “The Statistical Power of Election Studies to Detect Media ExposureEffects in Political Campaigns.” Electoral Studies 21.2: 297-329.
28
Table 1: The Performance of Major Polling Organizations in 2012
Average Abs. Inter-Number Sample Std. Error Bias in view
Polling Firm of Polls Size Bias Dev. in A MSE MSE ModeABC/Post 11 1240 0.069 0.043 0.071 0.0066 72% 1
ARG 8 1150 0.084 0.007 0.084 0.0071 99% 1DailyKos/SEIU/PPP (D) 9 1240 0.063 0.014 0.063 0.0060 66% 2
Gallup 43 2900 0.082 0.075 0.087 0.0123 54% 1IBD/TIPP 19 880 0.039 0.052 0.048 0.0042 36% 1
Ipsos/Reuters 43 1400 0.035 0.023 0.036 0.0018 70% 3Politico/GWU/Battleground 7 1000 0.071 0.047 0.071 0.0073 70% 1
Rand 43 1000 0.003 0.037 0.029 0.0014 1% 3Rasmussen 43 1500 0.105 0.029 0.105 0.0119 93% 2
UPI/CVOTER 7 1320 0.08 0.033 0.080 0.0075 85% 1YouGov/Economist 10 800 0.03 0.022 0.030 0.0014 65% 3
Note: Average Sample Size is rounded to the nearest 10 and interview codes correspond to1. Phone with interviewer, 2. IVT (robo-call), 3. Internet
29
Tab
le2:
Pre
-Ele
ctio
nP
oll
Acc
ura
cyA
cros
sA
llF
inal
Pol
lsPollin
gFirm
Fin
alPoll
Date
sSample
Size
Obama
Romney
O-R
Error
Abs.
Error
AAbs.
ARank
Population
Inte
rview
Mode
Cell
Phone
ABC/Post
11/1-1
1/4
563
50
47
3-0
.90.9
0.018
0.018
5LV/RV
Phone
Y
Angus-Reid
11/1-1
1/3
590
51
48
3-0
.90.9
0.019
0.019
6LV/RV
Phone
Y
AP-G
fK10/19-1
0/23
740
45
47
-2-5
.95.9
0.123
0.123
28
LV/RV
Phone
Y
ARG
11/2-1
1/4
1475
49
49
0-3
.93.9
0.079
0.079
17
LV
Phone
Y
CBS/NYT
10/25-1
0/28
4725
48
47
1-2
.92.9
0.058
0.058
13
LV/RV
Phone
Y
Claru
s10/4-1
0/4
800
46
47
-1-4
.94.9
0.101
0.101
27
LV/RV
Phone
Y
CNN
11/2-1
1/4
1000
49
49
0-3
.93.9
0.079
0.079
17
LV/RV
Phone
Y
DailyKos/
SEIU
/PPP
(D)
11/1-1
1/4
1128
50
48
2-1
.91.9
0.039
0.039
10
LV/RV
Auto
mate
dN
Democra
cy
Corp
s(D
)11/1-1
1/4
2345
49
45
40.1
0.1
-0.006
0.006
1LV
Phone
Y
FOX
10/28-1
0/30
713
46
46
0-3
.93.9
0.079
0.079
16
LV/RV
Phone
Y
Gallup
11/1-1
1/4
1500
48
49
-1-4
.94.9
0.1
0.1
24
LV/RV
Phone
Y
Gra
vis
Mark
eting
11/3-1
1/5
2709
48
48
0-3
.93.9
0.079
0.079
17
LV
Auto
mate
dN/A
IBD/TIP
P11/3-1
1/5
1417
50
49
1-2
.92.9
0.059
0.059
15
LV
Phone
Y
Ipso
s/Reute
rs11/1-1
1/5
1300
48
46
2-1
.91.9
0.037
0.037
8RV
Phone
N/A
JZ
Analytics/
Newsm
ax
11/3-1
1/5
1000
47
47
0-3
.93.9
0.079
0.079
17
LV
Inte
rnet
N/A
Monmouth
11/1-1
1/4
1200
48
48
0-3
.93.9
0.079
0.079
17
LV
Auto
mate
dY
NationalJourn
al
10/25-1
0/28
693
50
45
50.1
0.1
-0.026
0.026
7LV
Phone
Y
NBC/W
SJ
11/1-1
1/3
712
48
47
1-2
.92.9
0.058
0.058
13
LV/RV/A
Phone
Y
NPR
10/23-1
0/25
3000
47
48
-1-4
.94.9
0.1
0.1
26
LV
Phone
Y
Pew
10/31-1
1/3
839
48
45
3-0
.90.9
0.015
0.015
3LV/RV
Phone
Y
Politico/GW
U/Battlegro
und
11/4-1
1/5
1000
47
47
0-3
.93.9
0.079
0.079
17
LV
Phone
Y
PPP
(D)
11/2-1
1/4
2551
50
48
2-1
.91.9
0.039
0.039
10
LV/RV
Auto
mate
dN
Purp
leStrate
gies
10/31-1
1/1
872
47
46
1-2
.92.9
0.058
0.058
12
LV
Auto
mate
dY
Rand
10/30-1
1/5
1019
49.5
46.2
3.3
-0.6
0.6
0.01
0.01
2LV/RV
Inte
rnet
N/A
Rasm
ussen
11/3-1
1/5
1023
48
49
-1-4
.94.9
0.1
0.1
24
LV
Auto
mate
dN
UConn/Hartford
Coura
nt
10/11-1
0/16
1200
48
45
3-0
.90.9
0.015
0.015
3LV
Phone
Y
UPI/
CVOTER
11/3-1
1/5
1041
49
48
1-2
.92.9
0.059
0.059
14
LV
Phone
N/A
Wash
.Tim
es/
JZ
Analytics
10/29-1
0/31
N/A
49
49
0-3
.93.9
0.079
0.079
17
LV
Phone
N/A
YouGov/Economist
11/3-1
1/5
1080
49
47
2-1
.91.9
0.038
0.038
9LV/RV
Inte
rnet
N/A
30
Table 3: Absolute Errors and Sample Size, Closeness to ElectionModel 1 Model 2 Model 3
Sample Size .027 (.011)** – .023(.011)**Closeness to Election – -.459(.155)*** .139(.192)
Constant .037(.014)*** .077(.006)*** .033(.014)**Adjusted R2 0.05 0.03 .05
N 150 243 150
Robust regression results. Coefficients for Sample Size and Closeness to Election aremultiplied with 1,000 for easy reading. ** is significant at the p= .05 level. *** is
significant at the p=.01 level.
Figure 1. Poll Dynamics
Vot
e S
hare
: Rom
ney/
Oba
ma
09/0
1
09/0
2
09/0
4
09/0
5
09/0
6
09/0
7
09/0
8
09/1
0
09/1
1
09/1
3
09/1
7
09/1
8
09/1
9
09/2
0
09/2
4
09/2
5
09/2
6
09/2
7
09/2
8
10/0
1
10/0
2
10/0
5
10/0
6
10/0
7
10/0
8
10/1
0
10/1
1
10/1
2
10/1
4
10/1
5
10/1
6
10/1
7
10/1
8
10/1
9
10/2
0
10/2
2
10/2
3
10/2
4
10/2
5
10/2
6
10/2
7
10/2
8
11/0
4
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
Ipsos
Rasmussen
Rand
IBD
ABC
ARG
PPP
Politico
UPI
YouGov
Gallup
Actual Support
31
Figure 2. Poll Accuracy
Acc
urac
y m
easu
re: A
09/0
1
09/0
2
09/0
4
09/0
5
09/0
6
09/0
7
09/0
8
09/1
0
09/1
1
09/1
3
09/1
7
09/1
8
09/1
9
09/2
0
09/2
4
09/2
5
09/2
6
09/2
7
09/2
8
10/0
1
10/0
2
10/0
5
10/0
6
10/0
7
10/0
8
10/1
0
10/1
1
10/1
2
10/1
4
10/1
5
10/1
6
10/1
7
10/1
8
10/1
9
10/2
0
10/2
2
10/2
3
10/2
4
10/2
5
10/2
6
10/2
7
10/2
8
11/0
4
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Gallup
Ipsos
Rasmussen
Rand
IBD
ABC
ARG
PPP
Politico
UPI
YouGov
32
Figure 3. Poll Herding
Vot
e S
hare
: Rom
ney/
Oba
ma
09/0
1
09/0
2
09/0
4
09/0
5
09/0
6
09/0
7
09/0
8
09/1
0
09/1
1
09/1
3
09/1
7
09/1
8
09/1
9
09/2
0
09/2
4
09/2
5
09/2
6
09/2
7
09/2
8
10/0
1
10/0
2
10/0
5
10/0
6
10/0
7
10/0
8
10/1
0
10/1
1
10/1
2
10/1
4
10/1
5
10/1
6
10/1
7
10/1
8
10/1
9
10/2
0
10/2
2
10/2
3
10/2
4
10/2
5
10/2
6
10/2
7
10/2
8
11/0
4
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
Gallup
ABC
PPP
UPI
Actual Support
33