identifying sources of survey error in 2012 presidential ... · pdf filefutures are traded.1...

33
Identifying Sources of Survey Error in 2012 Presidential Election Polls: A Dynamic Factor Analytic Approach * Jee-Kwang Park Assistant Professor Nazarbayev University [email protected] Adam G. Hughes PhD Candidate University of Virginia [email protected] April 3, 2014 Abstract Although election polls provide important information to voters, they are often biased and unreliable. Existing measures of poll accuracy fail to assess accuracy in the weeks prior to the election and fail to detect systematic manipulation by polling firms. Using data from the 2012 U.S. presidential election, we propose a new econometric method (dynamic factor analysis) for estimating true electoral opinion in the months before an election. Our estimate improves upon simple averages, smoothing estimates, and Bayesian models of true support by distinguishing between the kinds of survey errors in each poll. We provide empirical validation of the total survey error framework; we estimate that non-sampling error accounts for the majority of survey error in 2012’s national presidential polls. By comparing individual poll results with our estimate, we create a measure of accuracy over time and rank eleven major polling firms. We show which polling organizations succeeded throughout the campaign and identify which polling practices were most effective. The results of our analysis suggest that internet polls outperformed telephone polls and that larger sample sizes are correlated with inaccuracy. * Prepared for presentation at the Annual Meeting of the Midwest Political Science Association, April 3-6, 2014, Chicago, Illinois. 1

Upload: lamminh

Post on 16-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Identifying Sources of Survey Errorin 2012 Presidential Election Polls:

A Dynamic Factor Analytic Approach∗

Jee-Kwang ParkAssistant Professor

Nazarbayev [email protected]

Adam G. HughesPhD Candidate

University of [email protected]

April 3, 2014

Abstract

Although election polls provide important information to voters, they are often biased andunreliable. Existing measures of poll accuracy fail to assess accuracy in the weeks priorto the election and fail to detect systematic manipulation by polling firms. Using datafrom the 2012 U.S. presidential election, we propose a new econometric method (dynamicfactor analysis) for estimating true electoral opinion in the months before an election. Ourestimate improves upon simple averages, smoothing estimates, and Bayesian models of truesupport by distinguishing between the kinds of survey errors in each poll. We provideempirical validation of the total survey error framework; we estimate that non-samplingerror accounts for the majority of survey error in 2012’s national presidential polls. Bycomparing individual poll results with our estimate, we create a measure of accuracy overtime and rank eleven major polling firms. We show which polling organizations succeededthroughout the campaign and identify which polling practices were most effective. Theresults of our analysis suggest that internet polls outperformed telephone polls and thatlarger sample sizes are correlated with inaccuracy.

∗Prepared for presentation at the Annual Meeting of the Midwest Political Science Association, April 3-6,2014, Chicago, Illinois.

1

1 Introduction

How accurate are election polls? The 2012 U.S. presidential election triggered

another heated debate about this old question when final polls from respected polling orga-

nizations proved to be surprisingly inaccurate. Barack Obama won the election by a margin

of 3.8%. However, major polling organizations failed to predict Obama’s easy victory: al-

though ABC/WP and Pew favored Obama by a 3% margin, other pollsters predicted a tied

race, or even victory for Mitt Romney. Indeed, quite a few final polls were further from the

mark than their so-called margin of error would suggest. Moreover, these polls systemat-

ically overestimated public support for Romney by 1% to 5%, which strongly suggests an

industry-wide polling bias in favor of Romney. Despite pollsters’ attempts to draw random

and unbiased samples, the 2012 election polls turned out to be less accurate and more biased

than those in earlier elections.

Polls issued in the final weeks before the 2012 election actually took place were particu-

larly inaccurate, suggesting manipulation and industry-wide herding behavior. In the month

of October 2012, a majority of polls consistently reported that Romney held a substantial

lead or that the race was tied. For example, Gallup reported that Romney would beat

Obama by between 0 and 7 points throughout October. According to these polls, Romney

never trailed Obama throughout the entire month. Gallup wasn’t alone in reporting favor-

able polling results for Romney: Rasmussen’s daily poll consistently reported that Romney

led by 0% to 4%, while ARG reported a 0% to 2% Republican lead. Almost all other polling

organizations (except Rand) described the race in October as neck and neck if not favoring

Romney.

Was Romney ahead in October as much as these polls indicated? Gallup, Rasmussen,

and ARG’s results contrast sharply with betting in the Iowa election market, where election

2

futures are traded.1 During October 2012, Obama never trailed Romney in the Iowa election

markets: he maintained a substantial lead. The prediction market Intrade also suggested that

Obama would win; in the final month of betting, the chance of Obama’s reelection was always

higher than 50%. At the same time, several October polls asked respondents: “Regardless

of who you might support, who do you think is most likely to win the presidential election?”

The results from this question, a survey item that is consistently better at predicting election

outcomes than personal vote preference questions (Rothschild and Wolfers 2012), reliably

showed Obama winning by a margin ranging from 18% (Pew) to 30% (AP-GfK). Overall,

it seems that these less traditional predictions directly contradicted the results reported by

well known pollsters.

However, Romney’s fortunes changed dramatically in many final election polls. Despite

the fact that there were no major political scandals, events, or new revelations about the

candidates in the final week of the election, several major polling organizations reversed

course and predicted a tied race or even victory for Obama. For example, Gallup consistently

reported that Romney held a 5 - 7 point lead over Obama until its second-to-last poll. Then

Romney’s lead suddenly dwindled to a single point in the final poll. Gallup’s explanation is

that the super-storm Sandy boosted support for the incumbent president (Newport 2012a).

But this explanation should invite skepticism: comparable four-point drops in support for

Romney did not occur among other published polls.2 Some political commentators suspect

that Gallup changed its likely voter weighting methodology so that its results more closely

resembled those of other polls (Silver 2012b, Blumenthal 2013).

1The IEM 2012 U.S. Presidential Election Markets are real-money futures markets where contract payoffsare determined by the popular vote cast in the 2012 U.S. Presidential Election. There are currently twomarkets: Pres12 VS – based on vote shares won by the two major party candidates in the 2012 U.S.Presidential election and Pres12 WTA – a winner take all market based on the popular vote plurality winnerof the 2012 U.S. Presidential election.

2In fact, Hurricane Sandy should decrease the incumbent president’s popularity. Katrina was a blow toBush’s approval rating in 2005, and research points out that natural disasters inflict a significant decreasein support for the incumbent (Achen and Bartels 2012).

3

The unjustifiable drop in support for Romney does not seem to reflect real changes in

public preferences. Rather, it appears to reflect Gallup’s strategic decision to avoid making

a prediction far from what other sources indicated. In the face of mounting evidence from

expectation-based polls, election prediction markets, and polling experts’ criticism (Blumen-

thal 2012, Silver 2012b), Gallup likely realized that it was overestimating public support for

Romney and then changed its polling methodology to reduce Romney’s lead. Several other

polling organizations likely followed suit. Since final presidential pre-election polls draw close

attention from the mass media and a wrongheaded prediction can damage a polling orga-

nization’s reputation, polling firms have strong incentives to adjust their final poll in ways

that reflect the consensus forecast. This kind of behavior often results in polls moving closer

to each other. And indeed, polls appear to have herded closer together in the last moments

of campaign period. Careful observers of the 2012 pre-election polls (Silver 2012b, Linzer

2012) reported on this phenomenon.

These events suggest that the accuracy of pre-election polls may be worse than we might

expect. Since retrospective evaluations of pollster accuracy are often based on manipulated

final polls, overall evaluations based solely on predictive accuracy (e.g. Panagopolous 2013)

could understate inaccuracy in the weeks before an election. Accordingly, commentators like

Nate Silver emphasize the importance of non-final polls in assessing poll accuracy (Silver

2012a). So the question we seek to answer is how to best measure accuracy across an entire

campaign.

In this paper, we propose a novel way to measure the ‘actual’ survey error of pre-election

polls. To overcome the aforementioned problems, we include not just final polls but also

other polls conducted in the months before an election in our analysis. To measure the

survey errors of non-final polls, we need an estimate of the underlying true support levels for

both candidates before the election, which are not known – unlike actual election results. We

resolve this challenge by adopting an advanced econometric model, dynamic factor analysis.

4

The dynamic factor model (DFM) estimates latent true values from correlated time series

with Kalman filtering in tandem with MLE or MCMC. Since a poll is composed of both the

true opinion value and survey error, different polling series from polling organizations are

very highly correlated with each other. On the other hand, individual polling series are also

serially correlated. By taking advantage of this double correlation in the pre-election polling

series, dynamic factor analysis models can quite precisely estimate the underlying trend in

support levels from multiple polling series. Although dynamic factor models have not been

applied in political science, they are widely used in economics to study business cycles, price

changes, the housing market and other topics.3

Our DFM-based estimate of polling accuracy shows that 2012 pre-election polls moved

closer to each other about one week before the election, which suggests that herding occurred

around that time (Figure 1). Thus, it can be misleading to judge a polling organization’s

performance solely by its predictive accuracy. Our results also provide strong empirical

support for the total survey error framework, which describes the error properties of sample

survey statistics as incorporating a variety of error sources beyond sampling error (Groves

and Lyberg 2010; Weisberg 2005). Although the total survey error framework is conceptually

convincing, it suffers from a lack of empirical studies buttressing its theoretical arguments,

especially the claim that non-sampling errors are the largest part of total survey error. This

gap is the result of practical difficulty estimating the size of non-sampling error.4 To the best

of our knowledge, there are few empirical studies that attempt to measure total survey error,

and existing research is experimental, based on company-wide polls (Groves and Magilavy

1984; Assael and Keon 1982). We believe that our study is the first empirical research

to estimate the size of total survey error for national polls conducted by major polling

3DFM can be used in both STATA and E-Views without installing additional software.4Several studies measure the size of a specific type of non-sampling error, including nonresponse error

(Brehm 1993; Berinsky 2004), interviewer effects (Kish 1962, Groves and Magilavy 1986), and interviewmode effect (Beland and St-Pierre 2008, Villanueva 2001).

5

organizations. Our results confirm the findings of experimental studies in the total survey

error literature: the size of non-sampling error far exceeds that of sampling error. More

concretely, we find that sampling error usually accounts for only 20 to 30% of total survey

error for most 2012 pre-election polls. Furthermore, a large proportion of non-sampling

error is attributable to the bias of survey errors, not to variance, as Groves and Magilavy

(1984) find in an experimental context. Our results show that early findings from total

survey error tradition remain valid even for national polls. To achieve more accurate poll

estimates, therefore, the most important methodological goals should be to reduce non-

sampling errors, especially bias, instead of increasing sample size in an effort to decrease

sampling error. Gallup’s 3000 person sample size did not help it predict the 2012 election

results more accurately than other firms with much smaller samples.

2 Survey Errors and Pre-Election Polls

Several studies examine the accuracy of pre-election polls, many by comparing final polls

with actual election results (Crespi 1988; Erikson and Sigelman 1995; Moore and Saad

1997; Panagopoulos 2009; Martin, Traugott, and Kennedy 2005; Traugott 2001; Traugott

and Wlezien 2009). However, since this comparison involves only final polls, it fails to

assess the degree to which polling firms manipulated final results. Herding behavior is one

form of manipulation, in which pollsters change poll results to approximate those reported

by other firms. We also suspect that polling organizations engage in result calibration:

the process of adjusting poll results to take into account information from other sources,

including poll aggregators, journalists, and betting markets. Additionally, the comparison

between final polls and election results involves only one poll; it is not a reliable measure

of polling accuracy even in the absence of herding or calibration behavior. With just one

poll, we cannot distinguish between an accurate poll with low survey error and one that was

6

simply lucky.

For these reasons, scholars have tried to use non-final polls as well as final polls in

accuracy assessments. Some researchers aggregate polls conducted during the last several

months before the election and then compare the firm-wide average of the polls with the

actual results on Election Day (Lau 1994; DeSart and Holbrook 2003; Traugott 2001). We

argue that while this measure does capture more information about each polling firm, it

is conceptually more problematic: a poll conducted a few weeks before the election is not

supposed to represent public support on election day but rather, public support on that

particular polling day. A poll conducted a few weeks before election should differ from the

actual election results and it ought to be compared with the actual latent public support on

that particular day.

In order to use non-final polls to assess poll accuracy, it is necessary to first estimate the

true level of public support for candidates in the weeks before the election. Several methods

have been proposed to estimate underlying true values. The first group of studies, which

use what is often called a “poll of polls,” adopts a straightforward approach to estimating

underlying true support. Insisting that a much larger sample size will provide a more precise

estimate of the underlying true value, those studies calculate an average level of support

using all polls conducted in the same time period and use the new average as an estimate of

the underlying true values. For example, Lau (1994) classifies the last month of polling in the

1992 presidential election into four sub-groups according to their polling week. Excluding

the poll whose accuracy is being judged, he calculates the average of other polls, weighted by

sample size, in the same week and uses the average as an estimate of the latent true support

levels for the week. Poll accuracy is measured by how close a poll is to the average of other

polls.5 The more distant a poll is from the average, the less accurate it is considered to be.

5The poll whose accuracy is judged is usually excluded. If we include the poll in estimating the average,the poll will be of part of the “reality” against which that poll is to be judged, which may lead to a dependencyproblem.

7

This is how Real Clear Politics calculates its RCP average.

We argue that this method is ineffective. Most importantly, there is no theoretical justi-

fication for why the poll of polls should be considered an unbiased and consistent estimate of

underlying true candidate support. Practically, when individual polls are collectively biased

(as they were in 2012), the poll of polls will be biased. In those cases, a poll that is close to

the true value but distant from many other, biased polls would actually be judged as more

inaccurate. And indeed, there is consistent evidence for polling industry bias: the 1996 pre-

election polls broadly overestimated public support for Bill Clinton (Ladd 1996; Mitofsky

1998; Silver 2012c). In the 2000 election, out of nineteen final pre-election poll estimates,

fourteen predicted victory for Bush, three a tie, and only two victory for Gore (Traugott

2001). 17 out of 23 final polls in the 2008 presidential election over-estimated Obama’s lead

while only 3 final polls overestimated McCain’s support (Panagopoulos 2009). In the 2008

New Hampshire Democratic primary, every poll wrongly predicted Barak Obama’s victory

over Hillary Clinton by 1 to 13 percentage points: Hillary Clinton won the actual vote by

3% (Traugott and Wlezien 2009).6

At the same time, the poll of polls method implicitly assumes that survey error can be

reduced to sampling error; accordingly, a larger sample size will provide a more accurate es-

timate of underlying true support. However, there are a variety of important non-sampling

errors that do not decrease as sample sizes increase. If non-sampling error is larger than sam-

pling error, any gains from the increased sample size in an aggregated poll will be marginal.

The total survey error framework emphasizes that non-sampling errors are larger and po-

tentially more problematic than sampling errors, and our analysis shows that sampling error

accounts for just 20 to 30% of total survey error in many 2012 pre-election polls. Empirical

6This wrong prediction prompted AAPOR to appoint a committee to review the performance of the polls.Traugott and Wlezien (2009) show that the problems of New Hampshire were not unique; the pre-electionpolls as a group generally underestimated the winner’s share of the voter for the two leading candidates inthe week leading up to each election.

8

studies of pre-election polls also consistently find that sample size is not significantly related

to predictive accuracy (Pickup et al. 2011, Arzheimer and Evans 2014). The 2012 election

is no exception. For example, Gallup’s sample sizes for pre-election polls in 2012 were two

or three times larger than those of other pollsters, but its performance was the worst in

terms of predictive accuracy (Panagopoulus 2013). Our analysis also shows that in the 2012

pre-election polls, sample size is negatively related to predictive accuracy; the larger sample

size is, the less accurate a poll is (Table 3). Thus, it is very unlikely that a poll of polls will

be closer to underlying true levels of support than individual polls.

Smoothing methods provide a similar but more advanced strategy for estimating under-

lying true opinion from non-final polls. Since a smoothed estimate is a “weighted”’ average

of adjacent polls, however, this smoothing method suffers from the same problems that we

identified for the poll of polls method. Furthermore, the smoothing method gives less weight

to outliers since it assumes that outlier polls are poorly conducted and that an estimate

would be closer to the underlying true values without them. When there is polling industry

bias (as in the 2012 pre-election polls), however, outliers may be the most correct polls. Thus,

a purported advantage of this method, robustness to outliers, is in fact a critical weakness

in the presence of polling industry bias.

Neither a poll of polls or poll smoothing is the most effective way to estimate underlying

true opinion because each lacks theoretical justification, ignores substantive non-sampling

error, and does not adjust for the prevalence of polling industry bias in pre-election polls.7

The most advanced existing approach to measuring pre-election poll accuracy is to pool

polls over an entire electoral campaign and apply Kalman filtering (Jackman 2005, Pickup

and Johnson 2008). Unlike other smoothing methods, Kalman filtering is theoretically proven

7Polling industry bias is not a phenomenon confined to the US; it is frequently reported in other countries’presidential elections. For example, South Korean polling companies were collectively biased in favor of theliberal candidate in the 2012 presidential election, while Egyptian polls showed bias against the Islamistcandidate, Morsi.

9

to provide an unbiased and consistent estimate of the underlying true value from an obser-

vation with measurement error. Since pre-election polls are ‘noisy signals’ in sense that the

true values are distributed with survey errors, the pooled estimate performs well. When

Jackman’s (2005) approach is adopted, house effects are determined ex post, and as a result,

estimated true opinion may be robust to polling industry bias. However, this application

of Kalman filter requires a priori knowledge of measurement error. In engineering and the

natural sciences, measurement error is already known, either through lab experiments or

predefined theoretical expectations. In polling, however, survey error is always unknown a

priori. For this reason, Jackman (2005) and other scholars replace the size of total survey

error with that of sampling error, which can be calculated from sample size before estimating

the model. As a result, the success of this application of Kalman filtering is predicated on the

degree to which polling accuracy is determined by sampling error - a problematic assumption.

The total survey error framework identifies many possible sources of error which influence

polling accuracy that are unrelated to sampling error. Futhermore, empirical research on

pre-election polls, especially the 2012 polls (Panagopoulos 2013), consistently shows that the

size of survey error is not related to sample size. In his extensive examination of pre-election

polling accuracy, Crespi (1988) found that sample size has trivial effects on polling accuracy:

“Once basic same size requirement are met, increasing the sample size may make less of a

contribution to poll accuracy than other aspects of poll methodology (64).”

For this reason, we introduce a “multivariate” Kalman filtering method which does not

require a priori knowledge of measurement error: dynamic factor analysis. The dynamic

factor model estimates the measurement error (survey error) from data with MLE or MCMC

rather than assuming that it is given. Although this model is not frequently used in political

science, dynamic factor analysis is an established method that has been widely used in

macroeconomics and finance since the early 1980s. We describe the approach in detail in the

next section.

10

3 Dynamic Factor Model

The dynamic factor model (DFM) is a multivariate time series approach comparable to

vector autoregression (VAR) or error correction models (ECM). However, not all multivari-

ate time series are the subject of dynamic factor modeling. DFM requires two or more time

series that show co-movement over time. DFM also assumes that the observed co-movement

of the time series is driven by an underlying a common factor or factors. However, each

time series is also moved by idiosyncratic disturbances arising from measurement error and

special features specific to each series. The goal of DFM is to estimate the underlying com-

mon factors from multiple correlated time series with disturbances. When the distribution

of idiosyncratic disturbances in the time series behaves reasonably well, we can obtain unbi-

ased and consistent estimates of the underlying factor(s). When idiosyncratic disturbances

approximate a Gaussian distribution, we use Kalman filtering in tandem with MLE to obtain

quite efficient estimates, even though the size of measurement error is not known a priori.

The dynamic factor model can be written as a linear state space model:8

Yt = α + γ(L) ft + et , where et ∼ N(0,σ2e )

ft = Ψ(L) ft−1 + ηt , where ηt ∼ N(0,σ2η)

Where Yt is a vector of observations, α is a vector of deterministic intercepts, ft is a vector

of common factors, and et is a vector of idiosyncratic disturbances. In the application we

develop, a poll estimate Yt is the linear combination of the bias from a polling organization,

α , an underlying true value (factor) at the time t, ft , and a random error at the time t, et .

Poll biases are assumed to be constants and random errors are assumed to have a Gaussian

distribution. Where there are N polling series, Yt , α , and et are N×1 vectors, respectively.

8This is a parametric representation of the dynamic factor model based on Stock and Watson’s notation.Non-parametric and semi-parametric representation, which is often used in principal component analysis,is represented quite differently. Refer to Stock and Watson (2012) for details of different dynamic factormodels and estimation procedures.

11

Where there are q dynamic factors, ft and ηt , in the transition equation there are q× 1

vectors. γ(L) is a vector of factor loadings, L is the lag operator, and the lag polynomial

matrices γ(L) and Ψ(L) are N×q and q×q matrixes, respectively. ηt is another idiosyncratic

disturbance term with a Gaussian distribution.9 The idiosyncratic disturbance term in the

observation equation and in the transition equation are not correlated even between lagged

or lead terms, E(et ,ηt−k) = 0 for all k. Finally, idiosyncratic disturbances in the observation

equation are not correlated with each other, E(eit ,e js) = 0 for all s if i 6= j .

When MLE is used in tandem with the Kalman filter, MLE estimates the parameters of

the space-state model, and the Kalman filter is used to obtain efficient estimates of latent

factors (the “unobserved state”). Unlike the pooling method, DFM does not pool different

polls nor treat them as a univariate time series. Instead, the DFM treats each poll as a time

series and recovers an estimate of underlying true public opinion from the high correlation

between polling series. The DFM is also different from the pooling method in the sense that

it allows the underlying true values (here, factors) to take the form of an AR(p) process and

to be stationary (among many others, Stock and Watson 1999 and 2001) or non-stationary

(Chang, Miller, and Park 2009).

The DFM has been widely used in macroeconomics and finance. For example, Sargent

and Sims (1977) show that two factors can explain a large fraction of fluctuations in im-

portant U.S quarterly macroeconomic indexes. Marcellino, Stock, and Watson (2003) and

Hamilton (1989) used the DFM to estimate the underlying business cycle from macroeco-

nomic indexes. Ng and Moench (2009) and Stock and Watson (2010) apply the DFM to

analysis of regional or state housing markets to see how much of price dynamics of those

housing markets can be explained by the national economy, the common factor.

9The assumption that the two disturbance terms have Gaussian distributions is not necessary. However,the assumption eases the burden of estimation. Furthermore, in the context of polling, this assumptionmakes sense: both sampling and non-sampling error are measurement errors. So, we assume that the twodisturbance terms have Gaussian distributions.

12

We argue that the analysis of a series of polls is a better application of dynamic factor

modeling than are macroeconomic time series. First, each poll has a simple structure, con-

sisting of both true public opinion and survey error. This is an ideal structure for DFM as

it mirrors the model’s structure, including an underlying factor and measurement error. In

most macroeconomic applications, economic variables are rarely combination of a factor and

measurement errors. There are many other specific variables affecting only a certain eco-

nomic time series, which makes estimation more challenging. At the same time, the number

of factors (underlying economic forces) that drive the business cycle, housing prices, or other

macroeconomic fluctuations is often unknown a priori, although we might expect that those

are strongly affected by some common economic forces. Thus, the identification of number of

common factors is an important problem in macroeconomic scholarship (Bai and Ng 2002).

In our polling application, however, we can safely assume only one common factor since an

opinion poll is composed of true public opinion and survey error.10 Furthermore, in macroe-

conomics, the distribution of the idiosyncratic disturbance term is not known a priori, or

even a posteriori in many cases, and a Gaussian distribution may be inappropriate. In fact,

the idiosyncratic disturbance term et often has a non-Gaussian or even non-standard distri-

bution, which makes estimation very challenging, if not practically impossible. In the case

of polls, theoretically the idiosyncratic term should have a Gaussian distribution because it

represents the random error of survey error.

Although our representation of polls in the state-space form may appear to be similar

to models used in existing scholarship (e.g. Jackman 2005, Pickup and 2007), in fact, it is

very different. In the previous studies, random errors (idiosyncratic disturbances), et , are

considered to be purely sampling errors:

poll = bias(house effect, systematic error) + true value + random error (sampling error)

10Survey error is composed of systematic error, which is constant (deterministic), and random errors. Theidiosyncratic disturbance term is equal to random errors.

13

In contrast, we adopt the total survey error (TSE) framework, in which random error is

distinct from sampling error. In our model, the random error et also contains non-sampling

error.

poll = bias(systematic error) + true value + random error (sampling error + non-sampling error)

By writing the model like this, we implicitly reject the view that house effects make

a poll biased but not necessarily more volatile. As the TSE framework suggests, house

effects have an influence on the variance of survey errors. Thus results from some polling

houses should fluctuate more than others, even if they have the same sample size. Polling

houses use different modes of interview, question wording, and weighting procedures. The

differences between these non-sampling error sources surely influence both the variance of

survey errors and poll biases. For example, Biemer (2010) shows that measurement error

inflates the random error of a survey.11 Thus, the TSE framework assumes that polling

houses will experience different random errors, due to their unique methods for adjusting to

non-sampling errors – even if they have exactly same sample size. This is one of the reasons

why we object the pooling method, which assumes that random error is solely determined

by sampling size, and favor DFM, which directly estimates the size of random error.

4 Data and Analysis

4.1 Data Sources and Accuracy Measure A

Twenty-nine organizations and teams released final polls that were conducted during

the last week of the 2012 presidential campaign, while more than ninety conducted at least

one pre-election poll for the presidential race. Not all of these polls are the subject of

our analysis, however: we include only polls that meet several criteria. First we analyze

11Biemer (2010) identifies five different non-sampling errors which theoretically affect the variance of surveyerror.

14

polls from reputable, well-known polling organizations (Silver 2012a). We focus on polls

conducted during the last two months of the campaign period, when polls are conducted on

a regular basis. We also restrict our analysis to organizations that conducted at least seven

polls in September, October, and November 2012. This standard is somewhat arbitrary, but

we believe that seven polls is the minimum number necessary to capture the distribution of

survey errors for each firm. In total, eleven polling firms meet these standards: ABC/WP,

ARG, DailyKos/SEIU/PPP, Gallup, GWU/Politico Battleground, IBD/TIPP, Ipsos, Rand,

Rasmussen, UPI/CVOTER, and YouGov/Economist.

The simplest way to evaluate the survey errors of final polls is to compare the difference

in poll estimates with that of actual election results. However, the process of comparing per-

centage point differences is problematic; polls include various numbers of undecided voters

(Martin, Traugott, Kennedy 2005). To address that problem, we use Martin, Traugott, and

Kennedy’s predictive accuracy measure, A.

A = ln

(PolledSupport f orRomneyPolledSupport f orObamaActualSupport f orRomneyActualSupport f orObama

)This measure first calculates poll odds: the ratio of two candidates’ estimated support in

the final poll. A value of 1 indicates a tie between Romney and Obama, values greater than

1 suggest Romney leading in the poll, and values smaller than 1 indicate a lead for Obama.

Then the poll odds are divided by the actual odds (vote share), generating the odds ratio.

Lastly, the odds ratio is logarithmized to produce the measure A. Positive values of the

measure A indicate polling bias for the Republican candidate and negative values show bias

in favor of the Democrat candidate. Although A is designed to assess predictive accuracy by

comparing final polls and actual results, it can also be used for non-final polls when we have

estimates of actual values.12

12Arzheimer and Evans (2013) adopt the predictive accuracy measure A and apply it to multi-candidate

15

Figure 1 shows the poll odds: Romney’s support divided by Obama’s support. The eleven

polls are represented with different line types and the DFM estimate of the underlying actual

ratio appears as a bold line.

Figure 1 about here

The figure shows that Romney’s support level decreased sharply in the aftermath of

the Democratic National Convention, which reconfirms existing evidence of a ‘convention

bounce’ (Gelman and King 1993; Holbrook 1996; Zaller 2002; Silver 2012d). Romney gained

popularity around October 5th, presumably as a result of his performance in the first pres-

idential debate on October 3rd. The majority of the polls, especially phone-based surveys,

indicate that Romney held a lead over Obama at least until late October. However, the

DFM estimate shows that Obama never trailed by Romney at any time in October. Thus,

most polling organizations (excluding internet polls) overestimated support for Romney.

4.2 Poll Performance: Does Mode Matter?

With DFM-estimated actual values, we calculate the measure A for non-final polls (Fig-

ure 2) to see how well polls tracked variation in candidate support throughout the campaign

period. Poll performance can be measured two different ways. The simpler approach com-

pares the average absolute difference of the measure A for each poll with the estimated

actual values. These values are reported in the fifth column of Table 1. We find that Rand

and YouGov/Economist fared relatively well and Rasmussen and Gallup did very poorly in

tracking public opinion. When bias exists in a poll, however, scholars in the TSE tradition

recommend mean squared error (MSE) as an alternative accuracy measure. MSE consists

of squared bias plus variance. When we use MSE, Gallup performs worst, closely followed

by Rasmussen. On the other hand, Rand and YouGov/Economist are tied for the top spot.

Both measures indicate that the Rand and YouGov/Economist polls are the best performers

election.

16

overall while Rasmussen and Gallup are the worst.

Figure 2 about here

On the whole, internet surveys fared very well in the 2012 presidential election: the

three best performing polls, Rand, YouGov/Economist, and Ipsos/Reuters each draw online

samples. The average absolute difference of the three internet polls is 0.032, which is much

smaller than 0.074, the average for the six interviewer-assisted phone surveys and 0.084 for

the two IVR polls. This finding contrasts sharply with Panagopoulos’ (2009) analysis of poll

accuracy in the 2008 presidential election, in which IVR polls fared best and internet polls

worst. In an analysis of the 2004 presidential election, Traugott (2005, p. 645) shows that

Harris Interactive’s online poll was quite inaccurate, especially when compared to a phone

survey from the same polling organization in the same period. Our contradictory results for

2012 should invite further research on internet polls: the question of whether or not they

will continue to outperform phone surveys in the future remains unresolved. The success of

internet polls in 2012 should also renew interest in non-probability polling, which was widely

condemned in the aftermath of the 1936 election’s Literary Digest poll and has fallen into

disfavor in much of the traditional polling world. New research (Wang, Rothschild, Goel, and

Gelman 2013) shows that non-probability polls based on samples drawn from X-Box video

gamers can be as accurate as traditional phone-based surveys with the joint use of multi-level

modeling and careful post-stratification. In the era of extremely high non-response rates and

increasing numbers of young people without landline phones, the probability poll may not

be as random as it should be. However, we refrain from definitive conclusions about the

success of internet polling and its future.

4.3 Determinants of Survey Error

The fact that the final polls in 2012 were biased in favor of Romney has become common

knowledge (Panagopoulos 2013; Silver 2012a; Blumenthal 2013). We find that non-final

17

polls are also biased in favor of Romney. Table 1 shows that almost all of the 243 polls

in our sample have positive values for A. That is, Romney’s popularity was consistently

overestimated for the last two months of the campaign period. We find that non-final polls

were slightly more biased than final ones: the average size of bias for the eleven polling

organizations we examine is 0.064 in terms of A. Table 2 shows that the average size of bias

for the twenty nine final polls in Table 2 is 0.057 and, for our subsample of nine, 0.056.

More importantly, we find that the size of a poll’s bias is noticeably larger than its

standard deviation, excluding IBD/TIPP and Rand (See Table 1). On average, bias consists

of the 65% of MSE of the eleven polls. Thus, we confirm the expectation, from the total

survey error approach, that house bias has a larger effect on poll accuracy than sampling

error across nationwide polls.

Table 1 about here

We also believe that even the remaining 35% of MSE which comes from variance is not

significantly determined by sampling error. This is because the actual standard deviation

does not decrease with larger sample sizes (Table 1). Rather, in our example, sample size

has a positive correlation of 0.58 with the actual standard deviation. That is, a poll with a

large sample size is more likely to have larger variance. Thus, the variance of polls seems to

be largely dominated by non-sampling errors, just as TSE research suggests.

Accordingly, polling accuracy or the precision of poll is not related to sample size, casting

doubt on approaches that use a poll of polls or the pooling method. We observe a link between

larger sample size and decreased accuracy. The correlation between the average sample size

of polling organization and its absolute difference in the measure A in Table 1 is 0.5 (N=11).

The more respondents a polling organization has, the larger its standard deviation. Even for

our 243 individual polls, sample size is still positively related to polling inaccuracy (Table

3).

To put this finding into context, consider evidence from the 2012 CCES. That study,

18

which was conducted by YouGov (separately from its work with The Economist) from Octo-

ber 31 to November 3, 2012, included 36,472 respondents. In the CCES, 49% of respondents

said that they intended to vote for Obama and 47% for Romney. This result is the exactly

same as a YouGov/Economist poll conducted from November 3 to November 5 with just 740

respondents. As there were no major political or economic shocks between October 31 and

November 5, the polls should show the same results - and indeed, they do. Thus, it seems that

very large sample sizes do not dramatically increase polling accuracy: YouGov/Economist

regularly used a sample of about 800 respondents but performed extremely well throughout

the 2012 election.

Table 2 about here

Table 3 about here

4.4 Poll Manipulation: Herding and Calibrating

As an election nears, the differences between polls tend to decrease. This phenomenon,

often called “herding,” involves polling organizations adjusting their estimates or methodol-

ogy at the end of the campaign so that their predictions are closer to those made by other

firms (Silver 2012a).13 Linzer (2012) shows that herding appears to have occurred in the

2012 election and that herding around the wrong numbers could lead to polling industry

bias. We find evidence that herding did occur in 2012, especially for polling organizations

with inaccurate pre-election polls. We also suggest that the term ‘calibration’ – rather than

herding – offers a more precise conceptual description of contemporary poll manipulation.

13Two possible explanations for the improved polling accuracy or the convergence at the end of campaignare increased sample sizes and more stabilized voting intention at the end of the campaign. Polling organiza-tions often increase their sample size at the end of the campaign to provide more precise results. However, nopolling organizations other than UPI/CVOTER substantially increased their sample size. We have alreadyshown that an increased sample size is unlikely to increase the accuracy of a poll. If convergence among pollsoccurs because of more stabilized public opinion at the end of campaign, the convergence should occur moregradually. However, Figure 1 shows that the convergence was sudden and restricted to the last ten days ofpolling.

19

Our analysis of eleven polls shows that some organizations engaged in herding behavior

at the end of the 2012 election, but not all firms appear to have adjusted their results. The

average absolute difference between non-final polls and the actual election result is .064, but

the difference between final polls and the election result is .056, just .008 less. Thus, the

extent of industry-wide herding behavior in 2012 appears slight.

However, we suspect that some polling organizations were more likely than others to

manipulate their estimates and herd toward their competitors. ABC/Post’s final poll has

an A value of 0.018, which puts the organization in fifth place among the 29 final polls for

predictive accuracy. However, the average absolute difference of its eleven non-final polls is

0.071, which is about four times larger. Of the eleven polls we examine across the election,

ABC/Post’s performance ranks fifth from the worst overall. The predictive accuracy of

DailyKos/SEIU/PPP is ranked 10th among final polls with a 0.039 difference. However,

the difference for non-final polls conducted by DailyKos/SEIU/PPP is 0.063. Both of these

pollsters appear to have improved dramatically in their final polls.

Gallup also appears to have participated in herding behavior in 2012. On October 10th,

Gallup officially announced a change in their polling method for “theoretical reasons” (New-

port 2012b). However, even after the official methodology change, the firm reported a

dramatic 4 point drop in Romney’s lead during the last week of the campaign, moving its es-

timates much closer in line with other polls. We believe there is no event that would justify a

sudden drop in support. Gallup’s executives (Newport 2012a) insisted that Hurricane Sandy

boosted support for the incumbent and weakened Romney’s lead, but research suggests that

natural disasters should decrease the incumbent candidate’s popularity (Achen and Bartels

2012). Indeed, our DFM-based estimate of actual public opinion shows that support for

Obama slightly slipped in the last week, perhaps due to Sandy.

Figure 3 about here

We represent herding behavior among these three polling firms graphically in Figure 3.

20

The figure also strongly suggests that UPI/CVOTER intentionally manipulated its results,

though probably somewhat earlier than the three polling firms.

We find that herding behavior was only adopted by a subset of major firms: the four

polling organizations that fared very poorly during the full campaign period (Figure 3).

In sharp contrast, none of the six best-performing polling firms show any signs of herd-

ing. Though almost all polling groups overestimated public support for Romney in 2012,

worse performing polls were especially likely to overstate Romney’s support, while better

performing polls were relatively less likely to do the same. If herding occurs only because

of polling firms’ desire to report results that are in line with others, worse performing and

better performing firms should have herded to the average of all estimates. Although the

term “herding” simply connotes group behavior, in this application herding is most likely to

lead pollsters with extreme results to move to the center. That is, more accurate polls should

have biased their estimated support in Romney’s favor, while less accurate polls should have

decreased support for Romney. However, only less accurate polls decreased support for

Romney as the election approached.

If herding occurs only because polling firms do not want to be singled out for making

the wrong prediction or to provide estimates that are in line with others, firms should herd

to the average of all estimates. Although the term herding simply connotes group behavior,

this kind of manipulation is most likely lead pollsters with extreme results to move to the

center. That is, more accurate polls should have biased their estimated support in Romney’s

favor, while less accurate polls should have decreased support for Romney. However, we do

not observe such a pattern in the data.

This suggests that poorly performing pollsters probably knew that they were overesti-

mating support for Romney and that more accurate pollsters were more confident that their

poll estimates were close to actual public opinion. As a result, more accurate pollsters may

have declined to adjust their models. This could not have occurred if polls had taken cues

21

just from other national polls – which were collectively biased in favor of Romney. For this

reason, we believe that the polling organizations calibrated their results according to cues

from other information sources, including electoral prediction betting markets, poll aggre-

gators, published criticism of Gallup’s likely voter model, state-level polls showing Obama’s

solid lead in most battleground states, and other informal sources of information. In major

online prediction markets, Romney’s chances were much lower than those reported by polls.

In the aftermath of the first presidential debate, Romney first attained majority support in

nationwide polls, but he did not actually take the lead in most statewide polls (Silver 2012a).

Surprisingly, many of the state-level polls were conducted by the same polling organizations

which conducted national polls - those that told a completely different story. Therefore, we

believe that polling firms had strong incentives to resolve these conflicting results by taking

cues from other sources of information.

We prefer to use the term ‘calibration’ to describe this kind of behavior, which is not

simply an automatic process of herding toward the average of other published poll results.

While herding is a form of calibration behavior, our evidence suggests that polling firms

adjusted their results in a more nuanced and strategic way.

5 Conclusion

Pre-election polls are more than simply predictive tools; the numbers they provide shape

political debate, galvanize supporters, and inspire political behavior. But these polls are

subject to survey error and intentional manipulation, each of which threatens to undermine

polls’ democratic functions. In this paper, we show that poll accuracy should be assessed

throughout a campaign, not just at its end. Using dynamic factor analysis, we estimated

true presidential candidate support in the two months before an election and identified

which polls accurately represented that support. We distinguished between sampling error

22

and other kinds of survey error, demonstrating the limited usefulness of ever-larger samples.

We build upon previous scholarship by providing a better way of estimating true support

and by showing how non-sampling error affects poll results. We also argue that polling firms

are more strategic actors than accounts of herding behavior imply. Instead, a subset of less

successful polling organizations appear to have calibrated their results to conform with a

variety of additional information sources beyond rival polls.

Our findings have implications for both practitioners and political scientists. First, poll

accuracy should not be assessed using only final polls, and assessment strategies like smooth-

ing and aggregating polls by week are potentially misleading approaches. Assessing accuracy

throughout the length of an election campaign is the best way to evaluate the performance

of election forecasters. Second, we provide evidence that non-probability based internet polls

outperformed traditional RDD polls for the first time in recent presidential elections. Future

research should more directly contrast the approaches of internet pollsters; sample matching

and post-stratification could introduce a variety of new kinds of errors beyond those identi-

fied in the total survey error framework. Finally, we showed that increasing sample size is

an ineffective way of improving poll accuracy; indeed, larger samples were associated with

worse performance. As polling firms prepare for upcoming electoral contests, we hope that

they will provide more accurate forecasts than they did in 2012.

23

References

[1] Achen, Christopher H., and Larry M. Bartels. 2012. “Blind Retrospection: Why SharkAttacks are Bad for Democracy.” Vanderbilt University working paper, retrieved fromhttps://my.vanderbilt.edu/larrybartels/files/2011/12/CSDI_WP_05-2013.pdf

[2] Arzheimer, Kai, and Jocelyn Evans. 2014. “A New Multinomial Accuracy Measure forPolling Bias.” Political Analysis 22.1: 31-44.

[3] Assael, Henry, and John Keon. 1982. “Nonsampling vs. Sampling Errors in Survey Re-search.” The Journal of Marketing : 114-123.

[4] Bai, Jushan. and Serena Ng. 2002. “Determining the Number of Factors in ApproximateFactor Models.” Econometrica 70:191-221.

[5] Beland, Yves, and Martin St-Pierre. 2008. “Mode Effects in the Canadian CommunityHealth Survey: A Comparison of CATI and CAPI.” In Advances in Telephone SurveyMethodology, eds. James Lepkowski, Clyde Tucker, Michael Brick, Edith de Leeuw, LilliJapec, Paul Lavrakas, Michael Link, and Roberta Sangster. New York: Wiley, 297314.

[6] Berinsky, Adam J. 2004. Silent Voices: Public Opinion and Political Participation inAmerica. Princeton, NJ: Princeton University Press.

[7] Biemer, Paul. 2010. “Total Survey Error: Design, Implementation, and Evaluation,”Public Opinion Quarterly 74(5): 817-848.

[8] Brehm, John. 1993. The Phantom Respondents. University of Michigan Press.

[9] Blumenthal, Mark. 2012. “Race Matters: Why Gallup Poll Finds Less Support For Presi-dent Obama” The Huffington Post, Jun. 7. Retrieved from http://www.huffingtonpost.

com/2012/06/17/gallup-poll-race-barack-obama_n_1589937.html

[10] Blumenthal, Mark. 2013. “Gallup Presidential Poll: How Did Brand-Name Firm BlowElection” The Huffington Post, Mar. 8. Retrieved from http://www.huffingtonpost.

com/2013/03/08/gallup-presidential-poll_n_2806361.html

[11] Chang, Yoo Soon, J. Issac Miller, and Joon Y. Park. 2009, “Extracting a CommonStochastic Trend: Theory with Some Applications,” Journal of Econometrics150.2: 231-247.

[12] Crespi, Irving.1988. Pre-Election Polling: Sources of Accuracy and Error. New York:Russell Sage Foundation.

[13] DeSart, Jay and Thomas Holbrook. 2003. “Campaigns, Polls, and the States: Assess-ing the Accuracy of Statewide Presidential Trial-Heat Polls.” Political Science Quarterly56:431-39.

24

[14] Erikson, Robert S., and Lee Sigelman. 1995. “Poll-Based Forecast of Midterm Con-gressional Election Outcomes: Do the Pollsters Get It Right?” Public Opinion Quarterly59:589-605.

[15] Erikson, Robert S., and Christopher Wlezien. 2008. “Leading Economic Indicators, thePolls, and the Presidential Vote.” PS: Political Science & Politics 41.4: 703-707.

[16] Gelman, Andrew and Gary King. 1993. “Why Are American Presidential CampaignPolls so Variable When Votes are so Predictable?” British Journal of Political Science23.4: 409-451.

[17] Groves, Robert M., and Lars Lyberg. 2010. “Total Survey Error: Past, Present, andFuture.” Public Opinion Quarterly 74.5: 849-879.

[18] Groves, Robert M., and Nancy A. Mathiowetz. 1984. “Computer Assisted TelephoneInterviewing: Effects on Interviewers and Respondents.” Public Opinion Quarterly 48.1B:356-369.

[19] Groves, Robert M., and Lou J. Magilavy. 1986. “Measuring and Explaining InterviewerEffects in Centralized Telephone Surveys.” Public Opinion Quarterly 50.2: 251-266.

[20] Hamilton, James D. 1989. “A New Approach to the Economic Analysis of NonstationaryTime Series and the Business Cycle,” Econometrica 57: 357-384.

[21] Holbrook, Thomas, 1996. Do Campaigns Matter? Thousand Oaks, CA: Sage Publish-ing.

[22] Jackman, Simon. 2005. “Pooling the Polls Over an Election Campaign.” AustralianJournal of Political Science 40:499-517.

[23] Kish, Leslie. 1962. “Studies of Interviewer Variance for Attitudinal Variables.” Journalof the American Statistical Association 57.297: 92-115.

[24] Ladd, E.C. 1996. “The Election Polls: An American Waterloo.” Chronicle of HigherEducation, Nov. 22, A52.

[25] Lau, Richard R. 1994. “An Analysis of The Accuracy of ‘Trial Heat’ Polls During the1992 Presidential Election.” Public Opinion Quarterly 58:2-20.

[26] Linzer, Drew. 2012. “Pollsters May Be Herding.” Votamatic, Nov 5. Retrieved fromhttp://votamatic.org/pollsters-may-be-herding/

[27] Linzer, Drew A. 2013. “Dynamic Bayesian Forecasting of Presidential Elections in theStates.” Journal of the American Statistical Association 108.501: 124-134.

[28] Marcellino, Massimiliano, James H. Stock, and Mark W. Watson. 2003. “MacroeconomicForecasting in the Euro Area: Country-specific versus Euro-wide Information,” EuropeanEconomic Review 47.1: pp. 1-18.

25

[29] Martin, Elizabeth A., Michael W. Traugott, and Courtney Kennedy. 2005. “A Reviewand Proposal a New Measure of Poll Accuracy.” Public Opinion Quarterly 69:342-369.

[30] Mitofsky, Warren J. 1998. “Review: Was 1996 a Worse Year for Polls Than 1948?”Public Opinion Quarterly 62:230-249.

[31] Moench, Emanuel, and Serena Ng. 2011. “A Hierarchical Factor Analysis of US HousingMarket Dynamics.” The Econometrics Journal 14.1: C1-C24.

[32] Moore, David W. and Lydia Saad. 1997. “The Generic Ballot in Midterm CongressionalElections: Its Accuracy and Relationship To House Seats.” Public Opinion Quarterly61:603-614.

[33] Mosteller, Frederick, Herbert Hyman, Philip J. McCarthy, Eli S. Marks, and David B.Truman. 1949. The Pre-Election Polls of 1948: Report to the Committee on Analysis ofPre-Election Polls and Forecasts. New York: The Social Science Research Council.

[34] Newport, Gary. 2012a. “Polling, Likely Voters, and the Law of the Com-mons.” Gallup Inc. Retrieved from http://pollingmatters.gallup.com/2012/11/

polling-likely-voters-and-law-of-commons.html

[35] Newport, Gary. 2012b. “Survey Methods, Complex and Ever Evolving.”Gallup Inc. Retrieved from http://pollingmatters.gallup.com/2012/10/

survey-methods-complex-and-ever-evolving.html

[36] Panagopoulos, Costas. 2009. “Polls and Elections: Preelection Poll Accuracy in the2008 General Elections.” Presidential Studies Quarterly 39:896-907.

[37] Panagopoulos, Costas. 2013. “Poll Accuracy in the 2012 Presidential Election Final Re-port.” Fordham University Department of Political Science. Retrieved from http://www.

fordham.edu/images/academics/graduate_schools/gsas/elections_and_campaign_

/poll_accuracy_2012_presidential_election_updated_1530pm_110712_2.pdf

[38] Pickup, Mark and Richard Johnston. 2008. “Campaign Trial Heats as Election Forecasts:Measurement Error and Bias in 2004 Presidential Campaign Polls.” International Journalof Forecasting 24: 272-284.

[39] Pickup, Mark, J. Scott Matthews, Will Jennings, Robert Ford, and Stephen D. Fisher.2011. “Why Did the Polls Overestimate Liberal Democrat Support? Sources of PollingError in the 2010 British General Election.” Journal of Elections, Public Opinion andParties 21.2: 179-209.

[40] Real Clear Politics. 2012. “RealClearPolitics Poll Averages.” Retrieved from http://

www.realclearpolitics.com/polls/

[41] Rothschild, David and Justin Wolfers. 2012. “Forecasting Elections: Voter Intentionsversus Expectations.” Working Paper, November 1.

26

[42] Silver, Nate. 2012a. “Which Polls Fared Best (and Worst) in the2012 Presidential Race.” The New York Times ’ FiveThirtyEight. Re-trieved from http://fivethirtyeight.blogs.nytimes.com/2012/11/10/

which-polls-fared-best-and-worst-in-the-2012-presidential-race/

[43] Silver, Nate. 2012b. “Gallup vs. the World.” The New York Times ’ FiveThir-tyEight. Retrieved from http://fivethirtyeight.blogs.nytimes.com/2012/10/18/

gallup-vs-the-world/

[44] Silver, Nate. 2012c. “Last 10 Presidential Elections Show No Consistent Bias in Polls.”The New York Times ’ FiveThirtyEight. Retrieved from http://www.nytimes.com/2012/

10/01/us/last-10-presidential-elections-show-no-consistent-bias-in-polls.

html?pagewanted=all

[45] Silver, Nate. 2012c. “Measuring a Convention Bounce.” The New York Times ’ FiveThir-tyEight. Retrieved from http://fivethirtyeight.blogs.nytimes.com/2012/08/29/

measuring-a-convention-bounce/

[46] Stock, James and Mark Watson. 1988. “Testing for Common Trends.” Journal of theAmerican Statistical Association 83(404):1097-1107.

[47] Stock, James and Mark Watson. 2002. “Macroeconomic forecasting using diffusion in-dexes.” Journal of Business and Economic Statistics 20:147-162.

[48] Stock, James H. and Watson, Mark W. 2010. “The Evolution of National and RegionalFactors in U.S. Housing Construction,” in Volatility and Time series Econometrics: Essaysin Honor of Robert F. Engle, eds. Tim Bollerslev, Jefferey Russell and Mark Watson,Oxford: Oxford University Press.

[49] Stock, James and Mark Watson. 2011. “Dynamic Factor Models.” In Oxford Handbook ofEconomic Forecasting, Michael P. Clements and David F. Hendry (eds), Oxford UniversityPress.

[50] Traugott, Michael W. 2001. “Assessing Poll Performance in the 2000 Campaign.” PublicOpinion Quarterly 65:389-419.

[51] Traugott, Michael W. 2005. “The Accuracy of The National Preelection polls in the2004 Presidential Election.” Public Opinion Quarterly 69:642-654.

[52] Traugott, Michael W. and Christopher Wlezien. 2009 “The Dynamics of Poll Per-formance during the 2008 Presidential Nomination Contest.” Public Opinion Quarterly73:866-94.

[53] Traugott, Michael W. 2011. “The Accuracy of Opinion Polling and Its Relation to ItsFuture.” in Robert Y. Shapiro and Lawrence R. Jacobs (eds), The Oxford Handbook ofAmerican Public Opinion and the Media. New York: Oxford University Press.

27

[54] Villanueva, Elmer V. 2001. “The Validity of Self-Reported Weight in US Adults: APopulation Based Cross-Sectional Study.” BMC Public Health 1.1: 11.

[55] Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman. 2013. “ForecastingElections with Non-Representative Polls.” International Journal of Forecasting, forthcom-ing.

[56] Weisberg, Herbert F. 2005. The Total Survey Error Approach: A Guide to the NewScience of Survey Research. Chicago: University of Chicago Press.

[57] Wlezien, Christopher. 2003. “Presidential Election Polls in 2000: A Study in Dynamics.”Presidential Studies Quarterly 33:172-186.

[58] Wlezien, Christopher and Robert S. Erikson. 2002. “The Timeline of Presidential Cam-paign.” Journal of Politics 64: 969-993.

[59] Wlezien, Christopher and Robert S. Erikson. 2007. “The Horse Race: What Polls Revealas the Election Campaign Unfolds.” International Journal of Political Opinion Research19: 74-88.

[60] Zaller, John. 2002. “The Statistical Power of Election Studies to Detect Media ExposureEffects in Political Campaigns.” Electoral Studies 21.2: 297-329.

28

Table 1: The Performance of Major Polling Organizations in 2012

Average Abs. Inter-Number Sample Std. Error Bias in view

Polling Firm of Polls Size Bias Dev. in A MSE MSE ModeABC/Post 11 1240 0.069 0.043 0.071 0.0066 72% 1

ARG 8 1150 0.084 0.007 0.084 0.0071 99% 1DailyKos/SEIU/PPP (D) 9 1240 0.063 0.014 0.063 0.0060 66% 2

Gallup 43 2900 0.082 0.075 0.087 0.0123 54% 1IBD/TIPP 19 880 0.039 0.052 0.048 0.0042 36% 1

Ipsos/Reuters 43 1400 0.035 0.023 0.036 0.0018 70% 3Politico/GWU/Battleground 7 1000 0.071 0.047 0.071 0.0073 70% 1

Rand 43 1000 0.003 0.037 0.029 0.0014 1% 3Rasmussen 43 1500 0.105 0.029 0.105 0.0119 93% 2

UPI/CVOTER 7 1320 0.08 0.033 0.080 0.0075 85% 1YouGov/Economist 10 800 0.03 0.022 0.030 0.0014 65% 3

Note: Average Sample Size is rounded to the nearest 10 and interview codes correspond to1. Phone with interviewer, 2. IVT (robo-call), 3. Internet

29

Tab

le2:

Pre

-Ele

ctio

nP

oll

Acc

ura

cyA

cros

sA

llF

inal

Pol

lsPollin

gFirm

Fin

alPoll

Date

sSample

Size

Obama

Romney

O-R

Error

Abs.

Error

AAbs.

ARank

Population

Inte

rview

Mode

Cell

Phone

ABC/Post

11/1-1

1/4

563

50

47

3-0

.90.9

0.018

0.018

5LV/RV

Phone

Y

Angus-Reid

11/1-1

1/3

590

51

48

3-0

.90.9

0.019

0.019

6LV/RV

Phone

Y

AP-G

fK10/19-1

0/23

740

45

47

-2-5

.95.9

0.123

0.123

28

LV/RV

Phone

Y

ARG

11/2-1

1/4

1475

49

49

0-3

.93.9

0.079

0.079

17

LV

Phone

Y

CBS/NYT

10/25-1

0/28

4725

48

47

1-2

.92.9

0.058

0.058

13

LV/RV

Phone

Y

Claru

s10/4-1

0/4

800

46

47

-1-4

.94.9

0.101

0.101

27

LV/RV

Phone

Y

CNN

11/2-1

1/4

1000

49

49

0-3

.93.9

0.079

0.079

17

LV/RV

Phone

Y

DailyKos/

SEIU

/PPP

(D)

11/1-1

1/4

1128

50

48

2-1

.91.9

0.039

0.039

10

LV/RV

Auto

mate

dN

Democra

cy

Corp

s(D

)11/1-1

1/4

2345

49

45

40.1

0.1

-0.006

0.006

1LV

Phone

Y

FOX

10/28-1

0/30

713

46

46

0-3

.93.9

0.079

0.079

16

LV/RV

Phone

Y

Gallup

11/1-1

1/4

1500

48

49

-1-4

.94.9

0.1

0.1

24

LV/RV

Phone

Y

Gra

vis

Mark

eting

11/3-1

1/5

2709

48

48

0-3

.93.9

0.079

0.079

17

LV

Auto

mate

dN/A

IBD/TIP

P11/3-1

1/5

1417

50

49

1-2

.92.9

0.059

0.059

15

LV

Phone

Y

Ipso

s/Reute

rs11/1-1

1/5

1300

48

46

2-1

.91.9

0.037

0.037

8RV

Phone

N/A

JZ

Analytics/

Newsm

ax

11/3-1

1/5

1000

47

47

0-3

.93.9

0.079

0.079

17

LV

Inte

rnet

N/A

Monmouth

11/1-1

1/4

1200

48

48

0-3

.93.9

0.079

0.079

17

LV

Auto

mate

dY

NationalJourn

al

10/25-1

0/28

693

50

45

50.1

0.1

-0.026

0.026

7LV

Phone

Y

NBC/W

SJ

11/1-1

1/3

712

48

47

1-2

.92.9

0.058

0.058

13

LV/RV/A

Phone

Y

NPR

10/23-1

0/25

3000

47

48

-1-4

.94.9

0.1

0.1

26

LV

Phone

Y

Pew

10/31-1

1/3

839

48

45

3-0

.90.9

0.015

0.015

3LV/RV

Phone

Y

Politico/GW

U/Battlegro

und

11/4-1

1/5

1000

47

47

0-3

.93.9

0.079

0.079

17

LV

Phone

Y

PPP

(D)

11/2-1

1/4

2551

50

48

2-1

.91.9

0.039

0.039

10

LV/RV

Auto

mate

dN

Purp

leStrate

gies

10/31-1

1/1

872

47

46

1-2

.92.9

0.058

0.058

12

LV

Auto

mate

dY

Rand

10/30-1

1/5

1019

49.5

46.2

3.3

-0.6

0.6

0.01

0.01

2LV/RV

Inte

rnet

N/A

Rasm

ussen

11/3-1

1/5

1023

48

49

-1-4

.94.9

0.1

0.1

24

LV

Auto

mate

dN

UConn/Hartford

Coura

nt

10/11-1

0/16

1200

48

45

3-0

.90.9

0.015

0.015

3LV

Phone

Y

UPI/

CVOTER

11/3-1

1/5

1041

49

48

1-2

.92.9

0.059

0.059

14

LV

Phone

N/A

Wash

.Tim

es/

JZ

Analytics

10/29-1

0/31

N/A

49

49

0-3

.93.9

0.079

0.079

17

LV

Phone

N/A

YouGov/Economist

11/3-1

1/5

1080

49

47

2-1

.91.9

0.038

0.038

9LV/RV

Inte

rnet

N/A

30

Table 3: Absolute Errors and Sample Size, Closeness to ElectionModel 1 Model 2 Model 3

Sample Size .027 (.011)** – .023(.011)**Closeness to Election – -.459(.155)*** .139(.192)

Constant .037(.014)*** .077(.006)*** .033(.014)**Adjusted R2 0.05 0.03 .05

N 150 243 150

Robust regression results. Coefficients for Sample Size and Closeness to Election aremultiplied with 1,000 for easy reading. ** is significant at the p= .05 level. *** is

significant at the p=.01 level.

Figure 1. Poll Dynamics

Vot

e S

hare

: Rom

ney/

Oba

ma

09/0

1

09/0

2

09/0

4

09/0

5

09/0

6

09/0

7

09/0

8

09/1

0

09/1

1

09/1

3

09/1

7

09/1

8

09/1

9

09/2

0

09/2

4

09/2

5

09/2

6

09/2

7

09/2

8

10/0

1

10/0

2

10/0

5

10/0

6

10/0

7

10/0

8

10/1

0

10/1

1

10/1

2

10/1

4

10/1

5

10/1

6

10/1

7

10/1

8

10/1

9

10/2

0

10/2

2

10/2

3

10/2

4

10/2

5

10/2

6

10/2

7

10/2

8

11/0

4

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

Ipsos

Rasmussen

Rand

IBD

ABC

ARG

PPP

Politico

UPI

YouGov

Gallup

Actual Support

31

Figure 2. Poll Accuracy

Acc

urac

y m

easu

re: A

09/0

1

09/0

2

09/0

4

09/0

5

09/0

6

09/0

7

09/0

8

09/1

0

09/1

1

09/1

3

09/1

7

09/1

8

09/1

9

09/2

0

09/2

4

09/2

5

09/2

6

09/2

7

09/2

8

10/0

1

10/0

2

10/0

5

10/0

6

10/0

7

10/0

8

10/1

0

10/1

1

10/1

2

10/1

4

10/1

5

10/1

6

10/1

7

10/1

8

10/1

9

10/2

0

10/2

2

10/2

3

10/2

4

10/2

5

10/2

6

10/2

7

10/2

8

11/0

4

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Gallup

Ipsos

Rasmussen

Rand

IBD

ABC

ARG

PPP

Politico

UPI

YouGov

32

Figure 3. Poll Herding

Vot

e S

hare

: Rom

ney/

Oba

ma

09/0

1

09/0

2

09/0

4

09/0

5

09/0

6

09/0

7

09/0

8

09/1

0

09/1

1

09/1

3

09/1

7

09/1

8

09/1

9

09/2

0

09/2

4

09/2

5

09/2

6

09/2

7

09/2

8

10/0

1

10/0

2

10/0

5

10/0

6

10/0

7

10/0

8

10/1

0

10/1

1

10/1

2

10/1

4

10/1

5

10/1

6

10/1

7

10/1

8

10/1

9

10/2

0

10/2

2

10/2

3

10/2

4

10/2

5

10/2

6

10/2

7

10/2

8

11/0

4

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

Gallup

ABC

PPP

UPI

Actual Support

33