forecasting elections using compartmental models of infectionscasting more transparent, we propose a...

19
Forecasting elections using compartmental models of infections Alexandria Volkening a , Daniel F. Linder b , Mason A. Porter c , and Grzegorz A. Rempala d a Mathematical Biosciences Institute, The Ohio State University, Columbus, OH 43210 ([email protected]) b Medical College of Georgia, Division of Biostatistics and Data Science, Augusta University, Augusta, Georgia 30912 ([email protected]) c Department of Mathematics, University of California, Los Angeles, Los Angeles, California 90095 ([email protected]) d Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH 43210 ([email protected]) September 9, 2019 Abstract To forecast political elections, popular pollsters gather polls and combine information from them with fundamen- tal data such as historical trends, the national economy, and incumbency. This process is complicated, and it includes many subjective choices (e.g., when identifying likely voters, estimating turnout, and quantifying other sources of uncertainty), leading to forecasts that differ between sources even when they use the same underlying polling data. With the goal of shedding light on election forecasts (using the United States as an example), we develop a framework for forecasting elections from the perspective of dynamical systems. Through a simple approach that borrows ideas from epidemiology, we show how to combine a compartmental model of disease spreading with public polling data to forecast gubernatorial, senatorial, and presidential elections at the state level. Our results for the 2012 and 2016 U.S. races are largely in agreement with those of popular pollsters, and we use our new model to explore how sub- jective choices about uncertainty impact results. Our goals are to open up new avenues for improving how elections are forecast, to increase understanding of the results that are reported by popular news sources, and to illustrate a fascinating example of data-driven forecasting using dynamical systems. We conclude by forecasting the senatorial and gubernatorial races in the 2018 U.S. midterm elections of 6 November. Keywords U.S. elections | mathematical modeling | politics | contagions | forecasting | compartmental models | complex systems Significance Forecasting political elections — a challenging, high-stakes problem — is characterized by uncertainty, incomplete information, subjectivity, and media attention. We introduce a mathematical framework, based on compart- mental models of infections, for forecasting elections in the United States at the level of states. Our model borrows ideas from the field of epidemiology, and we base its parameters for how states influence each other on public polling data. Surprisingly, while this simple model is designed to describe how diseases spread, it agrees well with more complicated algorithms in the popular press. Although the spread of contagions differs from voting dynamics, our work suggests a new approach to study and elucidate how elections in different states are related. It also illustrates how accounting for uncertainty in different ways impacts election forecasts, providing an illuminating example of data-driven forecasting using dynamical systems. Introduction Despite what was largely viewed as an unexpected outcome for the 2016 Unites States presidential election, recent work [44] indicates that national polling data is not becoming less accurate. Election forecasting is a complicated, multi- 1 arXiv:1811.01831v1 [physics.soc-ph] 5 Nov 2018

Upload: others

Post on 04-Dec-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

Forecasting elections using compartmental models of infections

Alexandria Volkeninga, Daniel F. Linderb, Mason A. Porterc, and Grzegorz A. Rempalad

aMathematical Biosciences Institute, The Ohio State University, Columbus, OH 43210([email protected])

bMedical College of Georgia, Division of Biostatistics and Data Science, Augusta University, Augusta,Georgia 30912 ([email protected])

cDepartment of Mathematics, University of California, Los Angeles, Los Angeles, California 90095([email protected])

dDivision of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH 43210([email protected])

September 9, 2019

AbstractTo forecast political elections, popular pollsters gather polls and combine information from them with fundamen-

tal data such as historical trends, the national economy, and incumbency. This process is complicated, and it includesmany subjective choices (e.g., when identifying likely voters, estimating turnout, and quantifying other sources ofuncertainty), leading to forecasts that differ between sources even when they use the same underlying polling data.With the goal of shedding light on election forecasts (using the United States as an example), we develop a frameworkfor forecasting elections from the perspective of dynamical systems. Through a simple approach that borrows ideasfrom epidemiology, we show how to combine a compartmental model of disease spreading with public polling datato forecast gubernatorial, senatorial, and presidential elections at the state level. Our results for the 2012 and 2016U.S. races are largely in agreement with those of popular pollsters, and we use our new model to explore how sub-jective choices about uncertainty impact results. Our goals are to open up new avenues for improving how electionsare forecast, to increase understanding of the results that are reported by popular news sources, and to illustrate afascinating example of data-driven forecasting using dynamical systems. We conclude by forecasting the senatorialand gubernatorial races in the 2018 U.S. midterm elections of 6 November.

Keywords U.S. elections | mathematical modeling | politics | contagions | forecasting | compartmental models |complex systems

Significance Forecasting political elections — a challenging, high-stakes problem — is characterized by uncertainty,incomplete information, subjectivity, and media attention. We introduce a mathematical framework, based on compart-mental models of infections, for forecasting elections in the United States at the level of states. Our model borrows ideasfrom the field of epidemiology, and we base its parameters for how states influence each other on public polling data.Surprisingly, while this simple model is designed to describe how diseases spread, it agrees well with more complicatedalgorithms in the popular press. Although the spread of contagions differs from voting dynamics, our work suggests anew approach to study and elucidate how elections in different states are related. It also illustrates how accounting foruncertainty in different ways impacts election forecasts, providing an illuminating example of data-driven forecastingusing dynamical systems.

Introduction

Despite what was largely viewed as an unexpected outcome for the 2016 Unites States presidential election, recentwork [44] indicates that national polling data is not becoming less accurate. Election forecasting is a complicated, multi-

1

arX

iv:1

811.

0183

1v1

[ph

ysic

s.so

c-ph

] 5

Nov

201

8

Page 2: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

step process, and it often comes across as a blackbox. It involves polling members of the public, identifying likely voters,adjusting poll results to account for demographics, reporting poll outcomes, and accounting for fundamental data (suchas historical trends and the economy). The presence of correlations between states with shared demographics furthercomplicates the picture and adds to the challenge of forecasting elections. The result is a high-stakes, high-interestproblem that is characterized by uncertainty, incomplete information, and subjective choices. In this paper, we developa new forecasting method based on dynamical systems and the mathematical theory of disease spreading, and we use itto attempt to unpack and shed mathematical light on U.S. election forecasting.

There are three primary approaches for forecasting elections: fundamentals-based, poll-based, and blended strate-gies. The academic literature is dominated by algorithms that are built on fundamental data; these statistical and eco-nomic models (e.g., [32, 42, 45, 50]) combine different variables — including state-level economic indicators, approvalratings, and incumbency — to estimate election results. Although some methods [50] blend both polls and fundamentaldata, others [32,42,45] disregard information about voter intentions entirely. The work of Hummel and Rothschild [42]is an example of accurate fundamentals-based election forecasting at the state level that does not use any polling data.

At the other end of the spectrum are popular pollsters, newsletters, and major media websites. These includeNate Silver’s FiveThirtyEight.com [1], The Huffington Post [2], Sabato’s Crystal Ball [3], The Los Angeles Times [4],270toWin.com [5], the Cook Political Report [6], and the New York Times Upshot [7]. These mainstream sources offervarying levels of detail about their techniques. Among the group of popular pollsters who use polling data to forecastelections, there seems to be a distinction between two types: those who aggregate publicly available polls from a rangeof sources, and those who gather their own in-house polls. The USC Dornsife/Los Angeles Times Daybreak poll [4]falls into the latter category and correctly called the 2016 U.S. presidential race for Donald Trump.

FiveThirtyEight.com [1] and The Huffington Post [2] arguably lead the popular field in making their algorithmslargely publicly available. Their methods are built on poll aggregating [8] and involve a combination of poll-based andblended strategies. The backbone of The Huffington Post’s forecast is a Bayesian Kalman-filter model that averagespublic polls; it incorporates fundamental data as model priors and uses it to correlate state outcomes [43]. FiveThir-tyEight.com, by contrast, is known for its pollster ratings: Nate Silver’s algorithm weights polls more heavily fromsources that it judges as more trustworthy and accurate [58]. After adjusting polls to account for factors such as re-cency, poll sample, and convention bounce, FiveThirtyEight.com uses state demographics (e.g., religion, ethnicity, andeducation) to correlate random outcomes, such that similar states are more likely to behave similarly [58].

Accounting for interactions between states is crucial for producing reliable forecasts, and Nate Silver [58] hasstressed the importance of correlating polling errors by state demographics. The Huffington Post [43] relies on statecorrelations based on voting history. Both of these methods, which one can view as indirectly incorporating relationshipsbetween states through noise, have a key characteristic in common: they rely on state similarity, which is inherentlyan undirected quantity. In particular, if Ohio and Pennsylvania are viewed as similar by FiveThirtyEight.com, so arePennsylvania and Ohio. However, it is possible that states influence each other in directional ways. For example, votersin Ohio may more strongly influence the population in Pennsylvania than vice versa. How strongly states influenceeach other can be due to where candidates are campaigning, the people with whom voters in various states interact,what distant states are featured prominently in the news, which states most resonate with local voters, and so on.

With the goal of accounting for directional relationships between states and, more broadly, making election fore-casting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. Weconsider model simplicity a virtue, and our goal is to be as simple as possible and utilize a largely poll-based, poll-aggregating approach. Because we are interested in state relationships, we borrow ideas from the well-established fieldof epidemic modeling. Using a compartmental model of disease spreading, we treat Democrat and Republican votingintentions as contagions that spread between states. Our model is overly simplistic, but it allows us to show how onecan employ tools from mathematical modeling (including dynamical systems, uncertainty quantification, and networkanalysis) to help demystify the process of election forecasting. Quantifying uncertainty is important, and we exploredifferent means of including randomness. We conclude with a forecast of the 6 November 2018 U.S. gubernatorial andsenatorial elections.

2

Page 3: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

A C

B

D

Nevada

OhioPennsylvania

Undecided/OtherDemocrat Republican

Susceptible Infected�I

DemocratUndecided voterRepublican

Perc

ent D

emoc

rat v

oter

s in

Ohi

o

300 250 200 150 100 50 0Days to 2012 presidential election

38

40

42

44

46

48

50

52

54

Model trajectoryPolling dataPolls averaged by month

Safe Red states (Red super state)Safe Blue states (Blue super state)

Swing states

Figure 1: Overview of our model and approach. (A) A susceptible–infected–susceptible (SIS) compartmental modeltracks the number of susceptible (S(t)) and infected individuals (I(t)) in a community in time; these quantities evolveaccording to two factors: infection and recovery. We repurpose this approach by including two types of infections(Democrat and Republican voting inclinations), interpreting infection as adoption of a voting opinion, and replacingrecovery with turnover of committed voters to undecided. (B) Our compartmental model assumes that individuals in-teract within (black links) and between (thick green links) states. In this cartoon example, Ohio and Pennsylvania caninfluence each other, but voters in Nevada and Pennsylvania do not interact with each other. (C) Example model dy-namics: Democrat opinions in Ohio leading up to the 2012 presidential elections. This shows how our model trajectorycompares with polling data for an example state. Our trajectories are smooth because we average polling data by month(to obtain the 11 purple-asterisk data points that we show). Although we obtain all of our forecasts by simulating theevolution of opinions in the year leading up to each election, for the remainder of this paper, we focus on the result attime t = 0 days to the election (i.e., our final forecast on the eve of the election). (D) We are interested in forecastingat the level of states, but for elections with many state races, we simplify the problem by combining all reliable (safe)Republican and Democrat regions into two super states (Red and Blue, to use traditional party colors). In this schematic,we show the super states for presidential elections.

Background

Because we repurpose our approach to forecasting elections from epidemic modeling (for example, see refs. [35,38,40,46]), we begin with a brief introduction to these techniques before describing our model in the next section.

Compartmental Modeling of Infections

Compartmental models [35, 38, 40, 46–48] are one of the standard mathematical approaches for studying biologicalcontagions. They were developed initially for analyzing the spread of biological diseases such as influenza [37] andZika [60], but contagion models (often combined with network structure to model social connectivity) have also beenused for the study of social contagions, including the dynamics of memes [59] and rumors [61], the movement ofrioting waves across cities [34], and the adaption of scientific techniques [33]. This family of models is built on theidea that one can categorize individuals into a few distinct types (i.e., “compartments”). One then describes contagiondynamics using flux terms between the various compartments. For example, the well-known susceptible–infected–susceptible (SIS) compartmental model breaks a population into two classes. At a given time, an individual can beeither susceptible (S) or infected (I). As we illustrate in Fig. 1A, the fraction of susceptible and infected individuals ina community varies depending on two factors:• transmission: when an infected individual interacts with a susceptible individual, the susceptible person has some

chance of becoming infected; and• recovery: an infected person has some chance of recovering (and becoming susceptible again).

Recovery depends only on the number of people currently infected, but transmission depends on the number of connec-tions between susceptible and infected individuals.

Suppose that S(t) and I(t), respectively, are the fraction of susceptible and infected individuals in a well-mixed

3

Page 4: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

population at time t. One then describes the spread of infection by the following set of ordinary differential equations:

dS

dt= γI︸︷︷︸

recovery with rate γ

− βSI︸︷︷︸infection with rate β

, (1)

dI

dt= −γI + βSI , (2)

where S(t) + I(t) = 1 and the two parameters, γ and β, correspond to the rates of disease transmission and recovery,respectively. (For details, see Materials and Methods.) The SIS model has been extended to account for more realisticdetails, such as multiple diseases/contagions, communities, and contact structure between individuals or subpopulations[31, 39, 49, 53, 54]. Mathematically, this entails adjusting the system [1–2] to include additional terms and equations.The commonality of a compartmental approach to contagion dynamics, whether one employs a simplistic example oran intricate one, is that one divides a population into categories, and there are mechanisms for individuals to changecategories.

Results

We now construct a model of elections in the form of a compartmental model.

Our Election Model

Our model of election dynamics is a two-pronged SIS compartmental model (see Fig. 1A). First, we reinterpret “sus-ceptible” individuals as undecided (or independent or minor-party) voters. Because most U.S. elections are dominatedby two parties, we then consider two contagions: Democrat and Republican voting intentions. We are interested inforecasting elections at the level of states, so we track these quantities within each state. In particular, let

Si = fraction of undecided voters in the ith state ,

IiD = fraction of Democrat voters in the ith state ,

IiR = fraction of Republican voters in the ith state ,

where IiD + IiR + Si = 1. We account for four behaviors:• Democrat transmission: undecided voters can become Democrat due to interactions with Democrats;• Republican transmission: similarly, undecided voters can become Republican due to interactions with Republi-

cans;• Democrat turnover: an infected person has some chance of changing their mind to undecided (this amounts to

“recovering”); and• Republican turnover: an infected person has some chance of becoming undecided (i.e., recovering).

It is important to note that, while contagion language does not necessarily apply to social dynamics [52], we use suchlanguage to describe voters in this paper. These terms highlight that our model is not a specialized election-forecastingmodel, as part of our goal is to highlight how a simple framework can provide meaningful forecasts of high-dimensionalsystems. We thus expect similar ideas to provide insight into forecasting in many complex systems.

By extending the traditional SIS model [1–2] to account for two contagions and M states or super states (see

4

Page 5: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

Fig. 1D), we obtain the following system of coupled ordinary differential equations:

dIiDdt

(t) = −γiDIiD︸ ︷︷ ︸Dem. loss

+M∑j=1

βijRN j

NSiIjD︸ ︷︷ ︸

Dem. infection

, (3)

dIiRdt

(t) = −γiRIiR︸ ︷︷ ︸Rep. loss

+M∑j=1

βijRN j

NSiIjR︸ ︷︷ ︸

Rep. infection

, (4)

dSi

dt(t) = γIDI

iD + γiRI

iR −

M∑j=1

βijDN j

NSiIjD −

M∑j=1

βijRN j

NSiIjR , (5)

where N is the total number of likely voters in the U.S.; N j is the number of likely voters in state j; and γiD and γiRdescribe the rates of committed Democrats and Republicans, respectively, converting to the undecided compartment.Similarly, βijD and βijR correspond, respectively, to the transmission (i.e., influence) rates from Democrat and Republicanvoters in state j to undecided individuals in state i. We obtain our parameters by fitting to a year of state polling data(averaged by month to remove small-scale fluctuations; see Fig. 1C) from HuffPost Pollster [9] and RealClearPolitics[10]; see Materials and Methods for details. We obtain measurements of the number of voting age individuals per statefrom the Federal Register [28–30], and we make the simplifying assumption that all states have the same voter turnout,so this parameter cancels out of our model. Our parameters are different for each election, as we use the public pollsand population data specific to each election for fitting.

The β parameters provide a means of modeling directed relationships between states, and we interpret transmissioninteractions broadly. While opinion persuasion (i.e., transmission) can occur through communication between unde-cided and committed voters [55], we expect that it can also occur through candidate campaigning, news coverage, andtelevised debates. We hypothesize that these venues serve as an indirect means for voters in one state to influenceanother. For example, if news coverage of Republican campaigning in Pennsylvania resonates with undecided votersin Ohio, this may provide an indirect route of opinion transmission from Pennsylvania to Ohio and increase βOH,PA

R .Therefore, we consider large βijR to mean that Republicans in state j strongly influence undecided voters in state i, andsuch “strong influence” may be due either to conversations between voters or due to indirect effects like state–stateaffinity influenced (or even activated) by media.

After we fit our parameters to polling data for a given election, we use our model [3–5] to simulate the dailyevolution of political opinions from the preceding January through election day, using the earliest available pollingdata to specify our initial conditions (see Materials and Methods). If any undecided voters remain at the end of oursimulation, we make the naive choice to assume that they vote for minor-party candidates or simply do not vote. Bycontrast, FiveThirtyEight.com [58] assigns most voters who remain undecided on election day to the major parties andHuffington Post [43] incorporates undecided voters into their measurements of uncertainty.

Election Simulations

We now use our model [3–5] to simulate past races for governor, senate, and president positions. Because realisticforecasts should incorporate uncertainty, we follow this exploration of past races with a short study of the impact ofnoise on our 2016 presidential forecast. To do this, we introduce a stochastic differential equation (SDE) version ofour model, and we conclude our election simulations by using this probabilistic model to forecast the gubernatorial andsenatorial midterm races on 6 November 2018.

2012 and 2016 Election Forecasts

By fitting our model parameters to polling data for senatorial, gubernatorial, and presidential races in 2012 and 2016without incorporating the final election results, we can simulate forecasts for these elections as if we made them on theeve of the respective election days. In Fig. 2, we summarize our forecast results for these races.

5

Page 6: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

2012 Gubernatorial Elections

+50 +30 +10 0 +10 +30 +50

WI

WA

VT

UT

ND

NC

NH

MT

MO

IN

+30 +20 +10 0 +10 +20 +30WIPAOHNCNHNVMOLAINILFLAZ

Safe BlueSafe Red

2016 Senate Elections: 34 Seats Contested2016 Gubernatorial Elections

+60 +40 +20 0 +20 +40 +60WVWAVTUTORNDNCNHMTMOINDE

2016 Presidential Race: Trump (R) vs. Clinton (D)

+30 +20 +10 0 +10 +20 +30WIVAPAOHNCNHNVMNMIIAFLCO

Safe BlueSafe Red

2012 Senate Elections: 33 Seats Contested

+30 +20 +10 0 +10 +20 +30WIVAPAOHNDNVMTMOMAINFLCTAZ

Safe BlueSafe Red

2012 Presidential Race: Romney (R) vs. Obama (D)

+30 +20 +10 0 +10 +20 +30WIVAPAOHNCNHNVMNMIIAFLCO

Safe BlueSafe Red

A B C

D E F

Model (Rep. win)!Model (Dem. win)!Results (Rep. win)!Results (Dem. win)

Figure 2: Simulations of Eqns. [3–5]. We calculate our forecasts for 2012 and 2016 using polling data up until electionday. They do not include any election results, and they should be interpreted as forecasts from the night before anelection day. Comparison of our 2012 and 2016 (A, D) governor, (B, E) senate, and (C, F) president election forecasts(dark red and blue bars) with election results (light red and blue bars). The horizontal axis shows the percentage-pointlead by Democrats (blue) or Republicans (red); shorter bars represent closer elections, and bars that extend to the right(respectively, left) correspond to Republican (respectively, Democrat) leads. We highlight the state results that weforecast incorrectly by italicizing and bolding the corresponding state names. As we mentioned in the main text, “SafeRed” and “Safe Blue” are conglomerate “super states” that are composed, respectively, of reliably Republican-votingand Democrat-voting states. (We assemble the super states based on pollster opinions and historical data; see Materialsand Methods for details.) In panel (B), the Safe Red super state consists of MS, NE, TN, TX, UT, and WY; the SafeBlue super state consists of CA, DE, HI, MD, MI, MN, NJ, NM, NY, RI, WA, and WV. In panel (E), the Safe Red superstate consists of AL, AK, AR, GA, ID, IA, KS, KY, ND, OK, SC, SD, and UT; the Safe Blue super state consists of CO,CT, HI, MD, NY, OR, VT, and WA. In panels (C, F), see Fig. 1D for the super states used for presidential elections.

It is insightful to compare our forecast to the final forecasts from popular pollsters. As we show in Table 1, our modelhas a similar success rate at calling party outcomes at the state level as FiveThirtyEight.com [1] and Sabato’s CrystalBall [3]. Our mean accuracy for presidential elections in 2012 and 2016 is 94.1%, while FiveThirtyEight.com achieves amean success rate of 95.1% and Sabato’s Crystal Ball follows slightly behind at 93.1%. To our knowledge, gubernatorialforecasts and 2012 senatorial forecasts are not currently available from FiveThirtyEight.com, so we compare our modelperformance for these races with Sabato’s Crystal Ball. Across the states for which forecasts are available from bothour model and Sabato, we forecast 92.3% of 65 state senatorial races correctly; Sabato’s Crystal Ball achieves a successrate of 93.8%. Similarly, our forecasts have a mean accuracy of 90.5% for the 21 gubernatorial races in 2012 and 2016,for which Sabato’s predictions average 81.0% success.

Taken together, Fig. 2 and Table 1 highlight two related goals in election forecasting: (1) forecasting the vote shareby state (e.g., the percents of the state vote received by Democrat and Republican candidates), and (2) calling thefinal winning party by state (i.e., which party’s candidate wins the election in a given state). Many popular pollsters,including the Cook Political Report [6] and Sabato’s Crystal Ball [3], focus on the second goal, whereas our model andFiveThirtyEight.com’s algorithm [1] provides a means of pursuing both goals. Media coverage and public attentionare skewed heavily toward the second goal, yet the first aim represents a more challenging problem with delicate, finer

6

Page 7: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

Election 538 Model Sabato

2016 President 90.2% 88.2% 90.2%2016 Senate 94.1% 88.2% 94.1%2016 Governor NA 91.7% 83.3%2012 President 100% 100% 96.1%2012 Senate NA 96.8% 93.5%2012 Governor NA 88.9% 77.8%

Table 1: Comparison of success rates for our model and popular sources. Note that senatorial and gubernatorialforecasts are not provided or calculated for all states that hold elections in a given year. We report success rates only forstates for which forecasts are available from both Sabato’s Crystal Ball [3] and our model.

details.

Accounting for and Interpreting Uncertainty

Election forecasting involves not only calling a race for a specific party and forecasting vote shares, but also specifyingthe likelihood of different outcomes. This raises a third goal of pollsters that is easy for forecast readers to overlook.This goal entails (3) quantifying uncertainty (e.g., estimating the chance that a specific candidate has of winning anelection). Quantifying and interpreting uncertainty is important, and we suggest that this is one of the key places wheremathematical techniques can contribute most successfully to election forecasting. We explore randomness by reframingour model [3–5] as a system of stochastic differential equations:

dIiD(t) =

−γiDIiD +M∑j=1

βijRN j

NSiIjD

︸ ︷︷ ︸

deterministic dynamics in Eqn. [3]

dt+ σdW iD(t)︸ ︷︷ ︸

uncertainty

, (6)

dIiR(t) =

−γiRIiR +M∑j=1

βijRN j

NSiIjR

dt+ σdW iR(t) , (7)

dSi(t) =

γIDIiD + γiRIiR −

M∑j=1

βijDN j

NSiIjD

(8)

M∑j=1

βijRN j

NSiIjR

dt+ σdW iS(t) ,

where, with a slight abuse of notation, we now consider IiD, IiR, and Si to be stochastic processes and let W i

R andW iS be Wiener processes. We selected the noise strength σ to roughly match our 80% confidence intervals to those of

FiveThirtyEight.com [11].FiveThirtyEight.com [58] has highlighted several sources of uncertainty, but for concreteness and simplicity, we

focus on accounting for errors based on shared demographics. In particular, although U.S. president, senate, andgovernor elections are decided at the level of states, polling errors are correlated in regions with similar populations.Thus, if a pollster is wrong in Minnesota, they may be systematically off in states (such as Wisconsin) with sharedfeatures [58]. This type of error makes it possible for polls on a bloc of similar states to all be wrong together, leadingto an unforeseen upset.

To shed light on how uncertainty effects both election results and forecasts, we explore a few simple approaches forincluding randomness in the form of additive white noise. In particular, we compare the impact of uncorrelated noisewith the effect of additive noise correlated on a few sample demographics; specifically, we consider the fractions ofBlack, Hispanic, and college-educated individuals in a population. (See Materials and Methods for details). We corre-

7

Page 8: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

late on these demographics simply because these data are readily available; future work should incorporate additionaldemographic data.

We account for uncertainty by tweaking the voter populations in each state at every time step by a small randomnumber; when noise is uncorrelated, we pick these small numbers independently. By contrast, correlating noise impliesthat the numbers are related in similar states; for example, we can randomly tweak a set of states with a similar feature(e.g., high Hispanic population) in the same direction (e.g., up one percentage point for one candidate) at one time.Simulating a large number (e.g., we use 10,000) elections then results in a distribution of possible outcomes and allowsus to quantify our forecast uncertainty. To illustrate the impact of these different means of accounting for uncertainty,we show the distributions that result from these methods for the 2016 U.S. presidential election in Fig. 3.

A B

0 50 100 150 200 250 300 350 400 450 500Trump's electoral votes with correlated noise

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Prob

abilit

y

0 50 100 150 200 250 300 350 400 450 500Trump's electoral votes with uncorrelated noise

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Prob

abilit

y

0 50 100 150 200 250 300 350 400 450 500Trump's electoral votes

50 100 150 200 250 300 350 400 450 500Trump's electoral votes

Trump wins Chance: 4.56% Trump wins

Chance: 20.69%

Figure 3: Impact of incorporating uncertainty in different ways, as demonstrated by our model simulations for the 2016U.S. presidential election. (A) Uncorrelated additive white noise gives Trump less than a 5% chance of winning theelectoral college, whereas (B) correlating noise based on state demographics increases his chance to about 21%. Notethat these simulations are based on Eqns. [6–8] with σ = 0.0015; and we simulate 10,000 elections in each case. (SeeMaterials and Methods for additional details about our correlated noise.)

Our initial analysis illustrates how accounting for uncertainty in these different ways strongly influences electionforecasts, echoing points raised by Silver and his team [58]. In Fig. 3, we demonstrate that uncorrelated noise, which canmodel uncertainty in a single state or a poll without assuming a larger systematic (e.g., country-wide) polling problem,results in a very low likelihood of a Republican win in 2016. By contrast, correlating outcomes by a few demographics,which can model systematic polling errors (e.g., due to misidentifying likely voters) in similar states, increases DonaldTrump’s chances by a factor of about four. This agrees with Silver’s comment [58] that failing to account for correlatederror tends to result in underestimations of a trailing candidate’s chances.

Note that we are not attempting to measure or account directly for errors in polling data. Instead, we take the simpleapproach of assuming that we can incorporate all sources of uncertainty as an additive noise term in our model [6–8].There has been extensive work on quantifying uncertainty (see [51]), and exploring alternative, more subtle ways ofmeasuring and accounting for uncertainty is an important future direction for research on forecasting complex systems.

2018 Senate and Governor Forecasts

Because our model has similar success as the popular pollsters in its forecasts of the 2012 and 2016 U.S. elections, wenow use it to forecast the upcoming 2018 senatorial and gubernatorial races. For the remainder of the paper, note thatwe are simulating the SDE version of our model [6–8] that accounts for uncertainty by correlating noise on education,ethnicity, and race demographics within states. With one exception (see Figures 5B–C), we base these forecasts on10,000 runs of this model. This allows us to comment on all three of the goals that we mentioned earlier: forecastingstate vote shares (see Fig. 4), calling the winning party for each state (see Figures 5B–C), and providing estimates forthe likelihood of these and alternative outcomes (see Fig. 5).

As we show in Fig. 4, our forecast Republican and Democrat vote shares are reasonably similar to those of FiveThir-tyEight.com [12]. In particular, their mean results fall within the range that includes 80% of our simulated elections.

8

Page 9: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

2018 Senate Forecast: 35 Seats Contested

+30 +20 +10 0 +10 +20

WI

WV

TX

TN

OH

ND

NJ

NV

MT

MO

MN special

IN

FL

AZ

Safe Blue

Safe Red

Model (Rep. win)!Model (Dem.)!538* (Rep. win)!538* (Dem. win)!80% of results

Forecasted percent margin +30

Figure 4: Our forecasts (dark red and blue) based on polling data [10] through 24 October 2018 compared to those ofFiveThirtyEight.com [12] (light red and blue) on *30 October 2018 for the 2018 senatorial races (the FiveThirtyEightforecasts shown are from their “classic” model and were obtained on 30 October at 1:06 PM Eastern Time). Thisplot shows the forecasted percent lead by the Democrat (bars extending to the left) or Republican (bars extendingto the right) candidate. The yellow bars indicate the region in which 80% or our simulated results fall. “Safe Red”describes the mean forecast behavior of the MS, MS special, NV, UT, and WY races; and “Safe Blue” gives the meanforecast behavior of CT, DE, HI, ME, MD, MA, MI, MN, NM, NY, PA, RI, VT, VA, and WA. (We categorize thesestates as safe based on the forecasts of Sabato’s Crystal Ball [13], 270toWin.com [14], the New York Times [15], andRealClearPolitics.com [10] on 28 August 2018.)

We differ most strikingly in our forecasts for Florida and Tennessee: FiveThirtyEight.com [12] lists Florida as leaningDemocrat, but our model (based on data through 24 October 2018) sees this race as a toss-up with a weak Republicanlean (see Fig. 5B). Moreover, the most recent forecast by our model (based on data through 3 November 2018) placesFlorida just over the line into the “Lean Republican” category. Tennessee, by contrast, is a likely red state according toFiveThirtyEight.com, but we view it as a toss-up state with a slightly higher chance to go blue. The result is that bothFiveThirtyEight.com ’s classic forecast [12] (accessed on 1 November 2018) and our model project the same expectedtotal outcome: a 52/48 Republican/Democrat senatorial breakdown after the midterm elections.

In Figures 6 and 7, we show our forecasts for the gubernatorial elections that will occur on 6 November 2018, andwe compare them with those of several popular pollsters. We highlight that both our model and FiveThirtyEight.com[17] identify a relatively large number of “toss-up” states, with nearly 50–50 chances for Republican or Democratgovernors, despite the fact that both approaches use polling data that extends until just a few days before election day.As we show in Fig. 7, we differ most with the popular pollsters in our categorization of Iowa and Ohio. AlthoughFiveThirtyEight.com [17], Sabato’s Crystal Ball [18], the Cook Political Report [19], and Nathan Gonzales’ InsideElections [20] all report Iowa and Ohio as toss-up states (or “Tilt Democrat”, a weaker tendency than “Lean Democrat”,for Iowa in the case of Inside Elections), our model forecasts that both of these states lean Democrat. This is particularlystriking in Ohio, where we assign the Democrat candidate a 74.5% chance of winning. Of course, these forecast areprobabilistic by nature, and it is important to keep this in mind when interpreting our forecasts and those of popularpollsters.

9

Page 10: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

Republicans maintain control:!76.96% chance

Democrats !gain control:!

23.04% chance!

80% of simulated elections!fall in this range

Expected outcome!R: 52 seats (+1)!D: 48 seats (−1)

Senate seats by party

45 47 49 51 53 55 57 5955 53 51 49 47 45 43 41

R:!D:

3%

6%

9%

12%

15%

Perc

ent c

hanc

e fo

r thi

s ou

tcom

e

0%61 6339 37

Solid Rep. (≧ 95%)Likely Rep. (≧ 75%)Lean Rep. (≧ 60%)Toss-Up ( 60%)Lean Dem. (≧ 60%)Likely Dem. (≧ 75%)Solid Dem. (≧ 95%)

A

C

Hawaii

*

*

Wisconsin

West Virginia

Texas

Tennessee

Ohio

North Dakota

New Jersey

Nevada

Montana

Missouri

Minnesota special

Indiana

Florida

Arizona

Sabato!updated Oct. 11

538!updated Nov. 4

Model!updated Oct. 24

59.0%! !

66.7%! !

70.0%! !

91.4%! !

63.2%! !

87.6%! !

53.5%! !

92.0%! !

75.5%! !

96.2%! !

80.7%! !

78.0%! !

86.8%!!

97.7%

62.5%! !

57.9%! !

73.5%! !

95.3%! !

58.0%! !

83.3%! !

51.3%! !

75.6%! !

92.0%! !

97.4%! !

55.8%! !

87.8%! !

96.9%!!

96.0%

B Model*!updated Nov. 3

64.0%! !

61.7%! !

76.3%! !

94.6%! !

56.3%! !

82.7%! !

54.8%! !

71.4%! !

87.6%! !

99.1%! !

56.7%! !

87.9%! !

94.1%!!

95.3%

Cook!updated Oct. 26

Figure 5: Forecast for 2018 senatorial races: probability of red (Republican) and blue (Democrat) outcomes. (A)Forecast result for Senate composition based on our model (on 24 October 2018) using noise correlated on educa-tion, ethnicity, and race demographics to account for uncertainty. Our model suggests that the most likely event is aDemocrat win in the Senate, most of our simulations lead to Republican-dominated outcomes, and we forecast a 52/48Republican/Democrat split on average. (B) Categorization of states as solid, likely, lean, or toss-up. It is importantto note that our most recent model forecast for the senate races using polling data [10] through 3 November 2018 —the rightmost column* — is based on 4,000 simulated elections instead of the usual 10,000 and uses a time step forparameter fitting that is five times larger than we use in our other forecasts. (We only categorize states that we have notalready designated as “Safe Red” or “Safe Blue”; see the caption of Fig. 4 for details.) Both our model and the approachof FiveThirtyEight.com [12] provide precise numbers to quantify uncertainty, whereas Sabato’s Crystal Ball [13] andthe Cook Political Report [16] do not report such values. For our model and the approach of FiveThirtyEight.com, thenumber that we show indicates the chance of a Democrat winning the respective state’s senatorial race if it is in a bluebox and the chance of Republican winning if it is in a red box. For toss-up states (gray), the number that we showis red (respectively, blue) if it corresponds to a Republican’s (respectively, Democrat’s) chance of winning. FiveThir-tyEight.com’s values are based on the “classic” version of their model last updated at 1:19 PM Eastern Time on 4November 2018. (C) Map of the state ratings by party as forecasted by our model using data through 3 November (notethat this forecast uses fewer simulated elections than our other model forecasts; see Materials and Methods for details).

Conclusions and Discussion

We developed a simple method for forecasting elections using compartmental models from epidemiology, and weillustrated the promise of such a dynamical-systems approach by applying it to the U.S. races for president, senate,

10

Page 11: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

2018 Gubernatorial Forecast: Swing States

+20 +10 0 +10 +20

WI

SD

OR

OK

OH

NV

ME

KS

IA

GA

FL

CT

AK

Forecasted percent margin

Model (Rep. win)!Model (Dem.)!538* (Rep. win)!538* (Dem. win)!80% of results

Figure 6: Our model forecasts (dark red and blue) based on polling data collected from RealClearPolitics.com [10]through 3 November 2018 versus those of FiveThirtyEight.com [17] (light red and blue) on 4 November 2018. (Themargins that we reproduced from their website are from the version that was last updated at 4:50 PM Eastern Timeon 4 November 2018.) We agree on the overall party result for each state, except Nevada, which both approaches seeas having a close governor race. The yellow scale bars indicate the region with 80% of our simulated elections. Tocompare this result with FiveThirtyEight.com’s 80% span of likely outcomes, see their website [17]. In this plot, weshow the expected margins for swing states only; our model forecasts that the mean margin in the Safe Red super state(i.e., AL, AZ, AR, ID, MD, MA, NE, NH, SC, TN, TX, VT, and WY) will be +18.5 points for Republican candidatesand that the mean margin in the Safe Blue super state (i.e., CA, CO, HI, IL, MI, MN, NM, NY, PA, and RI) will be+15.6 points for Democrats.

and governor in 2012 and 2016. In our modeling, we persistently chose the simplest, most naive option to avoidincorporating further complexity into our forecasting methodology. Although our model is simple and the spread ofcontagions differs from voting dynamics, we were able to achieve similar success rates as popular pollsters for thesepast elections. This heartening result suggests that our novel approach to forecasting elections is a promising avenuefor future research, and we offered our forecast of the 2018 senatorial and gubernatorial elections as a test of our model.We consider simplicity to be a virtue in this first model, as part of our goal is to help demystify the election forecastingprocess, highlight future research directions on the forecasting of elections (and other complex systems), and motivatecommunity members to engage more actively with pollster interpretations and raise further questions themselves.

There are numerous ways to build further on our basic modeling approach. For example, it will be useful to be morecareful about how to handle undecided and minor-party voters. In our model, we tracked the fraction of the populationthat is undecided, Republican, and Democrat within each state. This raises the question of how to interpret the presenceof undecided (i.e., susceptible) voters on the day of an election. FiveThirtyEight.com [58] assigns a voting opinion(mostly to one of the major parties) to any undecided voters who remain on election day. By contrast, we assumed thatall undecided voters are either minor-party voters or simply do not vote. The Huffington Post factors undecided votersinto the uncertainty in an election year [43]. Using the fraction of undecided voters to inform our choice of how muchuncertainty to include in our model is an interesting and important future direction to pursue.

We made the simplifying assumption that all polls are equally accurate (e.g., we did not consider the time to electionand pollster-reported error), and we did not distinguish between partisan and non-partisan polls or between polls of likelyvoters, registered voters, and adults. By contrast, FiveThirtyEight.com [58] relies on careful measures [21] of polling-firm accuracy to weight polls in different ways, and they adjust polls of registered voters and adults to frame all of theirdata in terms of likely voters. Using FiveThirtyEight.com’s pollster ratings [21] to weight polls in our model wouldallow us to comment more broadly on how various subjective choices made by pollsters effect their final forecasts.Similarly, future work can compare the influence of noise correlated based on demographics with the effects of noise

11

Page 12: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

2018 Gubernatorial Forecast BA

Alaska

Hawaii

Wisconsin

South Dakota

Oregon

Oklahoma

Ohio

Nevada

Maine

Kansas

Iowa

Georgia

Florida

Connecticut

Alaska

Cook!updated Oct. 26

Sabato!updated Oct. 25

538!updated Nov. 4

Model!updated Nov. 3

89.9%! !

95.5%! !

53.5%! !

52.8%! !

67.9%! !

52.5%! !

82.7%! !

54.0%! !

74.5%! !

55.5%! !

60.2%! !

60.2%! !

68.3%!

Inside Elections!updated Nov. 1

Tilt! !! !Tilt! !Tilt!!Tilt !!!!Tilt !!Tilt !!! ! !!!Tilt! !Tilt !

70.2%! !

78.6%! !

75.7%! !

59.2%! !

52.1%! !

58.2%! !

94.4%! !

55.0%! !

55.2%! !

86.2%! !

81.4%! !

83.1%! !

60.6%!

Solid Rep. (≧ 95%)Likely Rep. (≧ 75%)Lean Rep. (≧ 60%)

Toss-Up ( 60%)Lean Dem. (≧ 60%)Likely Dem. (≧ 75%)Solid Dem. (≧ 95%)

Figure 7: 2018 governor forecasts. (A) Illustration of state gubernatorial ratings based on our model using data fromRealClearPolitics.com [10] through 3 November 2018. (On the map, we show only states that are holding gubernatorialelections.) (B) Comparison of our categorizations of states as solid, likely, lean, or toss-up with those of popularpollsters. (Note that we only categorize states that we have not already assigned to the Safe Red or Safe Blue superstates; see the caption of Fig. 6 for details.) The forecasts that we reproduce from FiveThirtyEight.com [17] are fortheir “classic” version, which was last updated at 1:19 PM Eastern Time on 4 November 2018. We obtained theforecasts that we reproduce from Sabato’s Crystal Ball [18], the Cook Political Report [19], and Nathan Gonzales’Inside Elections [20] from their websites on 4 November. They were last updated online on 26 October, 26 October,and 1 November, respectively. Inside Elections [20] further breaks down their state ratings to include the “Tilt” category,which appears between “Toss-Up” and “Lean”, so we also include these notes in our table.

correlated based on the last 80 years of state voting history (which is the technique used by The Huffington Post [43]).Rather than showing simulated timelines of political opinion evolution in the year leading up to an election, we

focused on presenting eve-of-election-day forecasts. Moreover, when fitting our parameters, we averaged poll results[9, 10] by month, producing a single polling data point per 30 days. This technique throws away daily fluctuations inopinions and smooths out interesting dynamics like “convention bounce”. (As described by FiveThirtyEight.com [58],candidates often receive a short spike in support after their party’s convention.) With a finer view on daily poll results,one can incorporate the effects of conventions or other time-specific events (e.g., a large rally or a new story abouta candidate in the media) into our model. Indeed, the field of dynamical systems provides an appropriate means forexploring the time evolution of political opinions, as well as their interplay with external forces — such as the media,rallies, or conventions — that may impact the system.

Lastly, our compartmental-model approach allowed us to obtain parameters related to the strength of interactionsbetween states (the β parameters in Eqns. [3–5]) and measurements of voter turnover by state (the γ parameters inEqns. [3–5]) for each election year and race. By comparing these parameters across various years and different typesof elections, one can help identify blocks of states that persistently influence each other, analyze which states have themost plastic voter populations, and suggest key differences in the political dynamics in presidential, senatorial, andgubernatorial races. More importantly, this and other future research directions can provide insight into how thesequantities evolve across years, allowing researchers to suggest ways that the U.S. electorate may be changing in timethat can potentially be useful for future forecasts.

12

Page 13: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

Materials and Methods

Data Sets

We obtained publicly-available polling data for the 2012 and 2016 elections from HuffPost Pollster [9] using theirPollster API v2 [22]. HuffPost Pollster [9] provides both state-level and national-level polls for these elections. Becauseour model tracks the evolution of political opinions within individual states, we chose to use only state-level pollingdata to fit our parameters. State-level polling data for the 2018 midterms is not currently available from HuffPostPollster [9], so we collected data for the 2018 governor and senate elections by hand from RealClearPolitics.com [10].For all simulated elections, we include only polls in the 330 days leading up to each election in our data set. In particular,for elections in November, we include polls between January and November. We use 2012, 2016, and 2017 estimatesof voting-age populations by state from the Federal Register [29, 30, 30] to specify N and N i in Eqns. [3–8]. (We use2017 in place of 2018 because this year’s data is not available yet.) To correlate noise in Eqns. [6–8], we used ethnicityand race demographic data from the U.S. Census Bureau [23] and data on state education levels presented in a study on247WallSt.com [24]. While these data sources are for 2016, we used them for our 2018 simulations as well.

Selecting “Safe Red” and “Safe Blue” Super States

To simplify our model and because polling data is often sparser for states that are viewed as having a reliable leantowards a specific party, we focus on forecasting the elections in swing states and treat all reliably Red and Bluestates together as two ”super state” conglomerates. (Note that we do not specify that these super states actually voteRepublican and Democrat, respectively; rather, that is an output of our model.) This raises the question of how toidentify states as “safe” or “swing”, and we do this differently for different elections. For the 2012 and 2016 presidentialraces, we define our swing states as those identified by FiveThirtyEight.com as “traditional swing states” [25]; these areCO, FL, IA, MI, MN, NV, NH, NC, OH, PA, VA, and WI (see Fig. 1D). Thus, M = 14 in Eqns. [3–5] for the 2012 and2016 presidential elections. Additionally, {S1, I1D, I

1R} and {S2, I2D, I

2R} refer to the voters in the Red and Blue super

states, respectively, whereas {Si, IiD, IiR} with i = 3, 4, ..., 14 are the voter fractions in the 12 swing states.To define our Safe Red and Safe Blue super states in senate and governor races, we average the forecasts of popular

pollsters. For the 2018 senate race, we combine the August 2018 ratings of Sabato’s Crystal Ball [13], 270toWin.com(the consensus version) [14], and the New York Times Upshot [15], leading to 14 swing states. Our 2018 governor racecategorizations are means of the ratings of FiveThirtyEight.com [17], the Cook Political Report [19], Sabato’s CrystalBall [18], and Nathan Gonzales’ Inside Elections [20]. We base our categorizations for senate races in 2012 on theratings of Sabato’s Crystal Ball [26] and the New York Times [27]. We obtain categorizations for our 2016 senate modelby averaging the ratings of 270toWin.com, Sabato’s Crystal Ball, and The Huffington Post. Because there were notmany governor races in 2012 and 2016, we treat every state separately for these elections.

Additional Modeling Details

The dynamics of recovery and transmission in the traditional susceptible–infected–susceptible (SIS) model are givenby the following ordinary differential equations:

dS

dt= γI − β[SI] ,

dI

dt= −γI + β[SI] ,

where S(t) and I(t), respectively, are the mean numbers of susceptible and infected individuals in a population at timet and [SI] is the mean number of connections between infected and susceptible individuals. This system is not closed,so we approximate [49] the number of connections as [SI] ≈ S ·n · I/N , where N is the total number of individuals ina population and n is the mean number of connections per person. If we define β = βn, we obtain the following closed

13

Page 14: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

system:

dS

dt= γI − βSI/N ,

dI

dt= −γI + βSI/N .

Writing this system in terms of the population fractions S(t) = S(t)/N and I(t) = I(t)/N and dividing by N yieldsEqns. [1–2].

A similar process yields our two-pronged SIS model [3–5] for election forecasting. The key difference is howwe define [SiIjD] and [SiIjR]. We approximate the number of connections between undecided voters in state i andDemocrats and Republicans, respectively, in state j as

[SiIjD] = Si · n ·N j/N · IjD/Nj ,

[SiIjR] = Si · n ·N j/N · IjR/Nj .

We thereby estimate the number of interactions between undecided voters in state i and Democrats in state j, forexample, as the mean number of interactions that involve an undecided voter in state i multiplied by the probabilitythe interaction is with someone in state j multiplied by the probability that someone in state j is a Democrat. Inmaking these approximations, we are assuming that an undecided voter is equally likely to interact with a Republicanor Democrat across the U.S. In particular, we do not assume that interactions are more likely between individuals inthe same state or between those in neighboring states. We assume that the number of interactions between individualsin different states depends only on the size of the states and on the number of Republicans, Democrats, and undecidedvoters currently therein.

For our investigation of correlated polling uncertainty in Figures 3B, 4, 5, and 7, we include correlated noise inEqns. [6–8]. We quantify the similarity of two states for a given population demographic using the Jaccard index:J = min(Di, Dj)/max(Di, Dj), where Di is the population fraction of the given demographic in state i and Dj

is the population fraction for that demographic in state j. The Jaccard index indicates the covariance for our Wienerprocesses W i

R and W iS . As we discussed earlier (see the subsection “Accounting for and Interpreting Uncertainty”), we

consider three sample demographics. In our calculations, JB denotes the Jaccard index calculated using the fractionof Black individuals in the state populations, JE is the Jaccard index for the fractions of individuals without a collegeeducation in the given two states, and JH is the Jaccard index based on the fractions of Hispanic individuals in a state.For our forecasts with correlated noise, we simulate our model [6–8] 10,000 times (with the exception of our 2018senate forecast based on data through 3 November 2018; this forecast is based on 4,000 simulated elections); for eachsimulated election, we select one Jaccard index uniformly at random among JB , JE , and JH to use as the covariancefor our noise.

Parameter Fitting

To fit model parameters for a given election, we first format the public polling data we have collected for the 330 daysleading up to that election. Because some states are polled much more frequently than others, we reduce the number ofpolling data points used to fit the model to 11 per state (or super state). This is done by binning the polls for each statein 30 day increments (extending backward from election day) and then averaging within these bins to arrive at 11 datapoints per state (or super state). If a given state has no polls for one of the 30 day increments, we approximate it bylinear interpolation. In many cases, there are no polls for a state in the early months of the year, and we account for thisby setting all missing early data points for that state to its earliest available polling result. We follow the same approachto arrive at 11 data points each for the Safe Red and Safe Blue super states. Note that we do not weight the polls used tocompute these average data points for super states by state sizes; this represents a place our model could be improved.

When computing the 11 data points for a state, we include all polling data available from HuffPost Pollster [22] (for2012 and 2016) and RealClearPolitics [10] (for 2018). We do not leave out partisan polls or weight better pollsters moreheavily (as does FiveThirtyEight.com [58]). Moreover, we do not treat polls differently in any way if they are polls oflikely voters, registered voters, or adults.

14

Page 15: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

The result of this process is 11 data points per state (or super state) that provide the percentage of Democrat,Republican, and undecided (or minor-party) votes within that state (e.g., 33 numbers in all). We use these compartment-wise numbers to fit the model in Eqns. [3–5] to the observed time-course polling data. To describe the fitting procedure,consider the vector consisting of the observed compartment-wise polling numbers at time ti, scaled by the state (orsuper state) total size and stacked in the “concentration” vector C(ti). For instance, for the presidential election, thevector C(ti) is the 3 × 14 = 42 dimensional vector holding all of the states’ (or super states’) compartment-specificempirical proportions. Further, consider the corresponding solution, cβ,γ(ti), to the the system in Eqns. [3–5], indexedby the unknown parameters (β,γ), with dependence on the initial condition suppressed. The parameter vector (β,γ)contains the various transmission and recovery parameters. The parameter estimates (β, γ) are found via the least-squares criteria; that is, (β, γ) = argmin

(β,γ)

∑i ‖C(ti) − cβ,γ(ti)‖22 . Consistency and weak convergence to Gaussian

laws of these estimators, given that the data are from a density dependent Markov jump process, has been shown in [57].

Numerical Implementation

We implement parameter fitting using R (version 3.4.2) [56]. Specifically, we use the OPTIM routine to perform con-strained optimization of the least-squares objective function, subject to non-negative rate constraints [36] with timestep ∆t = 0.1 month. This time step is used in all cases except for our 2018 senate forecast based on data through 3November 2018 (for this forecast we use ∆t = 0.5 month). We simulate our differential equations in MATLAB (VER-SION 9.3), THE MATHWORKS, INC., NATICK, MA, USA. We solve Eqns. [3–5] using a forward Euler scheme andEqns. [6–8] with the Euler–Maruyama method [41]. We utilize the fact that Si + IiR + IiD = 1 to reduce our model totwo equations per state (or super state) when solving it numerically. The time step in both cases is ∆t = 0.1 day, andwe simulate from January 1 up until election day. (As a simplification, we assume that each month has 30 days; thus,for example, we simulate the 2018 races for 306 days.) The noise strength that we use in Eqns. [6–8] is σ = 0.0015 forall stochastic simulations (namely, the results in Figures 3, 4, 5, and 7). (We choose this value so that our 80% rangesroughly have a similar length as those provided in presidential forecasts by FiveThirtyEight.com [11].)

Other Notes

Working with election data is occasionally messy, and we comment on a few special cases in our work.

(1) Polling data from HuffPost [9] was not available for the Delaware and West Virginia 2012 governor races, so wedo not provide forecasts for these states.

(2) In some cases, a state race features two candidates from the same party. (For example, California has twoDemocrats running for Senator in 2018.) Because our model assumes a race of a Democrat facing a Republican,we do not use polling data from states with a single-party race.

(3) In cases where an independent is running in place of a Democrat (e.g., the Vermont 2018 senatorial race), wetreat the independent as if they were Democrat for modeling and parameter fitting. (For the 2012 senate race inVermont, however, our simulations in Fig. 2B leave this state out.)

(4) We focus on polling data that compares two candidates. In particular, we do not include polls for which the dataare reported with a third candidate or races in which there are three candidates who each get reasonably largeshares of the vote. The polls that we found from RealClearPolitics.com [10] for New Mexico’s 2018 senate racewere for three candidates, so we do not include New Mexico’s polls in our averaged data points for the Safe Bluesuper state for this election. We also do not forecast the Maine 2012 senate race because it had three popularcandidates.

Acknowledgments The research of A.V. and D.F.L was supported by the Mathematical Biosciences Institute and theNSF under grant no. DMS-1440386.

15

Page 16: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

Contributions A.V., D.F.L., M.A.P., and G.A.R. designed the project; A.V. collected data; D.F.L. designed and wrotethe the algorithm for fitting parameters; A.V. and D.F.L. ran parameter estimates; A.V. carried out model simulationsand drafted the manuscript; all authors iterated on interpreting the results and refining the model and methodology; allauthors contributed to revisions and gave final approval for publication.

Competing interests The authors declare no conflict of interest.

References

[1] https://fivethirtyeight.com/politics/. FiveThirtyEight: Politics. Accessed: 2018-09-19.

[2] https://www.huffingtonpost.com/topic/election-forecasts. Huffington Post: ElectionForecasts. Accessed: 2018-10-02.

[3] http://crystalball.centerforpolitics.org/crystalball/. Sabato’s Crystal Ball (Universityof Virginia Center for Politics). Accessed: 2018-09-19.

[4] http://graphics.latimes.com/usc-presidential-poll-dashboard/. USC Dornsife/LosAngeles Times Daybreak Poll: Where the presidential race stands today. By Armand Emamadjomeh and DavidLauter. Accessed: 2018-09-19.

[5] https://www.270towin.com. 270toWin. Accessed: 2018-10-31.

[6] https://www.cookpolitical.com. The Cook Political Report. Accessed: 2018-10-31.

[7] https://www.nytimes.com/interactive/2018/upshot/elections-polls.html. The NewYork Times: Polling in Real Time: The 2018 Midterm Elections. Accessed: 2018-11-01.

[8] https://www.cookpolitical.com/resources. The Cook Political Report: Resources. Accessed:2018-09-19.

[9] https://elections.huffingtonpost.com/pollster. HuffPost Pollster. Last accessed: 2018-11-02.

[10] https://www.realclearpolitics.com/epolls/latest_polls/elections/. RealClear-Politcs: Polls. Last accessed: 2018-11-02.

[11] https://projects.fivethirtyeight.com/2016-election-forecast/. FiveThirtyEight:2018 Election Forecast. Who will win the presidency? Models by Nate Silver. Research by Jennifer Kanjana andDhrumil Mehta. Design plus development by Jay Boice, Aaron Bycoffe, Matthew Conlen, Reuben Fischer-Baum,Ritchie King, Ella Koeze, Allison McCann, Andrei Scheinkman, and Gus Wezerek. Last accessed: 2018-11-02.

[12] https://projects.fivethirtyeight.com/2018-midterm-election-forecast/senate/. FiveThirtyEight: Forecasting the race for the Senate. Forecast models by Nate Silver. Designand development by Jay Boice, Emma Brillhart, Aaron Bycoffe, Rachael Dottle, Lauren Eastridge, Ritchie King,Ella Koeze, Andrei Scheinkman, Gus Wezerek, Julia Wolfe. Research by Dustin Dienhart, Andrea Jones-Rooy,Dhrumil Mehta, Mai Nguyen, Nathaniel Rakich, Derek Shan, and Geoffrey Skelley. Accessed: 2018-10-30 and2018-11-04.

[13] http://www.centerforpolitics.org/crystalball/2018-senate/. Sabato’s Crystal Ball(University of Virginia Center for Politics): 2018 Senate. Last accessed: 2018-11-04.

[14] https://www.270towin.com/2018-senate-election/. 270toWin.com: 2018 Senate Election In-teractive Map. Last accessed: 2018-11-03.

16

Page 17: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

[15] http://www.nytimes.com/newsgraphics/2014/senate-model/index.html. The New YorkTimes Upshot: Senate Forecasts. Who Will Win The Senate? Last accessed: 2018-11-03.

[16] https://www.cookpolitical.com/ratings/senate-race-ratings. The Cook Political Re-port: 2018 Senate Race Ratings. Last accessed: 2018-11-03.

[17] https://projects.fivethirtyeight.com/2018-midterm-election-forecast/governor/. FiveThirtyEight.com: Forecasting the races for governor. Models by Nate Silver. Designplus development by Jay Boice, Aaron Bycoffe, Rachael Dottle, Ritchie King, Ella Koeze, Andrei Scheinkman,Gus Wezerek and Julia Wolfe. Research by Dustin Dienhart, Andrea Jones-Rooy, Dhrumil Mehta, Mai Nguyen,Nathaniel Rakich, Derek Shan and Geoffrey Skelley. Last accessed: 2018-11-04.

[18] http://www.centerforpolitics.org/crystalball/2018-governor/. Sabato’s Crystal Ball(University of Virginia Center for Politics): 2018 Governor. Last accessed: 2018-11-04.

[19] https://www.cookpolitical.com/ratings/governor-race-ratings. The Cook Political Re-port: 2018 Governor Race Ratings. Last accessed: 2018-11-04.

[20] https://insideelections.com/ratings/governor. Inside Elections with Nathan L. Gonzales,Nonpartisan Analysis: Gubernatorial Ratings. Work by Nathan L. Gonzales, Leah Askarinam, Ryan Matsumoto,Robert Yoon, and Stuart Rothenberg. Last accessed: 2018-11-03.

[21] https://projects.fivethirtyeight.com/pollster-ratings/. FiveThirtyEight: FiveThir-tyEight’s Pollster Ratings, Based on the historical accuracy and methodology of each firm’s polls. Model byNate Silver. Research contributed by Andrea Jones-Rooy and Dhrumil Mehta. Accessed: 2018-11-04.

[22] https://elections.huffingtonpost.com/pollster/api/v2. HuffPost Pollster API v2.Data provided under a CC BY-NC-SA 3.0 license (https://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US). Last accessed: 2018-11-02.

[23] https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=PEP_2015_PEPSR6H&prodType=table. United States Census Bureau, Population Divi-sion. Annual Estimates of the Resident Population by Sex, Race, and Hispanic Origin for the United States,States, and Counties: April 1, 2010 to July 1, 2016 more information 2016 Population Estimates. Release Date:June 2017.

[24] https://247wallst.com/special-report/2016/09/16/americas-most-and-least-educated-states-a-survey-of-all-50/2/. 247WallSt.com: America’s Most and Least Educated States: A Survey of All 50. By Evan Comen, ThomasC. Frohlich and Michael B. Sauter September 16, 2016. Last accessed: 2018-11-02.

[25] https://fivethirtyeight.com/features/the-odds-of-an-electoral-college-popular-vote-split-are-increasing/.FiveThirtyEight: Politics. The Odds Of An Electoral College-Popular Vote Split Are Increasing. By Nate Silver.October 31, 2016. Last accessed: 2018-11-02.

[26] http://crystalball.centerforpolitics.org/crystalball/articles/projection-obama-will-likely-win-second-term/. Sabato’s Crystal Ball (University ofVirginia Center for Politics): Projection: Obama Will Likely Win Second Term. By Larry J. Sabato, Kyle Kondik,and Geoffrey Skelley. Last accessed: 2018-11-03.

[27] https://www.nytimes.com/elections/2012/results/senate.html. The New York Times:Election 2012. Senate Map. Accessed: 2018-11-03.

[28] Estimates of the Voting Age Population for 2012. A Notice by the Commerce Department on 1/30/2013. Agency:Office of the Secretary, Commerce. Federal Register Notices, 78(20), 2013.

[29] Estimates of the Voting Age Population for 2016. A Notice by the Commerce Department on 1/30/2017. Agency:Office of the Secretary, Commerce. Federal Register Notices, 82(18), 2017.

17

Page 18: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

[30] Estimates of the Voting Age Population for 2017. A Notice by the Commerce Department on 2/20/2018. Agency:Office of the Secretary, Commerce. Federal Register Notices, 83(34), 2018.

[31] Linda J.S. Allen, Michel Langlais, and Carleton J. Phillips. The dynamics of two viral infections in a single hostpopulation with applications to hantavirus. Mathematical Biosciences, 186(2):191–217, 2003.

[32] Michael J. Berry and Kenneth N. Bickers. Forecasting the 2012 presidential election with state-level economicindicators. PS: Political Science and Politics, 45(4):669–674, 2012.

[33] Luıs M A Bettencourt, Ariel Cintron-Arias, David I Kaiser, and Carlos Castillo-Chavez. The power of a good idea:Quantitative modeling of the spread of ideas from epidemiological models. Physica A: Statistical Mechanics andits Applications, 364:513–536, 2006.

[34] Laurent Bonnasse-Gahot, Henri Berestycki, Marie-Aude Depuiset, Mirta B Gordon, S Roche, Nancy Rodriguez,and Jean-Pierre Nadal. Epidemiological modelling of the 2005 French riots: a spreading wave and the role ofcontagion. Scientific Reports, 8(107), 2018.

[35] F. Brauer and C Castillo-Chavez. Mathematical Models in Population Biology and Epidemiology. Springer Verlag,2nd edition, 2012.

[36] Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrainedoptimization. SIAM Journal on Scientific Computing, 16(5):1190–1208, 1995.

[37] Brian J. Coburn, Bradley G. Wagner, and Sally Blower. Modeling influenza epidemics and pandemics: insightsinto the future of swine flu (H1N1). BMC Medicine, 7(1), 2009.

[38] Odo Diekmann and J A P Heesterbeek. Mathematical Epidemiology of Infectious Diseases: Model Building,Analysis and Interpretation. Wiley, New York, 2000.

[39] Porco Travis C Gao, Daozhou and Shigui Ruan. Coinfection dynamics of two diseases in a single host population.Journal of Mathematical Analysis and Applications, 442:171–188, 2016.

[40] Herbert W Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599–653, 2000.

[41] Desmond J Higham. An algorithmic introduction to numerical simulation of stochastic differential equations.SIAM Review, 43(3):525–546, 2001.

[42] Patrick Hummel and David Rothschild. Fundamental models for forecasting elections at the state level. ElectoralStudies, 35:123–139, 2014.

[43] Natalie Jackson and Adam Hooper. http://elections.huffingtonpost.com/2016/forecast/president. Huffington Post Election 2016 Forecast: President. Additional design by Alissa Scheller. Lastaccessed: 2018-10-31.

[44] Will Jennings and Christopher Wlezien. Election polling errors across time and space. Nature Human Behaviour,2:276–283, 2018.

[45] Bruno Jerome and Veronique Jerome-Speziari. Forecasting the 2012 US presidential election: Lessons from astate-by-state political economy model. PS: Political Science and Politics, 45(4):663–668, 2012.

[46] William O Kermack and Anderson G McKendrick. A contribution to the mathematical theory of epidemics.Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 115(772):700–721, 1927.

[47] William O Kermack and Anderson G McKendrick. Contributions to the mathematical theory of epidemics. ii. theproblem of endemicity. Proceedings of the Royal Society of London A: Mathematical, Physical and EngineeringSciences, 138(834):55–83, 1932.

18

Page 19: Forecasting elections using compartmental models of infectionscasting more transparent, we propose a mathematical model of the evolution of political opinions in U.S. elections. We

[48] William O Kermack and Anderson G McKendrick. Contributions to the mathematical theory of epidemics. iii.further studies of the problem of endemicity. Proceedings of the Royal Society of London A: Mathematical,Physical and Engineering Sciences, 141(843):94–122, 1933.

[49] Peter L Kiss, Istvan Z and Miller, Joel C and Simon. Mathematics of Epidemics on Networks: From Exact toApproximate Models. Springer-Verlag, Berlin, Germany, 2017.

[50] Carl Klarner. Forecasting the 2008 U.S. house, senate and presidential elections at the district and state level. PS:Political Science and Politics, 41(4):723–728, 2008.

[51] Kody Law, Andrew Stuart, and Konstantinos Zygalakis. Data Assimilation: A Mathematical Introduction, vol-ume 63 of Texts in Applied Mathematics. Springer, Cham, 2015.

[52] Ahn Yong-Yeol Lehmann Sune. Complex Spreading Phenomena in Social Systems. Springer-Verlag, 2018.

[53] Robert K. McCormack and Linda J. S. Allen. Multi-patch deterministic and stochastic models for wildlife diseases.Journal of Biological Dynamics, 1(1):63–85, 2007.

[54] Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. Epidemic pro-cesses in complex networks. Reviews of Modern Physics, 87(3):925–979, 2015.

[55] Vincent Pons. Will a five-minute discussion change your mind? A countrywide experiment on voter choice inFrance. American Economic Review, 108(6):1322–1363, 2018.

[56] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing,Vienna, Austria, 2018.

[57] Grzegorz A Rempala. Least squares estimation in stochastic biochemical networks. Bulletin of MathematicalBiology, 74(8):1938–1955, 2012.

[58] Nate Silver. https://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/.FiveThirtyEight: A User’s Guide To FiveThirtyEight’s 2016 General Election Forecast. Last accessed: 2018-10-31.

[59] Lin Wang and Brendan C Wood. An epidemiological approach to model the viral propagation of memes. AppliedMathematical Modeling, 35:5442–5447, 2011.

[60] Liping Wang, Hongyong Zhao, Sergio Muniz Oliva, and Huaiping Zhu. Modeling the transmission and control ofZika in Brazil. Scientific Reports, 7(7721), 2017.

[61] Laijun Zhao, Hongxin Cui, Xiaoyan Qiu, Xiaoli Wang, and Jiajia Wang. SIR rumor spreading model in the newmedia age. Physica A: Statistical Mechanics and its Applications, 392:995–1003, 2013.

19