topic-factorized ideal point estimation model for legislative voting network

Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network

Yupeng Gu†, Yizhou Sun†, Ning Jiang‡, Bingyu Wang†, Ting Chen†

†Northeastern University‡University of Illinois at Urbana-Champaign

2 /27

Background Federal Legislation

(bill)Law

The HouseSenate

Ronald Paul

Bill 1 Bill 2 ……

Barack Obama

Ronald Paul

liberal conservative

Politician

Republican Democrat

Barack Obama

United States Congress

The House Senate

3 /27

Outline

• Background and Problem Definition• Topic Factorized Ideal Point Model and Learning Algorithm• Experimental Results• Conclusion

4 /27

Legislative Voting Network

5 /27

Problem Definition

Input: Legislative Network

Output:: Ideal Points for Politician : Ideal Points for Bill

’s on different topics

6 /27

Existing Work

• 1 dimensional ideal point model (Poole and Rosenthal, 1985; Gerrish and Blei, 2011) • High-dimensional ideal point model (Poole and Rosenthal, 1997)• Issue-adjusted ideal point model (Gerrish and Blei, 2012)

7 /27

Motivations

Topic 1Topic 2Topic 3Topic 4

• Voters have different attitudes on different topics.

• Traditional matrix factorization method cannot give the meanings for each dimension.

𝑀 𝑈≈ ⋅𝑉 𝑇

latent factor

• Topics of bills can influence politician’s voting, and the voting behavior can better interpret the topics of bills as well.

Topic Model:• Health• Public Transport• …

Voting-guided Topic Model:• Health Service• Health Expenses• Public Transport• …

8 /27

Outline


9 /27

Topic Factorized IPM

𝑢 𝑑𝑤

PoliticiansBills

Terms

Heterogeneous Voting Network

𝑛(𝑑 ,𝑤)𝑣𝑢𝑑

Entities:• Politicians• Bills• Terms

Links:• (, )• ()

10 /27

Text Part

PoliticiansBills

Terms

11 /27

Text Part

• We model the probability of each word in each document as a mixture of multinomial distributions, as in PLSA (Hofmann, 1999) and LDA (Blei et al., 2003)

𝑑 𝑘 𝑤𝜃𝑑𝑘=𝑝 (𝑘∨𝑑) 𝛽𝑘𝑤=𝑝 (𝑤∨𝑘)

Bill Topic Word

12 /27

Voting Part

PoliticiansBills

Terms

13 /27

Voting Part

YEA𝑝 (𝑣𝑢𝑑=1 )=𝜎 (∑𝑘

𝜃𝑑𝑘 𝑥𝑢𝑘𝑎𝑑𝑘+𝑏𝑑)

𝒙𝑢

𝒂𝑑

Topic1

Topic2

…… Topic Topic……

0 1 -1 1 1 1 1 1

0 0 -1 1 1 1 -1 1

1 1 1 1 -1 1 0 0

𝑢1𝑢2

𝑢𝑁𝑈

𝑑1𝑑2 𝑑𝑁 𝐷

……

……

User-Bill voting matrix

�̂�𝑢𝑑=∑𝑘=1

𝐾

𝑥𝑢𝑘𝑎𝑑𝑘

�̂�𝑢𝑑=∑𝑘=1

𝐾

𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘

𝑝 (𝑣𝑢𝑑=−1 )=1−𝜎 (∑𝑘

𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘+𝑏𝑑)NAY

Voter

Bill

𝑝 (𝑽|𝜽 ,𝑿 ,𝑨 ,𝒃 )= ∏(𝑢 ,𝑑 ) :𝑣𝑢𝑑≠ 0

(𝑝 (𝑣𝑢𝑑=1 )1+𝑣𝑢𝑑2 𝑝 (𝑣𝑢𝑑=−1 )

1−𝑣𝑢𝑑2 ¿)¿

𝑥𝑢 1

𝑥𝑢𝑘

𝑥𝑢𝐾

𝑎𝑑1

𝑎𝑑𝑘

𝑎𝑑𝐾

……

……

𝜃𝑑𝑘

𝜃𝑑1

𝜃𝑑𝐾

𝑥𝑢𝑘∈𝑹

𝑎𝑑 𝑘∈𝑹

𝐼 {𝑣𝑢𝑑=1}𝐼 {𝑣𝑢𝑑=−1 }

14 /27

Combining Two Parts Together• The final objective function is a linear combination of the two average

log-likelihood functions over the word links and voting links.

• We also add an regularization term to and to reduce over-fitting.

s.t.and

0≤ 𝜃𝑑𝑘≤1 ,∑𝑘

𝜃𝑑𝑘=10≤ 𝛽𝑘𝑤≤1 ,∑𝑤

𝛽𝑘𝑤=1

𝐽 (𝜽 ,𝜷 ,𝑿 ,𝑨 ,𝒃 )

¿ (1− 𝜆 )∑𝑑 ,𝑤

𝑛 (𝑑 ,𝑤 ) log ¿¿¿𝐽 (𝜽 ,𝜷 ,𝑿 ,𝑨 ,𝒃 )

15 /27

Learning Algorithm

• An iterative algorithm where ideal points related parameters () and topic model related parameters () enhance each other.• Step 1: Update given

• Gradient descent• Step 2: Update given

• Follow the idea of expectation-maximization (EM) algorithm: maximize a lower bound of the objective function in each iteration

16 /27

Learning Algorithm

• Update : A nonlinear constrained optimization problem. Remove the constraints by a logistic function based transformation:

and update using gradient descent.• Update : Since only appears in the topic model part, we use the same updating rule as in PLSA:

where

𝑒𝜇𝑑𝑘

1+∑𝑘′=1

𝐾− 1

𝑒𝜇𝑑𝑘 ′

1

1+∑𝑘′=1

𝐾− 1

𝑒𝜇𝑑𝑘 ′

if

if

𝜃𝑑𝑘=¿

17 /27

Outline


18 /27

Data Description

• Dataset:• U.S. House and Senate roll call data in the years between 1990 and 2013

• 1,540 legislators• 7,162 bills• 2,780,453 votes (80% are “YEA”)

• Keep the latest version of a bill if there are multiple versions.• Randomly select 90% of the votes as training and 10% as testing.

Downloaded from http://thomas.loc.gov/home/rollcallvotes.html

19 /27

Evaluation Measures

• Root mean square error (RMSE) between the predicted vote score and the ground truth

RMSE =

• Accuracy of correctly predicted votes (using 0.5 as a threshold for the predicted accuracy)

Accuracy =

• Average log-likelihood of the voting linkAvelogL =

20 /27

Experimental Results

Training Data set Testing Data set

21 /27

Parameter Study

Parameter study on Parameter study on (regularization coefficient)

¿ (1− 𝜆 )⋅ 𝑎𝑣𝑒𝑙𝑜𝑔𝐿 (𝑡𝑒𝑥𝑡 )+𝜆⋅𝑎𝑣𝑒𝑙𝑜𝑔𝐿(𝑣𝑜𝑡𝑖𝑛𝑔)−1

2𝜎2 ¿𝐽 (𝜽 ,𝜷 ,𝑿 ,𝑨 ,𝒃 )

22 /27

ForeignEducation

Individual Property

MilitaryFinancial Institution

LawHealth Service

Health Expenses

Funds

Public Transportation

Ronald Paul Barack Obama Joe Lieberman

Case Studies

• Ideal points for three famous politicians: (Republican, Democrat)• Ronald Paul (R), Barack Obama (D), Joe Lieberman (D)

23 /27

Case Studies

• Pick three topics: (Republican, Democrat)

24 /27

Case Studies

𝑝 (𝑣𝑢𝑑=1 )=𝜎 (∑𝑘

𝜃𝑑𝑘 𝑥𝑢𝑘𝑎𝑑𝑘+𝑏𝑑)

Bill: H_R_4548 (in 2004)

NayR. Paul H_R_4548

Topic Model

TF-IPM

Experts

𝜽𝑑

𝒙𝑢

𝒂𝑑 ,𝑏𝑑

𝑝 (𝑣𝑢𝑑=1 )

For Unseen Bill :

𝑎𝑑𝑘 𝜃𝑑𝑘

𝑥𝑢𝑘

25 /27

Outline


26 /27

Conclusion

• We estimate the ideal points of legislators and bills on multiple dimensions instead of global ones. • The generation of topics are guided by two types of links in the

heterogeneous network.• We present a unified model that combines voting behavior and topic

modeling, and propose an iterative learning algorithm to learn the parameters.• The experimental results on real-world roll call data show our

advantage over the state-of-the-art methods.

27 /27

Thanks

Q & A

topic-factorized ideal point estimation model for legislative voting network

Documents

ideal point model gerrish

global ideal point

ideal points

topic kbeta

topic distribution

different topics

voting behavior

topics of bills