the political blogosphere and the 2004 u.s. election: “divided they blog”

27
The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog” By Lada Adamic, HP Labs, & Natalie Glance, Intelliseek Applied Research Center 2

Upload: elaina

Post on 24-Feb-2016

107 views

Category:

Documents


1 download

DESCRIPTION

The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”. By Lada Adamic , HP Labs, & Natalie Glance, Intelliseek Applied Research Center. Agenda:. General background and terms Study goals Methodology: creating 2 data sets Analysis Summing up. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

2

The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”By Lada Adamic, HP Labs, & Natalie Glance, Intelliseek Applied Research Center

Page 2: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

3

Agenda:1) General background and terms2) Study goals3) Methodology: creating 2 data

sets4) Analysis5) Summing up

Page 3: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

4

Name: George W. BushParty: Republican (“American conservatism”)Home State: TexasElectoral Vote: 286

Name: John KerryParty: Democratic (“Modern American liberalism”)Home State: MassachusettsElectoral Vote: 251

US Presidential Election, 2/11/2004

Page 4: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

5

US Presidential Election, 2/11/2004

Page 5: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

6

Political blogs What is a blog?

~35 million blogs worldwide by end of 2006, and ~173 million in 2011.

2004: 32 million US citizens read blogs

2004: 63 million use internet to get informed about politics

Page 6: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

7

“Blogosphere” as a social networkVarious ways of drawing the blogosphere graph: Each blog/ post is a vertex and a directed edge from post A to B is

added if A contains a link to B. Each blog/ post is a vertex, undirected weighted edge is added

between two posts based on their similarity. (Similarity can be calculated in various ways)

And more.

Page 7: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

8

“Blogosphere” as a social networkLinks in-between blogs may appear in two different ways:

2. Blogroll links

1. Post citations

Page 8: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

9

Political blogs: not only in the US….

Page 9: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

10

“The Political Blogosphere and the 2004 US Election: Divided They Blog” Study goals: Identify differences between

sub-communities of political blogs (focusing on conservative vs. liberal blogs), both linking patterns and discussion topics.

(Why is this interesting? “cyber-balkanization”)

Page 10: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

11

Dataset #1: Wide Snapshot Gather list of labeled blogs from online blog directories (“BlogCatalog”, “eTalkingHead”, etc.)

Collect snapshots of front pages of each blog, February 2005

Extract links to additional political blogs, save only those cited by others at least ~20 times

Manually/automatically set labels for new list

Collect snapshots of new list and join the 2 lists together

Page 11: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

12

Dataset #1: Wide Snapshot Final dataset contained: 1494 listed blogs in total: 759 liberal, 735 conservative Snapshot of front page collected for 676 liberal blogs and

659 conservative ones No distinction between blogroll links to links in specific

posts (post citations) – all links are referred to as “page links”

Page 12: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

13

Dataset #1: Wide Snapshot 91% of links stay

within their community

Conservative blogs show a greater tendency to link: 84% of conservative blogs link to at least one other blog, as opposed to 74% of liberal blogs

Conservative blogs link to 15.1 other blogs on average, liberal to 13.6 on average

Page 13: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

14

Dataset #1: Wide Snapshot

“…as common in almost every large subset of sites on web, the distribution of inlinks is highly uneven, with a few blogs of either persuasion having over a hundred incoming links, while hundreds of blogs have just one or two.”

Page 14: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

15

Dataset #2: Corpus of Posts from Selected BlogsTake the top 100 blogs from each community with maximum page links

Use “blogPulse” to retrieve the number of post citations pointing at each blog during the months of October and November 2004 (indicating current popularity)

Choose top 20 from each list based on post-citations ranking, omitting a few websites with unusual formats or a primary function other than blogging

Create a corpus of blog posts from 40 blogs selected above, in the time frame of August 2004 to November 2004 (“blogPulse” provides tools to crawl weblog pages and segment them into individual posts)

Page 15: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

16

• 12,470 posts from left leaning blogs, 10,414 posts from right leaning blogs

• Selected blogs – examples:

Dataset #2: Corpus of Posts from Selected Blogs

Libe

ral

Con

serv

ative

Page 16: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

17

Analyses:1) Strength of each community2) Varied conversations• Using citations

• Using textual similarity

3) Interaction with mainstream media4) Occurrences of names of political

figures

Page 17: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

18

Analysis 1: Strength of each community Liberal Conservative

Total number of posts 12,470 10,414Inner community citing (Liberal blogs citing other liberal blogs and same for conservatives)

1,511 2,110

Cross citing (liberals citing conservative blogs and vice versa)

247 312

Links-per-post rate 0.12 0.2

Page 18: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

19

Analysis 2: Varied conversations 1st method focused on similarity between blogs based on

common links (any URL, not neccesarily a blog). Cosine similarity: XA is a binary vector, where entry i is set to 1 or 0

corresponding to whether blog A cited URL(i) or not. Pairwise cosine similarity was computed for all 40 blogs.

Page 19: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

20

Analysis 2: Varied conversations Average similarity between liberal blogs and conservative

blogs: 0.03. Average similarity amongst liberal blogs: 0.09. Average similarity amongst conservative blogs: 0.11. Statistically significant difference. P-value of ~0.004 based on

ANOVA. When removing political blogs from URL’s, difference was no

longer significant (we already saw that conservative blogs tend to more actively relate to one another).

Page 20: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

21

Analysis 2: Varied conversations 2nd method focused on similarity between blogs based on

textual content, particularly “informative phrases”. Used a phrase-finding algorithm to identify 498 phrases in the

40 blogs. Similarity was based on cosine-similarity again, this time using

TF*IDF metric. TF-IDF stands for “Term Frequency - Inverse Document Frequency”. TF-IDF is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.

Page 21: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

22

Analysis 2: Varied conversations This time XA is a binary vector, where the entry corresponding

to phrase p is given by . is the number of times phrase p appears in blog A. N = 1,768,887 is the number of all blogs in “blogPulse” dataset, found in

Oct-Nov 2004. is the number of blogs containing the phrase p out of all N blogs from

“blogPulse”. Results: average similarity between blogs of opposite persuasions (0.1)

was smaller than that of liberal (0.57) and conservative (0.54) pairs.

Reminder: Cosine similarity is defined as

Page 22: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

23

Analysis 3: Interaction with mainstream media

Focusing on links to formal news articles, some online news sites (e.g. National Review, Washington Times) were found to receive the majority of their links from conservative blogs while others (e.g. LA Times, Wall Street Journal) – from liberal blogs.

Page 23: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

24

Analysis 3: Interaction with mainstream mediaDataset #1 Dataset

#2

Page 24: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

25

Analysis 3: Interaction with mainstream media

Mentions of the “CBS forged documents” article, on time series graph:

Page 25: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

26

Analysis 4: Mentioning names of political figures

Overall pattern: Democrats are more often mentioned by right-leaning bloggers, and vice versa...

Page 26: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

27

Summing up The political blogosphere is, in some ways, divided

between liberals and conservatives: Links are mostly within each community Discussion topics and political figures mentioned differ Conservative blogs are more tightly linked

Future research directions: divide posts by author instead of blog, how do news and ideas spread in both communities, and blogs that do not count as “liberal” nor “conservative” – do they form a bridge in between or rather a separate community?

Page 27: The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”

28

A peak into later work by one of the authors: