the political blogosphere and the 2004 u.s. election: “divided they blog”
DESCRIPTION
The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”. By Lada Adamic , HP Labs, & Natalie Glance, Intelliseek Applied Research Center. Agenda:. General background and terms Study goals Methodology: creating 2 data sets Analysis Summing up. - PowerPoint PPT PresentationTRANSCRIPT
2
The Political Blogosphere and the 2004 U.S. Election: “Divided They Blog”By Lada Adamic, HP Labs, & Natalie Glance, Intelliseek Applied Research Center
3
Agenda:1) General background and terms2) Study goals3) Methodology: creating 2 data
sets4) Analysis5) Summing up
4
Name: George W. BushParty: Republican (“American conservatism”)Home State: TexasElectoral Vote: 286
Name: John KerryParty: Democratic (“Modern American liberalism”)Home State: MassachusettsElectoral Vote: 251
US Presidential Election, 2/11/2004
5
US Presidential Election, 2/11/2004
6
Political blogs What is a blog?
~35 million blogs worldwide by end of 2006, and ~173 million in 2011.
2004: 32 million US citizens read blogs
2004: 63 million use internet to get informed about politics
7
“Blogosphere” as a social networkVarious ways of drawing the blogosphere graph: Each blog/ post is a vertex and a directed edge from post A to B is
added if A contains a link to B. Each blog/ post is a vertex, undirected weighted edge is added
between two posts based on their similarity. (Similarity can be calculated in various ways)
And more.
8
“Blogosphere” as a social networkLinks in-between blogs may appear in two different ways:
2. Blogroll links
1. Post citations
9
Political blogs: not only in the US….
10
“The Political Blogosphere and the 2004 US Election: Divided They Blog” Study goals: Identify differences between
sub-communities of political blogs (focusing on conservative vs. liberal blogs), both linking patterns and discussion topics.
(Why is this interesting? “cyber-balkanization”)
11
Dataset #1: Wide Snapshot Gather list of labeled blogs from online blog directories (“BlogCatalog”, “eTalkingHead”, etc.)
Collect snapshots of front pages of each blog, February 2005
Extract links to additional political blogs, save only those cited by others at least ~20 times
Manually/automatically set labels for new list
Collect snapshots of new list and join the 2 lists together
12
Dataset #1: Wide Snapshot Final dataset contained: 1494 listed blogs in total: 759 liberal, 735 conservative Snapshot of front page collected for 676 liberal blogs and
659 conservative ones No distinction between blogroll links to links in specific
posts (post citations) – all links are referred to as “page links”
13
Dataset #1: Wide Snapshot 91% of links stay
within their community
Conservative blogs show a greater tendency to link: 84% of conservative blogs link to at least one other blog, as opposed to 74% of liberal blogs
Conservative blogs link to 15.1 other blogs on average, liberal to 13.6 on average
14
Dataset #1: Wide Snapshot
“…as common in almost every large subset of sites on web, the distribution of inlinks is highly uneven, with a few blogs of either persuasion having over a hundred incoming links, while hundreds of blogs have just one or two.”
15
Dataset #2: Corpus of Posts from Selected BlogsTake the top 100 blogs from each community with maximum page links
Use “blogPulse” to retrieve the number of post citations pointing at each blog during the months of October and November 2004 (indicating current popularity)
Choose top 20 from each list based on post-citations ranking, omitting a few websites with unusual formats or a primary function other than blogging
Create a corpus of blog posts from 40 blogs selected above, in the time frame of August 2004 to November 2004 (“blogPulse” provides tools to crawl weblog pages and segment them into individual posts)
16
• 12,470 posts from left leaning blogs, 10,414 posts from right leaning blogs
• Selected blogs – examples:
Dataset #2: Corpus of Posts from Selected Blogs
Libe
ral
Con
serv
ative
17
Analyses:1) Strength of each community2) Varied conversations• Using citations
• Using textual similarity
3) Interaction with mainstream media4) Occurrences of names of political
figures
18
Analysis 1: Strength of each community Liberal Conservative
Total number of posts 12,470 10,414Inner community citing (Liberal blogs citing other liberal blogs and same for conservatives)
1,511 2,110
Cross citing (liberals citing conservative blogs and vice versa)
247 312
Links-per-post rate 0.12 0.2
19
Analysis 2: Varied conversations 1st method focused on similarity between blogs based on
common links (any URL, not neccesarily a blog). Cosine similarity: XA is a binary vector, where entry i is set to 1 or 0
corresponding to whether blog A cited URL(i) or not. Pairwise cosine similarity was computed for all 40 blogs.
20
Analysis 2: Varied conversations Average similarity between liberal blogs and conservative
blogs: 0.03. Average similarity amongst liberal blogs: 0.09. Average similarity amongst conservative blogs: 0.11. Statistically significant difference. P-value of ~0.004 based on
ANOVA. When removing political blogs from URL’s, difference was no
longer significant (we already saw that conservative blogs tend to more actively relate to one another).
21
Analysis 2: Varied conversations 2nd method focused on similarity between blogs based on
textual content, particularly “informative phrases”. Used a phrase-finding algorithm to identify 498 phrases in the
40 blogs. Similarity was based on cosine-similarity again, this time using
TF*IDF metric. TF-IDF stands for “Term Frequency - Inverse Document Frequency”. TF-IDF is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.
22
Analysis 2: Varied conversations This time XA is a binary vector, where the entry corresponding
to phrase p is given by . is the number of times phrase p appears in blog A. N = 1,768,887 is the number of all blogs in “blogPulse” dataset, found in
Oct-Nov 2004. is the number of blogs containing the phrase p out of all N blogs from
“blogPulse”. Results: average similarity between blogs of opposite persuasions (0.1)
was smaller than that of liberal (0.57) and conservative (0.54) pairs.
Reminder: Cosine similarity is defined as
23
Analysis 3: Interaction with mainstream media
Focusing on links to formal news articles, some online news sites (e.g. National Review, Washington Times) were found to receive the majority of their links from conservative blogs while others (e.g. LA Times, Wall Street Journal) – from liberal blogs.
24
Analysis 3: Interaction with mainstream mediaDataset #1 Dataset
#2
25
Analysis 3: Interaction with mainstream media
Mentions of the “CBS forged documents” article, on time series graph:
26
Analysis 4: Mentioning names of political figures
Overall pattern: Democrats are more often mentioned by right-leaning bloggers, and vice versa...
27
Summing up The political blogosphere is, in some ways, divided
between liberals and conservatives: Links are mostly within each community Discussion topics and political figures mentioned differ Conservative blogs are more tightly linked
Future research directions: divide posts by author instead of blog, how do news and ideas spread in both communities, and blogs that do not count as “liberal” nor “conservative” – do they form a bridge in between or rather a separate community?
28
A peak into later work by one of the authors: