measuring self-focus bias in community maintained knowledge repositories
TRANSCRIPT
Measuring Self-Focus Bias in Community Maintained Knowledge RepositoriesBrent Hecht and Darren GergleNorthwestern University
Introduction
• Artificial Intelligence• Natural Language Processing• Human-Computer Interaction• CSCW
• self-focus bias• effect of community-held opinions and interests on the world knowledge in Wikipedia• if it exists, both positive and negative
Introduction
subset of the English Wikipedia Article Graph (WAG)
• “Barack Obama” has 2 inlinks• “Barack Obama” has an indegree of 2
Introduction terms and concepts
subset of the English Wikipedia Article Graph (WAG)
• indegree → what people are writing about• indegree → relatedness to sum of world knowledge in each Wikipedia
Introduction terms and concepts
subset of the English Wikipedia Article Graph (WAG)
Introduction terms and concepts
Barack Obama
The United States Joe Biden
subset of the English Wikipedia Article Graph (WAG)
• indegree → what people are writing about• indegree → relatedness to sum of world knowledge in each Wikipedia
Introduction terms and concepts
Barack Obama
The United States Joe Biden
• focus = indegree in Wikipedia Article Graph (WAG)• greater indegree = greater focus
Study 1 methods
definition of focus
• focus = indegree in Wikipedia Article Graph (WAG)• greater indegree = greater focus• compare across 15 Wikipedias
Study 1 methods
definition of focus
Study 1
English Wikipedia
methods
Penn StateUniversity
Jonathan Frakes Pennsylvania Interstate
99
Université d'État de Pennsylvanie
Jonathan Frakes Pennsylvania
French Wikipedia
definition of focus
indegree = 3 indegree = 1
Experiment methods
Poutine!
http://commons.wikimedia.org/wiki/File:Poutine.JPG
Study 1
English Wikipedia
methods
French Wikipedia
definition of focus
Poutine
French Fries Cheddar
Cheese
Poutine
Chez Ashton French Fries Cheddar
Cheese
indegree = 0 indegree = 3
Chez Ashton
Study 1 methods
FinlandHelsinki
Rovaniemi
Finno-Urgic Languages
Sub-arctic Climate
Linus Torvalds
Sub-arctic Climate
sample and statistic
Flying Finn Airline
Study 1 methods
FinlandHelsinki
Rovaniemi
Finno-Urgic Languages
Sub-arctic Climate
Linus Torvalds
Sub-arctic Climate
• Finland has an indegree sum = 4
sample and statistic
Flying Finn Airline
Study 1 null hypothesis
H0: Indegree sums will have roughly the same distribution in every Wikipedia
Study 1 null hypothesis
H0: Indegree sums will have roughly the same distribution in every Wikipedia
All Wikipedias agree on focus distribution
Study 1 null hypothesis
H0: Indegree sums will have roughly the same distribution in every Wikipedia
All Wikipedias agree on focus distribution
Self-focus bias does not exist
Study 1 self-focus hypothesis
H1: Each language’s Wikipedia will have higher indegree sums in countries where
the language is prominent
Study 1 self-focus hypothesis
H1: Each language’s Wikipedia will have higher indegree sums in countries where
the language is prominent
Each Wikipedia will demonstrate greater focus on its language’s culture hearth
Study 1 self-focus hypothesis
H1: Each language’s Wikipedia will have higher indegree sums in countries where
the language is prominent
Each Wikipedia will demonstrate greater focus on its language’s culture hearth
Self-focus bias exists
Study Iresults
Country Indegree Sum
Germany 718,668
United States 114,720
France 110,554
Switzerland 103,387
Austria 95,986
Italy 93,116
German Wikipedia
Study Iresults
Finnish Wikipedia
Country Indegree Sum
Finland 55,331
United States 25,664
Germany 11,972
Russia 10,076
United Kingdom 9,402
Italy 7,948
Study Iresults
Country Indegree Sum
Japan 453,048
Italy 70,922
United States 60,384
China 37,208
Germany 25,276
United Kingdom 20,690
Study Iresults
Country Indegree Sum
Japan 453,048
Italy 70,922
United States 60,384
China 37,208
Germany 25,276
United Kingdom 20,690
Japanese Wikipedia
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
Y
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YY
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YYY
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YYN
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YYNN
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YYNNY
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YYNNYN
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YYNNYN
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YNNYN
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
YNNYN
Num
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
Y
NYN
Num
Study Iresults
Country Indegree Sum
United States 1,366,261
United Kingdom 439,582
France 189,698
Germany 151,503
Canada 146,191
Italy 129,133
English Wikipedia
Y
NYN
Num
Den
Language Self-focus RatioEnglish 7.2
Japanese 6.4German 6.3French 4.2Italian 3.6
Catalan 2.9Spanish 2.4Finnish 2.2Polish 1.7
Norwegian 1.4Chinese 1.2Dutch 0.7
Swedish 0.6Portuguese 0.3
Study Iresults
• sample = geographic articles• statistic = spatial indegree sums
Study 1I methods
sample and statistic
• sample = geographic articles• statistic = spatial indegree sums
Study 1I methods
sample and statistic
• sample = geographic articles• statistic = spatial indegree sums
Study 1I methods
sample and statistic
spatial pagerank score sums
Discussionhyperlingual approach
• 15 Wikipedias (22)• over 8 million articles• over 270 million links
Discussionhyperlingual approach
• 15 Wikipedias (22)• over 8 million articles• over 270 million links• English less than 1/4 the data
Discussionhyperlingual approach
• 15 Wikipedias (22)• over 8 million articles• over 270 million links• English less than 1/4 the data• it was “easy” with WikAPIdia software
Discussion
• general benefits• similarities → more robust findings • differences → cultural diversity
hyperlingual approach
Discussion
• general benefits• similarities → more robust findings • differences → cultural diversity
• mine cultural diversity
hyperlingual approach
Discussion
• general benefits• similarities → more robust findings • differences → cultural diversity
• mine cultural diversity• “culturally-aware applications”
hyperlingual approach
Discussion
• general benefits• similarities → more robust findings • differences → cultural diversity
• mine cultural diversity• “culturally-aware applications”
• very rarely in literature
hyperlingual approach
1. self-focus is a systemic bias in Wikipedia• people reorient world knowledge around themselves• many implications for technologies
ConclusionCliffs Notes
1. self-focus is a systemic bias in Wikipedia• people reorient world knowledge around themselves• many implications for technologies
2. hyperlingual approach proved very useful
ConclusionCliffs Notes
Nada Petrović Colleagues at the Collabolab
NSF #0705901 Microsoft Research
Acknowledgements
Contact Info