disinformation on the web: impact, characteristics and detection of wikipedia hoaxes
TRANSCRIPT
![Page 1: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/1.jpg)
Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes
Srijan Kumar Univ. of MarylandRobert West Stanford Univ.Jure Leskovec Stanford Univ.
1Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016
![Page 2: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/2.jpg)
Web: Source of information
2
62% adults in U.S.A. rely on social media
for news
28% of 18-24 year olds use
social media as primary
news source
![Page 3: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/3.jpg)
Web: Source of false information
3
![Page 4: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/4.jpg)
Types of false information
4
Misinformationhonest mistake
Disinformationdeliberate lie to
misleadHoax“deliberately fabricated falsehood made to masquerade as truth”Wikipedia
![Page 5: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/5.jpg)
Why Wikipedia?
The free encyclopedia that anyone can edit
5
Easy to add (false) information
• Freely accessible
• Large reach• Major source of
information for many
![Page 6: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/6.jpg)
Hoaxes on Wikipedia
6
![Page 7: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/7.jpg)
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
7
![Page 8: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/8.jpg)
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
21,218 hoax articles
8
Hoax lifecycle:
![Page 9: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/9.jpg)
Wikipedia hoaxes
9
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Quantify their impact?
What are the hoaxes like?
Can we find them?
![Page 10: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/10.jpg)
Impact of hoaxes“The worst hoaxes are those which (a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.”Jimmy Wales on Quora
10
![Page 11: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/11.jpg)
Impact of hoaxes“The worst hoaxes are those which (a) last for a long time”
11
Time t between patrolling and flagging
0.99
0.90
![Page 12: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/12.jpg)
Impact of hoaxes“The worst hoaxes are those which (b) receive significant traffic”
12
10 100
500
Number n of pageviews per day
![Page 13: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/13.jpg)
Impact of hoaxes“The worst hoaxes are those which (c) are relied upon by credible news media”
13
1.08 active inlinks
per hoax article, on average
7% of hoax articles have
at least 5 active inlinks
![Page 14: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/14.jpg)
Wikipedia hoaxes
14
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Most hoaxes are caught
soon, but some hoaxes are impactful
What are the hoaxes like?
Can we find them?
![Page 15: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/15.jpg)
15
Successful hoaxpass patrolsurvive for a monthviewed 100+/day
Failed hoaxflagged and deleted during patrol
Wrongly flagged temporarily flagged
Legitimate articlesnever flagged
Hoax
Non-hoax
![Page 16: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/16.jpg)
Characteristics of hoaxes
16
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
![Page 17: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/17.jpg)
Characteristics of hoaxes
17
Surprisingly, hoax articles are longer than non-hoax articles!
Features:o Plain-text length
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
![Page 18: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/18.jpg)
Characteristics of hoaxes
18
Surprisingly, hoax articles are longer than non-hoax articles!butthey mostly have plain text and have fewer web and wiki links.
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
Features:o Plain-text lengtho Plain-text-to-markup
ratioo Wiki-link densityo Web-link density
![Page 19: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/19.jpg)
Characteristics of hoaxes
19
Clustering coefficient = 0incoherent article
Clustering coefficient > 0coherent article
Legitimate articles are more coherent than successful hoaxes
Appearance:hoaxes mostly have text and few references.
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
![Page 20: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/20.jpg)
Characteristics of hoaxes
20
Hoax mentions are less in number.
Features:o Number of prior
mentions
Appearance:hoaxes mostly have text and few references.
Link-network:hoaxes have incoherent wikilinks.
Support:how other articles refer to it
Editor:how the article creator looks
![Page 21: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/21.jpg)
Characteristics of hoaxes
21
Hoax mentions are less in number, mostly created by article creator or anonymously, and are more recently created.
Features:o Number of prior
mentionso Creator of first mentiono Time since first mention
Appearance:hoaxes mostly have text and few references.
Link-network:hoaxes have incoherent wikilinks.
Support:how other articles refer to it
Editor:how the article creator looks
![Page 22: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/22.jpg)
Characteristics of hoaxes
22
Hoax creators are more recently registered, and have lesser editing experience.
Features:o Creator’s time since
registrationo Creator’s experience
Appearance:hoaxes mostly have text and few references.
Link-network:hoaxes have incoherent wikilinks.
Support:hoaxes have few, recent, suspicious mentions.
Editor:how the article creator looks
![Page 23: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/23.jpg)
Wikipedia Hoaxes
23
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Hoaxes are different from non-hoaxes in many respects
Most hoaxes are caught
soon, but some hoaxes are impactful
Can we find them?
![Page 24: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/24.jpg)
Detection of hoaxes
24
Will a hoax get past patrol?
Is an article a hoax?
Is an article flagged as hoax really one?
AUC = 71% Appearance features
AUC = 98% Editor and Network features
AUC = 86% Editor and support features
![Page 25: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/25.jpg)
We discovered previously unknown hoaxes!
25
Flagged by us and deleted by Wikipedia administrators
Steve Moertel
Americanpopcorn
entrepreneur
Article survived over
6 years 11 months!
![Page 26: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/26.jpg)
Can readers identify hoaxes?
26
Results
320 random hoax and non-hoax pairs 10 raters on Amazon Mechanical Turk rated each pair
Casual readers are gullible to hoaxes.Accurate detection needs non-appearance features.
50%Random
66%Human
86%Classifier
![Page 27: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/27.jpg)
What fools humans?
27
Humans get fooled when article looks more “genuine”, and it is assumed to be credible.
Comparing easy- vs hard-to-identify hoaxes
![Page 28: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/28.jpg)
How to identify misinformation on the web?
28
● Appearance○ How well referenced is the information source?○ What is the content of the article?
● Editor○ Who created the information?
● Network○ How related is this information to other
information it references to?● Support
○ Is there any evidence of the information, prior to its creation?
![Page 29: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes](https://reader036.vdocuments.us/reader036/viewer/2022070517/58cefb891a28abab738b5a8b/html5/thumbnails/29.jpg)
Wikipedia Hoaxes
29
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Hoaxes are different from non-hoaxes in many respects
Most hoaxes are caught
soon, but some hoaxes are impactful
Non-appearance features are important to
detect hoaxes