the italian hate map: semantic content analytics for social good

40
The Italian Hate Map: semantic content analytics for social good Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group) I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015

Upload: cataldo-musto

Post on 21-Jan-2018

621 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: The Italian Hate Map: semantic content analytics for social good

The Italian Hate Map: semantic content analytics for social good

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities

Palermo (Italy) - October 29-30, 2015

Page 2: The Italian Hate Map: semantic content analytics for social good

2Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 3: The Italian Hate Map: semantic content analytics for social good

3

The Italian HateMap

http://users.humboldt.edu/mstephens/hate/hate_map.html

Inspired by the Hate Map built by

the Humboldt University

joint research with a psychologists team of Rome University and a

no-profit agency focused on human

rights

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 4: The Italian Hate Map: semantic content analytics for social good

4

http://users.humboldt.edu/mstephens/hate/hate_map.html

Insight:To aggregate rough people-based data in order to analyze

complex phenomena.

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 5: The Italian Hate Map: semantic content analytics for social good

5(Not a new idea) Map of cholera in London, 1854

red = cholera cases blue = water

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 6: The Italian Hate Map: semantic content analytics for social good

6

Research Question:Is it possible to extract and process social media

to detect intolerant content posted on social networks and identify the most at-risk areas of the

Italian country?

The Italian HateMap

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 7: The Italian Hate Map: semantic content analytics for social good

7

A framework for real-time Semantic Analysis of Social Streams

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 8: The Italian Hate Map: semantic content analytics for social good

8

CrowdPulse

Social Data Extraction

features

Semantic Tagging

Sentiment Analysis Processing & VisualizationCataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 9: The Italian Hate Map: semantic content analytics for social good

9

workflowCrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 10: The Italian Hate Map: semantic content analytics for social good

10

Step 1: Social Data ExtractionCrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 11: The Italian Hate Map: semantic content analytics for social good

11

Step 1: Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 12: The Italian Hate Map: semantic content analytics for social good

12

Step 1: Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 13: The Italian Hate Map: semantic content analytics for social good

13

Step 1: Social Data Extraction

Extraction

Source

Heuristics

ContentUser

Geo

Content+Geo

#icities2015#democrats

#traffic

@barack_obama@comunepalermo

#earthquake

Page

Group

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 14: The Italian Hate Map: semantic content analytics for social good

14

Step 1: Social Data Extraction

Extraction

Source

Heuristics

ContentUser

Geo

Content+Geo

#www2015#democrats

#traffic

@barack_obama@comunefi

#earthquake

Page

GroupWe only extract public content

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 15: The Italian Hate Map: semantic content analytics for social good

15

Use Case

Heuristics: Twitter content- 76 intolerant seed terms, defined by the psychologists teams - 5 intolerance dimensions: violence (against women), racism,

homophobia, disability, anti-semitism

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 16: The Italian Hate Map: semantic content analytics for social good

16

Use Case

Extracted content (seed term: nano/midget)

Tweet about an Italian ministry

CROWDPULSE SETTINGS

Tweet about iPod nano

Tweet about an Italian football player

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 17: The Italian Hate Map: semantic content analytics for social good

17

Use Case

Tweet about an Italian ministry

CROWDPULSE SETTINGS

Tweet about iPod nano

Tweet about an Italian football player

The Italian Hate Map

Many non-intolerant Tweets are extracted!

XX

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 18: The Italian Hate Map: semantic content analytics for social good

18

Use Case

Sentiment Analysis and Semantic Tagging of the content

CROWDPULSE SETTINGS

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

The Italian Hate Map

Page 19: The Italian Hate Map: semantic content analytics for social good

Keyword-based representation introduces a lot of noise in the analysis

nano

?

(midget)

(ipod nano)

Semantic TaggingMotivations

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015 19

Page 20: The Italian Hate Map: semantic content analytics for social good

“E’inutile, il mio nano non segnerà mai”

?

Semantic TaggingMotivations

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

INTOLERANTNOT INTOLERANT?

20

Page 21: The Italian Hate Map: semantic content analytics for social good

• Entity Linking Algorithms• Input: textual content • Output: identification and

disambiguation of the entities mentioned in the text.

(1) http://tagme.di.unipi.it

(2) http://spotlight.dbpedia.org

21

Step 2: Semantic Tagging

Solution: semantic processing of extracted content

Algorithms

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 22: The Italian Hate Map: semantic content analytics for social good

22

Use Case

Non-intolerant Tweets are detected and filtered out.

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 23: The Italian Hate Map: semantic content analytics for social good

23

CrowdPulseStep 3: Sentiment Analysis

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 24: The Italian Hate Map: semantic content analytics for social good

24

Sentiment AnalysisMotivations

Is this content conveying any opinion?

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 25: The Italian Hate Map: semantic content analytics for social good

25

Sentiment AnalysisMotivations

Is this content conveying any opinion?

This is a crucial issue if people-based findings have to be generated

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 26: The Italian Hate Map: semantic content analytics for social good

26

Sentiment AnalysisDefinition

“It is the field of study that analyzes people’s

opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as

products, services, organizations, individuals, issues, events, topics, and

their attributes “ (*)

(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)

We concentrated on the polarity detection taskCataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 27: The Italian Hate Map: semantic content analytics for social good

27

Use Case

Tweets with positive or neutral sentiment are detected and filtered out.

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 28: The Italian Hate Map: semantic content analytics for social good

28

Use CaseCROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 29: The Italian Hate Map: semantic content analytics for social good

29

CrowdPulseStep 4: Processing

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 30: The Italian Hate Map: semantic content analytics for social good

30

Use Case

We have to build a map, so we only need geotagged content

CROWDPULSE SETTINGS

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 31: The Italian Hate Map: semantic content analytics for social good

31

Use CaseCROWDPULSE SETTINGS

The Italian Hate Map

Definition of heuristics to increase the number of geotagged Tweets

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 32: The Italian Hate Map: semantic content analytics for social good

32

Use CaseThe Italian Hate Map

Dimension #Tweets #Geo %Geo

Homophobia 110,774 8,501 7,66%

Racism 154,170 1,940 1,24%

Violence 1,102,494 28,886 2,62%

Disability 479,654 3,410 0,75%

Anti-Semitism 6,000 1,150 18,03%

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 33: The Italian Hate Map: semantic content analytics for social good

33

CrowdPulseStep 4: Data Visualization

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 34: The Italian Hate Map: semantic content analytics for social good

34

Use CaseCROWDPULSE OUTPUT

The Italian Hate Map

Violence against women Disability

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

based on OpenStreetMap

Page 35: The Italian Hate Map: semantic content analytics for social good

35

Use CaseCROWDPULSE OUTPUT

The Italian Hate Map

Racism Homophobia

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

based on OpenStreetMap

Page 36: The Italian Hate Map: semantic content analytics for social good

Conclusions

36

Crowdsourcing-based approach

Social content containing the seed terms is extracted and processed in

real-time

Semantic Processing exploited to delete non-intolerant

Tweets

Sentiment Analysis

used to filter out Tweet with irony

1. 2.

3. 4. Analytics Console used to build real-time hate

maps

Almost 2,000,000 social content extracted and analyzed.

The Italian Hate Map

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 37: The Italian Hate Map: semantic content analytics for social good

Lessons Learned

37Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 38: The Italian Hate Map: semantic content analytics for social good

38

Lessons LearnedThe Italian Hate Map

Given the maps and given the output of the linguistic analysis of intolerant Tweets (co-occurrences between terms,

time lapse, etc.), the psychologists team defined some guidelines to tackle and prevent intolerant behaviors.

These guidelines have been freely distributed to public administration on early 2015.

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 39: The Italian Hate Map: semantic content analytics for social good

Lessons Learned

39

Pipeline of state of the art techniquesSemantic Processing, Sentiment Analysis, Machine Learning, Data Visualization

Use Case: The Italian Hate Map

DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS

Thanks to the huge availability of textual data very complex

phenomena can be analyzed in a totally new way

Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015

Page 40: The Italian Hate Map: semantic content analytics for social good

questions?Cataldo Musto, PhD

[email protected] @cataldomusto

http://www.di.uniba.it/~swap