crisis informatics (november 2013)

69
Crisis informatics: Finding relevant and credible information on social media during disasters

Upload: carlos-castillo

Post on 09-May-2015

848 views

Category:

Technology


0 download

DESCRIPTION

Talk at Microsoft Research, New York City, November 2013.

TRANSCRIPT

Page 1: Crisis Informatics (November 2013)

Crisis informatics:Finding relevant and credible information on social media during disasters

Page 2: Crisis Informatics (November 2013)

January 2010

How/when did it start for me?

Page 3: Crisis Informatics (November 2013)

3

Carlos Castillo – [email protected]://www.chato.cl/research/

Fertile grounds for applied research

✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities

Page 5: Crisis Informatics (November 2013)

Publication titles

Page 6: Crisis Informatics (November 2013)
Page 7: Crisis Informatics (November 2013)

7

Carlos Castillo – [email protected]://www.chato.cl/research/

Fertile grounds for applied research

✔ Problem of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities• Relevance to practitioners?

Page 8: Crisis Informatics (November 2013)

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

Page 9: Crisis Informatics (November 2013)

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

“What can speed humanitarian

response to tsunami-ravaged

coasts? Expose human rights

atrocities? Launch helicopters to

rescue earthquake victims?

Outwit corrupt regimes?

A map.”

Page 10: Crisis Informatics (November 2013)

10

Carlos Castillo – [email protected]://www.chato.cl/research/

CollaboratorsMuhammad Imran– QCRI

Hemant Purohit– Wright Univ.

Alexandra Olteanu– EPFL

Jakob Rogstadious– Univ. of Madeira

Ioanna Lykorentzou– INRIA

Shady Elbassuoni– Univ. of Beirut

Lalana Kagal et al.– CSAIL MIT

Fernando Diaz– Microsoft

Page 11: Crisis Informatics (November 2013)

11

Carlos Castillo – [email protected]://www.chato.cl/research/

Outline

• Motivation• Handling crisis tweets• Crowdsourced verification• Ongoing work

– Automatic classification– Resource matchmaking

Page 12: Crisis Informatics (November 2013)

Crisis MappingHemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media AnalyticsTutorial at ICWSM, May 2013.

Page 13: Crisis Informatics (November 2013)

13

Carlos Castillo – [email protected]://www.chato.cl/research/

I don't have time for social networks!

• We all have spare capacity– Television, TV series, Internet sites

• We overestimate ourselves in general– Don' underestimate social media

users, it is a bad starting point

Page 14: Crisis Informatics (November 2013)
Page 15: Crisis Informatics (November 2013)
Page 16: Crisis Informatics (November 2013)
Page 17: Crisis Informatics (November 2013)
Page 18: Crisis Informatics (November 2013)

18

Carlos Castillo – [email protected]://www.chato.cl/research/

An earthquake hits a Twitter user

• When an earthquake strikes, the first tweets are posted 20-30 seconds later

• Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency

• After ~100km seismic waves may be overtaken by tweets about them

http://xkcd.com/723/

Page 19: Crisis Informatics (November 2013)
Page 20: Crisis Informatics (November 2013)
Page 21: Crisis Informatics (November 2013)
Page 22: Crisis Informatics (November 2013)
Page 23: Crisis Informatics (November 2013)
Page 24: Crisis Informatics (November 2013)
Page 25: Crisis Informatics (November 2013)
Page 26: Crisis Informatics (November 2013)

26

Carlos Castillo – [email protected]://www.chato.cl/research/

Crisis Mapper Conference 2013:Next week!

Page 27: Crisis Informatics (November 2013)

Classifying and extracting information from tweetsMuhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social MediaIn SWDM. Rio de Janeiro, Brazil, 2013.

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social MediaIn ISCRAM. Baden-Baden, Germany, 2013. Best paper award.

Page 28: Crisis Informatics (November 2013)

28

Carlos Castillo – [email protected]://www.chato.cl/research/

3.

Extraction

Our approach

2.

Classification1.

Filtering

Page 29: Crisis Informatics (November 2013)

29

Carlos Castillo – [email protected]://www.chato.cl/research/

1. Filtering

Is disaster-related?

Contributes tosituational

awareness?

Yes Yes

No No

Page 30: Crisis Informatics (November 2013)

30

Carlos Castillo – [email protected]://www.chato.cl/research/

Labeling task

Classify the following tweet from Hurricane Sandy as:● Personal: only of interest to author and

immediate circle of friends● Informative: interesting to other people● Off-topic: not related to Hurricane Sandy● Other/can't judge

Page 31: Crisis Informatics (November 2013)

31

Carlos Castillo – [email protected]://www.chato.cl/research/

Advice on labeling

• Your instructions will never be correct the first time you try– e.g. personal / eyewitness– Instructions must be re-written reactively– Perform small-scale labeling first

• Instructions must be concrete and brief– If you can't do it, the task has to be divided

Page 32: Crisis Informatics (November 2013)

32

Carlos Castillo – [email protected]://www.chato.cl/research/

2. ClassificationCaution &

AdviceInformation

SourcesDamage &Casualties Donations

Health

Shelter

Food

Water

Logistics

...

...

Filteredtweets

Page 33: Crisis Informatics (November 2013)

33

Carlos Castillo – [email protected]://www.chato.cl/research/

Distribution of tweet types

50%

18%

16%

10%6%

Caution/AdviceInfo SourceDonationsCasualties/DamageUnknown

Joplin Tornado (2011)

Page 34: Crisis Informatics (November 2013)

34

Carlos Castillo – [email protected]://www.chato.cl/research/

Classification results

Class AUC

Caution and advice 0.91

Information source 0.76

Donations 0.89

Casualties/damage 0.87

Page 35: Crisis Informatics (November 2013)

35

Carlos Castillo – [email protected]://www.chato.cl/research/

3. Extraction

...

Classifiedtweets

@JimFreund: Apparently we have no choice.

There is a tornado watch in effect

tonight.

Page 36: Crisis Informatics (November 2013)

36

Carlos Castillo – [email protected]://www.chato.cl/research/

Extraction

• #hashtags, @user mentions, URLs, etc.– Regular expressions– Text library from Twitter

• Temporal expressions– Part-of-speech tagger + heuristics– Natty library

• Supervised learning

Page 37: Crisis Informatics (November 2013)

37

Carlos Castillo – [email protected]://www.chato.cl/research/

Labels for extraction

• Type-dependent instruction• Ask evaluators to copy-paste a

word/phrase from each tweet

Page 38: Crisis Informatics (November 2013)

38

Carlos Castillo – [email protected]://www.chato.cl/research/

Learning: Conditional Random Fields

• Used extensively in NLP for part-of-speech tagging and information extraction

• Representation of observations is important (capitalization, position, etc.)

HMM Linear-chain CRF

hidden

observed

Page 39: Crisis Informatics (November 2013)

39

Carlos Castillo – [email protected]://www.chato.cl/research/

Tool

• CMU ARK Twitter NLP– Tokenization– Feature extraction– CRF learning

• Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train

Page 40: Crisis Informatics (November 2013)

40

Carlos Castillo – [email protected]://www.chato.cl/research/

Output examples

RT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC

Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected

RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy

RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy

Page 41: Crisis Informatics (November 2013)

41

Carlos Castillo – [email protected]://www.chato.cl/research/

Extractor evaluation

Setting Rec Prec

Train 2/3 Joplin, Test 1/3 Joplin 78% 90%

Train 2/3 Sandy, Test 1/3 Sandy 41% 79%

Train Joplin, Test Sandy 11% 78%

Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%

• Precision is: one word or more in common with what humans extracted

Page 42: Crisis Informatics (November 2013)

42

Carlos Castillo – [email protected]://www.chato.cl/research/

Donations matching• Identify and match requests/offers for donations

– Money, clothing, food, shelter, volunteers, blood

Average precision = 0.21 (0.16 if only text similarity is used)

Page 43: Crisis Informatics (November 2013)

Crowdsourced stream processing systemsMuhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systems(Submitted for publication)

Page 44: Crisis Informatics (November 2013)

44

Carlos Castillo – [email protected]://www.chato.cl/research/

Page 45: Crisis Informatics (November 2013)

45

Carlos Castillo – [email protected]://www.chato.cl/research/

Design objectives and principlesDesign principles

Design objective Example metric Automatic components

Crowdsourced components

Low latency End-to-end time Keep-items moving Trivial tasks

High throughput Output items per unit of time

High-performance processing

Task automation

Load adaptability Rate response function

Load shedding, load queueing

Task prioritization

Cost effectiveness Cost vs. quality, throughput, etc.

N/A Task frugality

High quality Application-dependent

Redudancy, aggregation and quality control

Page 46: Crisis Informatics (November 2013)

Design patterns

● QA loop

● Task assignment

● Process/verify

● Supervised learning

● Crowdwork sub-task chaining

● Humans are not a bottleneck

● Humans review every output element

Page 47: Crisis Informatics (November 2013)

47

Carlos Castillo – [email protected]://www.chato.cl/research/

http://aidr.qcri.org/

Page 48: Crisis Informatics (November 2013)

48

Carlos Castillo – [email protected]://www.chato.cl/research/

Self-service for crisis-related classification

Unstructuredtext reports

Structuredinformation

ReportClassifier

ModelBuilder

Crowdsourced active learning

Library of training data

Page 49: Crisis Informatics (November 2013)

49

Carlos Castillo – [email protected]://www.chato.cl/research/

Page 50: Crisis Informatics (November 2013)

Preliminary results: efficiency

Maximum documented input load during a natural disaster = 270 tweets/sec.

Page 51: Crisis Informatics (November 2013)

Preliminary results: effectiveness

Task: Informative vs. {Personal, Other}

Page 52: Crisis Informatics (November 2013)

52

Carlos Castillo – [email protected]://www.chato.cl/research/

Free software

• AIDR is free software• The official launch date is

November 20th during the Crisis Mappers conference in Nairobi, Kenya

Page 53: Crisis Informatics (November 2013)

Mobile applicationsFuming Shih, Oshani Seneviratne, Daniela Miao, Ilaria Liccardi, Lalana Kagal, Evan Patton, Patrick Meier, Carlos Castillo:Democratizing Mobile App Development for Disaster ManagementTo be presented at the IJCAI Workshop on Semantic Cities. Beijing, China, 2013.

Page 54: Crisis Informatics (November 2013)

54

Carlos Castillo – [email protected]://www.chato.cl/research/

Mobile components (AppInventor)

• Components useful for DIY emergency response apps–e.g. off-line tolerant

photo uploads• Aggregating/federating

linked open data

Page 55: Crisis Informatics (November 2013)

55

Carlos Castillo – [email protected]://www.chato.cl/research/

Helping developers query linked data

Page 56: Crisis Informatics (November 2013)
Page 57: Crisis Informatics (November 2013)

57

Carlos Castillo – [email protected]://www.chato.cl/research/

Resource matching

Page 58: Crisis Informatics (November 2013)

Crowdsourced verification

Page 59: Crisis Informatics (November 2013)
Page 60: Crisis Informatics (November 2013)

3

Page 61: Crisis Informatics (November 2013)

61

Carlos Castillo – [email protected]://www.chato.cl/research/

Crowdsourced verificationfor crisis information

• Veri.ly• Joint project between MASDAR

and QCRI• Iyad Rahwan, Abdulfatai Popoola,

Dmytro Krasnoshtan, Attila Toth (MASDAR), Victor Naroditskiy (Univ. Southampton) + QCRI

Page 62: Crisis Informatics (November 2013)
Page 63: Crisis Informatics (November 2013)
Page 64: Crisis Informatics (November 2013)

Closing remarks

Page 65: Crisis Informatics (November 2013)

65

Carlos Castillo – [email protected]://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

Page 66: Crisis Informatics (November 2013)

66

Carlos Castillo – [email protected]://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

Temptation! Danger!

Poorly planned projects :-(

AI-complete problems

Page 67: Crisis Informatics (November 2013)

67

Carlos Castillo – [email protected]://www.chato.cl/research/

Some venues

• ISCRAM – International Conference on Information Systems for Crisis Response and Management

• SMDW – Workshop on Social Web for Disaster Management

• SMERTS – Social Media and Semantic Technologies in Emergency Response

+ the usual suspects, depending on your area ;-)

Page 68: Crisis Informatics (November 2013)

68

Carlos Castillo – [email protected]://www.chato.cl/research/

Possibility of large impact by using computer science to support

humanitarian work

=Applied computing at its best

Page 69: Crisis Informatics (November 2013)

Thank you!Carlos Castillo · [email protected]

http://www.chato.cl/research/With thanks to Patrick Meier for several slides