classifying microblogs for disasters
DESCRIPTION
Paper presentation at ADCS 2013, Brisbane Images used in the presentation are taken from various websites. Credits goes to their creators.TRANSCRIPT
Classifying Microblogs for Disasters
Sarvnaz Karimi Jessie Yin Cecile Paris
Social media plays an important role during disasters
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi2 |
• Realtime, popular, free• Accessible• Available
During disasters people share useful information
• lyttelton tunnel had reopened last night #eqnz
Or ask for help or information
• Kindercare in Fendalton, Christchurch - all okay? We are trying to get through with no luck. #eqnz
• Need help. Any donors of medicines for diarrhea cases in Baganga, Davao Oriental pls? #reliefPH #PabloPH pls tweet @KarloPuerto
Or even offer help
• I hv final yr medstudents in parade rd addington! They cn help. Bruce n boys #eqnz
And sometimes not so useful
• Someone just wondered aloud if the #eqnz was just another sign from God that he doesn't want The Hobbit to get made. #maybe?
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi3 |
Challenges of Working with Twitter Data
• In fact, lots of times Tweets are useless babbles
• Tweets are really short (140 characters)
• People often speak informal language
• And even in serious messages, tweets can be abbreviated to compensate for the length
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi4 |
Finding useful content can become looking for a needle in a haystack!
I hv final yr medstudents in parade rd addington! They cn help. Bruce n boys #eqnz
How to filter massive amount of Twitter messages in order to identify high value tweets related to natural or man-made disasters, or even specific types of disaster?
CSIRO: positive impact | Presentation title | Presenter name5 |
Keyword search to find disaster-related tweets
• Lots of false-positives due to multiple senses or ambiguities of keywords such as “fire”, or even “earthquake”
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi6 |
She’s a natural disaster: a tsunami in her eyes an earthquake in her chest a hurricaneflooding her mind she’s a travelingcatastrophe
In a pool of over 5700 tweets retrieved using keyword search, we had over 50% false positives.
Our work: Classify Twitter Stream for Disasters
•Classify tweets as Disaster and Non-disasterBinary Classification
•Classify tweets into disaster types:
– Earthquake
– Storm (hurricane, tornado, cyclone)
– Fire
– Flooding
– Other (e.g Civil disorder, Traffic accident)Multi-class classification problem
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi7 |
Related Studies
• Tweet classification: o Papers that used classifiers for categories such as news and junk, or opinion,
and private messages.
o Papers that heavily used hashtags.
o Adding context to short tweets by aggregating those that share the same hashtags, or by adding URL contents.
• Twitter during disasters:o Qualitative analysis on tweets published during a specific event to study
microblogger behaviour.
o On of the most cited works is by Sakaki et al. (2010), which made a classifier for earthquake to alert people. Their classifier was based on tweet length, position of query term (earthquake or shaking) in the tweet, n-grams, context of the query terms.
CSIRO: positive impact | Presentation title | Presenter name8 |
We do not focus on specific incidents, and do not assume the hashtags are known.We study different types of disasters, not just one.
Twitter Data
• Sampled a total of 6,500 tweets published in a range of two years, from December 2010 till November 2012
• Data was gathered using keyword search (fire, flooding, storm, tornado, hurricane, cyclone, and earthquake, accident).
• No retweets
• A number of disasters were included, among others: earthquake in Christchurch, New Zealand, 2011, Cyclone Yasi QLD, 2011, QLD floods, 2010-2011, bushfires in VIC, 2011, and the Hurricane Sandy, US 2012.
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi9 |
Annotations
• Two stage annotations
• Crowd-sourced the annotations using Crowdflower.
• Annotators where asked:1. Is this tweet talking about a disaster? (Yes or No);
2. What type of disaster is it talking about? (multiple choice)
• Each tweet was annotated by three annotators
• 5,747 had full agreement
• 2850 tweets were identified as disaster-related and 2,897 as non-disaster
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi10 |
Classifiers
• SVM Classifier
• Multinomial Naive Bayes Classifier
• We only reported SVM. Naive Bayes consistently performed worse in all the experiments.
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi11 |
C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology
Classification Features
Specific Features:• N-grams
• Hashtags
• Mentions
Generic Features:• Mention count
• Hashtag count
• Links
• Tweet length
CSIRO: positive impact | Presentation title | Presenter name12 |
What is the effect of using incident-specific compared to generic features inclassification accuracy? What are the best features to use for disaster classifiers?
Evaluation: Cross-validation vs. Time-Split
• K-fold cross-validation (e.g., 10 fold) is used in most similar studies (Sriram et al., 2010, Takemura and Tajima, 2012, Vosecky et al., 2012)
Problem:
• It overlooks the time-dependency among microblog data, and uses future-evidence, including hashtags, disaster names
Alternative:
• Time-split evaluation: Sort the data based on time, take the latest chunk as testing and others for training.
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi13 |
Disaster or Non-Disaster
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi14 |
Disaster-Type Classification
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi15 |
What features worked
• When training data is small, counts were better features. – Disaster-related tweets had 1.2 hashtags on average, versus 0.4 for non-
disaster tweets
• When our knowledge of an event is limited, hashtags or mentions are not so useful.
• In our experiments, classification accuracy using bigram features was worse than unigram.
CSIRO: positive impact | Presentation title | Presenter name16 |
Generic Features vs. Event-specific Features
• We need to learn the patterns that imply a type of natural or man-made disaster:
Same location, no disaster:
CSIRO: positive impact | Presentation title | Presenter name17 |
A massive cloud of smoke can be seen in south-west LakeMacquarie from the Wyee bushfire #nswfires #wyeefire@NewcastleHerald
Lake Macquarie is big & beautiful http: // lockerz.com/ s/ 257143427
Can we cross-train for disaster types?
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi18 |
Application:
- Compromise for disaster types with little training data.
- Reduce ambiguity
Training Testing
Cross-Disaster Classification
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi19 |
Generic featureSpecific Feature
How much our classifiers can be generalised to identify previously unseen disaster types?
• We used under-sampling to create training and testing data
Can we cross-train for disaster types?
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi20 |
• Yes! Our results showed promise, especially for fire.
• “Language of disaster”
• Using generic features was more effective.
What’s Next
Events are often associated with a location1. Better Classifiers: We can use existence of location information
as a feature to strengthen our classifiers
2. Help taking actions on the information: Once we know a tweet is talking about a disaster, we can then extract information on locations. This could help emergency responders in resource allocation.
• We have already established that traditional Named Entity Recognisers are able to identify locations in tweets with high accuracy*. Now we need to pinpoint them on the map!
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi21 |
* J. Lingad, S. Karimi, J. Yin, Location Extraction From Disaster-Related Microblogs, Proceedings of the 22nd international conference on World Wide Web companion, 2013