geographic context analysis of volunteered information

46
Geographic Context Analysis of Volunteered Information Frank Ostermann Laura Spinsanti European Commission – Joint Research Centre Institute for Environment and Sustainability Digital Earth and Reference Data Unit

Upload: frank-ostermann

Post on 07-May-2015

108 views

Category:

Social Media


0 download

TRANSCRIPT

Page 1: Geographic Context Analysis of Volunteered Information

Geographic Context Analysis

of

Volunteered Information

Frank OstermannLaura Spinsanti

European Commission – Joint Research CentreInstitute for Environment and Sustainability

Digital Earth and Reference Data Unit

Page 2: Geographic Context Analysis of Volunteered Information

2April 11, 2023

Outline

1. Some background on social media in crisis contexts

2. Geographic context for credibility and relevance

3. Twitter and Flickr for fighting forest fires

4. Some examples and results

Page 3: Geographic Context Analysis of Volunteered Information

3April 11, 2023

Opportunities for crisis response

California wildfires 2007

• Near real time

• Heterogeneous input

• More centralized information

processing (basically one

journalist)

Page 4: Geographic Context Analysis of Volunteered Information

4April 11, 2023

+ = !

Why not treat information from the citizens

as another type of sensor data?

Page 5: Geographic Context Analysis of Volunteered Information

5April 11, 2023

VGI during Crisis Events

Page 6: Geographic Context Analysis of Volunteered Information

6April 11, 2023

What social media offer… What crisis management needs…

rich up-to-date information up-to-date information

new paths of communication redundant paths of communication

noise and uncertain lineage and

accuracy of the information high-quality and reliable information

Page 7: Geographic Context Analysis of Volunteered Information

7April 11, 2023

Big Data – Big ChallengesHuman volunteered curation faces limits of

• Sustainability

• Scalability

Proposed Solution:

Automated filtering and quality assessment

Page 8: Geographic Context Analysis of Volunteered Information

8April 11, 2023

Outline

1. Some background on social media in crisis contexts

2. Geographic context for credibility and relevance

3. Twitter and Flickr for fighting forest fires

4. Some examples and results

Page 9: Geographic Context Analysis of Volunteered Information

9April 11, 2023

The two main challenges:

1. Flood of information -> What is relevant?

2. Quality of information -> What is credible and valid?

“Back at hotel. Fire skirted

round village. Little

evidence of significant

damage. Helicopters still

overhead damping scrub.

Beer unaffected”

(Canada BCGovFireInfo):

“Important notice from the Reg

Dist of Bulkley-Nechako

regarding evacuations due to

wildfires in the area

http://ow.ly/2sBxH”“Are you a fireman?

Cause you’re always there to extinguish the fire inside my heart.”

Page 10: Geographic Context Analysis of Volunteered Information

10April 11, 2023

Source

Credibility

Relevance

Context

Content

Location

Elements of quality assessment

Page 11: Geographic Context Analysis of Volunteered Information

11April 11, 2023

Relevance• Main criterion: Match with information needs of a user

• Three perspectives (Saracevic 1975, Hjorland 2010):

• User (What s/he thinks is relevant)

• System (Results matching query)

• Subject knowledge (goal-, task-oriented)

• Geographic relevance vs. geographic aspect of relevance

• Aspects include

• Topicality (Content)

• Location/Origin (Location, Context)

• Novelty (Content)

Page 12: Geographic Context Analysis of Volunteered Information

12April 11, 2023

Credibility

• Two main characteristics:

• Trustworthiness (affecting source credibility) (Source, Context)

• Expertise (affecting information credibility, or accuracy) (Location, Source)

• Two main heuristics (Metzger et al 2010)

• Social confirmation (what do others do or say?) (Source, Context)

• Expectancies (what do I already know?) (Location, Context)

Page 13: Geographic Context Analysis of Volunteered Information

13April 11, 2023

Context, Context, Context

Tackle the context deficit:

• Source characteristics: high number of followers could indicate trustworthiness and/or expertise (social confirmation)

• Context characteristics: land cover, population density, other UGC, could indicate trustworthiness and accuracy (expectancy)

Our focus: Geographic context

Page 14: Geographic Context Analysis of Volunteered Information

14April 11, 2023

Outline

1. Some background on social media in crisis contexts

2. Geographic context for credibility and relevance

3. Twitter and Flickr for fighting forest fires

4. Some examples and results

2

Page 15: Geographic Context Analysis of Volunteered Information

15April 11, 2023

Project Overview• Opportunistic sensing approach

• Geographic focus on Europe

• Collaboration with:

• European Forest Fire Information System (effis.jrc.ec.europa.eu/)

• European Media Monitor (emm.newsbrief.eu/overview.html)

• Google

• Universitat Jaume I de Castellón (Institute of New Imaging

Technologies: www.init.uji.es Geographic Information research

group: www.geoinfo.uji.es)

Page 16: Geographic Context Analysis of Volunteered Information

16April 11, 2023

Research Objectives

1. Deploy a system for using UGC in crisis decision support on

forest fires• Integration

• Quality control

• Communication of risk/uncertainty, alerts

2. Assess the added value of using UGC for forest fire response.

Page 17: Geographic Context Analysis of Volunteered Information

17April 11, 2023

• Dynamics require near real-time

processing

• Less signals since often in

sparsely populated areas

• Predictability and recurrence

facilitate sensor and model

calibration

Forest fire characteristics

Page 18: Geographic Context Analysis of Volunteered Information

18April 11, 2023

Page 19: Geographic Context Analysis of Volunteered Information

19April 11, 2023

Example pieces of information: Text Messages

User: “Back at hotel. Fire skirted round village. Little evidence of significant

damage. Helicopters still overhead damping scrub. Beer unaffected” User: “The forest fire is sooo close! We are now on evacuation notice. SOO

nervous!” News: “Okanagan forest fire forces resort evacuation: About 60 people fled in

boats from a forest fire in B.C.'s Okanagan…http://bit.ly/9cstp7” Governative (Canada BCGovFireInfo): “Important notice from the Reg Dist of

Bulkley-Nechako regarding evacuations due to wildfires in the area

http://ow.ly/2sBxH”

Microblogging, e.g. Twitter

Page 20: Geographic Context Analysis of Volunteered Information

20April 11, 2023

Example pieces of information: Visual and Annotations

Page 21: Geographic Context Analysis of Volunteered Information

21April 11, 2023

Scoring VGI

• Sum of weighted scores: QS(VGIj) = ∑Ni=1wisji

• with w being weight for criterion i, and s being the score for the VGI object j

• Topicality: keyword-based

• Proximity: next concurrent reported hotspot

• Land cover: Forest, no-Forest, Built-up

• Population Density: Risk factor

• Information clusters: Similar messages or lone signal?

Page 22: Geographic Context Analysis of Volunteered Information

22April 11, 2023

Workflow overview

Page 23: Geographic Context Analysis of Volunteered Information

23April 11, 2023

Initial context info, tasks and data sources

• Spatio-temporal location -> Geocoding

• Topicality -> Natural Language Processing

• Landcover -> Corine Landcover Classification

• Known fires/hotspots -> MODIS hotspot data

• Population density -> Official LAU2 statistics

Page 24: Geographic Context Analysis of Volunteered Information

24April 11, 2023

Detailed implemented 2011 CONAVI workflow

1.1 RetrievalScheduled Java code

accessing APIs

2.1 TopicalityScheduled PLSQL job

2.2 Geo-Codinga) Scheduled PLSQL jobb) Scheduled Java code

2.3 Geographic contextScheduled PLSQL job

3.1 Spatio-temporal clustering

Scheduled Python script calling SatScan job

2.4 Quality AssessmentScheduled PLSQL job

1.2 StorageScheduled Java code writing to DBMS

Oracle DBMS

3.2 Quality Re-AssessmentScheduled PLSQL job

TwitterStream-ing API

FlickrSearch API

DisseminationSMS, WFS, WMS, RSS, SES

EFFISHotspot Data

European Media MonitorGeo-coding API

Page 25: Geographic Context Analysis of Volunteered Information

25April 11, 2023

Detailed implemented 2012 CONAVI workflow

1.1 RetrievalScheduled Java code

accessing APIs

2.1 TopicalityScheduled PLSQL job

2.2 Geo-Codinga) Scheduled PLSQL jobb) Scheduled Java code

2.3 Geographic contextScheduled PLSQL job

3.1 Spatio-temporal clustering

Scheduled Python script calling SatScan job

2.4 Quality AssessmentScheduled PLSQL job

1.2 StorageScheduled Java code writing to DBMS

Oracle DBMS

3.2 Quality Re-AssessmentScheduled PLSQL job

TwitterStream-ing API

FlickrSearch API

DisseminationSMS, WFS, WMS, RSS, SES

EFFISHotspot Data

European Media MonitorGeo-coding API

Page 26: Geographic Context Analysis of Volunteered Information

26April 11, 2023

Scoring VGI TopicalityFour cases:

A: Probably about a forest fireB: Possibly about a forest fireC: Possibly not about a forest fireD: Probably not about a forest fire

FOR EACH UGCCASE = DIF feu OR feux THEN

CASE = CIF hectares THEN CASE = AIF foret OR forets OR fôret OR fôrets THEN CASE = ANEXT

IF foret OR forets OR fôret OR fôrets THEN CASE = CNEXT

IF hectares THEN CASE = CNEXT

IF incendie THEN CASE = BIF hectares THEN CASE = AIF foret OR forets OR fôret OR fôrets THEN CASE = A

Only CASE = {A,B} gets send to GISCO

Page 27: Geographic Context Analysis of Volunteered Information

27April 11, 2023

Geocoding VGI

Geocoders used: • GISCO/LAU2 brute string matching• European Media Monitor algorithms• Yahoo! Placemaker (2010)Aim: Exploit triplets of Location information (Source, Content, Message)

TWITTER FLICKRAugust 2010 August 2011 August 2010 August 2011

Number of retrieved VGI 2,904,065 7,996,228 7,991 17,850

Percentage with toponym 35% 27% 53% 50%

Percentage with geocode 1.1% 0.92% 20% 21%

Page 28: Geographic Context Analysis of Volunteered Information

28April 11, 2023

Geographic Context of VGI• All context information aggregated on LAU2 GISCO level (because

Geo-coding is done at this level)

• Each VGI is checked for information and scored on

 

 

Page 29: Geographic Context Analysis of Volunteered Information

29April 11, 2023

Clustering VGI• SatScan external software

• Scheduled Python script

1. Reads new VGI from database

2. Converts it to SatScan input format

3. Calls SatScan from the command line with appropriate

parameters

4. Waits for SatScan to complete analysis

5. Reads SatScan output

6. Stores relevant information in database

Page 30: Geographic Context Analysis of Volunteered Information

30April 11, 2023

Clustering parameters• Type of clustering algorithm

• Spatial location of clusters based on grid/locations or not

• Type of spatial overlap of clusters

• Maximum spatial cluster size

• Maximum temporal cluster size

Used in 2011: Discrete Poisson adjusting for population, no

grid, no overlap, max radius 50 km, max temporal extent 10%

of study period (9 days)

Page 31: Geographic Context Analysis of Volunteered Information

31April 11, 2023

Outline

1. Some background on social media in crisis contexts

2. Geographic context for credibility and relevance

3. Twitter and Flickr for fighting forest fires

4. Some examples and results

2

Page 32: Geographic Context Analysis of Volunteered Information

32April 11, 2023

Case Studies:

2010 and 2011 French forest fires

and their social media echo

Page 33: Geographic Context Analysis of Volunteered Information

33April 11, 2023

2010 processing steps and corresponding VGI volume

Processing steps applied Data volume

(0) Keyword filtered retrieval from API8 million Tweets 700 thousand Flickr images

(1) Filtering for French keywords611,274 Tweets 61,697 Flickr images

(2) Filtering for incendie keyword6,754 Tweets458 Flickr images

(3) Successfully geocoded VGI1,123 Tweets293 Flickr images

(4) Filtering for location in France 437 Tweets243 Flickr images

Page 34: Geographic Context Analysis of Volunteered Information

34April 11, 2023

Geocoding UGC

Geocoders used: • GISCO/LAU2 brute string matching• European Media Monitor algorithms• Yahoo! Placemaker• Exploit triplets of Location information (Source, Content, Message)

TWITTER FLICKRAugust 2010 August 2011 August 2010 August 2011

Number of retrieved UGGC 2,904,065 7,996,228 7,991 17,850

Percentage with toponym 35% 27% 53% 50%

Percentage with geocode 1.1% 0.92% 20% 21%

Page 35: Geographic Context Analysis of Volunteered Information

35April 11, 2023

WHEN

At least 7 Tweets, 35 fires

96% during first three days80% during first two days

Tweeting on Forest Fires

Page 36: Geographic Context Analysis of Volunteered Information

36April 11, 2023

Spatio-temporal clustering for event detection

CaseTotal

number of Clusters

True Positives False Positives

Undetected Fires

1 (default, all locations) 6 3 3 3

2 (default, known fire locations) 4 3 1 3

3 (modified, all locations) 74 7# 67 0

4 (modified, known fire locations) 8 6 2 0

Default: Space-Time Permutation, no spatial overlap, max spatial size unrestricted, max temporal size 50%Modified: spatial overlap except sharing centers

Page 37: Geographic Context Analysis of Volunteered Information

37April 11, 2023

Case study on French forest fires: spatio-temporal clustering (Case 4)

Page 38: Geographic Context Analysis of Volunteered Information

38April 11, 2023

Space-time cubeTIM

E D

IMEN

SIO

N

(showing un-clustered VGI as grey dots)

Page 39: Geographic Context Analysis of Volunteered Information

39April 11, 2023

2011 processing steps and corresponding VGI volumeProcessing steps applied Data volume

(0) Keyword filtered retrieval from API21.9 million Tweets 54,000 Flickr images

(1) Filtering for French keywords659,676 Tweets 39,016 Flickr images

(2) Calculating topicality and filtering high scores 25,684 VGI items

(3) Successfully enriched VGI 5,770 VGI items

(4) Spatio-Temporal Clustering 129 clusters containing 2,682 VGI

(5) Excluding smaller clusters (<6 items) 75 clusters containing 2,565 VGI

(6) Filtering for keywords in clusters 11 clusters containing 469 VGI

Page 40: Geographic Context Analysis of Volunteered Information

40April 11, 2023

2011 French geo-located VGI, MODIS hotspots, and forest cover

Page 41: Geographic Context Analysis of Volunteered Information

41April 11, 2023

(2) Machine-learned relevance filter: 25,684 items left

(3) Geocoded and context enriched:5,770 items left

(4) Clustered in space and time: 129 clusters with2,682 items

(5) Second relevance filter: 11 clusters left with 469 items

(1) Containing French keywords:659,676 Tweets and39,016 Flickr images

Geographic Context Analysis of Volunteered Information (GeoCONAVI)

Page 42: Geographic Context Analysis of Volunteered Information

42April 11, 2023

2011 French forest fires, VGI clusters, and forest cover

Page 43: Geographic Context Analysis of Volunteered Information

43April 11, 2023

EFFIS

Page 44: Geographic Context Analysis of Volunteered Information

44April 11, 2023

Results online:

http://s-jrciprap258p/vgi/vgitwitter/

http://geocommons.com/maps/183605

Page 45: Geographic Context Analysis of Volunteered Information

45April 11, 2023

Some preliminary results: • Simple keyword queries suffice

• Additional Geo-coding indispensable

• Topicality and context filtering plus spatio-temporal clustering

crucial

• Able to detect fires from Tweets and Flickr images by spatio-

temporal clustering

• Relevance, credibility and overall quality vary greatly, thus more

rules and human assessment needed

Page 46: Geographic Context Analysis of Volunteered Information

46April 11, 2023

Thank you for your attention!

{Frank.Ostermann, Laura.Spinsanti}@jrc.ec.europa.eu