web mining & open source intelligence · • internet revolution – 1.5 billion users, 2x1020...

20
1 AAAS Meeting Feb 16 2008 Web Mining & Open Source Intelligence C. H. Best European Commission Joint Research Centre Institute for Protection and Security of the Citizen http://www.jrc.ec.europa.eu http://ses.jrc.it

Upload: others

Post on 10-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

1AAAS Meeting Feb 16 2008

Web Mining & Open Source IntelligenceC. H. Best

European CommissionJoint Research Centre

Institute for Protection and Security of the Citizenhttp://www.jrc.ec.europa.eu

http://ses.jrc.it

Page 2: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

2AAAS Meeting Feb 16 2008

Motivation

• Internet revolution – 1.5 billion users, 2x1020 bytes.New OS Intelligence Applications– Live Media Monitoring – EU : 23 Languages– Situation Monitoring (UN, AU,US, EC)

– Conflict early warning, Crisis response, Natural Disasters – Disease Control

– Early Warning of disease outbreaks– Counter Terrorism, Law Enforcement

–Propaganda, Radicalisation, Recruitment, Fraud– Business Intelligence

–Markets, Competitors

Page 3: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

3AAAS Meeting Feb 16 2008

RNSRapid news service

EMMEuropean Media Monitor

News Explorer Entities

News Tracking -Timelines

All Europe’s newsBreaking News25,000,000 articles Processed since May 2002

EMM in active use by EC• 4000 email alerts/day• 50000 active web users• 500 SMS/day to VIPs• 10000 keywords Real-Time• 35000 articles/day• 35 languages• 600 topic alerts Real-time

EMM in active use by EC• 4000 email alerts/day• 50000 active web users• 500 SMS/day to VIPs• 10000 keywords Real-Time• 35000 articles/day• 35 languages• 600 topic alerts Real-time

Monitors all World CountriesDerives statistical indicators and time trends

Live newsNewslettersPush SMS alertsPress Reviews

Europe Media Monitor Services

Page 4: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

4AAAS Meeting Feb 16 2008

Long Term News Tracking

Page 5: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

5AAAS Meeting Feb 16 2008

Automatic Person News Tracking in multiple languages

AutomaticContact networks

Page 6: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

6AAAS Meeting Feb 16 2008

A Polish examplehttp://press.jrc.it/NewsExplorer/entities/pl/20240.html

Page 7: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

7AAAS Meeting Feb 16 2008

NewsExplorer – cross-lingual cluster linking

Page 8: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

8AAAS Meeting Feb 16 2008

Social Networks - Relation Extraction

• Two entities are Related or “linked” through a phrase• Machine Learning of relations.

• “contacts” (met, phoned, discussed with, emailed etc.)• “support” (backed, applauded, welcomed, concurred etc.)• “criticise” (slamed, rejected, criticised, accused, etc.)• “family” (wife, son, daughter, lover, mistress etc)

• Mine 3 years of news reports • Log dates and related topics

Page 9: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

9AAAS Meeting Feb 16 2008

Social NetworkOf Contacts DuringLebanon Conflict

Page 10: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

10AAAS Meeting Feb 16 2008

EMM: Mining Social NetworksKey: Contact, family,support,criticise,generic

Page 11: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

11AAAS Meeting Feb 16 2008

Event Extraction and Knowledge Bases

• Automatically determine “who did what to whom where and when” from unstructured text.

• Machine Learning Technique used and applied to news clusters 2005-2007.

Page 12: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

12AAAS Meeting Feb 16 2008

Violent Event Processing Chain

Take the title and the firstsentence

Pattern Library

News Cluster Selection

Event Aggregation

News Cluster

Event DescriptionDate:Place:Event type:Number killed:Number wounded:Number kidnapped:Perpetrators:Description of the victims: Weapons:

Keyword Library

Pattern & KeywordMatching

Page 13: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

13AAAS Meeting Feb 16 2008

Real Time Violent Event Detection

Real time clustering every 10 minutes

geocoding

Live updatedmap display

event extraction

Page 14: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

14AAAS Meeting Feb 16 2008

EMM-Labs: Detection and visualisation of violent events

Page 15: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

15AAAS Meeting Feb 16 2008

Extraction

Modelling

Browsing

EMMClusters

ExtractionPatterns

Extracted Events

Extracted Events OntologyInstances

Visualize

Querying, Browsing and Visualization of the Knowledgebase

Ontology Modelling and Knowledgebase Population

Violent Event Extraction

Knowledge Extraction

Page 16: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

16AAAS Meeting Feb 16 2008

MediSys: Monitoring Health Threats

Page 17: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

17AAAS Meeting Feb 16 2008

Normal GoogleAdvanced Search

Extracting fullTexts from allWeb pages

Desktop Web Mining Tool

Page 18: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

18AAAS Meeting Feb 16 2008

Page 19: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

19AAAS Meeting Feb 16 2008

Summary

• Web Mining applications in operational use for • Situation/Crisis Monitoring• Law enforcement and counter-terrorism• Medical Intelligence

• Future Technical Challenges• Knowledge extraction• Audio-Visual Monitoring• Small Signal Detection• Multi-linguality

Page 20: Web Mining & Open Source Intelligence · • Internet revolution – 1.5 billion users, 2x1020 bytes. ¾New OS Intelligence Applications – Live Media Monitoring – EU : 23 Languages

20AAAS Meeting Feb 16 2008

Thank You