© 2010 ibm corporation 1 content analytics solutions september, 2010

14
1 © 2010 IBM Corporation Content Analytics Solutions September, 2010

Upload: eric-cupps

Post on 16-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

1 © 2010 IBM Corporation

Content Analytics SolutionsSeptember, 2010

Page 2: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

2 © 2009 IBM Corporation

Social Network Analysis

Showing relationships between people, organisations, phone numbers, etc.

Event Timeline Analysis

Plotting specific events against a timeline.

Social Network and Event Timeline Analysis are just two examples of this – there are many more.

Content Analytics – An Increasingly Important Solution Component

Page 3: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

3 © 2009 IBM Corporation

SOURCE EXTRACT CORRELATED FUSED

Structured

File System

Web Articles

URN 12345678Born 1970

LukeBorn 22/01/1970

J S LukeAge mid-30s

URN 12345678Born 1970

URN 12345678Born 22/01/1970

URN 12345678Age mid-30s

URN 12345678Name J S LukeBorn 22/01/1970Age 34

An Open Information Centric Architecture

Page 4: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

4 © 2009 IBM Corporation

SOURCE EXTRACT STORE

DATE NAME LOCATION

19/04/03 Luke UK

….. ….. …..

24/07/02 Bent USA

NAME DATE FLIGHT

Biddle 29/08/2004 BA 256

….. ….. …..

Coates 21/07/2001 QA 725

CORRELATION& FUSION

TOOLS

Structured

Web Articles

NAME DOB

Luke 22/01/1970

….. …..

Bent 25/12/0000

Visualisation i2

GIS ESRI Tenet

Search & Discovery Engines

OmniFind IBM Content Analytics

Data Fusion & Mining SPSS EAS

…..

File System

An Open Information Centric Architecture

Page 5: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

5 © 2009 IBM Corporation

An IBM Content Analytics Solution InLaw Enforcement

Now, this police department can:

• Check for errors & inconsistencies with existing databases

• Provide management with actionable information

• Have improved search capabilities

• Perform identity resolution and relationship mining

Lockable pocket knifeEvidence_2_Description

1 oz Cannabis ResinEvidence_1_Description

IpswichSuspect_Addr_Town

22 East Dene RidgeSuspect_Addr_Street

Ford MondeoSuspect_Vehicle_Make

WhiteSuspect_Vehicle_Colour

W563WDLSuspect_VRN

SetsukoSuspect_Surname

JohnSuspect_Forename

15/06/2006 : 23:47Arrest_Date_Time

PC 143Arresting_OfficerPC 143 (Hunter)15 June 2006 23:47Suspect identified himself as John Setsuko. Matched description given by night club doorman (IC1, Male, Ag 22-24 yrs, blue Everton shirt). Stopped whilst driving White Ford Mondeo, W563 WDL. Address given as 22 East Dene Ridge, Copdock, Ipswich. Searched at scene and found in possession of 1oz Cannabis Resin and lockable pocket knife.

Page 6: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

6 © 2009 IBM Corporation

Page 7: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

7 © 2009 IBM Corporation

SOURCE EXTRACT STORE

DATE NAME LOCATION

19/04/03 Luke UK

….. ….. …..

24/07/02 Bent USA

NAME DATE FLIGHT

Biddle 29/08/2004 BA 256

….. ….. …..

Coates 21/07/2001 QA 725

CORRELATION& FUSION

TOOLS

Structured

Web Articles

NAME DOB

Luke 22/01/1970

….. …..

Bent 25/12/0000

Visualisation i2

GIS ESRI Tenet

Search & Discovery Engines

OmniFind IBM Content Analytics

Data Fusion & Mining SPSS EAS

…..

File System

Open Information Architecture

Page 8: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

8 © 2009 IBM Corporation

IBM Visual Search For A Government Agency

The Goal:

The Problem:

The Solution:

Reducing analysts time in locating relevant information.

Keyword search technologies do not allow the definition of complex searches. For example, “find every person mentioned in a document describing drug smuggling associated with another person mentioned in a document describing organised crime.”Deployment of a graphical search interface enabling the definition of complex patterns.

Page 9: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

9 © 2009 IBM Corporation

Find groups of 3 people who are linked together and are associated with the same organization

Page 10: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

10 © 2009 IBM Corporation

IBM Success at a Government Agency

The automated solution saved each analyst over 6 hours per day, improving the quality and consistency of analysis

The Goal:

The Problem:

The Solution:

Identify the re-occurrence of phone numbers within historical documents.

Using keyword search technologies had historically resulted in large numbers of false hits for credit card, visa and other reference numbers. The tedious nature of the task also resulted in oversights and errors.

Deployment of an automated software solution to analyze documents and identify recurring phone numbers

• Semantic rules were used to ensure a high degree of accuracy

• All extracted phone numbers were compared against other documents with the results visualized through a carefully designed User Interface.

Page 11: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

11 © 2009 IBM Corporation

• What’s the business case?

• How good is the text analytics?

• How do we know how good the text analytics is?

• How do we respond to changes in the content and of course the business environment?

• Are we creating, rather than solving a problem, when we invest in text analytics?

What Are The Inhibitors?

Page 12: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

12 © 2009 IBM Corporation

New Architectural Models For Text Analytics

Page 13: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

13 © 2009 IBM Corporation

New Architectural Models For Text AnalyticsReal-time Analysis

Index Driven Annotation Engine

Node 1

Interactive Rule Development &

Manual Annotation

Enterprise Services

Geospatial Analysis

Network Analysis

SemanticSearch

Node 1

Node 2

Node n

• Large scale development / training / test corpus

• Near real-time feedback on impact

• Analytics as opposed to speculation (mining instead of prospecting)

Page 14: © 2010 IBM Corporation 1 Content Analytics Solutions September, 2010

14 © 2010 IBM Corporation

IBM Text Analytics SolutionNovember, 2009