© 2010 ibm corporation 1 content analytics solutions september, 2010
TRANSCRIPT
1 © 2010 IBM Corporation
Content Analytics SolutionsSeptember, 2010
2 © 2009 IBM Corporation
Social Network Analysis
Showing relationships between people, organisations, phone numbers, etc.
Event Timeline Analysis
Plotting specific events against a timeline.
Social Network and Event Timeline Analysis are just two examples of this – there are many more.
Content Analytics – An Increasingly Important Solution Component
3 © 2009 IBM Corporation
SOURCE EXTRACT CORRELATED FUSED
Structured
File System
Web Articles
URN 12345678Born 1970
LukeBorn 22/01/1970
J S LukeAge mid-30s
URN 12345678Born 1970
URN 12345678Born 22/01/1970
URN 12345678Age mid-30s
URN 12345678Name J S LukeBorn 22/01/1970Age 34
An Open Information Centric Architecture
4 © 2009 IBM Corporation
SOURCE EXTRACT STORE
DATE NAME LOCATION
19/04/03 Luke UK
….. ….. …..
24/07/02 Bent USA
NAME DATE FLIGHT
Biddle 29/08/2004 BA 256
….. ….. …..
Coates 21/07/2001 QA 725
CORRELATION& FUSION
TOOLS
Structured
Web Articles
NAME DOB
Luke 22/01/1970
….. …..
Bent 25/12/0000
Visualisation i2
GIS ESRI Tenet
Search & Discovery Engines
OmniFind IBM Content Analytics
Data Fusion & Mining SPSS EAS
…..
File System
An Open Information Centric Architecture
5 © 2009 IBM Corporation
An IBM Content Analytics Solution InLaw Enforcement
Now, this police department can:
• Check for errors & inconsistencies with existing databases
• Provide management with actionable information
• Have improved search capabilities
• Perform identity resolution and relationship mining
Lockable pocket knifeEvidence_2_Description
1 oz Cannabis ResinEvidence_1_Description
IpswichSuspect_Addr_Town
22 East Dene RidgeSuspect_Addr_Street
Ford MondeoSuspect_Vehicle_Make
WhiteSuspect_Vehicle_Colour
W563WDLSuspect_VRN
SetsukoSuspect_Surname
JohnSuspect_Forename
15/06/2006 : 23:47Arrest_Date_Time
PC 143Arresting_OfficerPC 143 (Hunter)15 June 2006 23:47Suspect identified himself as John Setsuko. Matched description given by night club doorman (IC1, Male, Ag 22-24 yrs, blue Everton shirt). Stopped whilst driving White Ford Mondeo, W563 WDL. Address given as 22 East Dene Ridge, Copdock, Ipswich. Searched at scene and found in possession of 1oz Cannabis Resin and lockable pocket knife.
6 © 2009 IBM Corporation
7 © 2009 IBM Corporation
SOURCE EXTRACT STORE
DATE NAME LOCATION
19/04/03 Luke UK
….. ….. …..
24/07/02 Bent USA
NAME DATE FLIGHT
Biddle 29/08/2004 BA 256
….. ….. …..
Coates 21/07/2001 QA 725
CORRELATION& FUSION
TOOLS
Structured
Web Articles
NAME DOB
Luke 22/01/1970
….. …..
Bent 25/12/0000
Visualisation i2
GIS ESRI Tenet
Search & Discovery Engines
OmniFind IBM Content Analytics
Data Fusion & Mining SPSS EAS
…..
File System
Open Information Architecture
8 © 2009 IBM Corporation
IBM Visual Search For A Government Agency
The Goal:
The Problem:
The Solution:
Reducing analysts time in locating relevant information.
Keyword search technologies do not allow the definition of complex searches. For example, “find every person mentioned in a document describing drug smuggling associated with another person mentioned in a document describing organised crime.”Deployment of a graphical search interface enabling the definition of complex patterns.
9 © 2009 IBM Corporation
Find groups of 3 people who are linked together and are associated with the same organization
10 © 2009 IBM Corporation
IBM Success at a Government Agency
The automated solution saved each analyst over 6 hours per day, improving the quality and consistency of analysis
The Goal:
The Problem:
The Solution:
Identify the re-occurrence of phone numbers within historical documents.
Using keyword search technologies had historically resulted in large numbers of false hits for credit card, visa and other reference numbers. The tedious nature of the task also resulted in oversights and errors.
Deployment of an automated software solution to analyze documents and identify recurring phone numbers
• Semantic rules were used to ensure a high degree of accuracy
• All extracted phone numbers were compared against other documents with the results visualized through a carefully designed User Interface.
11 © 2009 IBM Corporation
• What’s the business case?
• How good is the text analytics?
• How do we know how good the text analytics is?
• How do we respond to changes in the content and of course the business environment?
• Are we creating, rather than solving a problem, when we invest in text analytics?
What Are The Inhibitors?
12 © 2009 IBM Corporation
New Architectural Models For Text Analytics
13 © 2009 IBM Corporation
New Architectural Models For Text AnalyticsReal-time Analysis
Index Driven Annotation Engine
Node 1
…
Interactive Rule Development &
Manual Annotation
Enterprise Services
Geospatial Analysis
Network Analysis
SemanticSearch
…
Node 1
Node 2
Node n
• Large scale development / training / test corpus
• Near real-time feedback on impact
• Analytics as opposed to speculation (mining instead of prospecting)
14 © 2010 IBM Corporation
IBM Text Analytics SolutionNovember, 2009