vigiles overview june 2010
TRANSCRIPT
Copyright © gillc S.A. 2009. All rights reserved
Flexible and Intelligent Access to Information
ColumboDiscovery™ and ColumboForensics™
June 2010
1
Copyright © gillc S.A. 2009. All rights reserved
Contents
Context
Information Challenges
Our Approach to Information Discovery
The Technologies we Use
Columbo® Information Discovery Platform
Automatic Entity Extraction
Themes and Links
Forensics Case Study
Summary and benefits
2
Copyright © gillc S.A. 2009. All rights reserved
Context
Economic pressures versus increasing demand
Increasing technical sophistication of criminal and terrorist
Criminal investigations versus digital investigations
Information technology – failure of expectations
Shortcomings of software - 8 years on from Soham
LEA working together?
3
Copyright © gillc S.A. 2009. All rights reserved
Information Challenges
Massive data volumes
Petabytes + of data (1 Petabyte (1000 terabytes) = approx 3000 million documents Forensics analysis of hundreds of devices on large cases
Diverse sources The Internet – www, blogs, twitter, social networks, virtual worlds, chat-rooms Internal – mail, office systems, intelligence databases, operational systems Third-party databases such as ISPs and Telcos Computers, storage devices, mobile phones, cameras, sat-navs, Wi Intelligence from other law enforcement agencies
Integration of data in multiple formats Structured, unstructured (text), multi-media (image, voice, video) Deleted and hidden Languages / alphabets
Dangers and shortcomings of search
Search engine issues – ranking, relevance etc. Terminology, expert knowledge of subject…. Can distort investigative approach Spellings / miss-spellings
4
Copyright © gillc S.A. 2009. All rights reserved
Some spelling challenges
5
MohammedMuhammadMohammadMuhammedMohamedMohamadMahammedMohammodMahamedMuhammodMuhamadMohmmedMohamudMohammud
hydrogen peroxide
hydrogen peroxode
hydrogen peoxide
hydrogen perioxide
hydrogen peroxcide
hydrogen peroxyde
hydrogen proxide
hydrogen pyroxide
hydrogenperoxide
hydrogen-peroxide
hydrogen peroxide.
hydrogen peroxide)
hydrogen peroxide-
hydrogen peroxide,
hydrogen peroxides
HusseinHussainHusainHusaynHuseinHusenHuseyinHussayn
112 different combinations!
Copyright © gillc S.A. 2009. All rights reserved
Our Approach to Information Discovery
Help the user to understand and explore the content
We identify entities, themes (subjects), links – in most cases automatically
People, Places, Objects, Account Numbers, Telephone Numbers, etc. Themes, Concepts, Sentiment Hard and soft (weak) links between Entities and Themes
We present this in ways that help users understand and explore (discover) the data
Entity /Theme Extractions Summaries Timelines and Graphs Connection and Relationship Diagrams Geo-location Maps
Intelligent search
Prompted Sounds like / spelt like Semantic (find similar content to this)
Automate processes including reports, where possible
6
Copyright © gillc S.A. 2009. All rights reserved
Some of the Technologies we use
We use advanced analysis techniques that result in much better conceptual understanding and forensic performance
These techniques include using semantic indexing and linking and more novel proprietary ‘digital fingerprint’ techniques (CSI – Columbo® Semantic Indexing)
Our platform is scalable and our techniques are geared to indexing and comparing massive amounts of information – many ‘discovery’ requirements are a numbers and speed game
Our platform can be trained to recognise certain patterns where appropriate (both text and image based), and can run autonomously and covertly if required
A key difference is that our solutions ‘turn search upside down’
We get the data to tell us what is there, rather than just looking for something specific We don’t search for the needle hidden in the haystack – we remove the hay and find the needle
together with whatever else might be there
Gillc has a number of products and applications, the main one of which is ColumboDiscoverytm, our integrated information discovery platform
7
Copyright © gillc S.A. 2009. All rights reserved
Columbo® Information Discovery Platform
8
Reports & Comparisons
Relationships
Timelines & Events
Geo-Location
ColumboCOREtm
(Columbo Object Resource Enhancement)ColumboDiscoverytm
(Intelligence Operations & Analysis Techniques)
Entities & Themes
Copyright © gillc S.A. 2009. All rights reserved
Automatic Entity Extraction
All structured and unstructured information resources can be automatically processed for entity extraction, including:
Documents – including web pages, social media, office applications, email, databases Digital devices – cameras, phones, SIM cards, storage devices
The entity types shown (left) are a selection of those already coded into Columbo® software. Others could include for example:
Airports and airlines Known street gangs
Additional types can be added by Gillc or added as Custom types by the end user
Metadata from applications, image files and digital devices is also extracted as entity information. For example:
Device type and ID – for phones, cameras, computers etc. Author and creation date – for enterprise documents etc.
Entity classification is customisable, and includes various identification and matching techniques, for example:
Detect entities where slang, codes or ‘street names’ are used Detect entities where there are multiple spellings Detect complex /variable formats – e.g. phone numbers, dates
9
Copyright © gillc S.A. 2009. All rights reserved
Themes and Links
Themes and Classification
Themes and sub-themes are automatically identified from textual resource information
Various techniques are used for theme deduction Various techniques are used for image classification /
identification
Links
Hard and soft links can be identified or uncovered by interacting with the information within Columbo®
Hard links show direct links between entities, entities and themes, and themes
Soft links (or weak links) can be identified by:– Analysing the presence/popularity of entities and themes in different
resources/devices– Using Columbo® Semantic Indexing (CSI) to identify varying levels of
link strength– CSI is also used for linking / categorising images
10
Copyright © gillc S.A. 2009. All rights reserved
ColumboForensicstm – case study
11
Suspect 3
Suspect 7
Suspect 2
Suspect 4
Suspect 6
Suspect 5
Suspect 1
X 2
X 2
X 2
X 7
X 4
X 5
X 9X 10
X 4X 4
X 3
Copyright © gillc S.A. 2009. All rights reserved
Forensics Process
12
SuspectTwo
SuspectOne
GatheringIndexing andAnalysis
Pro-activeComparison
BetweenSuspects
ImageProcess
ImageProcess
ImageProcess
ImageProcess
ImageProcess
ImageProcess
ImageProcess
ImageProcess
E01
E01
E01
E01
3 days
(7 suspects, 22 phones, 37 computers)(Existing search driven approach requires each device to be analysed separately – estimate of 55- 75 days)
Copyright © gillc S.A. 2009. All rights reserved
ColumboForensicstm – case study benchmarks
Task FTK results (secs) ColumboForensics™ (secs)
List all the documents containing paint and brush 10 10
Which people are mentioned in documents containing paint and brush not possible < 10Bookmark and extract all relevant content of all documents containing paint approx 1 day 20
Extract all the sentences mentioning paint approx 1 day 20
Which telephone numbers are associated with 07771 123456 not possible < 10
Which names are associated with 07771 123456 not possible < 10
Copyright © gillc S.A. 2009. All rights reserved
Some other Law Enforcement considerations
All necessary security features including:
Multiple protection levels
Security at document, entity and word level – extensive audit trail options
Can build case / suspect ‘databases’ allowing:
Intra-case analysis
Cross-case analysis
Suspect consolidation whilst retaining case integrity
Secure links between agencies could allow controlled comparison of content
Performant
Quick response times and turn-around offers real opportunity to change processes
Potential for comprehensive but rapid tri-age
14
Copyright © gillc S.A. 2009. All rights reserved
Summary and Benefits
The Columbo® group of products are powerful, next generation information discovery applications
Columbo® applications are tailored towards ‘discovery’, as opposed to ‘search’
Search implies that the user already knows what to look for Discovery allows the data to identify what may be relevant, and allows the user to
interact with it in order to find the information contained within it
The software delivers significant efficiency savings, by both rapidly finding relevant data and automating much of the process including reporting
The software enhances effectiveness, automatically compares content and incrementally builds an intelligence repository
Columbo® is “implementation-lite” and has capacity to readily link diverse agencies together, sharing and collaborating critical data as appropriate
15