value mining: how entity extraction informs analysis

25
Value Mining: How Entity Extraction Informs Analysis June 2012 | Andrew Strite

Upload: ikanow

Post on 30-Jun-2015

1.346 views

Category:

Technology


0 download

DESCRIPTION

Learn how to create understanding from big data and how entity extraction and open analytics creates understanding from the deep web.

TRANSCRIPT

Page 1: Value Mining: How Entity Extraction Informs Analysis

Value Mining: How Entity Extraction Informs Analysis

June 2012 | Andrew Strite

Page 2: Value Mining: How Entity Extraction Informs Analysis

Agenda• Big Data and Document Analysis• Case Study: Federal Agency

– Problem Definition– Open Analytics & Entity Extraction– Reporting and Visualization– Results Assessment

• Questions

Page 3: Value Mining: How Entity Extraction Informs Analysis

The Big Data Problem

Data is becoming the new raw material of business: an economic input almost on par with

capital and labor.

“Every day I wake up and ask, ‘how can I flow data better, manage data better, analyze data better?”

Rollin Ford, the CIO of Wal-Mart

Page 4: Value Mining: How Entity Extraction Informs Analysis

Solution: Document Analysis"Document Analysis refers tocomputer-assisted analysis of large numbers of documents in order to answer questions about the content of a document set.”Source: http://www.text-tech.com/docanalysis/definition.html

Page 5: Value Mining: How Entity Extraction Informs Analysis

Document Analysis

• The goal is to:– Extract Entities (people, places, things)– Create Associations between entities (in the

form of noun-verb-noun), e.g.:• John Doe lives in Washington, D.C• John Doe is married to Jane Doe• John Doe is a Virgo• John Doe traveled to Mexico on July 6th, 2011

• And…

Page 6: Value Mining: How Entity Extraction Informs Analysis

Document Analysis

• Turn Who, What, When andWhere into a unified data structure that supports data analytics and visualization.

Whopeople, organizations, facilities, company

Whatevents, summaries,facts, themes

Whenpast, present, future dates

Wherecity, state, country, coordinate

Page 7: Value Mining: How Entity Extraction Informs Analysis

Document Analysis

Case Study: Federal Agency

Page 8: Value Mining: How Entity Extraction Informs Analysis

Overview

A Federal client produced reports for other DoD components and wanted to know:

“Did our reports meet customer needs?”

First step: assess historical reporting

“What were teams writing about and when?”

Page 9: Value Mining: How Entity Extraction Informs Analysis

Problem: Unstructured Data

• Plenty of raw data, but no way to get at it– 6K+ unstructured documents – 15+ file types– No standard formats

• Teams (Who)• Dates (When)• Topics (What)

– Some content not relevant

Page 10: Value Mining: How Entity Extraction Informs Analysis

Early Attempts

• Initial client attempts to solve the problem mostly involved manual review– High document volume = labor intensive– Assessing relevance = skilled labor

• Total process tied up skilled analysts for hundreds of man-hours.

• Manual review prone to error– Incomplete attempts corrupted data

Page 11: Value Mining: How Entity Extraction Informs Analysis

Solution: Open Analytics

• Process to design and implement analytical solutions

• Joins open tools and agile engineering techniques

• Goal is to enable organizations to quickly deliver smart analysis and enable top line growth

Page 12: Value Mining: How Entity Extraction Informs Analysis

Mechanism: Infinit.e

CollectingStoring

EnrichingRetrieving

AnalyzingVisualizing

Unstructured documents &

Structured records

Infinit.e is a scalable

framework for

Page 13: Value Mining: How Entity Extraction Informs Analysis

Infinit.e Concept

• Documents• Presentations• Spreadsheets• Meeting notes• Email• IM chats• Reports• Social

• Log files• Databases• Apps

80% Unstructured

20% Structured

Unstructured and Structured Data

• Entities• Events• Facts• Sentiment• Geospatial• Temporal• Themes

Page 14: Value Mining: How Entity Extraction Informs Analysis

Infinit.e Data Model

Tablet ownership levels hit 18% in China, the UK and US versus 3% in November 2010

Bernanke, 57 said in his testimony price increases “have begun to moderate” after a jump in oil costs earlier this year

Duke and Progress announced merger plans in January 2012

<Incident> <uid>20101043423</uid> <subject>1 person killed in armed attack by suspected Boko Haram in Maiduguri, Borno, Nigeria</subject> <multipleDays>No</multipleDays> <eventDate>06/04/2011</eventDate></Incident>

Whopeople, organizations, facilities, company

Whatevents, summaries,facts, themes

Whenpast, present, future dates

Wherecity, state, country, coordinate

Page 15: Value Mining: How Entity Extraction Informs Analysis

Applying Infinit.e

Open Analytic and Agile Intelligence architecture

“What were teams writing about and when?”

Page 16: Value Mining: How Entity Extraction Informs Analysis

Harvested Entities

Page 17: Value Mining: How Entity Extraction Informs Analysis

Reporting and Visualization

• Queries performed on the data, providing breakouts by team, topic, and dates

• Flexible visualization– Built-in visualization framework– Multiple export options

Page 18: Value Mining: How Entity Extraction Informs Analysis

Finding Value

• Over the course of 2.5 weeks, we applied the entity-based data model to our client’s document analysis problem

• Major advantages to this approach were:– Agility– Precision– Relevance

Page 19: Value Mining: How Entity Extraction Informs Analysis

Agility

• Automation reduced processing time:– Manual processing time: ~480 hours– Automated processing time: 2-3 hours

• Speed enabled iterative development– Extraction adapted alongside analysts’

understanding of data– Positive feedback loop

Page 20: Value Mining: How Entity Extraction Informs Analysis

Precision

• Entity definitions created from original data– Definitions improved based on feedback

• Automation ensures uniform application across data set

entity1

entity2

entity3

entity3 entity1TOPIC1

TOPIC2

TOPIC1TOPIC2

Page 21: Value Mining: How Entity Extraction Informs Analysis

Relevance

• Entity extraction informs quality control– Duplicates identified based on similar entities– Exclude documents based on missing entities– Minimizes risk of data corruption– Reduced need for analyst review Duplicates

Missing Meta-Data

Page 22: Value Mining: How Entity Extraction Informs Analysis

The Results

• Extracted entities became key meta-data6K+ unstructured documents became…

…3.5K documents with value to the study

Page 23: Value Mining: How Entity Extraction Informs Analysis

The Results

• Our client was able to complete the research shortly after final extraction

• Confidence in methodology and results bolstered the value of recommendations

• Considering similar approaches for future projects

Page 24: Value Mining: How Entity Extraction Informs Analysis

Bottom Line

Using document analysis significantly…

… reduces the time to ingest data.

… cuts right to relevant information.

… builds a framework for future analysis.

Page 25: Value Mining: How Entity Extraction Informs Analysis

Thank You!

Andrew Strite

www.ikanow.com

[email protected]

301.513.1384