Content Analytics for Legacy Data Retention
The Dayhuff Group
The Dayhuff Group has a long history of providing enterprise content management solutions in a wide variety of industries. • In business since 1997• IBM Premier Business Partner• Software ValueNet Partner• Over 180 projects at 80 companies• 96% customer satisfaction rating
The information lifecycle governance problem
Companies that cite defensible disposal as
key result of governance programs
98%
Companies that can defensibly dispose
today
22%
Average cost to collect, cull and
review information per legal case1
$3M
Portion of information unnecessarily
retained2
70%
Amount of IT budget spent on storage3
17%
Projected information growth, 2009-20204
44x
Watson and IBM ECM Today
• Natural Language Processing (NLP) is the cornerstone to translate interactions between computers and human (natural) languages– Watson uses IBM Content Analytics to perform critical NLP
functions• Unstructured Information Management Architecture
(UIMA) is an open framework for processing text and building analytic solutions– Several IBM ECM products leverage UIMA text analytics
processing:• IBM Content Analytics• OmniFind Enterprise Edition• IBM Classification Module• IBM eDiscovery Analyzer
Going from raw information to rapid insight
… to form large text-based collections from multiple internal and external sources (and types), including ECM
repositories, structured data, social media and
more.
… to form large text-based collections from multiple internal and external sources (and types), including ECM
repositories, structured data, social media and
more.
… from collections to confirm what is
suspected or uncover something new
without being forced to build models or
deploy complex systems.
… from collections to confirm what is
suspected or uncover something new
without being forced to build models or
deploy complex systems.
Aggregate and extract from multiple sources
Organize, analyze and visualize
Search and explore to derive insight
Uncover business insight through unique visual-based approachUncover business insight through unique visual-based approach
… enterprise content (and data) by
identifying trends, patterns, correlations,
anomalies and business context from
collections.
… enterprise content (and data) by
identifying trends, patterns, correlations,
anomalies and business context from
collections.
Why Content Analytics?
• #1 problem all accounts have, “don’t know what content they have”
• #2 problem “Uncontrollable Storage Cost”• Want to discover the value that may exist in their
existing information / content resources• Want to know who, where, when & how to
leverage their information / content assets• Believe they too can demonstrate a three month
return on investment• Non-threatening to IT• Low Cost investment
– Content Analytics “Try & Buy”
Dynamically analyzeDynamically analyze to know what you haveto know what you haveAggregate, correlate, visualize and explore your enterprise information to make rapid decisions about business value, relevance and disposition.
DecommissionDecommission what’s unnecessarywhat’s unnecessaryCut costs and reduce risk by eliminating obsolete, over-retained, duplicate, and irrelevant content – and the infrastructure that supports it.
Preserve and exploitPreserve and exploit the content that mattersthe content that matters
Collect valued content to manage, trust and govern throughout its lifespan
Content Analytics
Content Analytics
… policies that use rule-based metadata, advanced contextual
classification, and advanced
content analytics
LEVEL 5(Transformational)
NecessaryInformation
UnnecessaryInformation
Content In The Wild
How to decommission and preserve content
InitialInitialAssessmentAssessment
SpecificSpecificAssessmentsAssessments
1. Identify Content Sources to be assessed2. IT Initial Assessment to decommission irrelevant content3. LOB Specific Assessments to decommission over-retained and
obsolete content … and to collect and classify valued and obligated content.
4. System and Application decommissioning by IT5. Periodic audits by IT and LOB keep content environments optimized.
Identify ContentIdentify ContentSourcesSources
System & ApplicationSystem & ApplicationDecommissioningDecommissioning
Periodic AuditPeriodic Audit
Content Collection
1 2 3 4
5
Content Collection
Content Analytics and RIM
Step 1: Identify content sourcesStep 2: Exploration
– Examine sources and analyze content– Records manager uses interface to
explore & identify value-based content categories
– Define policies expressing required actions (delete, move, copy, ...) based on categories
Step 3: Archival and Management of content– IT manager encodes policies content
collection mechanism– Content identified by exploration
process is collected– Content collection is executed in an
ongoing basis, as prescribed by policies
* Supports Selective Content Decommissioning:
– Operate on a subset of content in the original source
– Identify & extract records across the enterprise
LegacyECM File SharePoint File
1
2
Content Exploration
Policies
TrustedECM
Content Collection w/Content Classification
3
Content Analytics and Dynamic ESI Collection
Step 1: Identify content sources in the wild
Step 2: Exploration– Examines sources and analyzes
content– IT or Legal user explores & identifies
content relevant to case– User determines content to be
collected into a case set and invokes collection process
– Collection tool (embedded ICC) copies identified content into ECM evidence repository
Step 3: eDiscovery– Cull, hold, audit-track, export ESI – Analytics-driven Early Case Assessment
across all relevant evidentiary ESI
Content Analytics Multi-caseEvidentiaryECM basedDynamic on
demand reactive collection requests
Policy based collections
2
Content Collection
3
eDiscovery Tools
1
Unlock valuable insight from contentWhat our clients are doing with Content Analytics
Basic questions to consider regarding content
• Do we know what we have and can we find it?
• Is it properly managed and can we trust it?
• What does it all mean and how can we benefit from it?
Know your content• Accelerate time to knowledge by providing greater accuracy and more
complete business context with enterprise search
• Dynamically analyze what you have, decommission the unnecessary, and preserve the content that matters with content assessment
Trust your content• Manage and govern content in trusted repositories, not in suspect
environments, enabling confidence in your content
• Create and manage 360 degree trusted content views to enrich master data by connecting to enterprise content
Leverage and Exploit your content• Interactively discover content to derive unexpected business insights and
take action with content analytics
• Exploit content analytics insights by enriching BI and predictive analytics as well as tailoring for industry and customer specific scenarios
Content Analytics for Legacy Data Retention
• Two Objectives:– Securely Retain Records Requiring Retention– Defensibly Decommission duplicate, non-business and
information which has satisfied its retention requirements
• One LARGE ROI:– Storage Cost Savings– ROI in less than 3 months– $44m in Storage Savings for one client
And Delivers
Addresses
How CA for Legacy Data Retention Delivers
... ...I need to dynamically collect electronically stored information (ESI) by knowing what I have, sorting out the case relevant information,
declare a records and bring under hold management or
decommission as necessary
• Content Assessment enables content-based decision making for:
– Decommissioning for cost savings – selected content or entire sources
– Dynamic collection Records Management for eDiscovery
– Ongoing proactive information governance
– Improving metadata & content organization
– Reduce Storage Cost
Content Analytics Admin
Content Sources• File Systems• Content Repositories• Databases• Email• Collaboration• Web Content• Web Pages• Portals• Content Integration (custom crawlers)• ...
Content Analytics Admin
• Linguistic understanding of your content• Industry and business specific dictionaries• Understanding of Named Entities
– People, Places, Companies
• Integration with Classification Module– Deep concept analysis
• Annotators specific to industry, business and specific uses– Record Types– Industry and company specific concepts – Business specific concepts such as Employee Names , Products,
etc.– . . .
Parse and Index Content
Content Analytics for Legacy Data Retention - How it works
Analyzed Content (and
Data)
stocks rose Monday on comments from ...
Source InformationInternal (ECM, Files, DBMS, etc.) and External (Social,
News, etc.)
Adjective Verb Noun
Trade Action Day
Financial Recordto be Declared
ExtractedConcept
Automatic Visualization for Interactive Exploration and
Assessment
Prep Phrase
Reason
Content Analytics for Defensible Decomissioning – How it works
Analyzed Content (and
Data)
stocks on the plants require trimming ...
Source InformationInternal (ECM, Files, DBMS, etc.) and External (Social,
News, etc.)
Noun VerbNoun
Element of a plant ActionVegetation
Non-recordDecomissionable
ExtractedConcept
Automatic Visualization for Interactive Exploration and
Assessment
Noun
Reason
Classification Process for Legacy Data Retention
Analyze
Decide
Take Action
Enforce
Collect information & context needed to make an inform a
decision (declare vs. decommission)
Assess the collected information and select a
category (declare vs. decommission), accurately
& repeatably
Use the selected category to determine
& initiate an appropriate response
(declare vs. decommission)
Ensure actions are taken consistently & correctly, creating defensible process
Content Analytics for Legacy Data Retention
IBM ECM
Enterprise Records
ClassificationModule
Content Collectors
Content Analytics
21
4
3
Solution Overview
ElectronicDiscovery
5
Demonstration of Content Analytics for Legacy Data Retention
Content Analytics for Legacy Data Retention is a solution to inventory and locate legacy data requiring retention or disposition
The Dayhuff Group has created this solution using IBM Content Analytics tools to perform the heavy lifting of mining legacy information
It allows data retention analysts to view and analyze their source content based on familiar concepts such as how the content fits into records series’ within their file plan
Facets – Record Types
This view exposes facets which categorize the content based on the business’ record types within record series’
Select documents to Decommission or Declare
The analyst can graphically see content that is past it’s retention period available for decommissioning
Flag as Past Retention period
Documents are flagged based on their retention requirements identified using Content Analytics and the Record Type facets, then exported to Content Collector
Collect to IER
Content Collector
Documents are decommissioned based on analysis
results inContent Analytics
or archived to
Enterprise Records
Decommission
Unnecessary
Content
Records Retentio
n Program
s&
Policies
ECM
IBM Enterprise Records
accounts payable
treasury
Budget & forecast
payroll
cash management
Financial reporting
tax
accounts receivable
customs
general accounting
insurance
Cost accounting
Sales & Marketing - 7000
Finance - 2000
Service - 4000
sales
marketing communication
dealer support
product management
market research
warranty administration
customer service
quality assurance
Legal - 5000
Documents are declared as records in Enterprise Records matching Record Type Facets analyzed in Content Analytics
Content Analytics for Legacy Data Retention
Benefits:• Reduced risk
• Increased productivity
• Quantifiable and measurable results
• Reduced cost
• Defensible process for evaluation
• Reduction of information through disposition
• Reduction of duplicated and old information