draining the swamp how to plan and practice defensible disposition richard medina, doculabs january...

46
Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January 13, 2015

Upload: berenice-malone

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

Draining the SwampHow to Plan and Practice Defensible Disposition

Richard Medina, DoculabsJanuary Greater Chattanooga Area Chapter ARMA Meeting

January 13, 2015

Page 2: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

How to Plan and Practice Defensible Disposition• This session explains how to tackle the monster problem of over-retention of

electronic information. Most organizations hoard and fail to destroy their piles of files in a legally defensible manner when business and law allow. The session shows how to develop and execute the four most important steps in defensible disposition: the Defensible Disposition Policy, Assessment Plan, Technology Plan, and Disposition Plan. It outlines business case development and tool selection.

Takeaways:1. Learn how to develop and execute the four steps in defensible disposition: the

Defensible Disposition Policy, Assessment Plan, Technology Plan, and Disposition Plan.

2. Learn which types of tools and technologies to use to analyze, sort, retain, and defensibly dispose of your information.

3. Learn how to develop a rigorous business case for defensible disposition.

2Escape Now While You Can

Page 3: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

3Doculabs

Doculabs is a strategy consulting firm. Our clients rely on us to help them improve the way they manage information. We provide services such as developing strategic roadmaps and business cases, program management, and content migration assistance. Our consultants are experts in helping clients manage content such as Office documents, web content, email, customer communications, and records to improve operations, lower costs, increase revenue, and reduce risk.

Differentiators• 20+ years of information management experience• Objective recommendations • Provide empirical data from over 1,200 engagements• More than 550 customers in financial services,

insurance, energy, manufacturing, and life sciences

Page 4: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Richard Medina

• Co-Founder and a Principal Consultant at Doculabs.

• In my 20+ years with Doculabs, I’ve consulted for organizations in a wide range of industries, including financial services, insurance, communications, utilities, and government.

• 312-953-9983

[email protected]

• blog: richardmedinadoculabs.com

• LinkedIn, Twitter

• www.doculabs.com

4

Page 5: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Issues

1. The problem– The sky is falling again

2. Break it into two problems– Day-forward versus historical content

3. How to address historical content– A defensible disposition methodology

4. Analysis and classification technology– Should you use it? Does it work?

5. Content assessment and disposition process– Approaches and results

5

Page 6: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Map of the Territory 6

ENTERPRISE INFORMATION MANAGEMENT (EIM)How the organization uses all information assets to achieve business goals

ENTERPRISE INFORMATION MANAGEMENT (EIM)How the organization uses all information assets to achieve business goals

ENTERPRISE CONTENT MANAGEMENT (ECM)

How the organization uses its unstructured content (including documents and

collaborative/social content) to achieve its business goals

ENTERPRISE DATA MANAGEMENT

How the organization uses structure data (in databases) to achieve its

business goals

INFORMATION GOVERNANCE

RECORDS MANAGEMENTHow the organization manages its information to ensure compliance with recordkeeping

laws and regulations

E-DISCOVERYHow the organization finds, preserves, and produces information as needed in

response to litigation, investigations, or other discovery requests

INFORMATION SECURITY AND PROTECTIONHow the organization manages its information to ensure compliance with

privacy and security laws and regulations and protect against loss or misuse

d

Page 7: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• Information governance is the control of information to meet your legal, regulatory, and business requirements. (Robert Smallwood)

– Great start because it's accurate and simple -- it avoids the trap of being a laundry list written in legalese.

• Information governance is the control of information to meet your legal, regulatory, and business risk requirements.– IG doesn't address all your business demands -- its primary focus is on

"defensive" business requirements as opposed to "offensive" business requirements.

– IG’s primary focus should be on controlling the risks and costs (primarily risk-related costs) of your information.

7What’s the Scope of Information Governance?

Page 8: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

1. The digital landfill problem. – 50, 100, or 1K TBs – or 10K PBs of files all over the place in your various

systems– How do you sort through it and responsibly retain or dispose appropriately

within your budget constraints?

2. The “systems of engagement” fragmentation problem. – How do you do IG on your dynamic, sometimes chaotic “systems of

engagement”? They use social media, mobile devices, and the cloud.– Your problem has three parts:

1. How do you meet your IG demands with your internal use of systems of engagement which you use for collaboration, interactive community building, etc.?

2. How do you meet your IG demands with your use of external SOE beyond the firewall, with customers, vendors, and the public?

3. How do you meet your IG demands in how you’re integrating your evolving SOE into your more mature systems of record, which help to run your core line of business processes?

3. The discovery problem. – How do you prepare for and respond to regulatory audit, litigation and

other discovery, given #1 and #2 above?

8Three Big IG Challenges for 2015

Page 9: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Issues

1. The problem– The sky is falling again

2. Break it into two problems– Day-forward versus historical content

3. How to address historical content– A defensible disposition methodology

4. Analysis and classification technology– Should you use it? Does it work?

5. Content assessment and disposition process– Approaches and results

9

Page 10: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Organizations have been over-retaining electronic information and failing to dispose of it in a legally defensible manner when business and law will allow

Retaining everything forever

Disposing of everything immediately

Having employees make classification decisions

Having technology make classification decisions

Hybrid with technology and people

10The Problem is Over-Retention

Page 11: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• Organizations keep non-required electronic content forever because:

1. Classifying content (to determine what to keep and what to purge) is manual and expensive

2. Content worth preserving is mixed with content that should be purged

3. Legal -- and others -- are afraid of wrongfully deleting materials (spoliation)

4. Additional storage is inexpensive, which makes it easy for corporations to buy more storage and defer addressing the problem

11Why Over-Retention is the Problem

Page 12: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

12

1. The problem– The sky is falling again

2. Break it into two problems– Day-forward versus historical content

3. How to address historical content– A defensible disposition methodology

4. Analysis and classification technology– Should you use it? Does it work?

5. Content assessment and disposition process– Approaches and results

Issues

Page 13: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• Addressing day-forward information lifecycle management (ILM) is much easier to address than historical content

– Even though addressing it messes with employees’ day-to-day business activities• Day-forward: Initiate ILM practices on a “day-forward” basis first, so any new content

created or saved is assigned a disposition period– Disposition horizons should begin to influence behavior on where content begins to be

stored (as users discover that those materials saved in the “wrong” system will be purged)• Guidance: Provide employees with explicit guidance for the acceptable use of

available tools for dynamic content and their associated retention periods – For example, retain non-records for 3 years, retain official records per the retention

schedule• Historical: For historical content, analyze the feasibility of content analytics and

autoclassification– Recognize that cleaning up TBs of content can take years. So conduct the analysis in 2014,

begin the cleanup effort in earnest by 2015, and eliminate a large portion of dated content by 2018

13Recommendations for Day-Forward

Page 14: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

System/Repository Recommended Retention Period

Personal Network Drives (“P” drives)

• Provide each user with personal drive space of a limited size for their storage, for as long as the user is employed

Shared Network Drives(“G” drives)

• Make them read only (which means no network storage for collaboration; content will have to go into an ECM system)

• Exceptions include application or systems that need to use network storage

ECM System 1. Default for non records: retained for 3 years 2. Default for non records that have long-term value: retained for 7

years3. Official records: retained per the retention schedule

Social Community Sites • No documents stored in communities (only links to documents in the ECM system)

• Consider retention periods for non-document content (e.g. 3 years)

14Guidance Example for Day-Forward

Page 15: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

15

1. The problem– The sky is falling again

2. Break it into two problems– Day-forward versus historical content

3. How to address historical content– A defensible disposition methodology

4. Analysis and classification technology– Should you use it? Does it work?

5. Content assessment and disposition process– Approaches and results

Issues

Page 16: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• You must satisfy 4 demands:1. Regulatory retention requirements2. Hold retention requirements3. Business retention requirements4. Cost impact of anything you do

• What you do has impact:1. What you do2. Effects of what you do

• You can do 2 things:1. Sort2. Dispose

• Your mission stated two ways:• Your mission is to satisfy your retention demands (1-3) while minimizing bad

cost impact to yourself (4)• Your mission is to maximize good cost impact (4) while satisfying your retention

requirements (1-3)

16The DD Methodology in a Nutshell

Page 17: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

It’s Based on Reasonableness

• To determine what “satisfy your retention demands” really means for you, use the Principle of Reasonableness and act In Good Faith– Courts do not ask, expect or necessarily reward organizations for

perfection. Courts do expect, however, that whatever information management tactics an organization undertakes are appropriate to how that particular entity is situated (size, financial resources, regulatory and litigation profile, etc.). (Jim McGann and Julie Colgan, “Implement a defensible deletion strategy to manage risk and control costs”, Inside Counsel)

17

Page 18: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

1. Defensible Disposition Policy– It’s your design specification, your business rules for DD, your decision tree– Specifies very clearly the objectives that your methodology will fulfill. It states clearly what

you mean by your retention requirements and what you mean by reasonable costs when you are trying to fulfill your retention requirements.

2. Technology Approach– For Sorting and Disposing– You must use technology – it’s not an option

3. Assessment (Sorting) Plan– What information and systems you’re assessing– Your processing rules (decision plan)– It will be flexible

4. Disposition Plan– Evaluate your assessment results using your DD Policy– Dispose (which ranges from keeping forever to deleting right now with many options in

between)– Refine your DD Policy (1) and continue as needed

18Your DD Methodology Has 4 Parts

Page 19: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

19Sidebar: A Simple Set of Rules

Page 20: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

20A Simple Set of Rules

Page 21: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

21A Simple Set of Rules

Page 22: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

But Even Simple Rules Need Clarification

1. What’s a Legal Hold?2. What are Records versus Non-Records?3. What are Non-Records – which are still important for business

purposes?4. What about Non-Records that are not business-related?5. Where do documents under Legal Hold fit? Are they Records,

Non-Records, or what?

22

Page 23: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

23R

isk

Manageability

Likely Discoverable Information Declared Records

Oth

er B

usin

ess-

rela

ted

Info

rmat

ion

(OB

RI)

• Think about your ESI (electronically stored information) in terms of its Risk, Value, and Manageability.

• For simplicity, let’s just use Risk and Manageability.

But Even Simple Rules Need Clarification

Page 24: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Ris

k

Manageability

Electronically Stored

Information (ESI)

24

• For simplicity, let’s just use Risk and Manageability.

What is the Scope of Records Management?

Page 25: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

What is the Scope of Records Management? 25R

isk

Manageability

Electronically Stored

Information (ESI)

Likely Discoverable Information

• One major source of risk for ESI is its “Likely Discoverability”.

• While all ESI is perhaps “discoverable”, we can prioritize the more likely and harmful ESI.

Page 26: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

What is the Scope of Records Management? 26R

isk

Manageability

Electronically Stored

Information (ESI)

Likely Discoverable Information Declared Records

• Your RM program probably declares only a subset of your LDI and ESI as records – these are your most valuable, risky, and manageable electronic documents.

Page 27: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Ris

k

Manageability

Physical Documents and Electronically

Stored Information (ESI)

Likely Discoverable Information Declared Records

Non

-bus

ines

s-re

late

d In

form

atio

n (N

BR

I)

Oth

er B

usin

ess-

rela

ted

Info

rmat

ion

(OB

RI)

27

1. But most of your content and documents are non-records -- and range from very low to very high risk and value.

2. Most of the ESI on your shared drives, hard drives, and in email is OBRI.

3. Some is NBRI.

4. It’s a mess.

What is the Scope of Records Management?

Page 28: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Ris

k

Manageability

Electronically Stored Information

(ESI)

Likely Discoverable Information Declared Records

Oth

er B

usin

ess-

rela

ted

Info

rmat

ion

(OB

RI)

Non

-bus

ines

s In

form

atio

n (N

BI)

Too Narrow

Ris

k

Manageability

Electronically Stored Information

(ESI)

Likely Discoverable Information Declared Records

Oth

er B

usin

ess-

rela

ted

Info

rmat

ion

(OB

RI)

Non

-bus

ines

s In

form

atio

n (N

BI)

Too Wide

28Two Extreme Approaches to RM

Page 29: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• A much more effective approach is to divide your ESI into three “Tiers”.

• Tier 1 denotes your declared records, specified by a Records Retention Schedule.

• Tier 2 denotes the OBRI that is important to retain for business reasons.

• Tier 3 denotes the OBRI that is not important to retain for business reason; it also denotes NBRI, which – by definition -- is not important to retain for business reasons.

Ris

k

Manageability

Electronically Stored Information

(ESI)

Likely Discoverable Information Declared Records

Oth

er B

usin

ess-

rela

ted

Info

rmat

ion

(OB

RI)

Non

-bus

ines

s In

form

atio

n (N

BI)

Tier 1Tier 2

Tier 3

29Use a Tiered Approach

Page 30: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• Tiered Approach– Different types of physical documents

and ESI are handled differently1. Keep as records2. Keep as non-records, but move to

rigorous ECM/RIM system3. Keep on (better managed) shared

drives4. Don't worry about them; they

aren't worth it – keep or dispose according togeneral rules

Ris

k

Manageability

Electronically Stored Information

(ESI)

Likely Discoverable Information Declared Records

Oth

er B

usin

ess-

rela

ted

Info

rmat

ion

(OB

RI)

Non

-bus

ines

s In

form

atio

n (N

BI)

Tier 1

1

2

3

4

30

Tier 2

Tier 3

“Treat them Differently”

Page 31: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

31Now This Tree Makes Sense

Page 32: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

32

1. The problem– The sky is falling again

2. Break it into two problems– Day-forward versus historical content

3. How to address historical content– A defensible disposition methodology

4. Analysis and classification technology– Should you use it? Does it work?

5. Content assessment and disposition process– Approaches and results

Issues

Page 33: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Classification Technique Classification Rate Pricing Total Cost

to Classify

Manual Classification 10 seconds per document

$35 / hr. $20 million

Auto Classification

(with 95% machine and 5% human classified, via offshore labor)

Less than 1 second per document

$.005 per document for machine processing and $5 / hr. for those that require manual classification

$2 million

• … if the technology works

• 50 TB = ~200 million documents (average of 250KB per document)

• The following table illustrates the time and effort required to classify 200 million documents

33There’s an Awesome Business Case

Page 34: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• Many different kinds of technology vendors are addressing analysis, classification, and disposition– File Analytics, Content Analytics, Content Classification, ECM, E-discovery, Search,

Capture, DLP, Storage Management– Products, hosted solutions, service providers – Nuix, IBM/Stored IQ, HP/Autonomy, EMC/Kazeon, SAS, Kofax, Equivio, Rational

Enterprise, Recommind, Index Engines, and others

• Most have a sweet spot where they will succeed (and deliver ROI)– But it’s highly dependent…. on 8 factors or so– E.g., your business purposes, your ECM environment, your “information

architecture”, your document types and their complexity and volume, the value and risk of the documents, your success criteria, etc., etc., etc.

Analysis and Classification Technologies 34

Page 35: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Before After

<server XXX, drive G:>Forecast summary_121008.doc

Record = noAge = 2.5 yearsDocument type= departmental forecastKeywords = forecast, 2008, draftStatus = deleteConfidence = 9.2 (out of 10)

1

2

3

4

5

6

1. Analyze the content and review the retention schedule

2. Establish classification rules and train the systems with examples

3. Crawlers and recognition engines evaluate the content and generate a classification

4. For content where a high machine confidence factor exists, content is automatically tagged and then staged for migration to the appropriate system for retention or disposition

5. For content with low confidence factors, documents are routed to clerical staff (onshore or offshore) for manual classification

6. The results of the manual identification are fed back into the automated algorithms to “teach” the systems better classification

Throughout the process, results and samples are routed to records management and legal professionals within the firm for validation and confirmation

Client Validation

Sidebar: How they Work 35

Page 36: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

36

1. The problem– The sky is falling again

2. Break it into two problems– Day-forward versus historical content

3. How to address historical content– A defensible disposition methodology

4. Analysis and classification technology– Should you use it? Does it work?

5. Content assessment and disposition process– Approaches and results

Issues

Page 37: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• There are three categories of attributes that can be used to determine what a file is:

1. Environmental attributes around the file (e.g. file location, ownership)2. File attributes about the file (e.g. file type, age, author)3. Content attributes within the file (e.g. keywords, character strings, word

proximity, word density)

• Various techniques and technologies, along with business rules, can be used to determine what a file is, and whether it is eligible for disposition– E.g. a DOC file created over 5 years ago and not accessed for a year may be

purged– This type of purging could be done after giving users adequate notice (“move it

or lose it” or “hold” for 90 days, then delete)

Content Assessment Approaches 37

Page 38: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Attribute Evaluation Technique Tool(s) Used Examples How Used

Ownership Access ControlsContent Analytics, Data Loss Prevention, Storage Management

Permissions within LDAP list people and infer department or function

Large collections of files can be assessed en masse based on access controls

1

Location File PathContent Analytics, Data Loss Prevention, Storage Management

G:/accounting/july2004/temp Stranded and orphaned locations are often easily eliminated

2

Environmental Attributes (around a file)

Content Assessment: Environmental Attributes 38

Page 39: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Duplicate

Hash AlgorithmContent Analytics Exact duplicates Exact duplicates can be easily

eliminated3

File Type Extension or MIME type

Content Analytics .TMP, .MP3 To identify file types that should not exist in a corporate setting

4

Block ReadContent Analytics Near duplicates Near duplicates must be assessed in

the context of other attributes

Metadata Properties

Content Analytics Age To determine old materials, materials authored by individuals that have left the organization

5 Content Analytics Author Typically, these attributed must be combined with other attributed via a rule to take action

Content Analytics Security Profile (Confidential)

User filename properties to determine type

File Name Character Strings

Content Analytics GL-USDIST31_093098.xls Determine whether a file was system generated vs. human generated

6 Content Analytics FORMUB92_SMITH Documents that are based on a specific form number can easily be identified

Attribute Evaluation Technique Tool(s) Used Examples How Used

File Attributes (about a file)

Content Assessment: File Attributes 39

Page 40: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Key Word Character Strings

Content Analytics; Classification Module

“Enron”, “Guarantee” To determine if a document is on Hold via a word list per the hold request

7

Character or Word Patterns

“Classification” <pattern matching>

Classification Module Word proximity To determine the category in which a document may fit8

Classification Module Word frequency

Content Analytics; Classification Module

“Privileged” Identification of PII

Content Analytics; DLP SS#, Credit card # Regular Expression(RegEX) lists; determined entities for hold, security, IP, PHI, PII, DLP

Attribute Evaluation Technique Tool(s) Used Examples How Used

Content Attributes (within a file)

Content Assessment: Content Attributes 40

Page 41: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Preservation Findings

Unnecessary File Types(Executables, non-business pictures, movies, etc.) 13 to 15%

Duplicates 15 to 20%

Near Duplicates 9 to 30%

Risk Findings

Files with PII 10 to 16%

Files with Sample Keywords 3 to 5%

Operational Findings

Files 10 years or older 7 to 11%

Files accessed within the last 18 months 25 to 35%

Findings not mutually exclusive ( e.g. a duplicate file could also be aged)

Content Analytics: Assessment Results 41

Page 42: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Technique Status % of Total Total

Analytics Unnecessary 20% 500 TB (.5 PB)

Classification Record 8% 200 TB (.2 PB)

Non-Record, Business Reference

28% 700 TB (.7 PB)

Evaluated, Staged for Disposition (2018)

44% 1,100 TB (1.1PB)

Total 100% 2,500 TB (2.5 PB)

Findings Enterprise Impact

Total that could be disposed 20% of 2.5 PB

Enterprise Implications .5 PB removed @ $5M per PB

Savings $2.5M per year in storage expense

Summary 42

Page 43: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

• Given the results, $2.5 million in storage expense could be saved annually on the disposition of historic content, resulting in $12.5 million over 5 years

• Going forward with newly created content, if similar techniques are applied, the saving grows to $34.8 million over 5 years

– The current cost projections are based on the historical content growth rate of 30% per year– The expected cost projections are based on a content growth rate of 26% per year

@$5,000,000 per PB 2014 2015 2016 2017 2018* Total

Current Storage (PB) 2.5 3.25 4.23 5.49 7.14Current Cost (Mill) $12.5 $16.3 $21.1 $27.5 $35.7 $113.0

Expected Storage (PB) 2 2.52 3.18 4.00 3.94Expected Cost (Mill) $10 $12.6 $15.9 $20.0 $19.7 $78.2Total Savings (Mill) $2.5 $3.65 $5.25 $7.46 $16.00 $34.8

*In 2018, the 1.1 PB or 44% of content from the 2014 historical content assessment can be disposed

Implications 43

Page 44: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

1. The business case for disposition is strong– Costs, risks, and benefits

2. Address Information governance in phases– Starting today, the program will take years to mature– Set expectations according

3. Probably address day-forward ILM before tackling historical content

4. Manual classification (alone) is not an option5. The technologies are immature and varied, but you can be

successful by matching the techniques and technologies to the kinds of files you want to target

Conclusions 44

Page 45: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

Thank YouDoculabs, Inc.

(312) 433-7793

Page 46: Draining the Swamp How to Plan and Practice Defensible Disposition Richard Medina, Doculabs January Greater Chattanooga Area Chapter ARMA Meeting January

© Doculabs, Inc. 2014

Richard Medina

• Co-Founder and a Principal Consultant at Doculabs.

• In my 20+ years with Doculabs, I’ve consulted for organizations in a wide range of industries, including financial services, insurance, communications, utilities, and government.

• 312-953-9983

[email protected]

• blog: http://www.richardmedinadoculabs.com

• http://www.linkedin.com/in/richmedinadoculabs

• Twitter: @richarddoculabs

• www.doculabs.com

46