freeeed presentation
TRANSCRIPT
+
Hadoop-based Open Source eDiscovery: FreeEed
(Easy as popcorn)
2+Business (legal) use case
• Duty to disclose information – rule FRCP
26
• Preserve relevant information
• Produce information on request
• Keep the information for X years
• Sanctions for obstruction
• Sanctions for non-compliance
3+Before the thirties
• Court room was full of surprises
4+Civil discovery changes this
5+Discovery basics
• Obligations of the parties
• At the start of a lawsuit or litigation
possibility, preserve relevant data
• Produce data at request, within
timelines
• Review the data before production
• Can request eDiscovery from
opponents
• Store and archive
6+Interesting facts about eDiscovery• Most of these are proprietary or under
NDA
• Representative case size: 5GB to
500GB
• Cost per GB of processing: $5-200,
~$100
• Takes 25-50% of litigation budget
• Days to process and months to
review
• Preservation: 3-7 years
• 500 providers, with 10 majors
7+Challenges of eDiscovery
• Data sizes in the TB
• Seasonal loads, tight deadlines
• Hundreds of file formats
• Heavy read/write load in review
• Text analytics is of paramount
importance
• Huge price tickets obstruct justice
8+FreeEed main features
• Open source Hadoop-based eDiscovery:
• As scalable as Hadoop
• Fast review with NoSQL
• Scales with the lawsuit - time and
volume
• Data preservation and archiving with
VM
• Only possible with open source
license
9+Design goals
• Built on open source components
• Big Data scalable
• Preservation, chain of custody,
archiving
• Scalable technically and business-ly
• Stable (don’t laugh, people get different
results on different runs)
• Close-source compatible (MS + Azure
too)
10+Packaging architecture
• Comes as VM’s
• Grab as few or as many as you want
• No mixing of matters
• No ethical problems
• Preserve for as many years as you want
• 1 VM = 1 corn, FreeEed = free popcorn
11+FreeEed makes lawyers happy
12+FreeEed : Architecture
+FreeEed popcorn is very popular with lawyers, legal techs, IT, etc.
14+FreeEed popcorn
• Deploy on laptops, servers or cloud
• One-node or any number of nodes
• Scalable storage
• Different cooking recipes
• No mixing of matters
• Easy archiving
• Easy deletion
15+Processing architecture
• Based on golden-image VM
• Controlled cluster start in any
environment
• Index / cull on the fly or later
• Immediately searchable
16+Cluster start-up on EC2
17+Cloud integration
Downloadable VM’s
Same VM’s on Amazon AWS
Amazon VM’s are very convenient Immediate deployment Any hardware configuration you need Control lots of power from a limited-power laptop
Azure – working with Microsoft
18+Review architecture
• Lucene
• Solr
• HBase
• Lucene indexes created in reducers and
combined in Solr
• For small matters, write directly to Solr
19+Review screen
20+Review capabilities
• Search
• Cull down
• View text and metadata
• Tag documents
• Export as images or as native files
21+Eagle eye’s view - EDRM
22+Left of EDRM – Legal Hold
• FreeEedCollect
• Architecture: https://
github.com/markkerzner/FreeEedColl
ect
• ZooKeeper/MapReduce/Flume/HDFS
23+Right of EDRM – Org. charts
Partnership with Sintelix
24+Analytics – network of actors
Partnership with Sintelix
25+FreeEed and data governance
• Virtualization for data preservation
• Scalable processing
• Archiving
• Documents groups not mixing
• Data format stored together with
software that understands it
26+Hadoop & Big Data applications
• Other related applications
• Financial – text analytics
• Energy – documents and procedures
analytics
• Actual on-going projects
27+FreeEed as a learning tool
• 100’s of downloads
• Dozens of active users
• Real-world Hadoop application
• Many developers download to learn
• Complex, real, but manageable
28+FreeEed adoption – who is trying our “popcorn”?• Large law firms
• Small law firms and solos
• Government agencies
• Universities
• Enterprises
• Developers learn Big Data
29+Looking forward
• Add
• Collection
• Analytics
• Community
• Integrations
• Implementation
s
30+How you can use FreeEed
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management
project
31+How you can use FreeEed
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management
project
32+Q&A
• Thank you!
• People usually ask:
• How can I put my data in the cloud?
• Is it safe?
• Do you do OCR, PST, OST, etc…?