big data cloud june 3rd meetup - presentation by mark davis
DESCRIPTION
Big Data Cloud June 3rd Meetup - Presentation by Mark Davis Unlocking Big Data through Analytics and SearchTRANSCRIPT
Big Data Cloud Meetup
Big Data & Cloud Computing - Help, Educate & Demystify.
June 3rd 2011
Kitenga, Mark Davis CTO
Unlocking Big Data through Analytics and Search
June 3rd 2011 Meetup
Big DataEnormous transactional dataEnormous unstructured informationToo big for databasesNew tools are needed
Unstructured data explosion
4
Multimedia Content
TextImagery
AudioVideo
Sensor StreamsBiometric data
3D
TextEmail
DocumentsWeb pages
TweetsPosts
Structured Enterprise DataDatawarehouse
CDRsFinancial records
Access logs
<5%
Big Data
5
Trillions of user interactions/transactions == Big Data
<1M <10M >100M
Open sourceMySQL
PHP
Data warehousingParallel SQL
Big hardware
NoSQLHadoop/MapReduce
Hbase/HIVE
Traditional (DBMS-based) solutions
Emerging technologies
The Structured/Unstructured Chasm
SQLRDBMS
Transactional DataBI Tools
SearchDocumentsText ClassificationTaxonomiesOntologies
Unstructured Analytics: Surfacing Metadata
Information Extraction
Parts-of-Speech Tagging
Tokenization
Lemmatization
Finite State Transducer
Finite State Transducer
Finite State Transducer
Machine-Learning
Search + Analytics
Query Language
Metadata Extraction
Indexing
Facet Browsing Facet Charting
Resource Integration
AutosuggestSpellcheck
10
Defense Intelligence
Analyst support staff needs to convert raw data into actionable intelligence
Situation Reports
Geo-tagged Imagery
Named Entity Extraction
Image tagging
Video analytics
Linkage Analysis
Network Visualization
Search
Hadoop/MapReduce, GPUs, HDFS,
Hbase, SOLR
Improve Force
Effectiveness
US Army NavyDHSNSA
11
CASE STUDY: US ARMY
Analysis Bottlenecks
200 data feedsUnacceptable response time
Analysts avoid complete searches
Basic entity extractionSlow analysis cycles
Distribution by PowerPoint
Enabling Technologies: Oracle and custom thick clients
The Solution>200 data feeds
<0.5s queriesFast analysis cyclesMachine Learning
AnalyticsBiometrics
Linkage AnalysisFace recognition
Video taggingCollaborative systems
Enabling techonolgies: GPU clouds, Hadoop/MapReduce, Katta, Lucene,
NoSQL, Hbase
12
Pharma Bioinformatics
Increase speed of drug discovery
Patents
Genetic Sequence Data
Journal Articles
ZettaVoxBiological Named Entity Extraction
Author Name Extraction and Normalization
Linkage Analysis
Timelines
Facetted Search
Hadoop/MapReduce, HDFS, Hbase, GPUs,
SOLR
Faster Discovery
13
Pharma Treemap
14
Demo
Summary
• Big Data spans unstructured and structured data
• Effective tools for managing both involve understanding the differences and similarities of both
• Bridging the chasm between them means merging search and analytics together
Questions?
20
Contact Info
[email protected]://www.kitenga.com
Kitenga, Inc.2953 Bunker Hill Lane, Suite 400Santa Clara, CA 95054
1-(408)-462-KITE1-(253)-541-6799 (FAX)