big data cloud june 3rd meetup - presentation by mark davis

20
Big Data Cloud Meetup Big Data & Cloud Computing - Help, Educate & Demystify. June 3 rd 2011

Upload: dj-das

Post on 13-Jan-2015

1.419 views

Category:

Technology


4 download

DESCRIPTION

Big Data Cloud June 3rd Meetup - Presentation by Mark Davis Unlocking Big Data through Analytics and Search

TRANSCRIPT

Page 1: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Big Data Cloud Meetup

Big Data & Cloud Computing - Help, Educate & Demystify.

June 3rd 2011

Page 2: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Kitenga, Mark Davis CTO

Unlocking Big Data through Analytics and Search

June 3rd 2011 Meetup

Page 3: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Big DataEnormous transactional dataEnormous unstructured informationToo big for databasesNew tools are needed

Page 4: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Unstructured data explosion

4

Multimedia Content

TextImagery

AudioVideo

Sensor StreamsBiometric data

3D

TextEmail

DocumentsWeb pages

TweetsPosts

Structured Enterprise DataDatawarehouse

CDRsFinancial records

Access logs

<5%

Page 5: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Big Data

5

Trillions of user interactions/transactions == Big Data

<1M <10M >100M

Open sourceMySQL

PHP

Data warehousingParallel SQL

Big hardware

NoSQLHadoop/MapReduce

Hbase/HIVE

Traditional (DBMS-based) solutions

Emerging technologies

Page 6: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

The Structured/Unstructured Chasm

SQLRDBMS

Transactional DataBI Tools

SearchDocumentsText ClassificationTaxonomiesOntologies

Page 7: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Unstructured Analytics: Surfacing Metadata

Page 8: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Information Extraction

Parts-of-Speech Tagging

Tokenization

Lemmatization

Finite State Transducer

Finite State Transducer

Finite State Transducer

Machine-Learning

Page 9: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Search + Analytics

Query Language

Metadata Extraction

Indexing

Facet Browsing Facet Charting

Resource Integration

AutosuggestSpellcheck

Page 10: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

10

Defense Intelligence

Analyst support staff needs to convert raw data into actionable intelligence

Situation Reports

Geo-tagged Imagery

Named Entity Extraction

Image tagging

Video analytics

Linkage Analysis

Network Visualization

Search

Hadoop/MapReduce, GPUs, HDFS,

Hbase, SOLR

Improve Force

Effectiveness

US Army NavyDHSNSA

Page 11: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

11

CASE STUDY: US ARMY

Analysis Bottlenecks

200 data feedsUnacceptable response time

Analysts avoid complete searches

Basic entity extractionSlow analysis cycles

Distribution by PowerPoint

Enabling Technologies: Oracle and custom thick clients

The Solution>200 data feeds

<0.5s queriesFast analysis cyclesMachine Learning

AnalyticsBiometrics

Linkage AnalysisFace recognition

Video taggingCollaborative systems

Enabling techonolgies: GPU clouds, Hadoop/MapReduce, Katta, Lucene,

NoSQL, Hbase

Page 12: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

12

Pharma Bioinformatics

Increase speed of drug discovery

Patents

Genetic Sequence Data

Journal Articles

ZettaVoxBiological Named Entity Extraction

Author Name Extraction and Normalization

Linkage Analysis

Timelines

Facetted Search

Hadoop/MapReduce, HDFS, Hbase, GPUs,

SOLR

Faster Discovery

Page 13: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

13

Pharma Treemap

Page 14: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

14

Page 15: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis
Page 16: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Demo

Page 17: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis
Page 18: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Summary

• Big Data spans unstructured and structured data

• Effective tools for managing both involve understanding the differences and similarities of both

• Bridging the chasm between them means merging search and analytics together

Page 19: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

Questions?

Page 20: Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

20

Contact Info

[email protected]://www.kitenga.com

Kitenga, Inc.2953 Bunker Hill Lane, Suite 400Santa Clara, CA 95054

1-(408)-462-KITE1-(253)-541-6799 (FAX)