zettavox: content mining and analysis across heterogeneous compute clouds__hadoopsummit2010

16
ZettaVox: Content Mining and Analysis across Heterogeneous Compute Clouds Mark Davis Kitenga, Inc.

Upload: yahoo-developer-network

Post on 03-Jul-2015

1.302 views

Category:

Technology


1 download

DESCRIPTION

Hadoop Summit 2010 - Application Track ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds Mark Davis, Kitenga

TRANSCRIPT

Page 1: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

ZettaVox: Content Mining and Analysis across Heterogeneous Compute Clouds

Mark DavisKitenga, Inc.

Page 2: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

The Company

The Problem

The Solution

Demo

2

Session Agenda

Page 3: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Kitenga1,2: (Maori) A view or perception

› 2004-present

› CTO: Mark Davis, InXight Software (Business Objects/SAP), Microsoft, Defense R&D

› CEO: Anil Uberoi, Lucid Imagination, Amdocs, Sun

3

Kitenga

1also a region in Uganda2also a bed-and-breakfast in Clevendon, Auckland

Solutions for Information Overload

2953 Bunker Hill Lane, Santa Clara, CA

Page 4: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Support

4

PredictionLogic, Inc.

Page 5: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

The Never-Ending Problem

5

Multimedia Data

VideoImageryAudio

Sensor StreamsBiometric data

3DText

EmailWeb pages

TweetsPosts

Enterprise Data

Enterprise dataCDRs

Financial recordsAccess logs

Page 6: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Solving the Problem is Hard

6

Content mining analysts

Machine learning specialists

Information retrieval specialists

Software Engineers

Expensive and hard to find

Parallel Supercomputers

Racked clusters

Systems management

Enterprise storage solutions

Gigabit switches

Power management

Text analytics

Ontologies

Database reporting tools

ETL tools

Business intelligence

Open source components

Page 7: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Convert raw data into actionable intelligence

Defense Intelligence

7

Situation Reports

Geotagged Imagery

ZettaVox

Named Entity Extraction

Image tagging

Video analytics

Linkage Analysis

Network Visualization

Search

Hadoop, GPUs, HDFS, Hbase, SOLR

Improve Force Effectiveness

Page 8: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Increase speed of drug discovery

Pharmaceutical R&D

8

Patents

Genetic Sequence

Data

Journal Articles

ZettaVox

Biological Named Entity

Extraction

Author Name Extraction and Normalization

Linkage Analysis

Timelines

Facetted Search

Hadoop, HDFS, Hbase, GPUs, SOLR

Faster Discovery

Page 9: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

ZettaVox

Compose analysis workflows using out-of-the-box components

Interact with HDFS/Hadoop through Rich Internet Application

Monitor system progress

Visualize and analyze results

Batch mode via XML and JSON

Heterogenous compute resources

9

Page 10: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Heterogenous Compute Clouds

10

42 U ≈ 84-168 cores

2 PCIe slots15 multiprocessors

480 cores$0.13-$0.35/Gflop

AmazonAWS

RackspaceMosso

PrivateCloud

Page 11: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Author Analysis Solutions

11

Page 12: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Interact with HDFS

12

Page 13: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Monitor Analysis Jobs

13

Page 14: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Use and Visualize Results

14

Page 15: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

ZettaVox

15

Current Approach

Slow analyticsMethods don’t scaleExpensive hardwareExpensive softwareCapital investment

Expertise investment

ZettaVoxZettaVoxInternet-scale cloud

and cluster-based content mining

Hadoop with GPU supportScalable

SaaSOut-of-the-box expertise

Rich user experience

Page 16: ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__HadoopSummit2010

Questions?

Mark [email protected]