boston hadoop meetup, april 26 2012

Post on 07-May-2015

1.581 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Daniel Abadi presentation at the Boston Hadoop Meetup held on April 26, 2012.

TRANSCRIPT

The Proliferation of Database Systems and the Data Silo

Problem

@daniel_abadi@daniel_abadi

Yale University / HadaptYale University / Hadapt

April 26April 26thth, 2012, 2012

In The Old Days …In The Old Days …

Database

In The Old Days …In The Old Days …

Data Warehouse

Database

Database

Database

External Data

ETL Tools

Data Integration Tools

MDM ToolsData Governance Tools

One Size Does Not Fit AllOne Size Does Not Fit All

Transactional DatabasesTransactional Databases– Single digit millisecond latencies, and high Single digit millisecond latencies, and high

throughputthroughput– Store data in rowsStore data in rows– Heavy on flash and main memoryHeavy on flash and main memory– Indexing is very importantIndexing is very important– High availability extremely importantHigh availability extremely important

One Size Does Not Fit AllOne Size Does Not Fit All

Analytical DatabasesAnalytical Databases– Single digit second latencies (and higher)Single digit second latencies (and higher)– Store data in columnsStore data in columns– Scale out commodity hardwareScale out commodity hardware– Still need magnetic diskStill need magnetic disk– Indexing less importantIndexing less important– High availability less importantHigh availability less important

One Size Does Not Fit AllOne Size Does Not Fit All

Streaming DatabasesStreaming Databases– Continuous queriesContinuous queries– Data flows through the systemData flows through the system– Network latencies are paramountNetwork latencies are paramount– Drop data to deal with loadDrop data to deal with load

Therefore, in my PhD years Therefore, in my PhD years alone …alone …

Aurora and Borealis projects became Aurora and Borealis projects became StreambaseStreambase

C-Store project became VerticaC-Store project became Vertica

H-Store project became VoltDBH-Store project became VoltDB

Right Tool for the JobRight Tool for the Job

What We Have Now …What We Have Now …

Transactional DBMS

Transactional DBMS

Web DBMS (like MySQL)

Web Logs

Reporting and Dashboarding Data

Warehouse

Analytical Datamart

High Performance Column-Store

Analytical DBMS

NoSQL NewSQL

Streaming DBMS

HadoopOLAP Database

What We Have Now …What We Have Now …

Transactional DBMS

Transactional DBMS

Web DBMS (like MySQL)

Web Logs

Reporting and Dashboarding Data

Warehouse

Analytical Datamart

High Performance Column-Store

Analytical DBMS

NoSQL NewSQL

Streaming DBMS

HadoopOLAP Database

What We Have Now …What We Have Now …

Transactional DBMS

Transactional DBMS

Web DBMS (like MySQL)

Web Logs

Reporting and Dashboarding Data

Warehouse

Analytical Datamart

High Performance Column-Store

Analytical DBMS

NoSQL NewSQL

Streaming DBMS

HadoopOLAP Database

What This Leads To…What This Leads To…

Very little data provenanceVery little data provenance

Data silosData silos

Non identical data copiesNon identical data copies

Not even close to a single version of the Not even close to a single version of the truthtruth

A Potential Way Towards a A Potential Way Towards a SolutionSolution

Hadoop

Data Streaming

(HstreamingFlume)

NoSQL & Simple Xacts & Short

Request Processing(HBase, Brisk)

Data Analysis DBMS(Hive,

Hadapt)

What this has Potential to What this has Potential to EnableEnable

Fewer data silosFewer data silos

Increased data provenanceIncreased data provenance

Reduced systems management overheadReduced systems management overhead

Better resource utilization and Better resource utilization and managementmanagement

But we still needBut we still need

Hadoop-based data integration toolsHadoop-based data integration tools

MDM and data governance tools for MDM and data governance tools for HadoopHadoop

Data provenance tracking across Hadoop Data provenance tracking across Hadoop projectsprojects

top related