experience with hibench from micro-benchmarks toward end-to-end pipelines wbdb 2013 workshop...

9
Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi [email protected] Senior Software Engineer Intel China Software Center 2013.07.16

Upload: marybeth-atkinson

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

Experience with HiBenchFrom Micro-Benchmarks toward End-to-End Pipelines

WBDB 2013 Workshop Presentation

Lan [email protected]

Senior Software EngineerIntel China Software Center

2013.07.16

Page 2: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

HiBench

23/4/21

HiBenchHiBench

– Enhanced DFSIO

Micro Benchmarks Web Search

– Sort– WordCount– TeraSort

– Nutch Indexing– Page Rank

Machine Learning

– Bayesian Classification

– K-Means Clustering

HDFS

See our paper “The HiBench Suite: Characterization of the MapReduce-Based Data See our paper “The HiBench Suite: Characterization of the MapReduce-Based Data Analysis” in ICDE’10 workshops (WISS’10)Analysis” in ICDE’10 workshops (WISS’10)

1. Different from GrixMix, SWIM?

2. Micro Benchmark?3. Isolated components?4. End-2-end Benchmark?5. We need ETL-

Recommendation Pipeline

Page 3: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

TestCF

Pref

ETL

ETL-Recommendation (hammer)

Sales tables log table

Sales updates

h1 h2 h24

ipagen

tRetcode

cookies

WP

Cookies updates

Sales preferences

Browsing preferences

User-item preferences

Pref-logsPref-logs

ETL-logsETL-logs

Pref-salesPref-sales

Item based Collaborative Filtering

Item based Collaborative Filtering

Pref-combPref-comb

HIVE-Hadoop Cluster (Data Warehouse)

Item-item similarity

matrix

Offline test

Offline test

Test data

Statistics & Measureme

nts

TPC-DS

Mahout

ETL-salesETL-sales

Page 4: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

ETL-Recommendation (hammer)

Task Dependences

Pref-logsPref-logs

ETL-logsETL-logs

Pref-salesPref-sales

Item based Collaborative Filtering

Item based Collaborative Filtering

Pref-combPref-comb

ETL-salesETL-sales

Offline test

Offline test

Page 5: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

Empirical Data (hammer)

5

Intel Xeon E5-2600 @ 2.2Ghz, sandyBridge

2 x 8 x HT = 32 cores

192G Mem, WD 7200 0.3x12x4=14.4T

1000M net, 300M~400M/s

4-node cluster , RHL6.2, cdh4.1.2

HiBench etl-recomm branch, HiTune-0.9

Sales ~14G (TPC-DS scale 100), logs ~105G

Page 6: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

Empirical Data (hammer)

6

Page 7: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

Empirical Data (hammer)

Page 8: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

LinkBench

8

• Benchmark for Social Graph Service

• Originally Developed by Facebook on Top of MySQL– Simulate social graph workloads similar to Facebook’s online

service– Key workload properties match Facebook’s real production

workload

• Different from Analytical Workloads

• Our Work– Port LinkBench to HBase– On top of Phoenix (SQL support over HBase)

Page 9: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer

Resources

• HiBench– https://github.com/intel-hadoop/HiBench

• HiBench ETL-Recomm Branch– https://github.com/intel-hadoop/HiBench/tree/etl-recomm

• LinkBench– https://github.com/intel-hadoop/linkbench

• HiTune– https://github.com/intel-hadoop/HiTune

• Phoenix– https://github.com/intel-hadoop/phoenix

9