experience with hibench from micro-benchmarks toward end-to-end pipelines wbdb 2013 workshop...
TRANSCRIPT
![Page 1: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/1.jpg)
Experience with HiBenchFrom Micro-Benchmarks toward End-to-End Pipelines
WBDB 2013 Workshop Presentation
Senior Software EngineerIntel China Software Center
2013.07.16
![Page 2: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/2.jpg)
HiBench
23/4/21
HiBenchHiBench
– Enhanced DFSIO
Micro Benchmarks Web Search
– Sort– WordCount– TeraSort
– Nutch Indexing– Page Rank
Machine Learning
– Bayesian Classification
– K-Means Clustering
HDFS
See our paper “The HiBench Suite: Characterization of the MapReduce-Based Data See our paper “The HiBench Suite: Characterization of the MapReduce-Based Data Analysis” in ICDE’10 workshops (WISS’10)Analysis” in ICDE’10 workshops (WISS’10)
1. Different from GrixMix, SWIM?
2. Micro Benchmark?3. Isolated components?4. End-2-end Benchmark?5. We need ETL-
Recommendation Pipeline
![Page 3: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/3.jpg)
TestCF
Pref
ETL
ETL-Recommendation (hammer)
Sales tables log table
Sales updates
h1 h2 h24
ipagen
tRetcode
cookies
WP
Cookies updates
Sales preferences
Browsing preferences
User-item preferences
Pref-logsPref-logs
ETL-logsETL-logs
Pref-salesPref-sales
Item based Collaborative Filtering
Item based Collaborative Filtering
Pref-combPref-comb
HIVE-Hadoop Cluster (Data Warehouse)
Item-item similarity
matrix
Offline test
Offline test
Test data
Statistics & Measureme
nts
TPC-DS
Mahout
ETL-salesETL-sales
![Page 4: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/4.jpg)
ETL-Recommendation (hammer)
Task Dependences
Pref-logsPref-logs
ETL-logsETL-logs
Pref-salesPref-sales
Item based Collaborative Filtering
Item based Collaborative Filtering
Pref-combPref-comb
ETL-salesETL-sales
Offline test
Offline test
![Page 5: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/5.jpg)
Empirical Data (hammer)
5
Intel Xeon E5-2600 @ 2.2Ghz, sandyBridge
2 x 8 x HT = 32 cores
192G Mem, WD 7200 0.3x12x4=14.4T
1000M net, 300M~400M/s
4-node cluster , RHL6.2, cdh4.1.2
HiBench etl-recomm branch, HiTune-0.9
Sales ~14G (TPC-DS scale 100), logs ~105G
![Page 6: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/6.jpg)
Empirical Data (hammer)
6
![Page 7: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/7.jpg)
Empirical Data (hammer)
![Page 8: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/8.jpg)
LinkBench
8
• Benchmark for Social Graph Service
• Originally Developed by Facebook on Top of MySQL– Simulate social graph workloads similar to Facebook’s online
service– Key workload properties match Facebook’s real production
workload
• Different from Analytical Workloads
• Our Work– Port LinkBench to HBase– On top of Phoenix (SQL support over HBase)
![Page 9: Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649e9e5503460f94ba0891/html5/thumbnails/9.jpg)
Resources
• HiBench– https://github.com/intel-hadoop/HiBench
• HiBench ETL-Recomm Branch– https://github.com/intel-hadoop/HiBench/tree/etl-recomm
• LinkBench– https://github.com/intel-hadoop/linkbench
• HiTune– https://github.com/intel-hadoop/HiTune
• Phoenix– https://github.com/intel-hadoop/phoenix
9