howtousebigdatabench4prof.ict.ac.cn/bigdatabench_asplos_18/howtouse_bigdatbench4.pdf · y...
TRANSCRIPT
![Page 1: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/1.jpg)
INSTITUTE O
F COM
PUTING
TECHN
OLO
GY
How to Use BigDataBench 4.0
Jianfeng Zhan, Chen Zheng, and Wanling Gaohttp://prof.ict.ac.cn
ICT,ChineseAcademyofSciences
ASPLOS2018, Williamsburg, VA, USA
![Page 2: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/2.jpg)
BigDataBench ASPLOS2018
General Steps to Use BigDataBench
n Currentreleasen Version4.0 onhttp://prof.ict.ac.cn
n Generalstepstorunthebenchmarksn PreparethepackageofBigDataBenchn Preparetheenvironmentsoftheselectedsoftwarestackn Generatedatasetsasyouneed•YoucanfindagenDate*oraprepare*shellscriptineachdirectoryofthebenchmarks
n Runthescriptsorcommands(User Manual!)
![Page 3: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/3.jpg)
BigDataBench ASPLOS2018
Directory Structure
Root directory
MicroBenchmark
AI TensorFlow, Caffe2
Offline analytics Hadoop, Spark, Flink, MPI
Hadoop, Spark, Flink,GraphLab, MPIGraph analytics
NoSQL Hbase, MongoDBComponentBenchmark
Online service Xapian
Data warehouse Hive, SparkSQL, Impala
Streaming Spark streaming, JStorm
Data Generator(BDGS)
![Page 4: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/4.jpg)
BigDataBench ASPLOS2018
BDGS - Text
n Text_datagenn Wikipedia generator - 3trainedmodels• lda_wiki1w, wiki_1w5, wiki_noSW_90_Sampling
n Amazon movie review generator – 2 models• amazonMR1, AMR1_noSW_95_Sampling
n Use“gen_text_data.sh”
e.g.lda_wiki1w e.g.10 e.g.100 e.g.10000
e.g.amazonMR1 e.g.10 e.g.100 e.g.10000
Wiki example:
Amazon example:
![Page 5: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/5.jpg)
BigDataBench ASPLOS2018
BDGS - Graph
n Graph_datagenn Kronecker Model• Weighted graph• Un-weighted graph
e.g.kronecker model parameter Vertex: 2^16
![Page 6: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/6.jpg)
BigDataBench ASPLOS2018
BDGS - Table
n Table_datagenn E-commerce data generation• PDGF: usesXMLconfigurationfilesfordatadescriptionanddistribution
n Personal Resume generation
![Page 7: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/7.jpg)
BigDataBench ASPLOS2018
Micro Benchmark
n Offline analytics & Graph analyticsn Streaming
![Page 8: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/8.jpg)
BigDataBench ASPLOS2018
Offline Analytics - RandSample
n Target: run RandSample microbenchmarkn General steps:
n Prepare Hadoop environmentn Prepare input data• Using wikipedia text data generator
n ./run_RandSample.sh• hadoop jarRandSample.jar RandSample <input><output><sample_ratio>
![Page 9: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/9.jpg)
BigDataBench ASPLOS2018
Offline Analytics – FFT examplen Target: run “FFT” micro benchmarkusinghadoopn General steps:
n Prepare Hadoop environmentn Prepare matrix data
• cd/BigDataBench_V4.0_Hadoop/MicroBenchmark/OfflineAnalytics/FFT• sh genData_FFT.shsh generate-matrix<mat_row><mat_col><sparsity>
n RunFFT:• sh run_FFT.shhadoop jarfft.jarorg.fft.fft <inputfile><outputfile1><outputfile2><log2_col><log2_co>:(auto-generated by run_FFT.sh)
![Page 10: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/10.jpg)
BigDataBench ASPLOS2018
Streaming – Grep example
n Target:rungrep benchmarkusingSparkstreamingn Generalsteps:
n PrepareSparkstreamingenvironmentn cd/BigDataBench_V4.0_Streaming/MicroBenchmark/Streaming/Grep
n ./run-sparkstreaming-grep.sh
![Page 11: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/11.jpg)
BigDataBench ASPLOS2018
Micro Benchmark
n AI
![Page 12: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/12.jpg)
BigDataBench ASPLOS2018
AI – Conv2d example
n Target: run conv2d micro benchmark usingTensorFlow
n General steps:n Prepare TensorFlow environmentn Prepare image datan Config image directory in conv2d.pyn python conv2d.py
![Page 13: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/13.jpg)
BigDataBench ASPLOS2018
Micro Benchmark
n NoSQL
![Page 14: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/14.jpg)
BigDataBench ASPLOS2018
NoSQL – Write example
n Target:run“write”operationsusingHBasen Generalsteps:
n PrepareHBase accordingtotheofficeguide• sh /hbase-0.94.5/bin/hbase shell• create'usertable','f1','f2','f3'
n PrepareYCSBastheworkloadgenerator• YCSBisinthedirectoryofBasicDatastoreOperaOons/ycsb-0.1.4
n RunYCSBcommandslikethis:• •sh bin/ycsb loadhbase -Pworkloads/workloadc -pthreads=<thread-numbers>-pcolumnfamily=<family>-precordcount=<recordcount-value>-phosts=<hosOp>-s>load.dat
![Page 15: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/15.jpg)
BigDataBench ASPLOS2018
Component Benchmark
n AI
![Page 16: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/16.jpg)
BigDataBench ASPLOS2018
AI – Alexnet Examplen Target: run “Alexnet” micro benchmarkusingTensorflown General steps:
n Prepare Tensorflow environmentn RunAlexnet:
• cd/BigDataBench_V4.0_Tensorflow/ComponentBenchmark/AI/Alexnet• pythonalexnet_cifar10.py• Choosing CPU or GPU environment
![Page 17: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/17.jpg)
BigDataBench ASPLOS2018
Component Benchmark
n Offline analytics & Graph analyticsn Streaming
![Page 18: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/18.jpg)
BigDataBench ASPLOS2018
Offline Analytics – SIFT examplen Target: run “SIFT” component benchmarkusinghadoopn General steps:
n Prepare Hadoop environmentn Prepare SIFT data
• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/OfflineAnalytics/SIFT• Put the image data under SIFT directory• sh genData_SIFT.shhadoopjar$jarFile/hibImport.jar-h/testimage/out.hib
n RunSIFT:• sh run_SIFT.shhadoop jarsift.jar<out.hib><outsif><out.hib>:genData_SIFT.shgeneratedata<outsif>:theresulttosavepath
![Page 19: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/19.jpg)
BigDataBench ASPLOS2018
Streaming – Kmeans example
n Target:runkmeans benchmarkusingSparkstreaming
n Generalsteps:n PrepareSparkstreamingenvironmentn cd/BigDataBench_V4.0_Streaming/ComponentBenchmark/Streaming/Kmeans
n ./run-sparkstreaming-kmeans.sh
![Page 20: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/20.jpg)
BigDataBench ASPLOS2018
Graph Analytics – PageRankn Target: run “PageRank” component benchmarkusinghadoopn General steps:
n Prepare Hadoop environmentn Runthedatagenerationscript
• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/GraphAnalytics/PageRank
• sh genData_PageRank.sh
n RunPageRank:• sh run_PageRank.shhadoop jarpegasus.PagerankNaive <inputfile>pr_tempmv pr_output<Internation><reducers><1024><makesym><new>
![Page 21: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/21.jpg)
BigDataBench ASPLOS2018
Online Service – Xapian (cont’)
n Target: run searching using Xapiann General steps:
n 3) Online searching• Run xapian/run_networked.sh
![Page 22: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/22.jpg)
BigDataBench ASPLOS2018
Online Service – Xapian
n Target: run searching using Xapiann General steps:
n 1) Install Xapian according to user manual• ./build.sh to install harness (gcc version > 4.8)• xapian/build.sh to install xapian
![Page 23: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/23.jpg)
BigDataBench ASPLOS2018
Online Service – Xapian (cont’)
n Target: run searching using Xapiann General steps:
n 2) Configuration• vim xapian/run_networked.sh
![Page 24: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/24.jpg)
BigDataBench ASPLOS2018
Component Benchmark
n Data warehouse
![Page 25: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/25.jpg)
BigDataBench ASPLOS2018
Data Warehouse – Select example
n Target: run “Select” benchmarkusinghadoop hiven General steps:
n Prepare Hadoop andhiveenvironmentn Runthedatagenerationscript
• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/Datawarehouse/Select/• sh genData_Select.sh
n RunSelectlikethis:• sh run_Select.sh
![Page 26: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/26.jpg)
BigDataBench ASPLOS2018
Conclusion
n Website:http://prof.ict.ac.cn
n Please refer to user manual for more details !
![Page 27: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI](https://reader033.vdocuments.us/reader033/viewer/2022050110/5f47db655bc1111f1b0ef0b2/html5/thumbnails/27.jpg)
BigDataBench ASPLOS2018