is-4011, accelerating analytics on hadoop using opencl, by zubin dowlaty and krishnaraj gharpure
DESCRIPTION
Presentation IS-4011, Accelerating Analytics on HADOOP using OpenCL, by Zubin Dowlaty and Krishnaraj Gharpure at the AMD Developer Summit (APU13) November 11-13, 2013.TRANSCRIPT
ACCELERATING ANALYTICS ON HADOOP USING OPEN CLZUBIN DOWLATY
KRISHNARAJ GHARPUREROHIT SAREWAR
PRASANNA LALINGKARAUSTIN CV
MU SIGMA INC.
November 2013
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL2
AGENDA
High Level Trends in the Applied Analytics Arena‒Computational Enablement
‒Usability & Visualization
‒ Intelligent Systems
Computational Enablement‒Big Data & Hadoop
‒Reference Diagram
‒Open CL and Hadoop Research
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL3
THE INFORMATION EXPLOSION IS LEADING TO AN UNPRECEDENTED ABILITY TO STORE, MANAGE AND ANALYZE DATA; WE NEED TO RUN AHEAD..
► Text
Expanding ApplicationsData Source Explosion
Technology EvolutionAdvanced Analytical Techniques
Digital Data in 2011: ~ 1750 Exabytes
Facebook: 100+ million users per day
Mobile Phone Subscription: 4.6 billion
15 Million Bing recommendation records
RFID Tags: 40 billion
INFORMATION
CLOUD
Click-streamdata
Purchase intent data
Social Media, Blogs
Email Archives
RFID & Geospatial Information
Events
Opinion Mining
Social Graph Targeting
InfluencerMarketing
Offer Optimization
Clinical Data AnalysisFraud Detection
SKU Rationalization
Massively large databases
Search TechnologiesComplex Event Processing
Cloud ComputingSoftware as a service
Interactive Visualization
In Memory Analytics
CART
Adaptive Regression Splines
Machine Learning
Radial Basis Functions
Support Vector Machines
Neural Networks Geospatial Predictive Modeling
Intra / Inter Firewall
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL4
THREE KEY MACRO ANALYTICS TRENDS TARGETING CONSUMPTION
Macro Trends: Agenda in All Three
Analytics TrendsComputational Enablement
– Scalable Analytics, Big Data & NoSQL technologies, GP-GPU, In – Memory & High Performance Computing
– Master Data Management, Cloud, and Federated Data – the end of the central EDW?
– Analytics as a Service and the Cloud
– Open Source Maturity
– Rise of Agile Self Service
– Data Model Disruption: ETL vs. ELT
Usability and Visualization
– Dashboards will Evolve into Cockpits
– Improved Interactive Data Visualization & Aesthetics
– Augmented / Virtual Reality / Natural User Interfaces
– Graph Analytics sparked by Social Data
– Discovery with Computational Geometry
– Mobile BI and the Integration of Consumer & Enterprise Devices
– Gamification: Do you want to play a game?
Intelligent Systems (“Anticipation
Denotes Intelligence”)
‒ Operational Smart Systems are Back propagating – Pervasive Analytics
‒ Real Time Analytics and Event Streams – Beyond Batch..
‒ Artificial Intelligence (AI) is not what we expected
‒ Don’t forget the 4th tier it’s Juicy
‒ Metaheuristics and the Ants
‒ Operational Analytics, You are the Exception
‒ Experiments in the Enterprise and the Quasi
‒ Just Say No to OLS
‒ Energy Informatics if Radiating
‒ Intelligent Search
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL5
“BIG DATA”, THE TERM, IS ABOUT APPLYING NEW TECHNOLOGIES TO MEET UNFULFILLED NEEDS: ANALYTICS AT SCALE…
What is Big Data?
Big Computation Map Reduce
BIG Computation & Big Data Scaling Economically
Intensive Computational Analysis not Supported by the Traditional Data Warehouse Stack
and Architecture
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL6
BIG DATA’S RECENT ENTERPRISE POPULARITY CAN BE ATTRIBUTED TO TWO KEY FACTORS
Big Data Technology Evolution
2003 2004 2005 2006 2007 2008 2009 2010 2011
Apache: Hadoop project
Yahoo: 10K core cluster
IBM Acquires
Early Research Open source dev momentum
Initial success stories
Commercialization & Adoption
Cloudera named most promising startup funded in 2009
elastic map-reduce
Teradata buys Aster (map-reduce DB)Google Trigger: Releases “Map Reduce” & “GFS” paper
Google: Bigtablepaper
Lucene subproject
HP Acquires Vertica
EMC Acquires Greenplum
2012
OracleMSOFT
SAS
Technology Trigger
‒ Robust Open Source Technology for Parallel Computation on Commodity Hardware (Hadoop / NoSQL)
Frustration Trigger..
‒ Tyranny of the Data Model, Cost & Complexity for data integration, Agility, Impact, Bureaucracy, Difficulty & Costly to scale of existing EDW technology
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL7
A MINDSET FOR ENABLING DECISION SCIENCES AT SCALEDATASET -> TOOLSET -> SKILLSET -> MINDSET
Big Data – What is it?
Skillset
‒ Computer Scientist
‒ Map-Reduce, HDFS, HIVE, R, PIG, NoSQL, Unstructured Data
‒ Parallel computation & Stream Processing
‒ Data & Visual Explorer
‒ Math / Stats / Data Mining / Machine Learning
‒ Math & Technology & Business
‒ Data “Artisans” / Design Thinking
‒ Systems Theory, Generalizations
Mindset
‒ Automation, Scale, Agility
‒ Higher utilization of machines for decisions
‒ Learning vs Knowing
‒ Consumption Focused
‒ Algorithmic and Heuristic (Man & Machine)
‒ Intrapreneur, Curious, Persuasive, Soft Skills
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL8
THE BIG DATA ECO-SYSTEM IS A CHALLENGE TO KEEP UP WITH..MINDSET: KNOWING VS. LEARNING..
The Big Data Ecosystem
”Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field).” source: Gartner
Source: The Big Data Group llc
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL9
TREND: DASHBOARDS WILL EVOLVE TO BE MORE ACTIONABLE AND FORWARDLOOKING
Dashboredom?
http://www.information-management.com/issues/20_7/dashboards_data_management_quality_business_intelligence-10019101-1.html“As we move from the information age into the intelligence age”..
Usability and Visualization: Evolution
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL10
OPEN SOURCE: D3 (DATA-DRIVEN DOCUMENTS) HTTP://D3JS.ORG/
1. Helps to bring data to life using HTML5, SVG and CSS.
2. Its an extension of “Protovis”, a very popular JavaScript library.
3. With Minimal overhead, D3 is extremely fast, supporting large datasets and dynamic behavior for interaction and animation.
4. It provides many built-in reusable functions and function factories, such as graphical primitives for area, line and pie charts.
http://d3js.org/
Usability and Visualization
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL11
OPEN SOURCE:
R is the premier language for data scientists
‒ Strength in data visualizations..
‒ http://www.r-project.org/
Usability and Visualization
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL12
GRAPH NETWORK VISUALIZATION APPLIED TO METRIC DATA – CORRELATIONNETWORKS
Usability and Visualization
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL13
Yippee ! 5 % discount ..
Yeah.. I think I need these headphones too
THE CURRENT BUSINESS SCENARIO DEMANDS AUTOMATING AND INJECTING INTELLIGENCE INTO VARIOUS ENTERPRISE TOUCH POINTS
Webpage
Add to Cart
Checkout
Yearly Bonus Let me buy a new laptop
Taxonomy, Associations, Nearest Neighbor Popup
Optimization, Need State Personalization
Collaborative FilteringRecommender Systems
Offer Optimization
Behind the Scenes
Marketing
Mix
CRM
Revenue
Management
Hey ! I have some good deals being provided hereWow ! Great offers
.. Since I have some more money, let me have a lookCustomer Segmentation
Propensity Score
Attribution Methods to Optimize Media Spend
PricingElasticity
Add Intelligent Analytics Everywhere You Can to Unlock The Value In Information
Search Buy laptop
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL14
WE ARE IN THE ‘INFORMATION AGE’ AND ITS TIME THAT DATA AND HUMAN ANALYTICS CAPABILITIES ARE AUGMENTED WITH MACHINE INTELLIGENCE
Intelligent Systems: Next Wave of Innovation and Change (Iron Man Suit)
Data Mining Finally Meets the Operation.. Powered by Big Data Scale & Value
Convergence of the global industrial system with
the power of advanced computing, analytics,
low-cost sensing and new levels of connectivity
permitted by the Internet
* Source: GE (Industrial Internet – Pushing the Boundaries of Minds and Machines)
Intelligent
Machines
Advanced
Analytics
People
At Work
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL15
CURRENT NEWS ON INTELLIGENT SYSTEMS
CEP Vendors Acquired June 2013: Streambase (Tibco) and Progress Apama (Software AG)
USA Today, September 16, 2012
– http://www.usatoday.com/money/business/story/2012/09/16/review-automate-this-probes-math-in-global-trading/57782600/1
Two Second Advantage: Published 2011
– Wayne Gretzky: Edge: He predicted a little better than anyone else, where the hockey puck would be.
– Enterprise 2.0: Every transaction became a bit of digital data
– Enterprise 3.0: Every EVENT can become a bit of digital data
– There will always be value in making projections, days, months, years, predictive analytics, data mining, and analytics systems on historical data – batch systems
– But in today’s world, enterprises need something more, instantaneous, proactive, predictive capability –real time systems
GE Mind an Machines: November 2012» Industrial Internet, Intelligent Systems, and Intelligent Machines
» http://www.ge.com/docs/chapters/Industrial_Internet.pdf
» http://www.youtube.com/watch?v=SvI3Pmv-DhE
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL16
AGENDA
High Level Trends in the Applied Analytics Arena‒Computational Enablement
‒Usability & Visualization
‒ Intelligent Systems
Computational Enablement‒Big Data & Hadoop
‒Open CL and Hadoop Research
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL17
THE DIVERSE WORLD OF HADOOP
Self healing, distributed
storage
Commodity hardware –
inexpensive storage
+
Fault tolerant distributed
processing
Abstraction for parallel
computing
Manage complex data – relational and non relational – in a single repository
“Data Reservoir/Lake”
Store massive data sets, scale horizontally with inexpensive commodity nodes
Process at source – eliminate data movement
Complements your existing relational database technology that excel at structured data manipulation
**NEWS FLASH***Hadoop as the data “OS”Hadoop 2.0 and Yarn Supporting Other
Parallel Frameworks such as MPI (SAS HPC Announcement)
High Speed SQL (STRATA 2013) Impala, Dremel, SHARKHadapt
ETL & EDW DisruptionGigaOM: Think Bigger.. Schema on Read, ELT
Zookeeper, PuppetMaintenance
HBASE, Pig, Hive, Rhipe, Rhadoop, MahoutData management and analytics
Nagios, Ganglia, SplunkMonitoring
Computational Enablement
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL18
THE HPC LANDSCAPE IS VAST AND CONTINUOUS RESEARCH TO UNDERSTAND THE STRENGTHS AND WEAKNESSES OF DIFFERENT COMPONENTS IS THE KEY
Techniques for writing parallel programs
Already in use by Mu Sigma
Should be explored
Legend
Not necessary in current context
Techniques supporting parallelization on the infrastructure
• Size indicates our comfort level with the technique/tool
ComputationDriven
Data Driven
Shared Memory
HAMA MPI
GPGPU
Map-Reduce
STORM
Distributed Memory
JumboMem
SPARK
OpenMP
Agent, Actor Model
Intel TBBs
HPCC
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL19
HADOOP 2.0“THE OS FOR DATA”
Source: HortonWorks
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL20
MU SIGMA IMPLEMENTED A BIG DATA SOLUTION THAT REDUCED SCORING TIME FROM 18 HOURS TO 2 HOURS..
Marketer: Process to score a list was taking 18 hours…
Data sourcing Data management Data processingCase study: pro CRM client
Load scored customer data in the specified format back to EDW
INGEST
Big Data Infrastructure
Score all customers based on Prioritization logic & produce output in specified format
Data Consumption
Data Scientists adding propensity scores,segmentation codes, LTV, etc..
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL21
GLM on Map-Reduce, Attribution modeling on clickstream data
IP Creative 1 Creative 2 Creative 3 Purchase (1/0)
E-Commerce sites are constantly collecting clickstream data, of the order of terabytes, on various attributes such as
users,
the creative's they’ve viewed,
their transactions, etc.
Frequently, when an online customer makes a purchase, the sale is attributed to the most recent creative
However, this inferred causality is not necessarily always correct
A previous creative or sequence of creatives might actually have triggered the customer’s decision to make a purchase
Background
Statistical techniques such as logistic regression can help
better understand the contributions of different creativestowards purchases and
identify key purchase-driving creatives
Hadoop can store and process large volumes of data and R is a powerful statistical programming language
Leveraging R to perform statistical analysis on large clickstream data stored in Hadoop using map-reduce is the natural next step
Least-squares regression, a crucial component of iterative methods used to solve logistic regression problems, has been implemented here in R, using map-reduce
Solution
w.x.y.z 3:15 AM - 3:11 AM 1
a.b.c.d - 9:54 PM 5:12 PM 0
e.f.g.h 5:49 AM 6:16 AM - 1
Application: Machine Logs
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL22
SPEEDING UP HADOOP WITH THE GPU
Hadoop jobs are designed to run in batch, with data being read off disks and streamed into ‘mappers’ and ‘reducers’
‒ Execution times of Hadoop jobs can be viewed in terms of a fixed overhead, a variable overhead (dependent mainly on data size) and the processing time, which depends on the complexity of computations and the size of data
‒ For a given size of data, the most computationally trivial job in Hadoop (an identity job) is usually measurable in tens of seconds
The testing objective was to measure the speedups achievable by using GPUs within MapReduce for relatively complex data processing
DOES THE GPU HELP?
Processing complexity
Fixed overhead
Variableoverhead
ProcessingProcessingProcessing
Datasize
Fixed overhead
Variableoverhead
Fixed overhead
Variableoverhead
Fixed overhead
Variableoverhead
×
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL23
SETTING IT UP AND MAKING IT WORK
Servers, with
Since mappers/reducers run in parallel, an ideal Hadoop cluster with a GPU per node was simulated by running mappers/reducers on one machine with one GPU
It was found that
‒ Hadoop doesn’t recognize the GPU card
‒ Needs to access GUI services provided by the X-server to be able to do so*
‒ A few modifications to the user’s shell configurations were required to get around this
CONFIG AND TEETHING TROUBLES
* AMD is currently working on getting OpenCL working without the X-server
Details
CPU 4-core Intel(R) Xeon(R) CPU E5-1603 0 @ 2.80GHz
RAM 16 GB
GPU AMD/ATI Radeon HD 7970 with 3GB GDDR5 Memory
Base software Ubuntu 12.04.3 LTS , Java 1.6.0_31
Hadoop + ecosystem Hadoop 2.0.0-cdh4.4.0 , MapReduce 0.20
OpenCL + ecosystem OpenCl 1.2 , JOCL
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL24
Linear regression was written in Java MapReduce with and without a GPU
‒ Uses 𝑁 rows and (𝑀 + 2)* columns of data to fit the expected value of the dependent variable as a linear combination of a set 𝑀 of predictor variables, such that for the 𝑖th row
‒ 𝐸 𝑦𝑖 = 𝛽0 + 𝑗=1𝑀 𝛽𝑗𝑥𝑖,𝑗
‒ Regression coefficients are computed by solving the Normal Equations
‒ 𝛽 = (𝑋𝑇𝑋)−1(𝑋𝑇 𝑦) where 𝑋 is a matrix of dimension (𝑁) × (𝑀 + 1) and 𝑦 is a column vector of length 𝑁
Data in Hadoop is distributed across multiple nodes into 𝑁 blocks that are mutually exclusive and collectively exhaustive (notwithstanding Hadoop’s replication factor)
‒ 𝑋𝑏𝑘
𝑇𝑋𝑏𝑘 and 𝑋𝑏𝑘
𝑇 𝑦𝑏𝑘 (**) terms are computed and appended in mappers for each input block
‒ Augmented terms output from each of the mappers are summed and inversion and multiplication of terms is performed in the reducer
Using GPUs, each 2D input** is passed as a flattened 1D array to a kernel function
‒ Mappers compute dot products of all sub-arrays formed by elements 𝑒ℎ such that ℎ mod 𝑀 + 2 = 𝑙 for 𝑙 from 0 to 𝑀 + 1
Data in Hadoop is distributed across multiple nodes into 𝐵 blocks that are mutually exclusive and collectively exhaustive (notwithstanding Hadoop’s replication factor)
EXPERIMENTAL DETAILSTHE USE-CASE
* The dependent variable, 𝑀 predictor variables and a column of 1s used to compute the intercept coefficient, 𝛽0
** 𝑏𝑘 for 𝑘 from 1 to 𝐵 indicating the blocks of data that constitute the input data
*** 1D in case data is processed 1 row at a time
Linear regression was written in Java MapReduce with and without a GPU
Each 2D input*** to mappers is flattened by row and sent as a 1D array to GPU kernel functions
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL25
USING THE GPU FROM MAPPERSDATA MOVEMENT AND COMPUTATIONS
1 1 21 3 4
×1 11 32 4
2 4 64 10 146 14 20
=Computing the XTX term
XT X XTX
1 1 21 3 4
2 4 64 10 146 14 20
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL26
MAPREDUCE ON CPU
Mapper class uses methods
‒ setup() : Called once at the beginning of the map task
‒ map() : Called for each (K,V) pair, Create a list of all the values in the input split
‒ cleanup(): Called once at the end of the map task, OLS code implemented in this method
Reducer class uses methods
‒ setup() : Called once at the beginning of the reduce task
‒ reduce() : Called for each key, sum of intermediate matrices computed in this method
‒ cleanup(): Called once at the end of the
reduce task, Matrix transpose and β values computed here
Mapper Reducer
CPU OLS CODE LOGIC
Calculate XTX, XTY for input
blocks
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL27
Mapper class uses methods
‒ setup() : Called once at the beginning of the map task
‒ map() : Called for each (K,V) pair, Create a list of all the values in the input split
‒ cleanup(): Called once at the end of the map task, OLS code implemented in this method using JOCL library
Reducer class uses methods
‒ setup() : Called once at the beginning of the reduce task
‒ reduce() : Called for each key, sum of intermediate matrices computed in this method
‒ cleanup(): Called once at the end of the
reduce task, Matrix transpose and β values computed here
MAPREDUCE ON GPU + CPU
Mapper Reducer
GPU OLS CODE LOGIC
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL28
EXPERIMENTAL OUTCOMES
Processing data one row at a time is not efficient – runtimes were measured for ‘chunks’ of data of increasing size
Performing linear regression using GPUs within MapReduce constantly outperformed the same exercise using ‘vanilla’ MapReduce code
KEY RESULTS
0
1000
2000
3000
4000
5000
6000
1000 2000 5000 10000 25000 50000 100000
Job
exe
cuti
on
tim
e (s
)
Number of rows
Increasing the size of data chunks**
GPU
CPU
* Processing times for 1000 rows and 256 columns, 1 row at a time and all together
** Processing times for 100,000 rows and 256 columns, processed in chunks of ‘Number of rows’
0
50
100
1 row at a time 1000 rows at a time
Job
exe
cuti
on
tim
e (s
)
Multiple data vs row-by-row processing*
GPU CPU
20
30
40
50
60
1000 2000 5000 10000 25000 50000 100000
Job
exe
cuti
on
tim
e (s
)
Number of rows
Increasing the size of data chunks** (GPU only)
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL29
EXPERIMENTAL OUTCOMES
CPU execution times show a sharp rise on data beyond 16 columns and 100,000 rows
GPU execution time does not show significant rise for rows from 1000 to 1,000,000 and columns from 4 to 256
Beyond 1,000,000 rows GPU execution times grow than CPU execution times
* Processing time for data sets where number of rows range from 1K to 10 M and columns range from 4 to 256
1K
10K
100K
1000K
10000K
0
100
200
300
400
500
600
48
1632
64128
256
Numberof rows
Job
exe
cuti
on
tim
e (
s)
Number of columns
GPU execution time *
500-600
400-500
300-400
200-300
100-200
0-100
Time (s)
1K
10K
100K
1000K
10000K
0
100
200
300
400
500
600
48
1632
64128
256
Numberof rows
Job
exe
cuti
on
tim
e (s
)
Number of columns
CPU execution time * #
500-600
400-500
300-400
200-300
100-200
0-100
Time (s)
#1 CPU execution times of a few data sets which are more than 600 seconds which are not seen clearly in above graph
Rows Columns Time (s)
1M 256 804
10M 128 1863
10M 256 7701
CPU AND GPU EXECUTION TIME
#2 The 2 bounds of the sharp increase in CPU execution times are 256 columns * 100 K rows (= 51.2 MB) and 16 columns * 10 M rows (= 320 MB)
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL30
EXPERIMENTAL OUTCOMES
* For data sets where number of rows range from 1K to 10 M and columns range from 4 to 256
1K
10K
100K
1000K
10000K
-1
1
3
5
7
9
11
13
15
17
19
48
1632
64128
256
Number of RowsR
atio
of
exec
uti
on
tim
e o
f (C
PU
/GP
U)
Number of columns
Ratio of CPU to GPU execution time *
17-19
15-17
13-15
11-13
9-11
7-9
5-7
3-5
1-3
-1-1
Ratio
COMPARING CPU AND GPU
For very small data size, CPU and GPU have comparable runtimes
With increase in the number of columns the CPU execution time shows gradual increase as compared to GPU
As number of rows increase the CPU execution time increases drastically as compared to GPU
For larger data size the ratio of CPU to GPU execution time shows a steep increase
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL31
TAKEAWAYSRECOMMENDATIONS AND NEXT STEPS
GPUs can be effectively used within MapReduce to speed up relatively computationally heavy tasks like matrix operations
‒ Since copying data to/from a GPU is expensive, a large amount of data should be copied and processed ‘in one go’
‒ Computation time asymtotically decreases with increasing data chunk size
Multiple mappers/reducers running on a node can share the GPU
Size of data chunks and memory allocation must be chosen carefully to ensure all processes run successfully and faster than they otherwise would, when one GPU is shared across mappers/reducers
‒ Using a GPU is not necessarily more ‘efficient’ for data with a small number of columns
‒ Smaller blocks increases the degree of MapReduce parallelism but also leads to more mappers using the GPU simultaneously
‒ Improper memory allocation in Hadoop and on the GPU can lead to either unused memory, or more memory required than available
Future exploration includes
‒ Measure changes in job execution time with the GPU being used in the reducer as well, for OLS
‒ Empirically determining optimal values for data size and memory allocation with one/more job running multiple mappers/reducers on a node that share a GPU
The Size of input data, Hadoop blocks, memory allocation are key factors that determine successfully and faster execution of Hadoop codes using a GPU, when one GPU is shared across mappers
Multiple mappers running on a node can share the GPU
GPUs can be effectively used within MapReduce to speed up relatively computationally heavy tasks like matrix operations
Future exploration includes
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL32
APPENDIX
| Accelerating Analytics on HADOOP using Open CL | NOVEMBER 19, 2013 | CONFIDENTIAL33
EXPERIMENTAL RESULTSRUNTIMES OF CPU AND GPU
Size Rows
Cols 1K 10K 100K 1M 10M
4 8 KB 80 KB 800 KB 8 MB 80 MB
8 16 KB 160 KB 1.6 MB 16 MB 160 MB
16 32 KB 320 KB 3.2 MB 32 MB 320 MB
32 64 KB 640 KB 6.4 MB 64 MB 640 MB
64 128 KB 1.28 MB 12.8 MB 128 MB 1.28 GB
128 256 KB 2.56 MB 25.6 MB 256 MB 2.56 GB
256 512 KB 5.12 MB 51.2 MB 512 MB 5.12 GB
CPU Rows
Cols 1K 10K 100K 1M 10M
4 10 9 10 14 33
8 11 10 11 14 54
16 10 10 12 20 62
32 10 14 13 44 466
64 12 12 20 120 1073
128 12 12 46 232 1863
256 14 17 157 804 7701
GPU Rows
Cols 1K 10K 100K 1M 10M
4 12 13 14 12 41
8 11 13 12 16 49
16 12 13 12 18 51
32 12 11 13 22 73
64 12 13 13 36 119
128 11 13 16 39 202
256 13 12 18 57 423
Execution time for GPU OLS code in secondsExecution time for CPU OLS code in seconds
Data size for different number of rows and columns
Ratio Rows
Cols 1K 10K 100K 1M 10M
4 0.83 0.69 0.71 1.17 0.80
8 1.00 0.77 0.92 0.88 1.10
16 0.83 0.77 1.00 1.11 1.22
32 0.83 1.27 1.00 2.00 6.38
64 1.00 0.92 1.54 3.33 9.02
128 1.09 0.92 2.88 5.95 9.22
256 1.08 1.42 8.72 14.11 18.21
Ratio of CPU to GPU execution time
[email protected] @crunchdata