Download - «Использование GPU для распределенных вычислений Map Reduce (Hadoop)»

© ALTOROS Systems | CONFIDENTIAL

“The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that

“nobody ever got firedfor using Hadoop on a cluster”!


BreakingNews

IBM Keynote at JavaOne 2013: Java Flies in Blue Skies and Open Clouds

Java and GPUs open up a world of new opportunities for GPU accelerators and Java programmers alike.


BreakingNews

Duimovich showed an example of GPU acceleration of sorting using standard NVIDIA CUDA libraries

that are already available!

The speedups are phenomenal — ranging from 2x to 48x faster!


BreakingNews?


BreakingHadoop


BreakingHadoop

10 000x faster


Hadoop vs GPU

Hadoop & GPU

Hadoop + GPU

HPC

Big Data

GPGPU in JavaHeterogeneous systems

Horizontal and vertical scalability


Hadoop horizontal scalability

file01 file02 file03




Node 1 Node 2 Node 3

01 02 03 04 05 06 07 08 09 10

01

02

03

04

05 0607 0809 10





01 02 03 04 05 06 07 08 09 10

01

02

03

04

05 0607 0809 10

3 4 3





01 02 03 04 05 06 07 08 09 10

01

02

03

04

05 0607 0809 10

3 4 3


01 02

03 04

05 06

07 08

09 10


01 02 03

04

05 06 07

08 09 10





01 02 03 04 05 06 07 08 09 10

01

02

03

04

05 0607 0809 10

3 4 3


01 02

03 04

05 06

07 08

09 10


01 02 03

04

05 06 07

08 09 10

221 1 2 2




01 02

03 04

05 06

07 08

09 10


01 02 03

04

05 06 07

08 09 10

221 1 2 2


Use GPU to scale vertically


01 02

03 04

05 06

07 08

09 10


01 02 03

04

05 06 07

08 09 10

221 1 2 20.5 1 1 0.5 1 1


Profit estimation

“Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU” by Intel

NVidia GTX280vs

Intel Core i7-960


Profit estimation

“Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU” by Intel

“OpenCL: the advantages of heterogeneous approach” by Intel

NVidia GTX280vs

Intel Core i7-960


How to use OpenCL?


How to use OpenCL?

Hadoop streaming


Aparapi

Expands Java's “Write Once Run Anywhere” to include APU and GPU devices by expressing data parallel algorithm through extending Kernel base class.

MyKernel.class


Aparapi


MyKernel.classPlatformSupportsOpenCL?


Aparapi



Execute usingJava Thread Pool


Aparapi



Bytecode canbe convertedto OpenCL?

Execute usingJava Thread Pool


Aparapi



Bytecode canbe convertedto OpenCL?

Convert it

Execute OpenCLKernel on DeviceExecute using

Java Thread Pool


Aparapi



Aparapi


lambda


Aparapi


lambda

HSA


Aparapi

Characteristics of ideal data parallel workload


Aparapi


Code which iterates over large arrays of primitives

- 32/64 bit data types preferred

- where the order of iterations is not critical

avoid data dependencies between iterations

- each iteration contains sequential code (few branches)


Aparapi


Code which iterates over large arrays of primitives

- 32/64 bit data types preferred

- where the order of iterations is not critical

avoid data dependencies between iterations

- each iteration contains sequential code (few branches)

Balance between data size (low) and compute (high)

- data transfer to/from the GPU can be costly

- trivial compute not worth the transfer cost

- may still benefit by freeing up CPU for other work(?)


HadoopCL

Rice University, AMD


HadoopCL


HadoopCL

2 six-core Intel X5660(48 GB mem)

2 NVidia Tesla M2050(2*2.5 GB mem)

AMD A10-5800K APU(16 GB mem)


HadoopCL

2 six-core Intel X5660(48 GB mem)

2 NVidia Tesla M2050(2*2.5 GB mem)

AMD A10-5800K APU(16 GB mem)

WHY?


HadoopCL


Back to OpenCL, Aparapi and heterogeneous computing


OpenCL, Aparapi and heterogeneous computing

GPU cache

GPU GDDR5

CPU cache

SATA 3.0 (HDD)

SATA 2.0 (SSD)

1 GBit networkFormula in terms of time:

(CPU calc1) + disk read + disk write>

(CPU calc2 + GPU calc + GPU-write + GPU-read) + disk read + disk write


OpenCL future


OpenCL future

http://streamcomputing.eu/


Questions?

Big Data Experts FB group

Download - «Использование GPU для распределенных вычислений Map Reduce (Hadoop)»

Top Related