glopencl: opencl support on hardware- and software-managed ...€¦ · hardware- and...
TRANSCRIPT
![Page 1: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/1.jpg)
GLOpenCL: OpenCL Support on Hardware- and Software-
Managed Cache Multicores
Department of Computer and Communications Engineering
University of Thessaly Volos, Greece
Konstantis Daloukas
Christos D. Antonopoulos
Nikolaos
Bellas
![Page 2: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/2.jpg)
23/03/2011 HiPEAC 2011 2
Introduction
• OpenCL: a unified programming standard and framework
• Targets:
– Homogeneous and heterogeneous multicores
– Accelerator-based systems
• Aims at being platform-agnostic
• Enhances portability
![Page 3: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/3.jpg)
23/03/2011 HiPEAC 2011 3
Motivation
• Vendor specific OpenCL implementations target either only hardware- or software-controlled cache multicores
– AMD Stream SDK – x86
– IBM OpenCL SDK – Cell B.E.
• Lack of a unified framework for architectures with diverse characteristics
![Page 4: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/4.jpg)
23/03/2011 HiPEAC 2011 4
Contribution
• GLOpenCL: a unified OpenCL framework
– Compilation infrastructure
– Run-time support
• Enables native execution of OpenCL applications on:
– Hardware-controlled cache multicores
– Software-controlled cache multicores
• Achieves comparable performance to architecture-specific implementations
![Page 5: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/5.jpg)
23/03/2011 HiPEAC 2011 5
Outline
• Introduction
• The OpenCL Programming Model
• Compilation Infrastructure
• Run-Time Support
• Experimental Evaluation
• Conclusions
![Page 6: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/6.jpg)
23/03/2011 HiPEAC 2011 6
OpenCL Platform Model
From: http://www.viznet.ac.uk
![Page 7: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/7.jpg)
23/03/2011 HiPEAC 2011 7
OpenCL Execution Model
• An OpenCL application consists of two parts:
– Main program that executes on the host
– A number of kernels that execute on the compute devices
• Main constructs of the OpenCL execution model:
– Kernels
– Memory Buffers
– Command Queues
![Page 8: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/8.jpg)
23/03/2011 HiPEAC 2011 8
OpenCL Kernel Execution “Geometry”
![Page 9: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/9.jpg)
23/03/2011 HiPEAC 2011 9
Outline
• Introduction
• The OpenCL Programming Model
• Compilation Infrastructure
• Run-Time Support
• Experimental Evaluation
• Conclusions
![Page 10: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/10.jpg)
Granularity Management
23/03/2011 HiPEAC 2011 10
work-group
![Page 11: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/11.jpg)
Serialization of Work Items
23/03/2011 HiPEAC 2011 11
__kernel void Vadd(…) ,
int index = get_global_id(0);
c[index] = a[index] + b[index];
}
__kernel void Vadd(…) ,
int index;
for( i = 0; i < get_local_size(2); i++)
for( j = 0; j < get_local_size(1); j++)
for( k = 0; k < get_local_size(0); k++) {
index = get_item_gid(0);
c[index] = a[index] + b[index];
}
}
OpenCL code
C code
#define get_item_gid(N) \
(__global_id[N] + (N == 0 ? __k : (N == 1 ? __j : __i)))
![Page 12: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/12.jpg)
Elimination of Synchronization Operations
23/03/2011 HiPEAC 2011 12
Statements_block1
barrier();
Statements_block2
triple_nested_loop {
Statements_block1
barrier();
Statements_block2
}
OpenCL code
C code
![Page 13: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/13.jpg)
Elimination of Synchronization Operations
23/03/2011 HiPEAC 2011 13
triple_nested_loop {
Statements_block1
}
//barrier();
triple_nested_loop {
Statements_block2
}
triple_nested_loop {
Statements_block1
barrier();
Statements_block2
}
C code C code
![Page 14: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/14.jpg)
Variable Privatization
23/03/2011 HiPEAC 2011 14
triple_nested_loop {
Statements_block1
x*i+*j+*k+ = …;
}
triple_nested_loop {
Statements_block2
… = … x*i+*j+*k+ …;
}
triple_nested_loop {
Statements_block1
x = …;
}
triple_nested_loop {
Statements_block2
… = … x …;
}
C code C code
![Page 15: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/15.jpg)
23/03/2011 HiPEAC 2011 15
Outline
• Introduction
• The OpenCL Programming Model
• Compilation Infrastructure
• Run-Time Support
• Experimental Evaluation
• Conclusions
![Page 16: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/16.jpg)
23/03/2011 HiPEAC 2011 16
Run-Time System Library
• Provide run-time support for:
– Work management
– Manipulation of memory buffers
• The run-time system provides a unified design for both architectures
![Page 17: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/17.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 17
Command Queue
C1
C2
C3
Main Thread
![Page 18: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/18.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 18
Command Queue
C1
C2
C3
Main Thread
![Page 19: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/19.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 19
Command Queue
C1
C2
C3
Worker Thread
Worker Thread
Ready Command
Queue
…
Main Thread
![Page 20: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/20.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 20
Command Queue
C1 C2
C3
Worker Thread 1
Worker Thread N
Ready Command
Queue
…
Work Queue 1
Work Queue N
T2
T1 TN-1
TN
Main Thread
![Page 21: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/21.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 21
Command Queue
C2
C3
Worker Thread 1
Worker Thread N
Ready Command
Queue
…
Work Queue 1
Work Queue N
T2
T1 TN-1
TN
Main Thread
![Page 22: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/22.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 22
Command Queue
C2
C3
Worker Thread N
Ready Command
Queue
…
Work Queue 1
Work Queue N
Worker Thread 1
Main Thread
![Page 23: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/23.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 23
Command Queue
C2
C3
Worker Thread 1
Worker Thread N
Ready Command
Queue
…
Work Queue 1
Work Queue N
T1
T2
TN-1
TN
C3
Main Thread
![Page 24: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/24.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 24
Command Queue
Worker Thread 1
Worker Thread N
Ready Command
Queue
…
Work Queue 1
Work Queue N
TN C3 T1
T2 TN
Main Thread
![Page 25: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/25.jpg)
Architecture for Hardware- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 25
Command Queue
Worker Thread 1
Worker Thread N
Ready Command
Queue
…
Work Queue 1
Work Queue N
Helper Thread
Async. Copy
Queue Main
Thread
![Page 26: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/26.jpg)
Architecture for Software- Controlled Cache Multicores
23/03/2011 HiPEAC 2011 26
Main Thread
Worker Thread 1
Worker Thread N
…
Work Queue 1
Host Side Accelerator Side
Work Queue N
Command Queue
Ready Command
Queue
Helper Thread
Work related req./replies
SW cache req./replies
![Page 27: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/27.jpg)
23/03/2011 HiPEAC 2011 27
Outline
• Introduction
• The OpenCL Programming Model
• Compilation Infrastructure
• Run-Time Support
• Experimental Evaluation
• Conclusions
![Page 28: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/28.jpg)
23/03/2011 HiPEAC 2011 28
Experimental Evaluation
• Evaluate the framework on two representative architectures: – Homogeneous, hardware-controlled cache memory
processor – Intel’s E5520 i7 (Nehalem) processor
– Heterogeneous, software-controlled cache memory processor – Cell B.E. processor
• Compare with vendor provided, platform-customized implementations – AMD Stream SDK for x86 architectures
– IBM OpenCL SDK for the Cell B.E.
![Page 29: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/29.jpg)
23/03/2011 HiPEAC 2011 29
Vector Add – x86
Vector Add (1-D part.)
0123456
64 128
256 1K 2K 64 128
256 1K 2K 64 128
256 1K 2K 64 128
256 1K 2K
16M 32M 64M 96M
Exe
cuti
on
tim
e (
sec)
GLOpenCL
AMD OpenCL
![Page 30: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/30.jpg)
23/03/2011 HiPEAC 2011 30
Vector Add – Cell
Vector Add (1-D part.)
0,00,10,20,30,40,50,60,7
64 128
256 2K 64 128
256 2K 64 128
256 2K 64 128
256 2K
2M 4M 8M 32M
Exe
cuti
on
tim
e (
sec)
GLOpenCL
IBM OpenCL
![Page 31: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/31.jpg)
23/03/2011 HiPEAC 2011 31
AES Encryption – x86
0
1
2
3
4
5
6
8x8 16x16 8x8 16x16
HD UHD
Exe
cuti
on
Tim
e (
sec)
AES Encryption (2-D part.)
GLOpenCL
AMD OpenCL
![Page 32: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/32.jpg)
23/03/2011 HiPEAC 2011 32
AES Encryption – Cell
0,0
0,5
1,0
1,5
2,0
2,5
3,0
3,5
4,0
8x8 16x16 8x8 16x16
HD UHD
Exe
cuti
on
Tim
e (s
ec)
AES Encryption (2-D part.)
GLOpenCL
IBM OpenCL
![Page 33: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/33.jpg)
23/03/2011 HiPEAC 2011 33
BlackScholes – x86
0,0
1,0
2,0
3,0
4,0
5,06
4
12
8
25
6
1K
2K 64
12
8
25
6
1K
2K 64
12
8
25
6
1K
2K 64
12
8
25
6
1K
2K
16M 32M 64M 96M
Exe
cuti
on
Tim
e (
sec)
Black Scholes (1-D part.)
GLOpenCL
GLOpenCL_async
AMD OpenCL
AMD OpenCL_async
![Page 34: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/34.jpg)
23/03/2011 HiPEAC 2011 34
BlackScholes – Cell
0,00,51,01,52,02,53,03,54,0
64
12
8
25
6
1K
2K 64
12
8
25
6
1K
2K 64
12
8
25
6
1K
2K
1M 2M 4M
Exe
cuti
on
Tim
e (
sec)
BlackScholes (1-D part.)
GLOpenCL
GLOpenCL_async
IBM OpenCL
IBM OpenCL_async
![Page 35: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/35.jpg)
23/03/2011 HiPEAC 2011 35
Outline
• Introduction
• The OpenCL Programming Model
• Compilation Infrastructure
• Run-Time Support
• Experimental Evaluation
• Conclusions
![Page 36: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/36.jpg)
23/03/2011 HiPEAC 2011 36
Conclusions
• GLOpenCL achieves comparable, or better performance than architecture specific implementations
• OpenCL standard leaves enough room for platform specific optimizations – May yield significant performance improvements
• Future Work: Reduce the effect of shared data structures – Memory access pattern analysis
![Page 37: GLOpenCL: OpenCL Support on Hardware- and Software-Managed ...€¦ · Hardware- and Software-Managed Cache Multicores Department of Computer and Communications Engineering University](https://reader033.vdocuments.us/reader033/viewer/2022060220/5f0731b17e708231d41bc64d/html5/thumbnails/37.jpg)
Acknowledgements
This project is partially supported by the EC Marie Curie International Reintegration Grant (IRG) 223819
23/03/2011 HiPEAC 2011 37