integration of gpu technologies in epics for real time data preprocessing applications
DESCRIPTION
Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications. J. Nieto 1 , D.Sanz 1 , G. de Arcas 1 , R . Castro 2 , J.M . López 1 , J. Vega 2 1 Universidad Politécnica de Madrid (UPM ), Spain 2 Asociación EURATOM/CIEMAT para Fusión . Spain. Index. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/1.jpg)
7th Workshop on Fusion Data Processing Validation and Analysis
Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications
J. Nieto1, D.Sanz1, G. de Arcas1, R. Castro2, J.M. López1, J. Vega2
1 Universidad Politécnica de Madrid (UPM), Spain2 Asociación EURATOM/CIEMAT para Fusión. Spain
![Page 2: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/2.jpg)
27th Workshop on Fusion Data Processing Validation and Analysis
Index
Scope of the project Project goals Sample algorithm Test system
Subtask 1: GPU benchmarking Subtask 2: EPICS integration (DPD) Results Conclusions
![Page 3: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/3.jpg)
37th Workshop on Fusion Data Processing Validation and Analysis
FPSC Project FPSC Project Objective: To develop a FPSC prototype
focused on Data Acquisition for ITER IO The “functional requirements” of FPSC prototype:
To provide high rate data acquisition, pre-processing, archiving and efficient data distribution among the different FPSC software modules
To interface with CODAC and to provide archiving
FPSC software based compatible with RHEL and EPICS To use COTS solutions
![Page 4: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/4.jpg)
47th Workshop on Fusion Data Processing Validation and Analysis
FPSC HW architecture
Real Time Controller 19 ” 1 U Chasis NI 8353 RT
PXI Clk10
PFSCSystem Controller CPU
PXIe-PCIe8372
)
PXIe-PCIe Data Archiving servers
NI-8370
172.17.152.13
172.17.152.11
ETHERNET
NI PXI-7952R
NI PXI-6682NI PXI-6653
172.17.152.33
172.17.152.40
MiniCODAC
PC Desktop
PC Desktop
ETHERNET
172.17.152.34
DEVELOPMENT HOST
GPUs
![Page 5: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/5.jpg)
57th Workshop on Fusion Data Processing Validation and Analysis
GPU subtasks Goals:
To provide benchmarking of Fermi GPUs (subtask 1) Analyze GPU development cycle (methodology) Compare execution times in GPU & CPU for similar
developing effort To provide a methodology to integrate GPU processing
units into EPICs (subtask 2) Requisites:
Use an algorithm representative of the type of operations that would be needed in plasma pre-processing
![Page 6: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/6.jpg)
67th Workshop on Fusion Data Processing Validation and Analysis
GPU Test System
Linux RedHat Enterprisev5.5 64bits
CPU Asyn
DPD Subsystem
EPICS IOC
GPU Asyn
IPP v7.0 CULA R11CUBLAS v3.2
CODAC CORE SYSTEM 2.0
Host processor softwareOSMiddlewareCompilers
CPU LibrariesGPU Libraries
RedHat Enterprise Linux 5.5
EPICS 3.14.12 and asynDriver 4.16
gcc V4.12.20080704 and nvcc
V0.2.1221
MKL 10.3 Update 9 and IPP 7.0
NVIDIA SDK 3.2
NVIDIA CUBLAS 3.2
EMPHOTONICS CULA R11
NVIDIA GTX580
Xeon X5550QuadCore
![Page 7: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/7.jpg)
77th Workshop on Fusion Data Processing Validation and Analysis
Sample algorithm
Loop until convergence
Compute first guess(Xfit)
Compute Jacobian matrix(JMxN)
Compute update coeffs.c = c + (J’ · J-1) · J’ · (x’ - x’fit)
Update fit (xfit)
Compute error
Initial coeffs C (N=10)
Input data X(M points)
Fitted coeffs(position & amplitude)
Fitted data
Input DataFitted Data
Best fit code for detecting position and amplitude of a spectra composed by a set of Gaussians based on Levenberg-Marquardt method
![Page 8: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/8.jpg)
87th Workshop on Fusion Data Processing Validation and Analysis
Subtask 1 Goal: benchmarking of a Fermi GPU Standard GPU programming methodology:
GPU is operated from the host as a coprocessor
Host threads sequence GPU operations: Responsible for moving data
(Host↔Device) Operations are coded:
Programming kernels: CUDA Using libraries primitives: CULA,
CUBLAS…Loop until convergence
Compute first guess(Xfit)
Compute Jacobian matrix(JMxN)
Compute update coeffs.c = c + (J’ · J-1) · J’ · (x’ - x’fit)
Update fit (xfit)
Compute error
Initial coeffs C (N=10)
Input data X(M points)
Fitted coeffs(position & amplitude)
Fitted data
![Page 9: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/9.jpg)
97th Workshop on Fusion Data Processing Validation and Analysis
Results S1 (I)Block Size
Exec. Time(ms)
Throughput(MB/s)
Improv.Ratio
GPU CPU GPU CPU 256 2,86 2,75 0,7 0,7 1,0512 2,85 4,83 1,4 0,8 1,71024 3,6 8,69 2,3 0,9 2,42048 5,21 16,07 3,1 1,0 3,14096 16,42 28,55 2,0 1,1 1,78192 42,85 55,26 1,5 1,2 1,3
16384 85,4 107,5 1,5 1,2 1,332768 168,65 210,99 1,6 1,2 1,365536 334,77 425,96 1,6 1,2 1,3
![Page 10: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/10.jpg)
107th Workshop on Fusion Data Processing Validation and Analysis
Results S1 (II)
![Page 11: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/11.jpg)
117th Workshop on Fusion Data Processing Validation and Analysis
Subtask 2 Goal: to provide EPICS support for GPU processing
Processing units
EPICS IOC
DPD
FPGAGPU Others: archiving…
Asyn Layer
Data Generation CPU
EPICS IOC
Acquisition &
Processing
Asyn Layer
Single processapproach
DPDapproach
![Page 12: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/12.jpg)
127th Workshop on Fusion Data Processing Validation and Analysis
Proposed methodology The core of FPSC software is the DPD, it allows for:
Moving data with very good performance. Integrating all the functional elements (EPICS monitoring, Data processing,
Data Acquisition, Remote archiving, etc). Having a code completely based on the standard asynDriver. Full compatibility with any type of required data
EPICS IOC
State MachineCODAC
Configuration
Hardware Monitoring DPD (Data Processing and Distribution)
SubsystemTiming
TCN/1588FPGAGPU
Proc.
Hardware/ CubicleSignals
Archiving
Asyn Layer
Monitoring CPUProc.SDN
![Page 13: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/13.jpg)
137th Workshop on Fusion Data Processing Validation and Analysis
DPD features (I) DPD enables to configure both the different functional
elements (FPGA acquisition, GPU processing, SDN, EPICS monitoring, data processing, data archiving) of the FPSC and the connections (links) between them.
Functional elements allow: reading data blocks from inputs processing received data generating new signals routing data blocks to output links
DPD enables the integration of new type of functional elements to extend the FPSC functionality. This implies the creation of the corresponding asynDrivers that can be carried out in a simple way.
Enables a very easy integration of any existing asynDriver
EPICS IOC
Input Links Output Links
![Page 14: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/14.jpg)
147th Workshop on Fusion Data Processing Validation and Analysis
DPD features (II) DPD enables to configure the data routing at configuration-time
or even at run-time (to implement fault tolerant solutions) DPD provides a common set of EPICS PVs for the several
functional elements and their respective links DPD provides on-line measurements of both throughputs and
buffer occupancy in the links DPD implements an optional multi-level buffering (memory, disk)
backup solution for any link of the system
Level 0
Level 1
Level 2
Backup Block Link
![Page 15: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/15.jpg)
157th Workshop on Fusion Data Processing Validation and Analysis
Test scenario
T0
T2
T3
T3-T2 Processing Time (TP)
T4-T1 Module Service Time (TMS)
Internal Process Time (TP0)
Host → Dev
DPD (Data Processing and Distribution) Subsystem
GPUProc.
Data Generation
Host → Dev
Processing
Dev → Host
T1 T4
GPUProc.GPU
Proc.GPU
T4-T0 Total Service Time (TTS)
TP0
![Page 16: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/16.jpg)
167th Workshop on Fusion Data Processing Validation and Analysis
Timing (II)TiCamera
DataGeneratorT0: New data block is generated
Received data block
DataFit Processing
Data block Received
DataFit result packing and routing
T1: Data block is received in the module
T2: Data block is ready to be processed
T3: DataFit processing is finished
T4: New DataFit processed data is packed and sent
TPTMS
TTS
![Page 17: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/17.jpg)
177th Workshop on Fusion Data Processing Validation and Analysis
Test scenario 1
Monitoring
EPICS waveform
TiCameraDataGenerator
GPU processing:TiCameraFit
![Page 18: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/18.jpg)
187th Workshop on Fusion Data Processing Validation and Analysis
Test scenario 2
Monitoring
EPICS waveform
TiCameraDataGenerator
GPU#0 processing:TiCameraFit
GPU#0 processing:TiCameraFit
![Page 19: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/19.jpg)
197th Workshop on Fusion Data Processing Validation and Analysis
Test scenario 3
Monitoring
EPICS waveform
TiCameraDataGenerator
GPU#0 processing:TiCameraFit
GPU#1 processing:TiCameraFit
![Page 20: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/20.jpg)
207th Workshop on Fusion Data Processing Validation and Analysis
Results S21. To determine DPD overhead with respect to “hard coded”
approach2. To test DPD scalability (multi-module, multiple-hw support)
Block Size SP App 1M/1GPU 2M/1GPU 2M/2GPU
4096 13,2 14,1 29,7 14,2
8192 43,1 44,8 89,9 45,4
16384 85,6 86,6 172,3 87,0
- Using 3rd solution, we have been able to process 3MB/s
running 2 modules in 2 different GPUs
![Page 21: Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications](https://reader035.vdocuments.us/reader035/viewer/2022081515/5681651f550346895dd79ed0/html5/thumbnails/21.jpg)
217th Workshop on Fusion Data Processing Validation and Analysis
Conclusions Development methodology for using GPUs is being standardized,
providing increasing levels of abstraction from hardware implementation details
“Hard coded” implementations seriously compromise scalability and maintainability, without guarantying relevant increase in performance
Specific frameworks are being developed for different scenarios (Thrust, DPD…) To simplify development To promote reusability To provide scalability and maintainability To include first level parallelism (internal load balancing based
on multithreading)