computing infrastructure for online monitoring and control...

20
www.kit.edu KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ Electronics S. Chilingaryan, M. Caselle, T. Dritschler, T. Farago, A. Kopmann, U. Stevanovic, M. Vogelgesang Hardware, Software, and Network Organization Picosecond Sampling Electronics for Terahertz Synchrotron Radiation Prototype of Streaming PCIe Camera Developed in house

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

www.kit.eduKIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ Electronics

S. Chilingaryan, M. Caselle, T. Dritschler, T. Farago,A. Kopmann, U. Stevanovic, M. Vogelgesang

Hardware, Software, and Network Organization

Picosecond Sampling Electronics for Terahertz Synchrotron Radiation

Prototype of Streaming PCIe Camera Developed in house

Page 2: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all2 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Requirements• Handling of Sensors with data rates up to ~ 8 GB/s (8-12 bit)• Real-time control loop based on 2D Images + Online compression

– In-flow 8 GB/s, unpacked up to 16 GB/s (16 bit)• Slow control loop based on 3D Tomographic Images

– In-flow 4 GB/s, unpacked 32 GB/s (single-precision floating-point)• Raw data storage at full speed, i.e. 4 GB/s• Long-term storage at 1 GB/s• Integration with Tango Control System• Low administrative effort

Camera

up to8 GB/s

4 GB/s 1 GB/s 0,25 GB/sTransferrates

UFO Control System

Camera PC

Real-time control loopOuter control loop

Ar c hi vi n

g

Real-TimeStorage

Long-termStorage

4 GB/s

PCIe + GPUDirect Infiniband iSER GlusterFS NFS

Page 3: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all3 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Concepts

• Programmable DAQ electronics with PCI-express interface• Distributed control system based on Infiniband interconnects• GPU-based computing• Multiple levels of scalability• Cheap off-the-shelf components

Poster FPI02

2007 2008 2010 2012 201410

100

1000

10000

Xeon/SP Xeon/DP Tesla/SP Tesla/DPGeForce/SP GeForce/DP

GF

lops

Historical trends of CPU and GPU performance

Prototype of Streaming PCIe Camera

Developed in house

Easily scalableQuad-SLI35 Tflops

for ~ 5000 EUR

Page 4: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all4 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

UFO Control Network

CameraStation

Ar c hi ve

Lar ge

Sc a

l e D

at a

Fa

c ility

Beam-line

OpenCLNode

OpenCLNode

StorageNode

StorageNode

IB router

Control Room

ComputeCenter

Scalable Control Network

Infiniband(Optical)

FCO202WCO202

Master Server(lots of memory)

PCIe x8 gen3

Infiniband(Electrical)

Ethernet

10 Gig Ethernet

Talks on Data Analysis Lab:

Page 5: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all5 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Master Server

LSDFLarge Scale Data Facility

FDR Infiniband

56 Gbit/s

Ethernet

10 Gb/sInternalPCO.edgePCO.dimax….

SuperMicro 7047GR-TRF (Intel C602 Chipset)CPU: 2 x Xeon E5-2680v2 ( total 20 cores at 2.8 Ghz)GPUs: 7 x NVIDIA GTX TitanMemory: 256 GB (512GB max)Network: Intel 82598EB (10 Gb/s)Infiniband: 2 x Mellanox ConnectX-3 VPIStorage: Areca ARC-1880-ix-12 SAS Raid 8 x Samsung 840 Pro 510 (Raid0)

Cameras Storage

High amount of memoryFast SSD-based Raid for overflow data

Page 6: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all6 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Master Server

LSDFLarge Scale Data Facility

FDR Infiniband

56 Gbit/s

External PCIe x16 (16 GB/s)

Ethernet

10 Gb/sInternalPCO.edgePCO.dimax….

SuperMicro 7047GR-TRF (Intel C602 Chipset)CPU: 2 x Xeon E5-2680v2 ( total 20 cores at 2.8 Ghz)GPUs: 7 x NVIDIA GTX TitanMemory: 256 GB (512GB max)Network: Intel 82598EB (10 Gb/s)Infiniband: 2 x Mellanox ConnectX-3 VPIStorage: Areca ARC-1880-ix-12 SAS Raid 8 x Samsung 840 Pro 510 (Raid0) 16 x Hitachi A7K200 (Raid6)

SFF8088 (2.4 GB/s)

Cameras Storage

High amount of memoryFast SSD-based Raid for overflow dataEasy scalability with external PCI express and SAS

Page 7: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all7 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Caching large data sets

A7K2000 (1)

A7K2000 (16)

Intel X25E (1)

Samsung 840 (8)

RamdDisk

1 10 100 1000 10000

28.71

29.74

140

1575.85

2859.88

MB/s

Using SSD drives may significantly increase random access performance to the data sets which are not fitting in memory completely. The big arrays of magnetic hard drives will not help unless multiple readers involved.

Page 8: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all8 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Camera Station

High-speed 4-channel memoryIPMI-based remote controlOptional fast SSD-based storage4x high speed PCI express slots

Camera Link

850 MB/s

PCI express (x8)

Up to 8 GB/s

Infiniband

56 Gbit/s

Asus Z9PA-U8 (Intel C602 Chipset)CPU: Xeon E5-1620v2 ( total 4 cores at 3.7 Ghz)GPUs: NVIDIA GTX TitanMemory: 32 GB (128GB max)Infiniband: Mellanox ConnectX-3 VPI

Page 9: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all9 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

NVIDIA GPUDirect

1 core

all cores

0 5 10 15 20 25 30 35

Core i7 950Core i7-980XCore i7 38202 x E5-2640

GB/s

Direct communication between GPUs, Network, and other devices on PCI express bus

memcpyperformance

MVAPICH2 1.9b throughput

H2H

D2D/GPUDirect

D2D

Page 10: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all10 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Cluster Node

4-Way SLILow Price

Asus Z87-WS (Intel Z87 PCH Chipset)CPU: Core i5-4670 ( total 4 cores at 3.4 Ghz)GPUs: 3 x NVIDIA GTX TitanInfiniband: Mellanox ConnectX-3 VPIMemory: 16 GB (32GB max)

NVIDIA GTX TitanMemory: 6 GB at 288 GB/sSingle-precision Gflops: 4500Double-precision Gflops: 1500

Page 11: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all11 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Storage Protocols

Raw

iSER

XFS/Local

XFS/iSER

OCFS2/Local

OCFS2/iSER

FhGFS (over XFS)

Gluster (over XFS)

0 500 1000 1500 2000 2500 3000 3500 4000 4500Write Read MB/s

Network FSNFSSambaSSHFS

Slow

Cluster FSLustre (patched kernel)GlusterFhGFS (close-sourced)

Slow if few nodes

Network DevicesiSCSI (slow)iSER

OCFS2

Page 12: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all13 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Handling High-speed Storage

Buffer Cache

Storage

Data

Default data flow in Linux

Buffer cache significantly limits maximal write performance

Kernel AIO may be used to program IO scheduler to issue read requests without delays

Optimizing I/O for maximum streaming performance using a single data source/receiver

Read

Write

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Buffered Direct AIO MB/s

Page 13: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all14 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

UFO Storage Subsystem

20 TB8 TB

Storage Node 1Raid6: 16 Hitachi 7K300, 28TB

20 TB8 TB

Storage Node 2Raid6: 16 Hitachi 7K300, 28TB

8 TB 8 TB

16 TB, XFS

iSer

iSer

SoftRaid Level 0

Master Server

High-speed Streaming storageRead: 2.3 GB/sWrite: 2.5 GB/s

GlusterFS

ComputeNode 2

ComputeNode 3

ComputeNode 1

Client 1 Client 2

NFS

Page 14: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all15 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Software Stack

ALPSAdvanced Linux

PCI ServicesLibUCAUnified Camera

Access

UFOGPU Image

Processing Framework

FastWriterStreaming Library

Tango Control

DeviceMotors

DeviceOther slow device

ConcertControl System

LibPCOPCO Drivers

Camera Station

OpenCL

MPI

UFO

UFO Master Server

Computing Cluster

XFS

iSER

StorageCluster

SoftRaid

PythonGobject-Introspection

KIRO

Page 15: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all16 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

KIRO: Integration with Tango

Poster FPI01Using InfiniBand for High-Throughput

Data Acquisition in a TANGO Environment

KIRO Architecture

Tango/Standard

Tango/KIRO

0 500 1000 1500 2000 2500 3000 3500 4000 4500MB/s

Tango over Corba over TCP over Infiniband is slow

Page 16: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all17 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Summary

Poster FPI02Picosecond Sampling Electronics for

Terahertz Synchrotron Radiation

Poster FPI01Using InfiniBand for High-Throughput

Data Acquisition in a TANGO Environment

WCO202Data Management at the

Synchrotron Radiation Facility ANKA

FCO202OpenGL-Based Data Analysis in

Virtualized Self-Service Environments

• Only easy to get off-the-shelf components are used• Our architecture can be easily scaled from a single PC to the cluster with

performance in hundreds of teraflops.• The reliable storage for data streaming with rates over 3 GB/s can be easily build

based on 1 – 2 low-end servers.• The electronic components can be distributed over large area and connected with

high-speed using Optical Infiniband links.• The provided software stack allows easily integrate new devices and processing

algorithms.• The Tango control system is extended to support high-speed communication over

Infiniband

Check related talks and posters:

Page 17: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all18 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Extra Slides

Page 18: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all19 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

Ultra Fast X-ray Imaging of Scientific Processes with On-Line Assessment and Data-Driven Process Control

ANKA beam line

Optics and sample manipulators

Smart high-speed camera

Online monitoring and evaluation

Offline storage

UFO

GoalsHigh speed tomographyIncrease sample throughputTomography of temporal processesAllow interactive quality assessment

Enable data driven controlAuto-tunning optical systemTracking dynamic processesFinding area of interest

Page 19: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all20 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

UFO Image Processing Framework

acquisitionflat field

correction

. . . . . .

noise reduction

PreprocessingExecuted on CPUs

sinogramgeneration FFT

. . .. . .

filter

ReconstructionExecuted on GPUs

iFFT backprojection

Storage

. . .. . .

Segmentation /meshing

storage of raw data

OpenSourcehttp://ufo.kit.edu

Features➢Easy Algorithm Exchange➢Camera Abstraction➢Pipelined Processing➢Glib/GObject, scripting language support with introspection➢OpenCL + automated management of OpenCL buffers

FiltersTomography & Laminography

FBP, DFI, Algebraic MethodsNon Local Means for Noise ReductionOptical Flow

Page 20: Computing Infrastructure for Online Monitoring and Control ...accelconf.web.cern.ch/AccelConf/PCaPAC2014/talks/wco201_talk.pdf · 2 S. Chilingaryan et. all Institute for Data Processing

S. Chilingaryan et. all21 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology

ALPS

DMA MemoryMapping

IRQHandling

Driver Access Layer Small & Easy to Maintain

Event EngineIPE Camera

DMA EngineNorthwest Logic DMA

XML Register List + Dynamic Registration API Register Access

Plain and FIFOPCI Memory Access Access Serialization,

Software Registers

pcitoolCommand-line tool

GUIPython/GTK User Interface

Web ServiceRemote Programming API

ScriptingBash, Perl, Ruby, Python

LabVIEWControl System Integration

PCI / PCI Express Board(Variety of FPGA Boards: KIT Camera, etc)