alice o 2 project b. von haller on behalf of the o 2 project 19.05.2015 cern

43
ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Upload: erik-edwards

Post on 20-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Rationales 1.After LS2, LHC will deliver minimum bias PbPb at 50 kHz  ~100 x higher rate than now 2.Running scenarios ▶ Goal: 13 nb −1 for Pb–Pb collisions (minimum bias) 3.Physics topics addressed by ALICE upgrade ▶ Very small signal-to-noise ratio and large background ▶ Requires very large statistics ▶ Triggering techniques very inefficient if not impossible  Too much data to be stored  Compress data intelligently by processing it online B. von Haller | TDR EC |

TRANSCRIPT

Page 1: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

ALICE O2 project

B. von Haller on behalf of the O2 project

19.05.2015CERN

Page 2: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Overview

▶O2 Project▶Upgrade for the Offline and Online computing▶Members of HLT, DAQ, Offline Build a unified computing system for after LS2

▶Guided tour of the O2 TDR (submitted to LHCC April 20 2015)▶Rationales▶General idea and architecture▶Computing needs

B. von Haller | O2 Project | 19.05.2015 2

Page 3: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Rationales

1. After LS2, LHC will deliver minimum bias PbPb at 50 kHz ~100 x higher rate than now

2. Running scenarios ▶Goal: 13 nb−1 for Pb–Pb collisions (minimum bias)

3. Physics topics addressed by ALICE upgrade▶ Very small signal-to-noise ratio and large background▶ Requires very large statistics▶ Triggering techniques very inefficient if not impossible

Too much data to be storedCompress data intelligently by processing it online

B. von Haller | TDR EC | 11.03.2015 3

Page 4: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Readout

B. von Haller | O2 Project | 19.05.2015 4

Detector Max read-out rate Data rate for Pb-Pb collisions at 50kHz

Average data size per interaction

kHz GB/s MBTPC 50 1012 (92.5%) 20.7 ITS 100 40 (3.6%) 0.8TRD 90.9 20 (1.8%) 0.5MFT 100 10 (0.9%) 0.2Other detectors - 11.269 (1.2%) 0.25Total 1093 22.4

Number of links Number of boardsDDL1 DDL2 GBT CRORC CRU

15 40 7998 13 463

Read-out parameters

Detector links and read-out boards

TPC : continuous readout to cope with the 50 kHz interaction rate

Page 5: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2/T0/T1 T0/T1

ArchiveCTFAOD

Storage

EPNs O(1000)

FLPsFLPs O(100)

O2 architecture (1)

B. von Haller | O2 Project | 19.05.2015 5

Raw data input

Local processing

Frame dispatch

Global processing

Compressed timeframes

Partially compressedsub-timeframes

Storage

Sync

hron

ous

Data Reduction 0

e.g. clustering

Sub-timeframes

Calibration 0 on local data,

ie. partial detector

Time slicing

Buffering

Local aggregation QC

Tagging

Detector reconstruction

e.g. track finding

Timeframe building

Full timeframe

Data Reduction 1

Calibration 1 on full detectors

e.g. space charge distortion

QC

Detectors electronicsTPC TRD…

Trigger and clockITS …

Detector data samplesinterleaved with synchronized

heartbeat triggers

Page 6: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 architecture (2)

B. von Haller | O2 Project | 19.05.2015 6

O2/T0/T1

EPNs

Compressed timeframes

T0/T1

ArchiveStorage

Condition & Calibration Database

Quality Control

Sub-timeframesTimeframesCompressed timeframesAOD

CCDB Objects

Asyn

chro

nous

Sync

hron

ous

QC data

CTFAOD

Storage

Compressed timeframes

O2/T0/T1 O(1)

ESD, AOD

Page 7: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2/T0/T1 O(1)

Event extractionTagging

Globalreconstruction

QCAOD extraction

Calibration 2

O2 architecture (3)

B. von Haller | O2 Project | 19.05.2015 7

O2/T0/T1

Reconstructionpasses

and event extraction

Compressed timeframes

T0/T1

Archive

Analysis

Storage

Simulation

Asyn

chro

nous

CTFAOD

Analysis Facilities

StorageHistograms,trees

O(1)

AnalysisAOD

Storage

T2

Simulation

CTF

AOD O(10)

QCReconstructionEvent buildingAOD extraction

ESD, AOD

Event Summary DataAnalysis Object Data

ESD, AODCompressed timeframes

Page 8: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Computing Model

B. von Haller | O2 Project | 19.05.2015 8

T0/T1

CTF -> ESD -> AOD

AF

AOD -> HISTO, TREE

O2

RAW -> CTF -> ESD -> AOD

1

T2/HPC

MC -> CTF -> ESD -> AOD

1..n

1..n 1..3

CTF

AODAOD

AOD

Page 9: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 software design

▶Message-based multi-processing ▶Ease of development▶Ease to scale hor izontally ▶Possibility to extend with different hardware▶Multi-threading within processes possible

▶ALFA : ALICE-FAIR concurrency framework ▶Provides data transport layer▶ZeroMQ▶Arbitrary payload

B. von Haller | O2 Project | 19.05.2015 9Libraries and tools

ALFA

Cbm ALICE O2Panda

FairRoot

. . . . . . .

Page 10: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Physics software designProcessing workflow

B. von Haller | O2 Project | 19.05.2015 10

EPN: synchronous asynchronousAll FLPs

Raw data

Local Processing E.g.

ClusterizationCalibration

Detector ReconstructionE.g. TPC & ITSTrack finding

CTF AOD

Step 1 Step 2 Step 3 Step 4

Inter-detectormatching

procedures

Final calibration, 2nd matching

Final matching, PID, Event extraction

Step 0

Page 11: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Technology survey (1)Comparison GPU and CPU for the Fast Cluster Finder

B. von Haller | O2 Project | 19.05.2015 11

Performance of the FPGA-based FastClusterFinder algorithm for DDL1 and DDL2 compared to the software implementation on a recent server PC. FPGA is the selected platform in this case

Page 12: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Technology survey (2)Comparison CPU vs GPU for the HLT TPC CA Tracker

B. von Haller | O2 Project | 19.05.2015 12

Tracking time of HLT TPC Cellular Automata tracker on Nehalem CPU (6Cores) and NVIDIA Fermi GPU. GPU is the selected platform in this case

Page 13: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Demonstrators – TPC CA Tracker

B. von Haller | O2 Project | 19.05.2015 13

Verified linear rise of processing time of TPC track finding for data samples corresponding to timeframe of 1 ms

Page 14: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Computing requirements for processing

B. von Haller | O2 Project | 19.05.2015 14

Computing requirements -> Total : ~ 100000 CPU cores 5000 GPU chips

Goes together, merging and fitting can run on

GPUs too

Being ported to GPU, conversion factor

unknown

Theoretically could run on GPU

Page 15: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Data reduction – TPC

B. von Haller | O2 Project | 19.05.2015 15

Data reduction factor of 20 for the TPC is feasible

Page 16: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Data reduction – Global

B. von Haller | O2 Project | 19.05.2015 16

Data rates for input to O2 system and output to permanent storage for routine data taking with Pb–Pb at 50 kHz interaction rate.

Page 17: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Data types characteristics

B. von Haller | O2 Project | 19.05.2015 17

▶ TF size - Duration of the time window (tTF)▶Data lost at the edges: 0.1/tTF(ms)▶ For calibration and reconstruction: 20ms - 100ms▶ Shorter is better for buffering and distribution 20ms (1000 interactions in Pb-Pb at 50kHz)

Data type Size (GB) Tape copy

TF (Pb-Pb) 10 No

CTF (Pb-Pb) 1.6 Yes

ESD 15% of CTF No

AOD 10% of CTF Yes

MC 100% of CTF No

MCAOD 30% of ESD Yes

HISTO 1% of ESD No

Page 18: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Data storage requirements

B. von Haller | O2 Project | 19.05.2015 18

Number of simulated events and storage requirements

Number of reconstructed collisions and storage requirements for scenarios.

~55 PB

Page 19: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

B. von Haller | O2 Project | 19.05.2015 19

2 CRUs per FLP

Detectors

8100 Read-out Links

250FLPs

2 GPUsper EPN

1500EPNs

Input: 250 portsOutput : 1500 ports

1500 x 60MB/s

1.2 TB/s

SwitchingNetwork

500 GB/s 90 GB/s

Storage

O2 facility design (1)

Page 20: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 facility design (2)

B. von Haller | O2 Project | 19.05.2015 20

Network layout 2 : 4 independent EPN subfarms

FLP1

4 x 10 Gb/s

NetworkSub-Farm 4

EPN

EPN

1126

1500

FLP256

NetworkSub-Farm 3

NetworkSub-Farm 2

NetworkSub-Farm 1

EPN

EPN

751

1125

EPN

EPN

376

750

EPN

EPN

1

375

10 Gb/s

Page 21: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 facility design (3)

B. von Haller | O2 Project | 19.05.2015 21

FLPEPN

FLPEPN

25

11

30

140/56 Gb/sSEPN

SEPN

110 Gb/s1

EPN

EPN

1471

1500

10 Gb/s50

FLP

FLP250

226

10

10 X 40/56 Gb/s

50

2 X 40/56 Gb/s

Network layout 3 : Super-EPNs

Page 22: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 facility design (4)Simulation – Link speed

B. von Haller | O2 Project | 19.05.2015 22

Left : Network Layout 2 : Link speed on the FLPs and EPNs for a network layout with 4 EPN subfarms for 100 parallel transfers from the FLPs.

Right : Network Layout 3 : Link speed on the FLPs and Super-EPNs (configuration based on an Infiniband network at 56 Gb/s)

Page 23: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 facility design (6)Simulation - system scalability

B. von Haller | O2 Project | 19.05.2015 23

Latency of the timeframes for different interaction rates using layout 2 (left) and layout 3 (right) Layout 2 is cheaper but scales up to 90kHz only.

Page 24: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 facility – Power and cooling

B. von Haller | O2 Project | 19.05.2015 24

Page 25: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Schedule

B. von Haller | O2 Project | 19.05.2015 25

2015 2016 2017 2018 2019 2020

Today6/15

ITS half-layer test1/17

TPC read-out test4/17

ITS surface test9/18

TPC RCUs installation CR11/19

Data taking Cosmics with core detectors7/19

TPC pre-commissioning on surface7/19

TPC commissioning in cavern1/20

End of commissioning6/20

O2 system v1 - 1 CRU, 1 FLP, basic data processing, control, logging, QC, monitoring

1/17

O2 system v2 - 1 detector (e.g. ITS) full read-out capability

4/18

10% Data processing and storage HW installation

9/18

90% Data processing and storage HW installation

11/19

Full system ready2/20

Detectors milestones

O2 milestones

Page 26: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Conclusion

▶O2 is a new project with very ambitious requirements▶> 1TB/s detector input, ~100x more than today▶Online synchronous compression factor of 14

▶Major paradigm change with combined offline and online computing▶1 framework, ▶1 facility

▶Challenging schedule▶TDR submitted

B. von Haller | O2 Project | 19.05.2015 26

Page 27: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

▶TDR draft available here : https://cds.cern.ch/record/2011297

▶Thank you for your attention

B. von Haller | O2 Project | 19.05.2015 27

Page 28: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

IntroductionChapter 1

B. von Haller | O2 Project | 19.05.2015 28

Asynchronous data processingEvent extraction

Compressed Sub-Timeframes

Continuous and triggered streams of raw data

Data aggregationSynchronous global data processing

Data storageand archival

Compressed Timeframes

Recons-tructed

eventsCompressed Timeframes

ReadoutData aggregation Local data processing

Detectors electronics1. After LS2, LHC will deliver min bias

Pb-Pb at 50 kHz ▶ 100 x more data than today

2. Physics topics addressed by ALICE upgrade▶ Very small signal-to-noise ratio and

large background▶ Triggering techniques very inefficient if

not impossible▶ Needs large statistics

3. Running scenarios ▶ Goal: 13 nb−1 for Pb–Pb collisions

(minimum bias)Too much data to be stored Compress data intelligently by processing it online

Page 29: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 software design (3)Chapter 7 – Data Format

B. von Haller | O2 Project | 19.05.2015 29

Memory buffer FLP #125

Time window (frame length)

Link #N

Payloadhe

ade

r Payload

head

er Payload

head

er Payload

head

er

Payload

head

er Payload

head

er

time

Trigger heartbeat events

Other triggers

triggered

continuous

Trigger heartbeat

HB #376453 HB #376454

125_1_0 Link #1

Link #2

Link #4

0x2AE06A0

0x2FC03E0

0x34AED30

0x39CC120

0x3D21EF0

125_1_1 125_1_2 125_1_17

125_2_1125_2_0 125_2_2 125_1_20

125_1_0…

… 125_2_0

125_2_2

125_2_2

125_2_2

125_1_20

125_1_20

125_1_20

125_1_20

125_1_20

125_1_0

125_2_2

125_1_17 125_1_2 Link #3

125_2_0

MDH MDH MDH MDH MDH MDH MDH MDH

Sub Time Frame descriptor

Memory view

Data link view

Correlated eventsSingle events

Multiple Data Headers

FLPid_DDLid_counter

Page 30: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 software design (2)

▶Facility control, configuration and monitoring▶CCM will combine control of data taking and of

asynchronous data processing▶140000 commands to 70000 processes (peak) ▶600 kHz monitoring data

Chapter 7

B. von Haller | O2 Project | 19.05.2015 30

Control, Configuration and Monitoring

LHC Trigger

Status/Monitoring

dataStatus

DCS Grid

Commands/Configuration

dataStatus

Commands/Configuration

data

Status/Monitoring

data

GridJobs

StatusCommands

O2 Processes

Page 31: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 software design (4)

▶Dedicated FLP for DCS▶O2 process retrieves conditions data and insert

them into DCS data frames The required DCS data are embedded in the

data They are available for reconstruction and

calibration after the frame building

Chapter 7 - DCS

B. von Haller | O2 Project | 19.05.2015 31

Page 32: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Physics programme and data taking scenariosChapter 2

B. von Haller | O2 Project | 19.05.2015 32

ALICE running scenarios :Year System √sNN Lint Ncollisions

(TeV) (pb-1) (nb-1)

2020pp 14 0.4 2.7 · 1010

Pb-Pb 5.5 2.85 2.3 · 1010

2021pp 14 0.4 2.7 · 1010

Pb-Pb 5.5 2.85 2.3 · 1010

2022pp 14 0.4 2.7 · 1010

pp 5.5 6 4 · 1011

2025pp 14 0.4 2.7 · 1010

Pb-Pb 5.5 2.85 2.3 · 1010

2026

pp 14 0.4 2.7 · 1010

Pb-Pb 5.5 1.4 1.1 · 1010

p-Pb 8.8 50 1011

2027pp 14 0.4 2.7 · 1010

Pb-Pb 5.5 2.85 2.3 · 1010

Page 33: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Requirements (1)

Detector Max read-out rate Data rate for Pb-Pb collisions at 50kHz

Average data size per interaction

kHz GB/s MBACO 100 0.014 0.00028CTP 200 0.02 0.0004EMC 50 4 0.08FIT 50 0.115 0.023HMP 2.5 0.06 0.024ITS 100 40 (3.6%) 0.8MCH 100 2.2 0.04MFT 100 10 (0.9%) 0.2MID 100 0.3 0.006PHS 50 2 0.04TOF 200 2.5 0.05TPC 50 1012 (92.5%) 20.7 TRD 90.9 20 (1.8%) 0.5ZDC 100 0.06 0.0012Total 1093 22.4

Input rates

B. von Haller | O2 Project | 19.05.2015 33

Page 34: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Requirements (2)

Detector Number of links Number of read-out boardDDL1 DDL2 GBT CRORC CRU

ACO 1 1EMC 20 4FIT 2 1HMP 14 3ITS 495 23MCH 480 20MFTMID 2 2PHS 16 3TOF 72 3TPC 5904 360TRD 1044 54ZDC 1 1CTP 2 1Total 15 40 7998 13 463

Read-out

B. von Haller | O2 Project | 19.05.2015 34

Page 35: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Project organisation (1)Chapter 11

B. von Haller | O2 Project | 19.05.2015 35

Total : 112 FTE for the period 2015-19Compatible with the 120 FTEs from institutes

Tasks Insitutes Human Resources (FTE)

Architecture CERN, FIAS,GSI, IRI 2

Tools, procedure and software process CERN, IPNO, JU, LIPI, WRCP 2

Data flow, detector read-out CALTECH, CERN, FESB, FIAS, IRI, LIPI, WRCP 12

Computing platforms CERN, FIAS, IRI, JU, KISTI, KMUTT, KU, ORNL 12

Software framework and data model CERN, IPNO, GSI, LBNL 14

Calibration JU, WSU 16

Reconstruction CERN, FESB, GSI, IPHC, LIPI, LPC, SUBATECH, UH, WSU 16

Physics simulation CERN, CU, IPHC, IPNO, LBNL, ORNL, UH, UTK 14

Data Quality monitoring and visualization CERN, ISS, JU, WUT 6

Control, configuration, monitoring and logging

ASCR, CALTECH, CERN, CU, KMUTT, IRI 10

O2 facility hardware procurement, installation

CERN, FIAS, IRI, GSI 8

O2 facility and grid/cloud operations CERN, KISTI M&O

Page 36: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 facility design (7)

▶ Demonstrators, e.g.▶ Existing HLT TPC algorithms interfaced to the new ALFA framework ▶ HLT development cluster infrastructure with ~40 nodes, 30 nodes

with GPU hardware ▶ FLP and EPN data distribution and transport devices

▶ Verified in the prototype ▶ TPC reconstruction topology using 2011 PbPb data ▶ FLP-EPN data transport network with 36 FLPs and 28 EPNs ▶ Reproduced the performance of HLT TPC processing in ALFA▶ Verified linear rise of processing time of TPC track finding for data

samples corresponding to timeframe of 1 ms ▶ Ongoing work

Chapter 10

B. von Haller | O2 Project | 19.05.2015 36

Page 37: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

Project organisation (3)Chapter 11

B. von Haller | O2 Project | 19.05.2015 37

A B C

Milestones relative to the framework and the facility at P2• Q1 2017

Version 1 (A) 1 CRU + QC (e.g. ITS half-layer test)• Q2 2018

Version 2 (B) 1 detector full read-out (e.g. ITS or TPC surface test)• Q4 2019

P2 installation and commissioning (C)

All FLPs 10% EPNs• Q2 2020

ProductionFull deployment

Page 38: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

B. von Haller | O2 Project | 19.05.2015 38

2015

Today

February March April May June

Submission of the TDR to the LHCC20/4/2015

Presentation of the TDR to the LHCC2/6/2015

5/2/2015 - 18/2/2015

Comments on the TDR by the O2 project members

19/2/2015 - 1/3/2015TDR editing

23/2/2015 - 27/2/2015Proof-reading (Frank)

2/3/2015 - 15/3/2015Comments on the TDR by the whole ALICE Collaboration

17/3/2015ALICE internal review

18/3/2015 - 5/4/2015Modification by the authors

6/4/2015 - 19/4/2015Final editing of the TDR before submission

20/4/2015 - 31/5/2015LHCC review

1/6/2015 - 4/6/2015LHC Committee

• Prof. Borut Paul Kersevan, ATLAS (former computing coordinator)

• Tonko Ljubicic, BNL, STAR (Online project leader)

• Niko Neufeld, CERN, LHCb (Online)

TDR Schedule

Page 39: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

O2 facility design (5)Simulation – Bisection data traffic

B. von Haller | O2 Project | 19.05.2015 39

Bisection data traffic in the system for one of the 4 EPNs subfarms of layout 2 (left) and for the whole layout 3 (right)

Page 40: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

B. von Haller | O2 Project | 19.05.2015 40

Page 41: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

B. von Haller | O2 Project | 19.05.2015 41

Page 42: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

B. von Haller | O2 Project | 19.05.2015 42

Page 43: ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

B. von Haller | O2 Project | 19.05.2015 43