lecc2003 amsterdammatthias müller a robin prototype for a pci-bus based atlas readout-system b....

18
LECC2003 Amsterdam Matthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R. Männer, M. Müller , M. Yu (University of Mannheim) B. Green (Royal Holloway University London) G. Kieft (NIKHEF, Amsterdam)

Upload: susan-lindsey

Post on 31-Dec-2015

258 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller

A RobIn Prototype for a PCI-Bus based

Atlas Readout-System

B. Gorini, M. Joos, J. Petersen (CERN, Geneva)A. Kugel, R. Männer, M. Müller, M. Yu (University of Mannheim)

B. Green (Royal Holloway University London)G. Kieft (NIKHEF, Amsterdam)

Page 2: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 2

Outline

• Overview• The Atlas Readout Sub-System (ROS)• PCI based Atlas ROS• The RobIn Prototype• Measurements• Conclusions

Page 3: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 3

Overview• PCI-based-ROS is one of the two implementation option

of the Atlas ROS. Uses custom PCI board for receiving / buffering data RobIn Host is a PC with multiple PCI-Buses Gigabit Ethernet connection to LVL2 and EF PC running multithreaded Software and Master-DMA based PCI

messaging scheme

• Data request rates of 170kHz@1kB measured

• Full scale system achieves LVL1 Rate of 130kHz@1kB (with GE Net I/O)

Page 4: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 4

Atlas Readout Subsystem Overview

• Buffers detector data while LVL2computes trigger decision

• 1600 links from detector

• up to 160 MB/s input bandwidth, 100kHz input rate.

• 2 kHz output to LVL2 on request via Gigabit Ethernet

• Output to Event Filter on event accept (~3kHz)

ARobC

LVL2EF

ATLAS Detector

decision

data

datadataARobC

LVL2EF

ATLAS Detector

decision

data

datadata

LVL2EF

ARobC

LVL2EF

ATLAS Detector

decision

data

datadataARobC

LVL2EF

ATLAS Detector

decision

data

datadata

ROS

Page 5: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 5

VME busRCP

ROD

ROD

ROD

ROD

Config & Control

Event sampling & Calibration data

… PCI busR

OB

IN

RO

BIN

RO

BIN

NIC

GigabitEthernetlinks

LVL2 & Event Builder NetworksLVL2 & Event Builder NetworksAlternative data

paths

RODCrate

Processor

ROLs Data

90 crates (~40 racks)

144 4U PCs(~15 racks)

1600 links (HOLA S-link,160 MByte/s per link)

In USA15(underground)

In SDX15(at surface)

Atlas Readout Subsystem

Page 6: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 6

PCI based Atlas ROS: Hardware

• Available: 2 GHz , 2.4 GHz and 3 GHz Xeon PC• OS: Linux CERN RedHat 7.3, 2.4.18 kernel (patched)

532MB/s

CPU (2.4GHz)

MemDDR RAM

PCI64bit/66MHz

SCSI 2xFE/GE

Slot 1

Slot 2

Slot 3

Slot 4

Slot 5 Slot 6

~2GB/s

532MB/s 532MB/s 532MB/s

PCI64bit/66MHz

PCI64bit/66MHz

PCI64bit/66MHz

RobIn RobIn RobIn

RobIn RobIn

GEth

Page 7: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 7

PCI based Atlas ROS: Software

• ROS software multi-threaded• Fragment Manager interface for RobIn hardware abstraction

= Thread

= Process

Requests(L2, EB, Delete)

Request Queue

RobInsRequest Handlers

Control, error

Trigger

FragmentsFragment Manager

PCI - BUS

= Thread

= Process

Requests(L2, EB, Delete)

Request Queue

RobInsRequest Handlers

Control, error

Trigger

FragmentsFragment Manager

PCI - BUS

= Thread

= Process

Requests(L2, EB, Delete)

Request Queue

RobInsRequest Handlers

Control, error

Trigger

FragmentsFragment Manager

PCI - BUS

= Thread

= Process

Requests(L2, EB, Delete)

Request Queue

RobInsRequest Handlers

Control, error

Trigger

FragmentsFragment Manager

PCI - BUS

Page 8: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 8

The RobIn Prototype

ARobC

LVL2EF

ATLAS Detector

decision

data

datadataARobC

LVL2EF

ATLAS Detector

decision

data

datadataARobC

LVL2EF

ATLAS Detector

decision

data

datadataARobC

LVL2EF

ATLAS Detector

decision

data

datadata

Gigabit EthernetInterface

Xilinx XC2V1500FPGA 128MB SDRAM

Buffer

4MB SRAMManagement

RAM

IBM PowerPC405CR

Processor

PLX 9656PCI Bridge

2 optical HOLA SLinkInput channels

Page 9: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 9

The RobIn Prototype (2)

RobIn

ROSSoftware

PowerPC

PLX9656

FPGA

FragmentManager

DMA Engine

DMA BufferSOH

EventData

EventData

X

RequestFIFO

EventData

Buffer

DMAFIFO

PCI Bus

Clear Request

Data RequestData ResponseEvent Data

RobIn

ROSSoftware

PowerPC

PLX9656

FPGA

FragmentManager

DMA Engine

DMA BufferSOH

EventData

EventData

X

RequestFIFO

EventData

Buffer

DMAFIFO

PCI Bus

Clear Request

Data RequestData ResponseEvent Data

• Requests to RobIn sent by PCI single cycles

(data requests) by PLX Bus Master DMA

(clear requests)

• Event data from RobIn: FPGA sends fragment

without first word First word transmitted finally

to signal end-of-transfer

Page 10: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 10

Measurements

• Initially RobIn Prototype not available

• All presented measurements (except one) with alternative RobIn hardware MPRACE1

• MPRACE1: Common purpose PCI based FPGA Co-Processor FPGA and PCI bridge identical to RobIn Prototype FPGA only board no PowerPC processor available. Implementing the same PCI messaging as the RobIn Prototype

• Measurements on three different PCs: a 2GHz Xeon, a 2.4GHz Xeon and a 3GHz Xeon

Page 11: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 11

Measurements- Multi-threading -

• Bare data request performance with 1 RobIn, no I/O to Gigabit Ethernet

• Variation of Request Handler threads shows maximum at 14

Variation of Request Handlers2.4 GHz PC, 1 MPRACE RobIn, no Net I/O, 1kB fragments

0

20

40

60

80

100

120

140

160

180

0 5 10 15 20 25 30 35

# of Request Handlers

Req

ues

t R

ate

[kH

z]

Page 12: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 12

Measurements- Fragment Size Dependency -

• MPRACE: Up to 512 bytes: fix request overheads overlap the returning fragment data transmissions from the RobIn.

very small fragment size dependency

• RobIn Prototype: comparison with MPRACE seems to be valid,up to 1kB no fragment size dependency

Fragment Size Dependency2.4 GHz PC, no Network I/O, 1 RobIn

0

50

100

150

200

250

0 200 400 600 800 1000 1200 1400 1600 1800

Fragment Size [bytes]

Req

ues

t R

ate

[kH

z]

MPRACE

RobIn Prototype

Page 13: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 13

Measurements- Influence of DC I/O -

• 4 ROLs per RobIn (MPRACE) emulated

• Network I/O to LVL2 and EF reduce performance by a factor of 3.

Effect of Network I/O

0

100

200

300

400

500

600

0 2 4 6 8 10 12

Fraction of LVL1 events accepted by LVL2 trigger (%)

Max

imu

m s

ust

ain

able

LV

L1

rate

(kH

z)

With Net I/OWithout Net I/O

rate of LVL2 data requests per ROS = 16% of LVL1 rateLVL2 mean data size = 1.4 KByte

Page 14: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 14

Measurements- DC I/O and CPU scalability -

• 4 ROLs per RobIn (MPRACE) emulated

• Moving towards a 3 GHz PC improves performance by ~25%.

CPU scalability

0

20

40

60

80

100

120

140

160

180

200

0 2 4 6 8 10 12

Fraction of LVL1 events accepted by LVL2 trigger (%)

Max

imu

m s

ust

ain

able

LV

L1

rate

(kH

z)

ROS PC @ 3 GHzROS PC @ 2 GHz

rate of LVL2 data requests per ROS = 12% of LVL1 rateLVL2 mean data size = 1 KByte

Page 15: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 15

Conclusions

• Max. request performance per RobIn is 170 kHz (1kB fragment size).

• “Standalone” ROS can handle 12 ROLs on 3 RobIns with 300 kHz LVL1 input rate .

• Full scale ROS System (3GHz Xeon PC) handles 130 kHz LVL1 input rate (> Atlas requirements)

• First measurements with RobIn Prototype confirm the results obtained with an earlier prototype (MPRACE).

Page 16: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 16

RobIn (MPRACE1)

PLX9656(PCI

Connection)

Xilinx VirtexIIFPGA

ControlPLD

Expansion Connector Expansion Connector

ZBT SRAM 2MBZBT SRAM 2MB

ZBT SRAM 2MBZBT SRAM 2MB

SDRAM SocketLocal Bus 32bit/66MHz

PCI Bus 64bit/66MHz

• Parts common to the RobIn Prototype:PLX Pci Bridge, Local Bus, FPGA

• Firmware implements RobIn Prototype Message Passing protocol• On-board “local” bus limited to 266MB/s (half of max. PCI

throughput)

Page 17: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 17

Measurements- Influence of DC I/O -

• 4 ROLs per RobIn (MPRACE) emulated

• Network I/O to LVL2 and EF reduce performance to 1/3

• Large EB fractions: performance limited by GE line speed

• Small EB fractions: performance limited by PC’s computing power

Max. LVL 1 Rate for 12 ROLs on 3 RobIns2% LVL2 rate, 1kB Fragments, DC I/O uses UDP

0

50

100

150

200

250

300

0 2 4 6 8 10 12EB Fraction (%)

Max

L1

rate

(kH

z)

No DC I/O

DC IO 2.4 GHz PC

DC IO 3GHz PC

100 kHz

Gigabit Ethernet Line Speed

3 kHzAtlas Baseline

Page 18: LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R

LECC2003 Amsterdam Matthias Müller 18

Measurements- Multiple PCI Buses -

• Request rate decreases, even though PCI – Bus is not saturated.Low parallelism in software?

Request Rate Depending on the # of Boards2.4 GHz PC, MPRACE on different Buses, 8 Request Handlers, 1kB Fragments

0

20

40

60

80

100

120

140

160

180

0 1 2 3 4 5

# of Boards

Req

ues

t R

ate

[kH

z]

0

20

40

60

80

100

120

140

160

180

Dat

a V

olu

me

[MB

/s]