summary of the “one day data challenge” for the heavy ion run

10
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ DSS Summary of the “one day data challenge” for the heavy ion run Alberto Pace, for the IT-DSS group

Upload: zorina

Post on 22-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Summary of the “one day data challenge” for the heavy ion run. Alberto Pace, for the IT-DSS group. Current situation. The ALICE and CMS disks pools have been extended in size to ensure to be able to keep all the data on disk for the entire duration of the run - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Summary of the  “one day data challenge”  for the heavy ion run

Data & Storage Services

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

DSS

Summary of the “one day data challenge”

for the heavy ion runAlberto Pace, for the IT-DSS group

Page 2: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

2

Current situation

• The ALICE and CMS disks pools have been extended in size to ensure to be able to keep all the data on disk for the entire duration of the run– In addition to the tape copy, this gives multiple

independent copies at CERN, as the tier1 replication may be delayed.

• Current pool sizes (TB)– ALICE: 2,412 (T0) + 2,542 (ALICEDISK)– CMS: 1,205 (T0) + 1,005 (T0STREAMER) + 1,727

(CMSCAF) + 1,618 in various additional pools– ATLAS: 4,603 in various pools– LHCb: 1,301 in various pools

Page 3: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

3

The HI test

• Planned for Nov 1-4 but anticipated to Oct 22 for (only) 24 hours. Castor upgraded for the HI run just in time (on Oct 19). Test made during the CHEP conference.

• ALICE:– Test lasted 24 hours as expected (from 15:30 on 21/10 until

16:00 on 22/10)– Sustained data rate of 2.5 - 3 GB/s, with peaks at 7 GB/s

(intrapool replications)– Average file size of 3 GB, efficient use of tape drives

• CMS– Test started late and due to intra pool replication delays, data

arrived on the T0 pool only starting at 4:00 on 22/10 and lasted only 6-7 hours

– data rate of 1.2 GB/s, with peaks at 2.6 GB/s– Average file size of 30 GB, efficient use of tape drives

• CASTOR has been able to handle successfully all data without any indication of bottlenecks or scalability issues

Page 4: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

4

The HI test

• Planned for Nov 1-4 but anticipated to Oct 22 for (only) 24 hours. Castor upgraded for the HI run just in time (on Oct 19). Test made during the CHEP conference.

• ALICE:– Test lasted 24 hours as expected (from 15:30 on 21/10 until

16:00 on 22/10)– Sustained data rate of 2.5 - 3 GB/s, with peaks at 7 GB/s

(intrapool replications)– Average file size of 3 GB, efficient use of tape drives

• CMS– Test started late and due to intra pool replication delays, data

arrived on the T0 pool only starting at 4:00 on 22/10 and lasted only 6-7 hours

– data rate of 1.2 GB/s, with peaks at 2.6 GB/s– Average file size of 30 GB, efficient use of tape drives

• CASTOR has been able to handle successfully all data without any indication of bottlenecks or scalability issues

ALICE

Page 5: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

5

The HI test

• Planned for Nov 1-4 but anticipated to Oct 22 for (only) 24 hours. Castor upgraded for the HI run just in time (on Oct 19). Test made during the CHEP conference.

• ALICE:– Test lasted 24 hours as expected (from 15:30 on 21/10 until

16:00 on 22/10)– Sustained data rate of 2.5 - 3 GB/s, with peaks at 7 GB/s

(intrapool replications)– Average file size of 3 GB, efficient use of tape drives

• CMS– Test started late and due to intra pool replication delays, data

arrived on the T0 pool only starting at 4:00 on 22/10 and lasted only 6-7 hours

– data rate of 1.2 GB/s, with peaks at 2.6 GB/s– Average file size of 30 GB, efficient use of tape drives

• CASTOR has been able to handle successfully all data without any indication of bottlenecks or scalability issues

ALICE

CMS

Page 6: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

6

Seen from the TAPE Side

ALICE

CMS

Only 2 hours of overlap

Page 7: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

7

The TAPE subsystem

• The aggregated throughput was well received by the tape subsystem at sustained rated of 4 GB/s

(you need to add the red and blue lines)

• We used less than 50 drives, meaning an average performances exceeding 80MB/s/drive, with peaks at 110-120MB/s

Page 8: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

8

Validated Castor 2.1.9 features

• The number of READ mounts was not interfering with production as we now have ceilings for concurrent reads.

• The plot below shows the read mounts in blue limited to 40 concurrent since the CMS upgrade on the 19th

• The write mounts related to the test are in grey

Concurrent read ceiling since 19/10

Page 9: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

9

A new record for the tape subsystem

• 250 TB of raw data written in a single day !

Page 10: Summary of the  “one day data challenge”  for the heavy ion run

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

DSS

10

Conclusion

• From the Tier 0 perspective, the test was successful

• However the full validation is not complete:– The two tests had a little time overlap: only 2

hours– The data rate from CMS was less than what

requested. Is this what we should expect from CMS or this will increase during production ?

– ATLAS was not sending data– The data we received was not white noise and

we have achieved a higher compression factor on tape. • On 22/10, we wrote 153 tapes for 250 TB