alice operations short summary alice offline week june 15, 2012

ALICE OperationsALICE Operationsshort summary short summary

ALICE Offline week ALICE Offline week June 15, 2012June 15, 2012

22

Data taking is 2012Data taking is 2012• Stable operation, steady data taking

Accumulation of RAW since beginning of 2012 runTotal 450 TB of physics data

33

RAW Data processingRAW Data processing• RAW data is subject to CPass0/CPass1 schema

• See session on Thursday morning• Most of RAW this year has been reconstructed ‘on demand’

• Replication follows the standard schema, no issues• The largest RAW production was LHC11h (PbPb) Pass2• Processing of 2012 data will start soon…

44

MC productions in 2012MC productions in 2012• So far…62 production cycles, p+p, Pb+Pb• Various generators, signals, detectors• More realistic – use of RAW OCDB and anchor runs for all productions• Presently running large-scale LHC11h productions with various signals for QM’2012

• This will take another month• MCs are more complex, but still rather routine

55

In generalIn general• The central productions (RAW and MC) are stable and well-behaved• Despite the (large) complexity…• Fortunately, most of the above is automatic

• Or we would need an army of people to do it

66

Grid powerGrid power•

2012:• 25.8K jobs average• 61.6Mio CPU hours = 7 CPU centuries…in 6 months

77

Job distributionJob distribution• No. of users ►

88

Non-production usersNon-production users

In average – organized and chaotic analysis use 39% of the Grid

99

And if we don’t have productionAnd if we don’t have production

• The user jobs would fill the Grid

Production jobs (2200) are 8% of the total

1010

Chaotic and organized analysisChaotic and organized analysis

• July and August will be ‘hot’ months • QM’2012 is endof August.

March – average 10K jobs7.9GB/sec from SE

Last month – 11K (+10%)9.8GB/sec from SE (+20%)

1111

Jobs use not only CPU…Jobs use not only CPU…

Average read rate: 10GB/sec from 57 SEsIn one month = 25PB of data read (approximately all storage is read ~twice)ALICE total disk capacity = 15PB

Remember the daily cyclic structure…

1212

EfficienciesEfficiencies• Efficiency definition: CPU/Wall

• Simplistic, and as such very appealing metrics• By this measure, we are not doing great• The 2012 (all centres) average efficiency is:

60%

1313

Efficiency (2)Efficiency (2)• The CPU/Wall depends on many factors

• I/O rate of the jobs• Swap rate• …

• And (IMHO) is not necessarily the best metrics to assess the productivity of the jobs or the computing centres

• What about the usage of the storage and the network?• At the end, what counts is that the job is done

• That said, we must work on increasing the CPU/Wall ratio

1414

Factorization - production job efficienciesFactorization - production job efficiencies

MC (aliprod), RAW (alidaq), QA and AOD filteringAverages: aliprod: 90%, alidaq: 75%, average: 82%

LHC11h Pass1

LHC11h Pass2

1515

Enter the user analysis Enter the user analysis

Note the daily cycle, remember the SE load structure…

24 hours without production

WeekendsAscention

1616

Day/night effectDay/night effect

Nighttime – production – 83%Daytime – production and analysis – 62%

1717

Users and trainsUsers and trains• Clearly the chaotic user jobs require a lot of I/O

• Little CPU – mostly histogram filling• This simple fact is known since long• A (partial) solution to this is

• Analyze smaller set of input data (ESD►AOD) • Use organized analysis - the train• See Andrei’s presentation from the analysis session• And the subsequent PWG talks – quite happy with the system’s performance

1818

Users and trains (2)Users and trains (2)• The chaotic analysis will not go away, but will be less pertinent

• Tuning of cuts, tests of tasks before joining the trains

• The smaller input set and trains also help to use less resources: do much more analysis for the same CPU and I/O (independent on efficiency)

1919

What can we doWhat can we do• Establish realistic expectations wrt I/O

Lego train tests: measure processing rateLego train tests: measure processing rate E.g. CF_PbPb (4 wagons, 1 CPU intensive)E.g. CF_PbPb (4 wagons, 1 CPU intensive) Train #120 running on AOD095Train #120 running on AOD095 Local efficiency 99.52%Local efficiency 99.52% AOD event size: 0.66 MB/evAOD event size: 0.66 MB/ev Processing rate: 370.95 ms/ev (2.69 ev/sec)Processing rate: 370.95 ms/ev (2.69 ev/sec) The train can “burn” 2.69*0.66 = The train can “burn” 2.69*0.66 = 1.78 MB/sec1.78 MB/sec

This was a good example…This was a good example… Average ~100 ms/ev equivalent to 6.5 MB/secAverage ~100 ms/ev equivalent to 6.5 MB/sec Best student found: DQ_PbPb: 1723 ms/ev, can “live” with Best student found: DQ_PbPb: 1723 ms/ev, can “live” with 380 kBytes/sec380 kBytes/sec

This number is really relevantThis number is really relevant It is NOT the number of wagons that really matters, but the rate they consume data withIt is NOT the number of wagons that really matters, but the rate they consume data with This is the number we have to improve against and measure, both in local tests and GRIDThis is the number we have to improve against and measure, both in local tests and GRID We have to measure instantaneous transfer rate per site, to correlate with other conditionsWe have to measure instantaneous transfer rate per site, to correlate with other conditions

On ESD is 3-4 times worse On ESD is 3-4 times worse Same processing rate, but event size bigger…Same processing rate, but event size bigger…

A train processing < 100 ms/ev will have < 50 % efficiency in grid, depending where it is A train processing < 100 ms/ev will have < 50 % efficiency in grid, depending where it is running and in which conditionsrunning and in which conditions

Borrowed without permission from A.GheataBorrowed without permission from A.Gheata

2020

WN to storage throughputWN to storage throughput• Could be estimated using ‘standard’ centre fabric

• Type of WNs (number of cores, NIC)• Switches (ports/throughput)• SE types• …. but the picture will be incomplete and too generic

• Thus we will not do it

2121

WN to storage throughput (2)WN to storage throughput (2)• Better measure the real thing

• Set of benchmarking jobs with known inut set, measure the time to complete• So that at all centres during normal load• Get a ‘HEP I/O’ rating of the centre WNs• We will do that very soon

• Using the benchmark every train can be rated easily for expected efficiency• The centres could use this measurement to optimize the fabric, if practical

2222

More…More…• SE monitoring and control… see Harsh’s presentation

• Clear correlation between efficiency and server load• Code optimization• Memory footprint – use of swap is also efficiency killer

2323

And more…And more…• Execute trains in different environments and compare results

• GSI has kindly volunteered to help• Programme of tests is being discussed

• The ultimate goal is to bring the efficiency of organized analysis to the level of production jobs

• The PWGs are relentlessly pushing their members to migrate to organized analysis

• By mid-2013 we should complete this task

2424

ConclusionsConclusions• 2012 is so far a standard year for data taking, production and analysis• Not mentioned in the talk (no need to discuss a working system) – the stability of the Grid has been outstanding

• Thanks to the mature sites support and AliEn and LCG software• And thus it fullfills its function to deliver Offline computational resources to the collaboration

• Our current programme is to• Deliver and support the next version of AliEn• Improve the SE operation in collaboration with the xrootd development team• Improve the support for analysis and its efficiency

alice operations short summary alice offline week june 15, 2012

Documents

users ccrc

months ccrc

total ccrc

grid ccrc

routine ccrc

cpuwall ratio ccrc

raw data processing

tb of physics data ccrc