the german tier 1 lhcc review, 19/20-nov-2007, stream b, part 2

1 LHCC Review, November 19-20, 2007

Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft

The German Tier 1The German Tier 1

LHCC Review, 19/20-nov-2007, stream B, part 2LHCC Review, 19/20-nov-2007, stream B, part 2

Holger Marten

Forschungszentrum Karlsruhe GmbHInstitut for Scientific Computing, IWR

Postfach 3640D-76021 Karlsruhe

2LHCC Review, November 19-20, 2007

0. Content0. Content

1. GridKa location & organization - skipped - but included in the slides

2. Resources and networks

3. Mass storage & SRM

4. Grid Services

5. Reliability & 24x7 operations

6. Plans for 2008


2. Resources and networks


LCG non-LCG HEP others

CPU [kSI2k] 1864 (55%) 1270 (37%) 264 (8%)

Disk [TB] 878 443 60

Tape [TB] 1007 585 120

Current Resources in ProductionCurrent Resources in Production

October 2007 accounting (example):

• CPUs provided through fair share

• 1.6 Mio. hours wall time by 300k jobs

on 2514 CPU cores

• 55% LCG, 45% non-LCG HEP


Installation of MoU Resources 2007Installation of MoU Resources 2007(from WLCG accounting spread sheets)(from WLCG accounting spread sheets)

installed

WLCGmilestone


GridKa WAN connectionsGridKa WAN connections

internal network


GridKa WAN connectionsGridKa WAN connections

internal network

redundancyredundancy

CERN


The benefit of network redundancyThe benefit of network redundancy

April 26, 2007: failure of DFN router of CERN-GridKa OPN

Automatic (!) re-routing through our backup link via CNAF; this was not a test !


Summary of GridKa networksSummary of GridKa networks

• LAN

- Full internal redundancy (of one router)

- Additional layer-3 BelWue backup link (to be realized in 2008)

• WAN

- multiple 10 Gbps available to CERN, Tier-1s, Tier-2s

- Sara/Nikhef: will be in production (end of Q4/2007)

- additional CERN independent Tier-1 transatlantic link(s) would be

highly desirable


3. Mass storage & SRM


Long time instabilities with SRM and gridFTP implementation• reduced availability because SAM critical tests fail; many patches since

Dual effort for complex and labour intensive software (data management)• running instable dCache SRM in production• running next SRM 2.2 release in pre-production• in the end SRM 2.2 was tested formally with F.Donnos S2 test suite, but

only very limited by the experiments

Read-only disk storage (T0D1) is administrative difficulty• full disks imply stopping experiment’s work

=> experiments ask for “temporary ad-hoc” conversions into T1D1• no failover or maintenance (reboot) is possible, otherwise jobs will

crash

dCache & MSS at GridKadCache & MSS at GridKa


Migrated to dCache 1.8 with SRM 2.2 on Nov 6/7• very fruitful collaboration with dCache/SRM developers in situ• bug fix for globus-url-copy in combination with space reservation

“on-the-fly” during migration process

=> many thanks to Timur Perelmutov and Tigran Mkrtchyan for support

Stability has to be verified during the coming months.

Connection to tape (MSS) is fully functional and scalable for writes• read tests by experiments have only started recently• difficult to estimate tape resources to reach required read throughput• workgroup with local experiment representatives to provide access

patterns, tape classes and recall optimisation proposals

dCache & MSS at GridKadCache & MSS at GridKa


4. Grid Services


Installed WLCG middleware services*Installed WLCG middleware services*

# Service Remarks

3 Top-level BDII round robin; supports EGEE region DECH

2 Resource Broker lcg-flavour; gLite WMS to be installed

1 Proxy Server

8 UI 4x VO-Box, 2x login, 1x gm, 1x admin

4 VO-Boxes also front-ends for experiment admins

5 3D HEP DBs 2x ATLAS, 2x LHCb, Conditions DB etc., 1x CMS Squid

1 Site BDII (GIIS)

1 Mon Box accounting

1 LFC MySQL migrated to 3 nodes Oracle

3 FTS 3 DNS load balanced front-ends; 3 clustered Oracle back-ends

3(+1) Compute Elements 4th CE currently set up;

2 Storage Elements

2 SRM v1.2 and v2.2

dCache pools 1 head node; pool nodes with gridFTP doors

900 Worker Nodes 2500 cores SL4; gLite 3.0.x to be migrated to 3.1.x

* In a wide sense, i.e. incl. physics DBs and dCache pools with grdiFTP; only production listed

FTS 2.0 deployment example


FTS 2.0 [+FTS 2.0 [+LFCLFC] deployment at GridKa] deployment at GridKa

Setup to ensure high availability.

Three nodes hosting web services. VO- and channel agents are distributed on the three nodes. Nodes located in 2 different cabinets to have at least one node working in case of a cabinet power failure or network switch failure.

3 nodes RAC on Oracle 10.2.0.3, 64 bit RAC will be shared with LFC database. Two nodes preferred for FTS, one node preferred for LFC. Distributed over several cabinets. Mirrored disks in the SAN.


FTS/LFC DB: One 3-node Cluster on Oracle 10.2.0.3, 64bit

node 3

192.x.x.x

SANex

tern

al n

etw

ork

inte

rna

l ne

twor

k

192.

168.

52

node 1

PrivIP1a

PubIP1 VirIP1

node 2

PrivIP2b

PubP2 VirIP2

eth 2

PrivIP3b

PubIP3 VirIP3

PrivIP Switch1

eth 1

PrivIP1b

PrivIP2a

PrivIP3a

10.x.x.x

VirI

P,

Pub

IP

Ext

IP

public VLAN VLAN

.53 .52

RA

ID1

142

GB

FTSREC1

ASMSpfile

RA

ID1

142

GB

LFCDATA1

RA

ID1

142

GB

FTSDATA1

RA

ID1

142

GB

LFCREC1

Voting

OCR

FTS(LFC)

FTS(LFC)

LFC(FTS)


Tested FTS channels GridKa Tested FTS channels GridKa ⇔ Tier-0 / 1 / 2⇔ Tier-0 / 1 / 2(likely incomplete list)(likely incomplete list)

Tier-0 FZKCERN - FZK

FZK Tier-1IN2P3 - FZKPIC - FZKRAL - FZKSARA - FZKTAIWAN - FZKTRIUMF - FZKBNL - FZKFNAL - FZKINFNT1 - FZKNDGFT1 - FZK

FZK Tier-2FZK - CSCSFZK - CYFRONETFZK - DESYFZK - DESYZNFZK - FZUFZK - GSIFZK - ITEPFZK - IHEPFZK - JINRFZK - PNPIFZK - POZNANFZK - PRAGUEFZK - RRCKIFZK - RWTHAACHENFZK - SINPFZK - SPBSU

FZK Tier-2 (cont.)FZK - TROITSKINRFZK - UNIFREIBURGFZK - UNIWUPPERTALFZK - WARSAW


FTS 2.0 deployment experienceFTS 2.0 deployment experience

ToDo’s @ GridKa after experience with FTS 1.5• Migrate FTS to 3 new redundant servers => buy, install LAN, OS, … in advance• Set up new Oracle RAC (new version) on 64 bit• Migrate DB to redundant disks => new SAN configurations required• Set up and test all existing transfer channels (by all experiments)

And the migration experience• learning curve for new 64-bit Oracle version• fighting esp. with changes in behaviour with two networks (internal + external)• setting up and testing channels needs people, sometimes on both ends

(vacation time, workshops, local admins communicate with 3 experiments –

sometimes with different views – in parallel)

WLCG milestone – as a member of MB I accepted it

For sites, upgrading also means time consuming service hardening and optimization, and is not just “pushing the update button.”


5. Reliability & 24x7 operations


SAM reliability (from WLCG report)SAM reliability (from WLCG report)


SAM reliabilitySAM reliability

Some examples with zero severity for experiments• config. changes of local or central services that result in failures for

OPS-VO only- missing rpm ‘lcg-version’ in new WN distribution- SAM tests CA-certificates that already became officially obsolete

More severe examples• pure local hardware / software failures (redundancy required…)• scalability of services after resource upgrades or during heavy load• stability of “MSS-related” software pieces (SRM, gridFTP)

Overall very complex hierarchy of dependencies• esp. transient scalability and stability issues are difficult to analyse• but this is necessary: analyse + fix instead of reboot !

(sometimes at the expense of availability though)


Site availability – OPS vs. CMS viewSite availability – OPS vs. CMS view

To be further analysed: Do we have the correct (customers) view?


To be further analysed…


Preparations for 24x7 supportPreparations for 24x7 support

Currently• site admins (experts) during normal working hours• experiment admins with special admin rights for VO-specific services• operators (not always “experts”) watch the system and intervene during

weekends and public holidays on a voluntary basis

Needs for and permanently working on• redundancy, redundancy, redundancy

- multiple experts 24h x 7d x 52w on site is out of discussion

• hardening / optimization of services- the more scalability tests in production, the better (even if it hurts)- but we depend on robust software

• documentation of service components and procedures for operators• service dashboard for operators


GridKa service dashboard for operatorsGridKa service dashboard for operators

See A. Heiss et al., CHEP 2007


6. Plans for 2008


C-RRB on 23-oct-2007: LCG status reportC-RRB on 23-oct-2007: LCG status report

Concern: Are sites aware of the ramp-up (incl. power & cooling)?Concern: Are sites aware of the ramp-up (incl. power & cooling)?


Electricity and cooling at GridKaElectricity and cooling at GridKa

Planning & upgrades done during the last 3 years

• second (redundant) main power line available since 2007

• 3(+1; redundancy) x 600 kW new chillers available

• 1 MW of cooling (water cooling) capacity ready for 2008

Capacity not an issue, but concerned about running cost

• started benchmarking of compute and el. power in 2002

• efficiency (ratio of SPECint / power consumption) enters into

call for tenders since 2004 (“penalty” of 4 €/W at selection)

• many discussions with providers (Intel, AMD, IBM,…)

• contributing to HEPiX benchmarking group and publishing

results


Efficiency (SPECint_rate_base2000 per W)

0 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45

Intel Xeon 3.06 GHz

Intel Xeon 2.66 GHz

Intel Xeon 2.20 GHz

Intel Pent. 3 1.26 GHz

Intel Xeon E5345

Intel Xeon 5160

Intel Pentium M 760

AMD Opteron 270

AMD Opteron 246 (b)

AMD Opteron 246 (a)

2001-2004: very alarming

2005-2007: much morepromising

Based on own benchmarks and measurements with GridKa hardware.


Extensions for 04/2008: everything is bought !Extensions for 04/2008: everything is bought !

Oct’07• 40 new cabinets delivered and installed• 1/3 of CPUs (~130 machines) delivered

Nov’07: arrival and base installation of• all new networking components (incl. cabling)• remaining 2/3 of CPUs• tape cartridges & drives

Nov/Dec’07:• arrival of 2.3 PB disks (incl. non-LHC) + servers

Jan-Mar’08: installations, tests, acceptance, bug fixes, …


SummarySummary

• GridKa contributes with full MoU 2007 resources- we are ready for the April’08 ramp-up

• Good collaboration with- sites, developers and experiments (e.g. local / remote VO admins)

• Much effort spent into- service hardening (redundancy …)

- tools and procedures for operations

- scalability and stability analysis

- access performance optimization (e.g. tape reads)

• This is still a necessity which requires- time of admins

- patience and understanding by customers

- …sometimes at the expense of reliability measures

the german tier 1 lhcc review, 19/20-nov-2007, stream b, part 2

Documents