1 large scale computing at pdsf iwona sakrejda nersc user services group [email protected] february...
TRANSCRIPT
![Page 1: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/1.jpg)
1
Large Scale Computing at PDSF
Iwona SakrejdaNERSC User Services Group
February ??, 2006
![Page 2: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/2.jpg)
2
Outline
• Role of PDSF in HENP computing.• Integration with other NERSC computational and storage
systems.• User management and user oriented services at NERSC• PDSF layout.• Workload management (batch systems)• File System implications of data intensive computing .• Operating system selection with CHOS.• Grid use at PDSF (Grid3, OSG, ITB)• Conclusions
![Page 3: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/3.jpg)
33
PDSF Mission
PDSF (Parallel Distributed Systems Facility) is a networked distributed computing environment used to meet the detector simulation and data analysis requirements of large scale High Energy Physics (HEP) and Nuclear Science (NS) experiments.
![Page 4: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/4.jpg)
4
PDSF Principle of Operation
• Multiple groups pool their resources together• Need for resources varies through the year – conferences,
data taking periods at different times (Quark Mater vs PANIC for example).
• Peak resource availability enhanced.• Idle cycles minimized by allowing groups with small
resources (cycle scavenging).• Software installation and license sharing (Totalview, IDL,
PGI)
![Page 5: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/5.jpg)
5
PDSF at NERSC
HPSSIBM AIX Server
50 TB of cache disk8 STK robots, 44,000 tape slots,
max capacity 9 PB
PDSF~700 processors ~1.5 TF, .7 TB of
Memory~300 TB of Shared
Disk
HPSS
FC Disk
STKRobots
Testbeds and servers
SGI HPSS
Global Filesystem
Storage Fabric
Jumbo 10 Gigabit Ethernet
10 gigabit ethernet
Opteron Cluster – Jacquard
640 processors (peak: 2.8 Tflop/s
Opteron/Infiniband 4X/12X
3.1 TF/ 1.2 TB memorySSP - .41 Tflop/s
30 TB Disk
IBM POWER5 – Bassi888 processors (peak:
6.7 Tflop/s) SSP - .8 Tflop/s
2 TB Memory70 TB disk
IBM POWER3 - Seaborg6,080 processors (peak 9.1
TFlop/s)SSP – 1.35 Tflop/s
7.8 Terabyte Memory55 Terabytes of Shared Disk
Analytics Server - DaVinci32 Processors192 GB Memory
25 Terabytes Disk
![Page 6: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/6.jpg)
6
User Management and Support at NERSC
• With >500 users and >10 projects a database management system needed.– Active user management (disabling, password expiration…)– Allocation management (especially mass storage accounting)
• PIs partly responsible for user management (from their own projects)– Adding users– Assigning users to groups– Removing users
• Users managing their own info, groups, certificates….• Account support • User Support and the trouble ticket system.
– Call center– Trouble ticket system
![Page 7: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/7.jpg)
7
Overview of PDSF Layout
![Page 8: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/8.jpg)
8
PDSF Layout
….. Interactive nodespdsf.nersc.gov pdsf.nersc.gov
Grid
gatekeepers
Batch pool – several generations of Intel and AMD processors
~1200 1GHz
Pool of disk vaults
GPFS file systemsHPSS
![Page 9: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/9.jpg)
9
Workload Management (Batch)
• Effective resource sharing via batch workload management• Fair share principle links shares to groups financial
contributions– Fairness concept by groups and within groups– Concept at the heart of PDSF design
• Unused resources split among running users • Group sharing places additional requirement on batch
systems.• Impact of batch system
– LSF good scalability, performance and documentation, met requirements, costly
– Condor (concept of a group share not implemented when transition was considered – 2 years ago)
– SGE met requirements, scales reasonably, documentation lacking at times
• Changes minimized by SUMS (STAR)
![Page 10: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/10.jpg)
10
Shares System at Work
STAR’s 70% share “pushes out” KamLAND (9% share)
SNO (1%, light blue), Majorana (no contribution) get time when the big share owners do not use it.
![Page 11: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/11.jpg)
11
File System implications of data intensive computing - NFS
• NFS – cost effective solution but – scales poorly
– data corruption during heavy use
– data safety (raidset helps but not 100%)
• Disk vault are cheap IDE based centralized storage– Dvio batch-level “resource” integrated with the batch system – defined to limit number of simultaneous read/write access streams – hard to a priori asses load
• Ganglia facilitates load monitoring and the dvio requirement assessment – available to the users..
![Page 12: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/12.jpg)
12
Usage per discipline
IO and data dominated by Nuclear Physics
![Page 13: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/13.jpg)
13
File System implications of data intensive computing – local storage
• Local storage on batch nodes– Cheap storage (large and cheap hard drives)
– Very good I/O performance
– Limited to jobs running on the node
– Diversity of the user population does not facilitate batch node sharing
• users wary of Xrootd daemons
– No redundancy, drive failure causes data loss
– File catalog aids in job submission – SUMS does the rest
![Page 14: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/14.jpg)
14
File System implications of data intensive computing - GPFS
• NERSC purchased GPFS software licenses for PDSF – Reliable (raid underneath)
– Good performance (striping)
– Self repairing• Even after disengaging under load comes back on-line• compare with “NFS stale file handles” (had to be fixed by either admin
or a cron job)
– Expensive
• PDSF hosts will host several GPFS file systems – 7 already in place
– ~15TB/filesystem – not enough experience with GPFS on linux
![Page 15: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/15.jpg)
15
File System implications of data intensive computing – beta testing
• file system (open software version) testing – File system performed reasonably well under high load– support and maintenance manpower intensive
• Storage units from commercial vendors made available for beta testing– Support provided by vendors– Users get cutting edge, highly capable, storage appliances to use
for extensive periods of time– Staff obliged to produce reports – additional workload (light)– Units too expensive to purchase – work related to data uploading– Affordable units from new companies – uncertainty of support
continuity
![Page 16: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/16.jpg)
16
Role of mass storage in data management
• Data intensive experiments require “smart backup”– Only $HOME, system and application area are automatically
backed up
– PDSF storage media reliable – but not disaster-proof.
– Groups have allocation in mass storage to selectively store their data
– Users have individual accounts in mass storage to backup their work
• Network bandwidth (10GB to HPSS)– large HPSS cache and large number of tape movers facilitate quick
access to stored data
– number of drives still an issue
![Page 17: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/17.jpg)
17
Physical Sciences Dominate Storage Use
Percent of HPSS Allocation to Science Areas
0
5
10
15
20
25
30
35
40
45
50
acce
l phy
s
astro
phys
ics
chem
istry
clim
ate
+ en
vir
CS + m
ath
fusi
on e
nergy
geo +
eng
rHEP
QCD
life
scie
nces
mat
sci
nucle
ar p
hys
20022003200420052006
![Page 18: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/18.jpg)
18
Operating system selection with CHOS
• PDSF is a secondary computing facility for most of the user groups– not free to independently select operating system– tied to the Tier0 selection
• PDSF projects originated at various times (in the past or still to come)– Tier0s embraced different operating systems, evolution
• PDSF accommodates needs of diverse groups with CHOS– framework for concurrently running multiple Linux environments
(distributions) on single node. – accomplished through a combination of the chroot system call, a
Linux kernel module, and some additional utilities. – can be configured so that users are transparently presented with
their selected distribution on login.
![Page 19: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/19.jpg)
19
Operating system selection with CHOS (cont)
• Support for operating systems based on same kernel version.– RH7.2– RH8– RH9– SL 3.0.2
• Base system – SL 3.03– provides security– More info about CHOS available at:
http://www.nersc.gov/nusers/resources/PDSF/chos/faq.php
CHOS protected PDSF from fragmentation of resources – Unique approach to multi-group
support.Sharing possible even when diverse OS required.
![Page 20: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/20.jpg)
20
Who Has Used the Grid at NERSC
• PDSF pioneered introduction of Grid services at NERSC.
• Participation in the Grid3 project
• Mostly PDSF (Parallel Distributed Systems Facility) users, who analyze detector data and simulations:
– STAR Detector Simulations and Data Analysis
• Studies the quark-gluon plasma and proton-proton collisions
• 631 collaborators from 52 project institutions• 265 users at NERSC …
– Simulations for the ALICE experiment at CERN
• Studies ion-ion collisions • 19 NERSC users from 11 institutions
– Simulations for the ATLAS experiment at CERN
• Studies fundamental particle processes• 56 NERSC users from 17 institutions
STAR ExperimentDetector
![Page 21: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/21.jpg)
21
Caveats - Grid usage thoughts …
• Most NERSC Users are not Using the Grid
• The Office of Science “Massively Parallel Processing” (MPP) user communities have not embraced the grid
• Even on the PDSF, only a few “production managers” use the grid; most users do not
• Site policy side effects:– ATLAS and CMS stopped using the grid at NERSC due to lack of support
for group accounts– Difficult/tedious/confusing to get a Grid certificate– Lack of support at NERSC for Virtual Organizations
• One grid user’s opinion: instead of writing the middleware and troubleshooting just use a piece of paper to keep track of jobs and pftp for file transfers
• However, several STAR users have been testing the Grid for user analysis jobs, so interest may be growing.
![Page 22: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/22.jpg)
22
STAR Grid Computing at NERSC
Grid computing benefits to STAR:1. Bulk data transfer RCF->NERSC with Storage
Resource Management (SRM) technologies– SRM automates end-to-end transfers: increased throughput
and reliability; less monitoring effort by data managers– Source/destination can be files on disk or in HPSS mass
storage system– 60 TB transferred in CY05 with automatic cataloging– Typical transfers are ~10k files, 5 days duration, 1 TB– Doubles STAR processing power since all data at two sites
SRM-COPY(thousands of files)SRM-COPY(thousands of files)
SRM-GET (one file at a time)SRM-GET (one file at a time)
GridFTP GET (pull mode)GridFTP GET (pull mode)
stage filesstage filesarchive filesarchive files
Network transferNetwork transfer
Get listof filesFrom directory
Get listof filesFrom directory
DiskCacheDiskCache
DataMover(Command-line Interface)
HRM(performs writes)
LBNL
DiskCacheDiskCache
HRM(performs reads)
BNL
![Page 23: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/23.jpg)
23
STAR Grid Computing at NERSC (cont.)
Grid computing benefits to STAR:2. Grid-based job submission with STAR scheduler (SUMS)
• Production grid jobs are running daily from RCF to PDSF– SUMS job xml job description -> – condor-g grid job submission -> – SGE submission to PDSF batch system
• Uses SRMs for input and output file transfers• Handles catalog queries, job definitions, grid/local job
submission, etc. • Underlying technologies largely hidden from user
![Page 24: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/24.jpg)
24
STAR Grid Computing at NERSC (cont.)
• Goal: use SUMS to run STAR user analysis and data mining jobs on OSG sites. Issues are:
– Transparent packaging and distribution of STAR software on OSG non-STAR-dedicated sites
– SRM services need to be deployed consistently at OSG sites (preferred) or deployed along with the jobs (how to do?)
– Inconsistencies of inbound/outbound site policies– SUMS Generic interface adaptable to other VOs
running on OSG offer community support
![Page 25: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/25.jpg)
25
NERSC Contributions to the Grid
• myproxy.nersc.gov– Users don’t have to scp their certs to different sites– Safely stores credentials; uses ssl– Anyone can use it from anywhere– myproxy-init –s myproxy.nersc.gov– myproxy-get-delegation– Part of VDT and OSG software distribution
• Management of grid-map files– NERSC users put their certs into our NERSC Information Management
system– They automatically get propagated to all NERSC resources
• garchive.nersc.gov– GSI authentications added to the HPSS pftp client and server– Users can log in to HPSS using their grid certs– Software contributed to the HPSS consortium
![Page 26: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/26.jpg)
26
Online Certification Services (in development)
• Would allow users to use grid services without having to get a grid cert
• myproxy-logon – s myproxy.nersc.gov• Generates a proxy cert on the fly• Built on top of PAM and Myproxy• Will use radius server to authenticate users• Radius is a protocol to securely send
authentication and auditing information between sites
• Can authenticate with LDAP, One Time Password or Grid cert
• Could be used to federate sites
![Page 27: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/27.jpg)
27
Audit Trail for Group Accounts (proposed development)
• NERSC needs to trace back sessions and commands to individual users
• Some projects need to set up a production environment managed by multiple users (who can then jointly manage the production jobs and data)
• Build an environment that accepts multiple certs or multiple username/passwords for a single account
• Keep logs that can associate PID/UIDs with the actual user
• Provide audit trail that constructs the original authentication associated with the PID/UID
![Page 28: 1 Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl.gov February ??, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649c735503460f9492498a/html5/thumbnails/28.jpg)
28
Conclusions
• NERSC/PDSF is a fully resource sharing facility– Several storage solutions evaluated, lots of choices and some emerging
trend (distributed file systems, IO balanced systems, …)– CPU shared based on financial contributions– Fully opportunistic (if not used, can be take by others)– NERSC will base its deployment decisions on science and user driven
requirements
• A lot of ongoing research in distributed computing technologies
• NERSC can contribute to STAR/OSG efforts:– Auditing and login tracing tools– Online certification services (integrate LDAP, One Time Passwords and
Grid certs)– Testbed for OSG software on HPC architectures– User Support