infso-ri-508833 enabling grids for e-science data management ron trompert sara grid tutorial, 18-19...
Post on 19-Dec-2015
212 views
TRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Data Management
Ron Trompert
SARA
Grid Tutorial, 18-19 September 2006
Grid Tutorial, RC RUG, 18-19 September 2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Outline
• Storage Infrastructures• SRM• Storage Elements in gLite• Low Level Data Management• LCG File Catalog (LFC)• Datamanagement CLIs and APIs• Examples• FTS
Grid Tutorial, RC RUG, 18-19 September 2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Storage Infrastructures
• Disk-only• Hierarchical storage management (HSM)
– policy-based management of file backup and archiving in a way that uses storage devices economically and without the user needing to be aware of when files are being retrieved from or stored on backup storage media.
– The hierarchy represents different types of storage media, such as disks systems, optical storage, or tape, each type representing a different level of cost and speed of retrieval when access is needed. For example, as a file ages in an archive, it can be automatically moved to a slower but less expensive form of storage.
– HSM Software: TSM, DMF, CASTOR, Enstore, HPSS,…
Grid Tutorial, RC RUG, 18-19 September 2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Storage Infrastructures
• HSM example at SARA
Grid Tutorial, RC RUG, 18-19 September 2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
SRM
• SRM standard– SRM implementations provide uniform access to heterogeneous
storage resources on the Grid
• Storage Resource Managers– SRM is a control protocol for:
Space reservation File management
• Pinning
• Lifetime management Replication Protocol negotiation
Grid Tutorial, RC RUG, 18-19 September 2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
SRM
• SRM implementation– SRM I/F is implemented as a web service– Implementations:
dCache (disk/HSM) DPM (disk) CASTOR (HSM) SRB (disk/HSM) ….
• SRM Examples– srmRm – srmLs– srmPrepareToPut– srmBringOnline – srmCopy– srmGetTransferProtocols– ….
Grid Tutorial, RC RUG, 18-19 September 2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Storage Elements in gLite
• Classic SE– No SRM– Will become deprecated in the autumn of this year– Transfer protocols: gridftp– Storage type: disk
• DPM– SRM– Transfer protocols: gridftp, secure rfio– Storage type: disk
• dCache– SRM– Transfer protocols: gridftp, gsidcap– Storage type: disk, HSM
Grid Tutorial, RC RUG, 18-19 September 2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
Low Level Data Management
• GridFTP (all SEs)– globus-url-copy file:///home/ron/file \
gsiftp://srm.grid.sara.nl/pnfs/grid.sara.nl/data/dteam/file– Third party transfer
globus-url-copy gsiftp://hostA/pathA gsiftp://hostB/pathB
– Also edg-gridftp-ls, edg-gridftp-rm, edg-gridftp-mkdir etc.– Uberftp
Interactive gridftp client ftp commands Gsi authentication
Grid Tutorial, RC RUG, 18-19 September 2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Low Level Data Management
• Gsidcap (dCache SEs)– dccp -p 20000:25000 /tmp/file \
gsidcap://srm.grid.sara.nl:22128/pnfs/grid.sara.nl/data/dteam/file– 20000:25000 is derived from GLOBUS_TCP_PORT_RANGE
environment variable
• Secure rfio– rfcp /path/myfile \
t2se01.physics.ox.ac.uk:/dpm/physics.ox.ac.uk/home/dteam/file
• Srmcp ( ! Classic SEs )– Srmcp file:////tmp/file \
srm://srm.grid.sara.nl:8443//pnfs/grid.sara.nl/data/dteam/file
Grid Tutorial, RC RUG, 18-19 September 2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
Information system
• LDAP-based– Ldap servers running on service nodes (GRIS/BDII)– Ldap servers collecting the information for a site (site BDII)– Ldap servers collecting the information for all sites (BDII)
• Need to set environment variable LCG_GFAL_INFOSYS– Needs to be set to a BDII
• lcg-infosites– Example: finding an SE:> lcg-infosites --vo tutor se
Avail Space(Kb) Used Space(Kb) Type SEs----------------------------------------------------------214632 1901097784 n.a tbn15.nikhef.nl626880000 1163120000 n.a tbn18.nikhef.nl488106596 368854044 n.a mu2.matrix.sara.nl
Grid Tutorial, RC RUG, 18-19 September 2006 11
Enabling Grids for E-sciencE
INFSO-RI-508833
Information system
• lcg-info– For more advanced searches:
For example, finding out where to put your files
>lcg-info --list-se --query 'SE=mu2.matrix.sara.nl’ --attrs Path
- SE: mu2.matrix.sara.nl- Path /flatfiles/SE00/tutor
• ldapsearch– For the real troopers among us
Grid Tutorial, RC RUG, 18-19 September 2006 12
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC• LFC stands for LCG File Catalog
– LCG stands for LHC Computing Grid– LHC stands for Large Hadron Collider
• User and programs produce and require data – Resource Broker can send (small amounts of) data to/from jobs:
Input and Output Sandbox. Not recommended for large amounts of data
• Data is stored on the grid– Located in Storage Elements– Several replicas of one file in different sites– Accessible by Grid users and applications from “anywhere”– Locatable by the WMS/RB (data requirements in JDL)
• Also…– Data may be copied from/to local filesystems (WNs, UIs) to the
Grid or opened remotely on the SE (GFAL,gsidcap,rfio).
Grid Tutorial, RC RUG, 18-19 September 2006 13
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC
•LFC–Keeps track of the location of copies (replicas) of files
on the Grid
Grid Tutorial, RC RUG, 18-19 September 2006 14
Enabling Grids for E-sciencE
INFSO-RI-508833
Name conventions
• Logical File Name (LFN) – An alias created by a user to refer to some item of data, e.g. “lfn:/grid/tutor/mydir/myfile”
• Globally Unique Identifier (GUID) – A non-human-readable unique identifier for an item of data, e.g.
“guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6”
• Site URL (SURL) (or Physical File Name (PFN) or Site FN)– The location of an actual piece of data on a storage system, e.g.
“srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE)
• Transport URL (TURL)– Locator of a replica + access protocol: understood by a SE, e.g.
“rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”
Grid Tutorial, RC RUG, 18-19 September 2006 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Naming conventions
• How do they fit together?– LFC holds the mapping LFN-GUID-SURL
LFN 1
LFN i
:
SURL j
GUID:
:
:
TURL j1
TURL jl
:
TURL 11
TURL 1k
SURL 1
LFC
Grid Tutorial, RC RUG, 18-19 September 2006 17
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC
• LFN acts as main key in the database. It has:– Symbolic links to it (additional LFNs)
– Unique Identifier (GUID)
– System metadata
– Information on replicas
– One field of user metadata
Grid Tutorial, RC RUG, 18-19 September 2006 18
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC
• Two kinds of LFC– Central LFC
For each VO, one site on the grid will publish a global catalog. This will record entries (file replicas or dataset entities) across the whole of the grid.
– Local LFCLocal catalogs record the file replicas stored at that site's SEs only.
Grid Tutorial, RC RUG, 18-19 September 2006 19
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC
• Provides:– User exposed transaction C/C++ API (+ auto rollback on
failure) Python wrapper provided (python module lfc)
– Command line tools with administrative functionality– Hierarchical unix-like namespace and namespace operations
for LFNs lfn:/grid/<vo name>/mydir/myfile lfc-mkdir, lfc-chmod
– Integrated GSI Authentication + Authorization– Access Control Lists (Unix Permissions and POSIX ACLs)– Checksums
– Sessions (multiple operations inside a single transaction )
– Bulk operations (inside transactions )
Grid Tutorial, RC RUG, 18-19 September 2006 20
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file-directory
lfc-delcomment Delete the comment associated with the file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
Summary of the LFC Catalog commands
Grid Tutorial, RC RUG, 18-19 September 2006 21
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC
lfc_deleteclass
lfc_delreplica
lfc_endtrans
lfc_enterclass
lfc_errmsg
lfc_getacl
lfc_getcomment
lfc_getcwd
lfc_getpath
lfc_lchown
lfc_listclass
lfc_listlinks
lfc_listreplica
lfc_lstat
lfc_mkdir
lfc_modifyclass
lfc_opendir
lfc_queryclass
lfc_readdir
lfc_readlink
lfc_rename
lfc_rewind
lfc_rmdir
lfc_selectsrvr
lfc_setacl
lfc_setatime
lfc_setcomment
lfc_seterrbuf
lfc_setfsize
lfc_starttrans
lfc_stat
lfc_symlink
lfc_umask
lfc_undelete
lfc_unlink
lfc_utime
send2lfc
lfc_access
lfc_aborttrans
lfc_addreplica
lfc_apiinit
lfc_chclass
lfc_chdir
lfc_chmod
lfc_chown
lfc_closedir
lfc_creat
lfc_delcomment
lfc_delete
C/C++ API: Low level methods (many POSIX-like):
Grid Tutorial, RC RUG, 18-19 September 2006 22
Enabling Grids for E-sciencE
INFSO-RI-508833
LFC Interfaces
• Integration with GFAL and lcg_utils APIs lcg-utils/GFAL access the catalog in a transparent way
• Integration with the WMS– The RB can locate Grid files: allows for data based match-
making– Jdl file:
InputData = "lfn:/grid/tutor/MyFile";
Grid Tutorial, RC RUG, 18-19 September 2006 23
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management CLIs & APIs
• lcg_utils: lcg-* commands + lcg_* API calls– Provide (all) the functionality needed by the LCG user– Transparent interaction with file catalogs and storage
interfaces when needed– Abstraction from technology of specific implementations
• Grid File Access Library (GFAL): API– Adds file I/O and explicit catalog interaction functionality– Still provides the abstraction and transparency of lcg_utils
Grid Tutorial, RC RUG, 18-19 September 2006 24
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management CLIs & APIs
lcg-utils commands: Replica Management
lcg-cp Copies a grid file to a local destination
lcg-cr Copies a file to a SE and registers the file in the catalog
lcg-del Delete one file
lcg-rep Replication between SEs and registration of the replica
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-sd Sets file status to “Done” for a given SURL in a SRM request
lcg-utils commands: File Catalog Interaction
lcg-aa Add an alias in LFC for a given GUID
lcg-ra Remove an alias in LFC for a given GUID
lcg-rf Registers in LFC a file placed in a SE
lcg-uf Unregisters in LFC a file placed in a SE
lcg-la Lists the alias for a given SURL, GUID or LFN
lcg-lg Get the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given GUID, SURL or LFN
Grid Tutorial, RC RUG, 18-19 September 2006 25
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management CLIs & APIs
lcg-utils C/C++ API:
lcg-cp lcg-lr
lcg-cr lcg-ra
lcg-del lcg-rf
lcg-rep lcg-uf
lcg-sd lcg-la
lcg-aa lcg-lg
lcg-gt
Grid Tutorial, RC RUG, 18-19 September 2006 26
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management CLIs & APIs
• GFAL
– Grid storage interactions today require using some existing software components:
The file catalog services to locate valid replicas of files in order to :• Download them to the user local machine• Move them from a SE to another one• Make job running on the worker node able to access and manage
files stored on remote storage element.
The SRM software to ensure:• Files existence on disk or disk pool (they are recalled from mass
storage if necessary) • Space allocation on disk for new files (they are possibly migrated
to mass storage later)
Grid Tutorial, RC RUG, 18-19 September 2006 27
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management CLIs & APIs
• The GFAL Features
– Hides interactions to the SRM to the end user
– Provides a Posix-like interface for File I/O Operation Posix calls prefixed with gfal_
– Based on shared libraries (both threaded e unthreaded version)
– Needs only one header file (gfal_api.h) to write C applications
– Supports following protocols : file for local access, also lfn/guid dcap, gsidcap and kdcap for dCache access protocol rfio for CASTOR access protocol. SRM
– Access to SRMs in secure mode, i.e. using a valid Grid proxy obtained by voms-proxy-init command.
Grid Tutorial, RC RUG, 18-19 September 2006 28
Enabling Grids for E-sciencE
INFSO-RI-508833
Examples
• Using lcg utils and lfc commands:– Define the server hostname
The LFC server must be published in the BDII ($LCG_GFAL_INFOSYS)
Use environmental variable: $LFC_HOST=<LFC_server_hostname> $LFC_HOST must be set
Grid Tutorial, RC RUG, 18-19 September 2006 29
Enabling Grids for E-sciencE
INFSO-RI-508833
Listing the entries of a LFC directorylfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path…
where path specifies the LFN pathname (mandatory)
– Remember that LFC has a directory tree structure– /grid/<VO_name>/<you create it>
– All members of a VO have read-write permissions under their directory– You can set LFC_HOME to use relative paths
> lfc-ls /grid/tutor/me
> export LFC_HOME=/grid/tutor
> lfc-ls -l me
> lfc-ls -l -R /grid
Examples
Defined by the userLFC Namespace
-l : long listing-R : list the contents of directories recursively: Don’t use it!
Grid Tutorial, RC RUG, 18-19 September 2006 30
Enabling Grids for E-sciencE
INFSO-RI-508833
Examples
Creating directories in the LFClfc-mkdir [-m mode] [-p] path...
• Where path specifies the LFC pathname
• Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand.
• Examples:
> lfc-mkdir /grid/tutor/me
You can just check the directory with:
> lfc-ls -l /grid/tutor/me
drwxr-xrwx 0 19122 1077 0 Jun 14 11:36 demo
Grid Tutorial, RC RUG, 18-19 September 2006 31
Enabling Grids for E-sciencE
INFSO-RI-508833
Examples
Let us copy and register a file using lcg-utils
> lcg-cr --vo tutor -l me/test -d mu2.matrix.sara.nl file:`pwd`/test
guid:7b4efaef-bb0f-42a3-bb6f-bbe35080d105
> lcg-lr --vo tutor lfn:me/testsfn://mu2.matrix.sara.nl/flatfiles/SE00/tutor/generated/2006-09-18/
file378fc829-351f-4558-8679-9d2ce530cbb4
> lfc-ls -l me-rw-rw-r-- 1 30010 2024 114 Sep 18 10:33 test
Grid Tutorial, RC RUG, 18-19 September 2006 32
Enabling Grids for E-sciencE
INFSO-RI-508833
Examples
Creating a symbolic linklfc-ln -s file linkname
lfc-ln -s directory linkname
Create a link to the specified file or directory with linkname
– Examples:
> lfc-ln -s /grid/tutor/me/test /grid/tutor/aLink
Let’s check the link using lfc-ls with long listing (-l):
> lfc-ls -l
lrwxrwxrwx 1 30010 2024 0 Sep 18 10:38 aLink -> /grid/tutor/me/test
Original File Symbolic link
Grid Tutorial, RC RUG, 18-19 September 2006 33
Enabling Grids for E-sciencE
INFSO-RI-508833
Examples
Adding/deleting metadata information
lfc-setcomment path comment
lfc-delcomment path
lfc-setcomment adds/replaces a comment associated with a file/directory in the LFC Catalog
lfc-delcomment deletes a comment previously added
• This is the only metadata (one field) supported by the catalog
• Examples:> lfc-setcomment me/test “nice file”
• Let’s see what happened:> lfc-ls --comment /grid/tutor/me/test
/grid/tutor/me/test nice file
Grid Tutorial, RC RUG, 18-19 September 2006 34
Enabling Grids for E-sciencE
INFSO-RI-508833
Examples
Deleting the file
lfc-rm
lfc-rm removes file/link/directory only from the catalog
lcg-del
Lcg-del removes file from SEs and the lfns/links from the catalog
• Examples, delete all replicas:> lcg-del –a --vo tutor guid:8e413879-7cb3-4260-af9f-6964392da7e8
• Example, delete only one replica:> lcg-del –a --vo tutor –s mu2.matrix.sara.nl guid:8e413879-7cb3-4260-af9f-
6964392da7e8
Grid Tutorial, RC RUG, 18-19 September 2006 35
Enabling Grids for E-sciencE
INFSO-RI-508833
File Transfer Service
• A batch system for submitting datatransfer jobs• For data intensive sciences
– Currently in use in the LCG project
Grid Tutorial, RC RUG, 18-19 September 2006 36
Enabling Grids for E-sciencE
INFSO-RI-508833
FTS
• Allows for– Managed transfers by means of channels to sites
Channels are between sites i.e. CERN-SARA for example. Site admins can adapt the configuration of incoming channels to
their site, can switch their channel off etc. Set priorities for different VOs.
– Optimisation of network tuning parametres per channel
Grid Tutorial, RC RUG, 18-19 September 2006 37
Enabling Grids for E-sciencE
INFSO-RI-508833
FTS
• Command line interface– glite-transfer-cancel
Cancels a file transfer job
– glite-transfer-list Lists ongoing data transfer jobs
– glite-transfer-status Displays the status of an ongoing data transfer job
– glite-transfer-submit Submits a new data transfer job