infso-ri-508833 enabling grids for e-science data management ron trompert sara grid tutorial, 18-19...

37
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Data Management Ron Trompert SARA Grid Tutorial, 18-19 September 2006

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Data Management

Ron Trompert

SARA

Grid Tutorial, 18-19 September 2006

Grid Tutorial, RC RUG, 18-19 September 2006 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Outline

• Storage Infrastructures• SRM• Storage Elements in gLite• Low Level Data Management• LCG File Catalog (LFC)• Datamanagement CLIs and APIs• Examples• FTS

Grid Tutorial, RC RUG, 18-19 September 2006 3

Enabling Grids for E-sciencE

INFSO-RI-508833

Storage Infrastructures

• Disk-only• Hierarchical storage management (HSM)

– policy-based management of file backup and archiving in a way that uses storage devices economically and without the user needing to be aware of when files are being retrieved from or stored on backup storage media.

– The hierarchy represents different types of storage media, such as disks systems, optical storage, or tape, each type representing a different level of cost and speed of retrieval when access is needed. For example, as a file ages in an archive, it can be automatically moved to a slower but less expensive form of storage.

– HSM Software: TSM, DMF, CASTOR, Enstore, HPSS,…

Grid Tutorial, RC RUG, 18-19 September 2006 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Storage Infrastructures

• HSM example at SARA

Grid Tutorial, RC RUG, 18-19 September 2006 5

Enabling Grids for E-sciencE

INFSO-RI-508833

SRM

• SRM standard– SRM implementations provide uniform access to heterogeneous

storage resources on the Grid

• Storage Resource Managers– SRM is a control protocol for:

Space reservation File management

• Pinning

• Lifetime management Replication Protocol negotiation

Grid Tutorial, RC RUG, 18-19 September 2006 6

Enabling Grids for E-sciencE

INFSO-RI-508833

SRM

• SRM implementation– SRM I/F is implemented as a web service– Implementations:

dCache (disk/HSM) DPM (disk) CASTOR (HSM) SRB (disk/HSM) ….

• SRM Examples– srmRm – srmLs– srmPrepareToPut– srmBringOnline – srmCopy– srmGetTransferProtocols– ….

Grid Tutorial, RC RUG, 18-19 September 2006 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Storage Elements in gLite

• Classic SE– No SRM– Will become deprecated in the autumn of this year– Transfer protocols: gridftp– Storage type: disk

• DPM– SRM– Transfer protocols: gridftp, secure rfio– Storage type: disk

• dCache– SRM– Transfer protocols: gridftp, gsidcap– Storage type: disk, HSM

Grid Tutorial, RC RUG, 18-19 September 2006 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Low Level Data Management

• GridFTP (all SEs)– globus-url-copy file:///home/ron/file \

gsiftp://srm.grid.sara.nl/pnfs/grid.sara.nl/data/dteam/file– Third party transfer

globus-url-copy gsiftp://hostA/pathA gsiftp://hostB/pathB

– Also edg-gridftp-ls, edg-gridftp-rm, edg-gridftp-mkdir etc.– Uberftp

Interactive gridftp client ftp commands Gsi authentication

Grid Tutorial, RC RUG, 18-19 September 2006 9

Enabling Grids for E-sciencE

INFSO-RI-508833

Low Level Data Management

• Gsidcap (dCache SEs)– dccp -p 20000:25000 /tmp/file \

gsidcap://srm.grid.sara.nl:22128/pnfs/grid.sara.nl/data/dteam/file– 20000:25000 is derived from GLOBUS_TCP_PORT_RANGE

environment variable

• Secure rfio– rfcp /path/myfile \

t2se01.physics.ox.ac.uk:/dpm/physics.ox.ac.uk/home/dteam/file

• Srmcp ( ! Classic SEs )– Srmcp file:////tmp/file \

srm://srm.grid.sara.nl:8443//pnfs/grid.sara.nl/data/dteam/file

Grid Tutorial, RC RUG, 18-19 September 2006 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Information system

• LDAP-based– Ldap servers running on service nodes (GRIS/BDII)– Ldap servers collecting the information for a site (site BDII)– Ldap servers collecting the information for all sites (BDII)

• Need to set environment variable LCG_GFAL_INFOSYS– Needs to be set to a BDII

• lcg-infosites– Example: finding an SE:> lcg-infosites --vo tutor se

Avail Space(Kb) Used Space(Kb) Type SEs----------------------------------------------------------214632 1901097784 n.a tbn15.nikhef.nl626880000 1163120000 n.a tbn18.nikhef.nl488106596 368854044 n.a mu2.matrix.sara.nl

Grid Tutorial, RC RUG, 18-19 September 2006 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Information system

• lcg-info– For more advanced searches:

For example, finding out where to put your files

>lcg-info --list-se --query 'SE=mu2.matrix.sara.nl’ --attrs Path

- SE: mu2.matrix.sara.nl- Path /flatfiles/SE00/tutor

• ldapsearch– For the real troopers among us

Grid Tutorial, RC RUG, 18-19 September 2006 12

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC• LFC stands for LCG File Catalog

– LCG stands for LHC Computing Grid– LHC stands for Large Hadron Collider

• User and programs produce and require data – Resource Broker can send (small amounts of) data to/from jobs:

Input and Output Sandbox. Not recommended for large amounts of data

• Data is stored on the grid– Located in Storage Elements– Several replicas of one file in different sites– Accessible by Grid users and applications from “anywhere”– Locatable by the WMS/RB (data requirements in JDL)

• Also…– Data may be copied from/to local filesystems (WNs, UIs) to the

Grid or opened remotely on the SE (GFAL,gsidcap,rfio).

Grid Tutorial, RC RUG, 18-19 September 2006 13

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC

•LFC–Keeps track of the location of copies (replicas) of files

on the Grid

Grid Tutorial, RC RUG, 18-19 September 2006 14

Enabling Grids for E-sciencE

INFSO-RI-508833

Name conventions

• Logical File Name (LFN) – An alias created by a user to refer to some item of data, e.g. “lfn:/grid/tutor/mydir/myfile”

• Globally Unique Identifier (GUID) – A non-human-readable unique identifier for an item of data, e.g.

“guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6”

• Site URL (SURL) (or Physical File Name (PFN) or Site FN)– The location of an actual piece of data on a storage system, e.g.

“srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE)

• Transport URL (TURL)– Locator of a replica + access protocol: understood by a SE, e.g.

“rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

Grid Tutorial, RC RUG, 18-19 September 2006 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Naming conventions

• How do they fit together?– LFC holds the mapping LFN-GUID-SURL

LFN 1

LFN i

:

SURL j

GUID:

:

:

TURL j1

TURL jl

:

TURL 11

TURL 1k

SURL 1

LFC

Grid Tutorial, RC RUG, 18-19 September 2006 16

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC

Grid Tutorial, RC RUG, 18-19 September 2006 17

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC

• LFN acts as main key in the database. It has:– Symbolic links to it (additional LFNs)

– Unique Identifier (GUID)

– System metadata

– Information on replicas

– One field of user metadata

Grid Tutorial, RC RUG, 18-19 September 2006 18

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC

• Two kinds of LFC– Central LFC

For each VO, one site on the grid will publish a global catalog. This will record entries (file replicas or dataset entities) across the whole of the grid.

– Local LFCLocal catalogs record the file replicas stored at that site's SEs only.

Grid Tutorial, RC RUG, 18-19 September 2006 19

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC

• Provides:– User exposed transaction C/C++ API (+ auto rollback on

failure) Python wrapper provided (python module lfc)

– Command line tools with administrative functionality– Hierarchical unix-like namespace and namespace operations

for LFNs lfn:/grid/<vo name>/mydir/myfile lfc-mkdir, lfc-chmod

– Integrated GSI Authentication + Authorization– Access Control Lists (Unix Permissions and POSIX ACLs)– Checksums

– Sessions (multiple operations inside a single transaction )

– Bulk operations (inside transactions )

Grid Tutorial, RC RUG, 18-19 September 2006 20

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC

lfc-chmod Change access mode of the LFC file/directory

lfc-chown Change owner and group of the LFC file-directory

lfc-delcomment Delete the comment associated with the file/directory

lfc-getacl Get file/directory access control lists

lfc-ln Make a symbolic link to a file/directory

lfc-ls List file/directory entries in a directory

lfc-mkdir Create a directory

lfc-rename Rename a file/directory

lfc-rm Remove a file/directory

lfc-setacl Set file/directory access control lists

lfc-setcomment Add/replace a comment

Summary of the LFC Catalog commands

Grid Tutorial, RC RUG, 18-19 September 2006 21

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC

lfc_deleteclass

lfc_delreplica

lfc_endtrans

lfc_enterclass

lfc_errmsg

lfc_getacl

lfc_getcomment

lfc_getcwd

lfc_getpath

lfc_lchown

lfc_listclass

lfc_listlinks

lfc_listreplica

lfc_lstat

lfc_mkdir

lfc_modifyclass

lfc_opendir

lfc_queryclass

lfc_readdir

lfc_readlink

lfc_rename

lfc_rewind

lfc_rmdir

lfc_selectsrvr

lfc_setacl

lfc_setatime

lfc_setcomment

lfc_seterrbuf

lfc_setfsize

lfc_starttrans

lfc_stat

lfc_symlink

lfc_umask

lfc_undelete

lfc_unlink

lfc_utime

send2lfc

lfc_access

lfc_aborttrans

lfc_addreplica

lfc_apiinit

lfc_chclass

lfc_chdir

lfc_chmod

lfc_chown

lfc_closedir

lfc_creat

lfc_delcomment

lfc_delete

C/C++ API: Low level methods (many POSIX-like):

Grid Tutorial, RC RUG, 18-19 September 2006 22

Enabling Grids for E-sciencE

INFSO-RI-508833

LFC Interfaces

• Integration with GFAL and lcg_utils APIs lcg-utils/GFAL access the catalog in a transparent way

• Integration with the WMS– The RB can locate Grid files: allows for data based match-

making– Jdl file:

InputData = "lfn:/grid/tutor/MyFile";

Grid Tutorial, RC RUG, 18-19 September 2006 23

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management CLIs & APIs

• lcg_utils: lcg-* commands + lcg_* API calls– Provide (all) the functionality needed by the LCG user– Transparent interaction with file catalogs and storage

interfaces when needed– Abstraction from technology of specific implementations

• Grid File Access Library (GFAL): API– Adds file I/O and explicit catalog interaction functionality– Still provides the abstraction and transparency of lcg_utils

Grid Tutorial, RC RUG, 18-19 September 2006 24

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management CLIs & APIs

lcg-utils commands: Replica Management

lcg-cp Copies a grid file to a local destination

lcg-cr Copies a file to a SE and registers the file in the catalog

lcg-del Delete one file

lcg-rep Replication between SEs and registration of the replica

lcg-gt Gets the TURL for a given SURL and transfer protocol

lcg-sd Sets file status to “Done” for a given SURL in a SRM request

lcg-utils commands: File Catalog Interaction

lcg-aa Add an alias in LFC for a given GUID

lcg-ra Remove an alias in LFC for a given GUID

lcg-rf Registers in LFC a file placed in a SE

lcg-uf Unregisters in LFC a file placed in a SE

lcg-la Lists the alias for a given SURL, GUID or LFN

lcg-lg Get the GUID for a given LFN or SURL

lcg-lr Lists the replicas for a given GUID, SURL or LFN

Grid Tutorial, RC RUG, 18-19 September 2006 25

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management CLIs & APIs

lcg-utils C/C++ API:

lcg-cp lcg-lr

lcg-cr lcg-ra

lcg-del lcg-rf

lcg-rep lcg-uf

lcg-sd lcg-la

lcg-aa lcg-lg

lcg-gt

Grid Tutorial, RC RUG, 18-19 September 2006 26

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management CLIs & APIs

• GFAL

– Grid storage interactions today require using some existing software components:

The file catalog services to locate valid replicas of files in order to :• Download them to the user local machine• Move them from a SE to another one• Make job running on the worker node able to access and manage

files stored on remote storage element.

The SRM software to ensure:• Files existence on disk or disk pool (they are recalled from mass

storage if necessary) • Space allocation on disk for new files (they are possibly migrated

to mass storage later)

Grid Tutorial, RC RUG, 18-19 September 2006 27

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management CLIs & APIs

• The GFAL Features

– Hides interactions to the SRM to the end user

– Provides a Posix-like interface for File I/O Operation Posix calls prefixed with gfal_

– Based on shared libraries (both threaded e unthreaded version)

– Needs only one header file (gfal_api.h) to write C applications

– Supports following protocols : file for local access, also lfn/guid dcap, gsidcap and kdcap for dCache access protocol rfio for CASTOR access protocol. SRM

– Access to SRMs in secure mode, i.e. using a valid Grid proxy obtained by voms-proxy-init command.

Grid Tutorial, RC RUG, 18-19 September 2006 28

Enabling Grids for E-sciencE

INFSO-RI-508833

Examples

• Using lcg utils and lfc commands:– Define the server hostname

The LFC server must be published in the BDII ($LCG_GFAL_INFOSYS)

Use environmental variable: $LFC_HOST=<LFC_server_hostname> $LFC_HOST must be set

Grid Tutorial, RC RUG, 18-19 September 2006 29

Enabling Grids for E-sciencE

INFSO-RI-508833

Listing the entries of a LFC directorylfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path…

where path specifies the LFN pathname (mandatory)

– Remember that LFC has a directory tree structure– /grid/<VO_name>/<you create it>

– All members of a VO have read-write permissions under their directory– You can set LFC_HOME to use relative paths

> lfc-ls /grid/tutor/me

> export LFC_HOME=/grid/tutor

> lfc-ls -l me

> lfc-ls -l -R /grid

Examples

Defined by the userLFC Namespace

-l : long listing-R : list the contents of directories recursively: Don’t use it!

Grid Tutorial, RC RUG, 18-19 September 2006 30

Enabling Grids for E-sciencE

INFSO-RI-508833

Examples

Creating directories in the LFClfc-mkdir [-m mode] [-p] path...

• Where path specifies the LFC pathname

• Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand.

• Examples:

> lfc-mkdir /grid/tutor/me

You can just check the directory with:

> lfc-ls -l /grid/tutor/me

drwxr-xrwx 0 19122 1077 0 Jun 14 11:36 demo

Grid Tutorial, RC RUG, 18-19 September 2006 31

Enabling Grids for E-sciencE

INFSO-RI-508833

Examples

Let us copy and register a file using lcg-utils

> lcg-cr --vo tutor -l me/test -d mu2.matrix.sara.nl file:`pwd`/test

guid:7b4efaef-bb0f-42a3-bb6f-bbe35080d105

> lcg-lr --vo tutor lfn:me/testsfn://mu2.matrix.sara.nl/flatfiles/SE00/tutor/generated/2006-09-18/

file378fc829-351f-4558-8679-9d2ce530cbb4

> lfc-ls -l me-rw-rw-r-- 1 30010 2024 114 Sep 18 10:33 test

Grid Tutorial, RC RUG, 18-19 September 2006 32

Enabling Grids for E-sciencE

INFSO-RI-508833

Examples

Creating a symbolic linklfc-ln -s file linkname

lfc-ln -s directory linkname

Create a link to the specified file or directory with linkname

– Examples:

> lfc-ln -s /grid/tutor/me/test /grid/tutor/aLink

Let’s check the link using lfc-ls with long listing (-l):

> lfc-ls -l

lrwxrwxrwx 1 30010 2024 0 Sep 18 10:38 aLink -> /grid/tutor/me/test

Original File Symbolic link

Grid Tutorial, RC RUG, 18-19 September 2006 33

Enabling Grids for E-sciencE

INFSO-RI-508833

Examples

Adding/deleting metadata information

lfc-setcomment path comment

lfc-delcomment path

lfc-setcomment adds/replaces a comment associated with a file/directory in the LFC Catalog

lfc-delcomment deletes a comment previously added

• This is the only metadata (one field) supported by the catalog

• Examples:> lfc-setcomment me/test “nice file”

• Let’s see what happened:> lfc-ls --comment /grid/tutor/me/test

/grid/tutor/me/test nice file

Grid Tutorial, RC RUG, 18-19 September 2006 34

Enabling Grids for E-sciencE

INFSO-RI-508833

Examples

Deleting the file

lfc-rm

lfc-rm removes file/link/directory only from the catalog

lcg-del

Lcg-del removes file from SEs and the lfns/links from the catalog

• Examples, delete all replicas:> lcg-del –a --vo tutor guid:8e413879-7cb3-4260-af9f-6964392da7e8

• Example, delete only one replica:> lcg-del –a --vo tutor –s mu2.matrix.sara.nl guid:8e413879-7cb3-4260-af9f-

6964392da7e8

Grid Tutorial, RC RUG, 18-19 September 2006 35

Enabling Grids for E-sciencE

INFSO-RI-508833

File Transfer Service

• A batch system for submitting datatransfer jobs• For data intensive sciences

– Currently in use in the LCG project

Grid Tutorial, RC RUG, 18-19 September 2006 36

Enabling Grids for E-sciencE

INFSO-RI-508833

FTS

• Allows for– Managed transfers by means of channels to sites

Channels are between sites i.e. CERN-SARA for example. Site admins can adapt the configuration of incoming channels to

their site, can switch their channel off etc. Set priorities for different VOs.

– Optimisation of network tuning parametres per channel

Grid Tutorial, RC RUG, 18-19 September 2006 37

Enabling Grids for E-sciencE

INFSO-RI-508833

FTS

• Command line interface– glite-transfer-cancel

Cancels a file transfer job

– glite-transfer-list Lists ongoing data transfer jobs

– glite-transfer-status Displays the status of an ongoing data transfer job

– glite-transfer-submit Submits a new data transfer job