amga metadata catalogue - agenda catania [home] · algiers, joint epikh/eumedgrid-support site...
TRANSCRIPT
www.epikh.eu
The EPIKH Project(Exchange Programme to advance e-Infrastructure Know-How)
AMGA metadata catalogue
Andrea Cortellese ([email protected])
Latin America 3 2010 - Joint GISELA/EPIKH
School for Application Porting
29th November - 10th December 2010
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Metadata
• Metadata is data about other data.
• AMGA (ARDA Metadata Grid Application); the „official‟ Grid
metadata service in gLite (gLite v3.1)
• Since „data‟ in gLite means files, AMGA was originally designed to
manage metadata on Grid files; but not only!
• Example
– Grid files of movie trailers stored on the Grid
– Each movie file has associated different metadata:
Title
Duration
Genre (Action, Animation, Comic, Drama etc.)
Cast (List of actors)
– User can ‘query’ on metadata in order to get back the movie file
Get trailer movie files having: Duration greater than 10 minutes
Get trailer movie files having: ‘Nicole Kidman’ in the Cast
2
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Simplest metadata Scenario
3
Some SEs and a LFC on the Grid
List of LFNsAMGA Server
QUERY: All trailers having „Animation‟ as Genre
Selected
Movie Files
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
LFC and AMGA
• By design there exists a close relationship between
LFC and AMGA servers to associate Metadata to Files
• Then Metadata can be hierarchically organized, FS like
4
LFC AMGA
…/trailers/
moviefile_1.avi
moviefile_m.avi
italian/
ita_movie_1.avi
…
spanish/
es_movie_1.avi
…
…/trailers/
moviefile_1.avi
moviefile_m.avi
italian/
ita_movie_1.avi
…
spanish/
es_movie_1.avi
…
List of attributes:
Name : Madagascar
Genre : Animation
Duration: 90
Cast : ….
LFC AMGA
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
WNs
WNs
WNs
Other metadata Scenario
5
QUERY: Job Ids related to my „Done‟ jobs
User
Jobs
WMS
CE
CE
CE
AMGA
List of JobIds
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
AMGA – Metadata Terminology• Entries
– List of entities having metadata associated
• Attribute
– key name, key type pair
• Schema
– Set of attributes
• Collection
– A set of entries associated with a schema
• Metadata
– List of attributes (including their values) associated with entries
6
Entries Attribute 1 Attribute 2 … Attribute n
Entry 01 E01’ Attrib. 1 value E01’ Attrib. 2 value … E01’ Attrib. n value
Entry 02 E02’ Attrib. 2 value E02’ Attrib. 2 value … E02’ Attrib. n value
… … … … …�
� Integer
Char
Date
…
collection_1/
entry_1
entry_2
…
collection_2/
entry_1
entry_2
…
FS Analogy
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Metadata Example
7
/gilda/demo/trailers/
AMGA collection:
>> Title
>> varchar
>> Duration
>> int
>> Genre
>> varchar
>> Cast
>> varchar
collection attributes:
/gilda/demo/trailers/
madagascar.avi
moulinrouge.avi
…
Collection entries:
Attibute values
>> madagascar.avi
>> madagascar
>> 15
>> animation
>> Ben Stiller;Chris Rock;David Schwimmer;Jada Pinkett …
Entry Name/RowId Title Duration Genre Casst
madagascar.avi Madagascar 12 Animation Ben Stiller,
…
moulinrouge.avi Moulin Rouge 14 Muscal Nicol Kidman,
…
… … … … …
RDBMS View
! Schemas/Attributes may be changed ANYTIME
!It is possible to define:
SEQUENCES, INDEXES and CONSTRAINTS
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Sub-Collections• AMGA Collections may contain sub-collections (Dir FS Analogy)
• AMGA Sub-collections may or not inherit parent attributes
8
/gilda/demo/trailers/
madagasgar.avi
moulinrouge.avi
/gilda/demo/trailers/italian
madagascar_ita.avi
moulinrouge_ita.avi
…
/gilda/demo/trailers/user_remarks
remark_0001
remark_0002
…
AMGA trailers‟ sub-collections:>> Title
>> varchar
>> Duration
>> int
>> Genre
>> varchar
>> Cast
>> varchar
>> DubbedCast
>> varchar
>> Title
>> varchar
>> User
>> varchar
>> Remark
>> varchar
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
AMGA as DB solution
• Although AMGA has been desgned to serve as a Grid File
metadata service; it can be used as a DB
– Collection DB Table
– Schema Table Schema
– Attribute Schema Column
– Entry Table row/record
• Tables may be organized in a single directory (RDBM) or hierarchically organized (OODBM).
9
Entry Name
RowId
Attr_1/Col_1 Attr_2/Col_2 … Attr_2/Col_2
GUID_1 RecVal(1,1) RecVal(1,2) … RecVal(1,n)
GUID_2 RecVal(2,1) RecVal(2,2) … RecVal(2,n)
… … … … …
GUID_m RecVal(m,1) RecVal(m,2) … RecVal(m,n)
Collection/Table
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Attribute Data Types
AMGA PostgreSQL MySQL Oracle SQLite Pyton
int integer int number(38) int int
float
double
precision
double
precision float float float
varchar(n)
character
varying(n)
character
varying(n) varchar2(n) varchar(n) string
timestamp
timestamp
w/o TZ datetime timestamp(6) unsupported
time(unsuppo
rted)
text text text long text string
numeric(p,s) numeric(p.s) numeric(p.s) numeric(p.s) numeric(p.s) float
10
• Using the above datatypes you are sure that your metadata can be easily moved to all
supported AMGA back-ends (DB Migration)
• If you do not care about DB portability, you can use, in principle, any datatypes
supported by the back-end, even the more specific ones: (PostgreSQL Network
Address type or Geometric ones).
• Are Excluded Oracle‟ MySQL and PostgreSQL binary types (BLOBs)
Tested solution implies the use of uuencode/uudecode (shareutils) to convert
binaries into Base64 text format.
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Interacting with AMGA
• Users may interact with AMGA in two different frontends
– Streaming front end (TCP) / amgad
CLI interactive session: mdclient mdjavaclient
CLI single command: mdcli
APIs (C++, Java, Python, Perl, PHP)
– SOAP frontend (WSDL) / mdsoapserver
11
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
mdcli/mdclient
• A configuration template file available at
– /opt/glite/etc/mdclient.config
• Template can be copied into
– $PWD/mdclient.config
– $HOME/.mdclient.config
• mdclient starts a interactive session
– Query>
• mdcli executes a single AMGA command
– It saves a session file storing the current session status in /tmp (i.e md_18968_amga.eela.ufrj.br_8822_0)
12
[brunor@genius ~]$ mdcli 'whoami'
prod.vo.eu-eela.eu
[brunor@genius ~]$ mdclient
Connecting to amga.eela.ufrj.br:8822...
ARDA Metadata Server 1.9.0
Query>
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
mdcli/mdclient help
• It is possible to get help on mdcli/mdclient commands typing
– help <command> or <topic>
• Possible topics– help metadata metadata-optional directory replication
constraints entry group acl index schema sequence user view
site replicas ticket capabilities admin commands
13
[brunor@glite-tutor ~]$ mdclient
Connecting to amga.ct.infn.it:8822...
ARDA Metadata Server
Query> help
>> help [topic]
>> Displays help on a command or a topic.
>> Valid topics are: help metadata metadata-optional directory replication constraints entry group acl
index schema sequence user view site replicas ticket capabilities admin commands
Query> help metadata
>> setattr entry attribute value [attribute value]...
>> Sets given attributes to specified values for all entries matching entry.
>> addattr dir attribute type
>> Adds a new attribute to a directory
…
Query>
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Simple metadata commands• Create a collection
– createdir <path>/<collection_name> [inherits]
• Associate a schema to the collection
– addattr <path>/<collection_name>
<attr_name> <attr_type> [<attr_name> <attr_type>] …
• List Attributes
– listattr <path>/<collection_name>
• Remove Attributes
– removeattr <path>/<collection_name> <attr_name>
• Rename Attributes
– renameattr <path>/<collection_name> <attr_name>
• Add entries and attribute values
– addentry <path>/<entry_name> <attr_name> <attr_value>
[<attr_name> <attr_value>] …
• Set an attribute value
– setattr <path>/<entry_name> <attr_name> <attr_value>
[<attr_name> <attr_value>] …
• List entries
– listentries <path>/<collection_name>
14
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Getting metadata• Three commands: getattr find and selectattr
– getattr pattern attribute1 attribute2 …
– find pattern 'query'
• It is possible to make complex queries throug the use of boolean
operators or join queries among different collections
– Find
15
Query> getattr *.avi Title Duration Genre
>> madagascar.avi
>> madagascar
>> 15
>> animation
>> moulinrouge.avi
>> moulin rouge!
>> 12
>> Drama;Musical;Romance
Query> find *.avi 'Duration > 10'
>> madagascar.avi
>> moulinrouge.avi
Query> find *.avi 'Title=italian:Title'
>> madagascar.avi
>> madagascar.avi
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Getting metadata
• selectattr allows to get Attribute values from given
queries
• selectattr <attrib> … 'query'
16
Query> selectattr trailers:Title trailers/italian:DubbedCast
'trailers:Title=trailers/italian:Title'
>> madagascar
>> Alessandro Besentini;Francesco Villa;Fabio De Luigi: Melman la
giraffa;Michelle Hunziker;Chiara Colizzi;Oreste Baldini;Roberto
Draghetti;Massimiliano Alto;Luigi Ferraro;Massimo Bitossi;Elena Magoia;Franco
Mannella;Gerolamo Alchieri;Pasquale Anselmo;Roberto Pedicini;Marco
Mete;Stefano De Sando;Emanuela Rossi
Query> selectattr trailers:Title trailers:Duration
'like(trailers:Cast,"%Kidman%")'
>> moulin rouge!
>> 12
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
SQL Support
• It is possible to issue SQL queries in AMGA
• Recognized SQL statements
– SELECT, INSERT, UPDATE, DELETE (uppercase)
• INSERT statement automatically generates a unique ID
as entry name17
Query> SELECT Title FROM trailers WHERE trailers.Duration > 10
>> trailers.Title
>> madagascar
>> moulin rouge!
>> madagascar
Query> SELECT trailers:Title FROM trailers, trailers/italian WHERE trailers:Title=trailers/italian.Title;
>> trailers.Title
>> madagascar
>> madagascar
Query>
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
Users and Groups
• AMGA maps users to configured AMGA users and
groups accordingly to
– LOGIN name
– X509/GridProxy DN
– VOMS Groups and Roles
• Main user is: root
• Users and groups are shown and managed POSIX like
• d rwx rwx (user, group) user ownweship
18
Query> ls –l
>> drwxr-x gilda /gilda/demo/trailers
Query> ls –l trailers
>> drwxr-x gilda /gilda/trailers/italian
>> drwxr-x gilda /gilda/demo/trailers/remark
>> -rwxr-x gilda madagascar.avi
>> -rwxr-x gilda moulinrouge.avi
>> -rwxr-x gilda madagascar.avi
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
ACLs
• AMGA allow users to define ACLs for
– Collections
– Entries (MySQL5 and PostgreSQL collection created with -acl)
• Use acl_show or stats <collection|entry>
• Since AMGA v2.0 sudo command allows root user to
become any user
19
Query> acl_show trailers
>> gilda rwx
>> gilda:users rwx
>> system:anyuser rx
Query> stat madagascar.avi
>> /gilda/demo/trailers/madagascar.avi
>> entry
>> rwx
>> r-x
>> gilda
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
AMGA Replication
• AMGA provides a replication/federation mechanisms
• Motivation– Scalability – Support hundreds/thousands of concurrent users
– Geographical distribution – Hide network latency
– Reliability – No single point of failure
– DB Independent replication – Heterogeneous DB systems
– Disconnected computing – Off-line access (laptops)
• Architecture– Asynchronous replication
Master-Slave Writes are only allowed on the master
• Application level replication
– Replicate Metadata with AMGA’ commands (dump)
• Partial replication
– Supports replication of only sub-trees of the metadata hierarchy
20
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
AMGA Replication types
Full Replication
21
Partial Replication
Federation
Proxy
Commands are redirected
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010
AGMA DB Import• Each AMGA server rely on a dedicated DB backend
– Oracle, MySQL, PostgreSQL, mSQL, other (UnixODBC)
• Database Import: two possibilities
– Import tables from the DB into an AMGA DB Backend
– Import AMGA DB Backend into DB hosting tables
• Use the import command by root to “mount” your table into the
AMGA collection hierarchy
22
Query> whoami
>> root
Query> createdir world
Query> cd world
Query> import world.City world/City
Query> import world.Country world/Country
Query> import world.CountryLanguage world/CountryLanguage
Query> acl_add /world/ gilda:users rx
Query> acl_show /world
>> root rwx
>> gilda:users rx
>> system:anyuser rx
DB Access and Replication
23www.eu-eela.euwww.eu-eela.eu
MySQL DB
Movie Metadata
PostgreSQL DB
User Comments
Oracle DB
Actors
PostgreSQL DB
Storage
AMGA
master
AMGA
master
AMGA
master
AMGA
master
AMGA slave/
/movie /storage /actors /comments
/movie/info
/movie/title
/movie/aka_title
/storage/LFN
/storage/SEs
/actors/name
/actors/info /comments/i
nfo
/comments/users
23
Federation and DB Import
• With Federation and DB Import feature it is possible to
create huge federated metadata structures
Jobs with AMGA
24
• Since AMGA supports Grid Proxies, jobs may access to any AMGA server
(mdclient.config)
• Normally the Job Pilot Script uses mdcli client applications to get/set
metadata
• Since AMGA supports Grid Proxies, jobs may access to any AMGA server
(mdclient.config)
• Normally the Job Pilot Script uses mdcli client applications to get/set
metadata
EXAMPLE
• A grid job that selects movies accordingly to a given actor
• A pilot script will query the AMGA server taking the actor name as
parameter and identifies the LFN
• The file pointed by the LFN will be uploaded to the WN
• In the JDL a mdclient.config file has to be specified in the
InputSandbox
Jobs with AMGA
25
# amgajobdemo.sh
#!/bin/bash
echo "Looking for Actor: '"$1"'"
MOVIE=$(mdcli "selectattr /gilda/demo/trailers:Title
'like(/gilda/demo/trailers:Cast,\"%${1}%\")'")
echo "Selected Movie Title: '"$MOVIE"'"
MOVIEFILE=$(mdcli "find /gilda/demo/trailers/*.avi 'Title = \"${MOVIE}\"'")
echo "Selected Trailer avi file: '"$MOVIEFILE"'"
MOVIESCD=$(mdcli "pwd")
echo "Uploading LFN file '"$MOVIESCD$MOVIEFILE"'"
lcg-cp lfn:$MOVIESCD$MOVIEFILE file:$PWD/movie.avi
...
Pilot script # mdclient.config
Host = amga.ct.infn.it
Port = 8822
Login=NULL
PermissionMask = rwx
GroupMask = r-x
Home = /home/gilda
UseSSL = require
AuthenticateWithCertificate = 1
UseGridProxy = 1
VerifyServerCert = 0
TrustedCertDir = /etc/grid-security/certificates
RequireDataEncryption = 1
mdclient.config
# amgajobdemo.jdl
Type = "Job";
JobType = "Normal";
Executable = "amgajobdemo.sh";
StdOutput = "amgajobdemo.out";
StdError = "amgajobdemo.err";
InputSandbox = {"mdclient.config", "amgajobdemo.sh"};
OutputSandbox = {"amgajobdemo.out","amgajobdemo.err"};
Arguments = "Kidman";
JDL file
Simple usage scenario
26
Grid Movie On Demand
gMOD: grid Movie On Demand
Latin America 3 2010 - Joint GISELA/EPIKH School for Application Porting 30.11.2010 27
• gMOD provides a Video-On-Demand service
• User chooses among a list of video and the chosen
one is streamed in real time to the video client of the
user‟s workstation
• For each movie a lot of details are stored and users
can search a particular movie querying on one or
more attributes (Title, Runtime, Country, Release
Date, Genre, Director, Case, Plot Outline)
• Two kind of users can interact with gMOD: TrailersManagers that can administer the DB of
movies and GILDA VO users (guests) that can
browse, search and choose a movie to be streamed.
gMOD under the hood
Latin America 3 2010 - Joint GISELA/EPIKH School for Application Porting 30.11.2010 28
• Built on top of gLite services:
• Storage Elements, sited in different place, physically
contain the movie files
• LFC, the File Catalogue, keeps track in which Storage
Element a particular movie is located
• AMGA is the repository of the detailed information for
each movie, and makes possible queries on them
• The Virtual Organization Membership Service (VOMS) is
used to assign the right role to the different users
• The Workload Management System (WMS) is responsible
to retrieve the chosen movie from the right Storage
Element and stream it over the network down to the
user‟s desktop or laptop
• GENIUS allow users to interact with above Grid Services
gMOD interactions
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010 2929www.eu-eela.euwww.eu-eela.eu
AMGA
LFC
SEsGENIUS Portal
get Role
VOMS
User
WNs
Job Request
CE
WMS
gMOD screenshot
Latin America 3 2010 - Joint GISELA/EPIKH School for Application Porting 30.11.2010 30
Usage scenarios summary
31
• Grid File metadata (LFC)
• Gridified DB solution (Platform Independent DB)
• Job/Infrastructure Monitoring System (GANGA/MonAMI)
• Handle complex job workflows
• Producer/Consumer Job models
• Trivial parallelization management
• Partial/Full Output retrieval (Watchdog)
• I/O Sharing of data among different Users and Jobs
• Share data among Grid users securely (sensitive data)
• Easy backend to develop Digital Libraries (gLibrary)
Conclusion
Latin America 3 2010 - Joint GISELA/EPIKH School for Application Porting 30.11.2010 32
• AMGA – Metadata Service of gLite
– Part of gLite 3.1
Can be used with other middleware platforms
Useful to realize simple Relational Schemas or add metadata
information to Grid Files
– Fully Integrated with the Grid Environment (Security)
• Features:
– Replication/Federation (root)
– Importing existing databases (root)
– SQL support
– Security (SSH, X509, G.Proxyies,VOMS,users/groups,ACLs)
– APIs / client Applications
– SOAP
• Tests shown good performance/scalability
References
Latin America 3 2010 - Joint GISELA/EPIKH School for Application Porting 30.11.2010 33
• AMGA Web Sitehttp://cern.ch/amga
• AMGA Manual v2.0http://amga.web.cern.ch/amga/downloads/2.0/amga-manual_2_0_0.pdf
• AMGA API Javadochttp://amga.web.cern.ch/amga/javadoc/index.html
• AMGA Basic Tutorialhttps://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn
• More information on existing DB access @:–http://amga.web.cern.ch/amga/importing.html
–https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess
Algiers, Joint EPIKH/EUMEDGRID-Support Site Admin Tutorial, 27.06.2010 34
Questions …