ce: compute element tp: ce & wn compute element worker node installation configuration
TRANSCRIPT
CE: compute element
TP: CE & WNCompute Element Worker Node
Installation configuration
CE presentation
The Computing Element is the central service of a site.
• Its main functionally are:
– manage the jobs (job submission, job control)
– update to WMS the status of the jobs
– publish all site informations (about site, queue, number of total,free CPUs)
• It can run several kinds of batch system:
–Torque + MAUI
– LSF
– Condor
TORQUE server presentation
• The Torque server is composed by a:
– pbs_server pbs_server which provides the basic batch services such as
receiving/creating a batch job.
• The Torque client is composed by a:
– pbs_mompbs_mom which places the job into execution. It is also responsible for
returning the job’s output to the user
• The MAUI system is composed by a:
– job_schedulerjob_scheduler which contains the site's policy to decide which job must be
executed.
CE: site-info.def variables (1)
Main variables of the site configuration file for the CE :CE_HOST=ce1.$MY_DOMAIN
# Jobmanager specific settings
JOB_MANAGER=lcgpbs
CE_BATCH_SYS=torque
BATCH_BIN_DIR=/usr/bin
BATCH_VERSION=torque-1.0.1b
BATCH_LOG_DIR=/var/spool/pbs/server_priv/accounting
# Architecture and enviroment specific settings
CE_CPU_MODEL=PIV
CE_CPU_VENDOR=intel
CE_CPU_SPEED=1001
CE_OS="Scientific Linux SL"
CE_OS_RELEASE="SL"
CE_OS_VERSION=3.0.5
CE_MINPHYSMEM=1024
CE : site-info.def variables (2)
CE_MINVIRTMEM=2048
CE_SMPSIZE=1
CE_SI00=381
CE_SF00=0
CE_OUTBOUNDIP=TRUE
CE_INBOUNDIP=FALSE
CE_RUNTIMEENV=" LCG-2 LCG-2_1_0 … GLITE-3_0_0 R-GMA "
# TORQUE - Change this if your torque server is not on the CE
TORQUE_SERVER=$CE_HOST
Worker Node list defined for the site “private.griprototype” :
WN_LIST=/opt/glite/yaim/travail/wn-list.conf
ce1.private.gridprototype
se1.private.gridprototype
WN: worker node & Torque client presentation
The Torque client is composed by a:
pbs_mompbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user
The Worker Node is a service where the jobs run. Its main functionally are:
execute the jobsupdate to Computing Element the status of the jobs
It can run several kinds of client batch system:TorqueLSF
CE certification:
cd /etc/grid-security/
ln -s ce1.private.gridprototype.crt hostcert.pem
ln -s ce1.private.gridprototype.key hostkey.pem
chmod 644 hostcert.pem
chmod 400 hostkey.pem
For the CE1 machine, certificates are files named :
ce1.private.gridprototype.crt
ce1.private.gridprototype.key
Certificates installation in /etc/grid-security directory on CE
Get certificates from the BEINGRID CA Certification Authority:
http://voms.beingrid.fr.cgg.com/ca/
backup the certificate as a <host>.p12 file and extract public and private keys
openssl pkcs12 –nocert –in ce1.p12 –out ce1….cert
openssl pkcs12 –nocert –in ce1.p12 –out ce1….key
List of mandatory configuration files :
the WN list defined for the site “private.griprototype” :
WN_LIST=/opt/glite/yaim/travail/wn-list.conf
the mapped-users list defined for the site “private.griprototype” :
/opt/glite/yaim/travail/users.conf
the mapped-groups list defined for the site “private.griprototype” :
/opt/glite/yaim/travail/groups.conf
CE installation and configuration
gLite-yaim generic command:
install_node site-info.def lcg-CE_torque glite-WN
The CE is a certified machine, install certificates in the
directory /etc/grid-security/
configure_node site-info.def CE_torque WN_torque BDII_site
CE publication test
The CE should publish information to the BDII:
lcg-infosites --vo egeode ce valor del bdii: rb1.private.gridprototype:2170
#CPU Free Total Jobs Running Waiting ComputingElement
-------------------------------------------------------
2 2 0 0 0
ce1.private.gridprototype:2119/jobmanager-lcgpbs-egeode
The CE should publish status of jobs queues: As egeode005 user locally, it should match the WN list defined in /opt/glite/…/wn-list.conf
pbsnodes -a se1.private.gridprototype ce1.private.gridprototype
state = free state = free
np = 1 np = 1
properties = lcgpro properties = lcgpro
ntype = cluster etc… ntype = cluster etc…
Local job submission on the CE
To be able to submit jobs locally the user must be mapped egeode005 user on
the new installed CE machine.
cat test.sh #!/bin/sh
/bin/hostname
/bin/sleep 300
qsub -q egeode test.sh 35.ce1.private.gridprototype
qstat -a ce1.private.gridprototype:
Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time
--------------- -------- -------- ---------- ------ --- --- ------ -----
35.ce1.private. egeode00 egeode test.sh 11239 -- -- -- 48:00 R
UI/GUI JAVA graphical interface
commands : edj-wl-ui-jobmonitor.sh edj-wl-ui-jdleditor.sh …
CE Torque/Maui documentation
TORQUE ADMIN GUIDE http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki
MAUI ADMIN GUIDE http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml
Sample Image
Questions on the CE ?