int.eu.grid: a grid infrastructure for interactive applications

Download int.eu.grid: A grid infrastructure for interactive applications

If you can't read please download the document

Upload: elpida

Post on 09-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

int.eu.grid: A grid infrastructure for interactive applications. Gonçalo Borges LIP on behalf of Int.EU.Grid Collaboration. INGRID’08, Italy, April 2008. Do you really need to be motivated?. What is interactivity? Feedback channel between you and your application - PowerPoint PPT Presentation

TRANSCRIPT

  • int.eu.grid:A grid infrastructure for interactive applicationsGonalo BorgesLIPon behalf of Int.EU.Grid CollaborationINGRID08, Italy, April 2008

  • Do you really need to be motivated?What is interactivity?Feedback channel between you and your applicationInteractivity is strongly connected to visualization Why interactivity?Users want answers in seconds and not hoursGrid debug, check application evolution, ...

    Is it possible to have it on the grid?Yes it is!!!Although the several middleware layers and the crossover of multiple administrative domains.

    Benifit for you as a grid user?Power of the grid available at your fingertips!

  • Interactive Job ExecutionREMOTE SITEInternetREMOTE SITE Fast start-up Execution in high-occupancy situations

  • Interactivity on the GridVOs and users are anxious to get itNeeded by a wide set of applications in different scientific domains

    Interactivity has been neglected by larger Grid projectsStandard grids are aimed to sequential jobsHow to properly set Matchmaking and Brokering for interactive tasks? How to start the application immediately? Even in scenarios in which all computing resources might be running batch jobs?And what about online application input/output?

    Int.EU.Grid Provide an advanced grid empowered infrastructure for scientific computing targeted to support demanding interactive (and parallel) applications.

  • Interactivity and MPI on the GridVOs and users are anxious to get itNeeded by a wide set of applications in different scientific domains

    MPI has been neglected by larger Grid projectsAimed to sequential jobsHow to properly set Matchmaking and Brokering for parallel tasks on a Grid Environment? How to manage/control local cluster MPI support?How to set central MPI support?

    Int.EU.Grid Provide an advanced grid empowered infrastructure for scientific computing targeted to support demanding interactive and parallel applications.

  • Int.EU.Grid grid infrastructure12 sites, 7 counties9 in production4 in development

    ~ 900 COREsXeonOpteronPentium

    ~ 45 TB of storage space

    Interconnection by Geant

  • Int.EU.Grid grid infrastructureTwo sets of sitesProduction 9 sitesDevelopment 4 sites

  • Int.EU.Grid servicesDistributed servicesTaking advantage of the partners expertiseRedundancyBetter use of resourcesProduction Core Services CrossBroker RAS BDII VOMS LFC MyProxyProduction Core Services CrossBroker RAS BDII VOMS LFC MyProxy APEL accounting GridICE R-GMADevelopment Core Services CrossBroker RAS BDII VOMS LFC MyProxy Pure gLite WMS Autobuild Repository R-GMA for development Helpdesk SAM Network monitoring Security coordination

  • Int.EU.Grid Virtual OrganizationsApplicationsifusionienvmodiusctibrainihepiplanckiwien2kicompchemimrteuforiaihidraicesgaimain, imon, itut, itest

  • int.eu.grid Archint.eu.grid is a gLite based infrastructure but...enhancing several services (lcg-CE, WN) and deploying new components (CB, RAS, UI) towards interactivity and MPILocal Services

  • Migrating Desktop and RAS (I)Migrating Desktop (MD): User Friendly Grid AccessJava based GUI; Hides the details of the grid Allows to log-in in the GRID independently from where you are (laptop, desktop, everywhere ...) what kind of Computer/OS you are using

    Roaming Access Server (RAS): Gateway for Grid AccessPerforms actions on the grid on behalf of the MD

  • Migrating Desktop and RAS (II)Work together in a client/server fashionThe RAS is a core service installed at a central location with direct network connectivityMD can run from private workstations as long as it connects to the appropriate RAS ports

    The RAS offers a well defined set of web services Represents the integration level of interfaces for different middlewareJava 1.6 supportJob submission, job monitoring, data management operations, job channel forwarding,

    MD/RAS supports interactivity and real time visualisationA plugin has to be developed for each application Acts as an interface enabling bidirectional communication between a job and the local GUI.

  • (i2)glogin ::== grid interactive tool Interactivity is wrapped around (i2)gloginCrossGrid tool extended in int.eu.grid to execute with GT4Supports GSS-based encryptionEnables online communication providing shell functionality for access to the Grid nodes Sort of (grid) ssh using certificates

    MD/RAS integration with i2gloginA local instance of i2glogin is started in the RAS Acting as the serverThe client part of i2glogin is submitted to the grid through a JDLThe connection point (server:port) is specified as an argumentWhen the grid job connects, i2glogin connects back to the local instance Creates a secure, low latency, bidirectional connection.

  • I/O streaming$ i2g-job-submit interactive.jdl$ i2glogin-p 24599:158.109.65.149VirtualOrganisation = "imain";JobType = Normal";Interactive = TRUE;InteractiveAgent = i2glogin;InteractiveAgentArguments = -r p 24599:158.109.65.149 -c;Executable = /bin/sh";InputSandbox = {i2glogin"};

  • I/O streamingCrossBrokerWNUser applicationi2glogin$ i2glogin-p 24599:158.109.65.149Job

  • I/O streamingWNi2gloginCrossBrokerUser application$ i2glogin-p 24599:158.109.65.149

  • I/O streamingWNi2gloginCrossBrokerUser application$ i2glogin-p 24599:158.109.65.149sh-2.05b$ hostnameaow5grid.uab.essh-2.05b$ exitexitConnection closed by foreign host$

  • GVID ::== Grid Video streamingVisualisation capabilities are based on the compression of the OpenGL graphics generated by the applicationGVID performs the encoding/decoding on both sidesVideo encoding saves bandwidth Communication over glogin

    A GVID display client java implementation was develloped to follow requirements of MD/RAS application pluginsA generic java package is ready to be used by all grid devellopers trying to built a MD pluginUser inputs (such as mouse clicks) generated through the application specific plugin are sent via RAS to the applicationThe application has to be adapted to react to these input events

  • Integration of the Interactivity & Visualization tools (I)Aplication LayerInt.eu.grid middleware extension for visualisationInt.eu.grid lower middleware extension for interactivity

  • Integration of the Interactivity & Visualization toolsJob Submission ServicesCrossBrokerLogging&BookkeepingRoaming Access ServerLRMSGatekeeperWorkerNodegLoginvtk AppJob Submission ServicesCrossBrokerLogging&BookkeepingLRMSGatekeeperJDLJDL,,WorkerNodegLoginWorkerNodevtk AppMP4 EncoderEvent DecoderTCPTransportSocketTransportFileTransportPipeTransportremoteGlutMP4 EncoderEvent DecoderTCPTransportSocketTransportFileTransportPipeTransportremoteGlutApplicationGVidMigrating DesktopMigrating DesktopSimulation +visualisationJava Video PlayerMP4 DecoderEvent EncoderTCPTransportSocketTransportFileTransportPipeTransportJava Video PlayerMP4 DecoderEvent EncoderTCPTransportSocketTransportFileTransportPipeTransport

  • CrossBroker (I)CrossBroker: Int.EU.Grid meta-schedulerOffers the same functionalities as the EGEE Resource Broker, plus:Support for Interactive Applications Interactive agent injectionScheduling priorities; Time SharingFull support for Parallel ApplicationsPACX-MPI and OpenMPIFlexible MPI job startup based on MPI-START

  • CrossBroker (II)The user can decide which interactive agent to use through special JDL requirementsThe CrossBroker can inject it transparently to the user

    If the job is recognized as interactive...The CrossBroker treats it with higher priority

    There is in place a mechanism to use bandwidth measurements in the matchmaking processGreat for application needing visualisationBut not really implemented...

    If there are no available resourcesUse a time sharing mechanism

  • Time Sharing: Glide-in mechanismMain idea:Wrap every batch job with an agent (glide-in) Agent will get control of the remote machine independently of its local resource manager.

    Glide-in benefits on the interactivity framework Agents enable simple multiprogramming between interactive and batch jobs. Interactive jobs may run even when no free resources are available.Agents can also be used as a fast start-up mechanism.Agent can control the amount of CPU that an interactive job gets according to QoS requirements expressed by the user in the JDL.

  • Time SharingSchedulingAgentCondor-GCrossBrokerGrid ResourceLRMSBATCHJOB

  • Time SharingSchedulingAgentCondor-GCrossBrokerApplicationLauncherGrid ResourceLRMSBATCHJOB

  • Time SharingSchedulingAgentCondor-GCrossBrokerApplicationLauncherGrid ResourceLRMSAgentVM1VM2BATCHJOB

  • Time SharingSchedulingAgentCondor-GCrossBrokerApplicationLauncherGrid ResourceLRMSAgentVM1VM2BATCHJOB

  • Time SharingSchedulingAgentCondor-GCrossBrokerApplicationLauncherGrid ResourceLRMSAgentVM1VM2BATCHJOBBatch Job running

  • Time SharingSchedulingAgentCondor-GCrossBrokerApplicationLauncherGrid ResourceLRMSAgentVM1VM2BATCHINT.JOB

  • Time SharingSchedulingAgentCondor-GCrossBrokerApplicationLauncherGrid ResourceLRMSAgentVM1VM2BATCHINT.JOBStartup-time reductionOnly one layer involved

  • Response TimeCrossBrokerCE + WN

    MechanismResourceSearchingResourceSelectionSubmissionCampus Grid Remote SiteFree machine submission3s0.5s17.2s22.3sGlidein submission to free machine3s0.5s29.3s33.25sVirtual Machine submission0.5s6.79s8.12s

  • Local configurationsAddress the possibility of starting up the application immediately

    Local Resource Management System levelSome sites have configured their queues so that(Interactive) jobs do not remain in queueOr there is immediate resources to execute them or they are resubmitted elsewhere

    LCG-CE JobManager (JM) levelSome site have decided to move from the lcgpbs JM to the pbs oneIn EGEE, a direct submission to a site completes in ~25s if the sites uses pbs JM against > 2m for the lcgpbs JMpbs JM requires sharing homes between CE and WN (via NFS)However, lcgpbs have been coded for scalability instead speadingLess CPU consumption when the CE is handling hundred of jobs in parallel

  • EGEE short deadtime jobs (sdj) WG sdj definition:short execution time; unexpected and urgentcannot be dealt on a best effort basis in a full production regime.

    Use case & RequirementsVisualisation, Inspection, InteractionImmediate access and small latenciesDinamic attach a sheel to a running processDebugging process

    Bypassing middleware delaysAn alternative solution is agent scheduling or overlay systems which provide user task management

    int.eu.grid relies on gLite with additional enhancements

    Opportunity window to test interoperability with EGEE

  • Scheme: Int.EU.Grid/EGEE shared WN Int.EU.Grid InfrastructureEGEE InfrastructureI2G UIBatch ServerMonBoxSite_BDIILCG-CEI2G CE softwareLCG-CEUIMonBoxSite-BDIISECore ServicesLocal ServicesEGEE NodeEGEE Nodes with I2G softwareI2G NodeLegend:JM env varsMPI, VisualizationInt.EU.Grid & EGEE VOs

  • Scheme: Enhancing an EGEE site with Int.EU.Grid features gLite WNInt.EU.Grid InfrastructureEGEE InfrastructureCore ServicesBatch ServerLCG-CEgLite WNI2G WN software gLite WNI2G WN softwareMonBoxSite-BDIISEUILocal ServicesI2G UILCG-CEI2G CE softwareMonBoxSite-BDIIMPI, VisualizationI2G UI softwareI2G CE softwareI2G WN softwareUIgLite WNgLite WNgLite WNLCG-CEMPI, Visualization

  • Open Issues and DrawbacksInformation System is still a major problemDelay reporting the site change of availability to Core Services

    Still missing a generalized priority schema for grid users and jobs Things are more easy if infrastructure is simple, butinstitutions will share the same local infrastructure between several Grids

    Some solutions (under investigation): Information system: CrossViewFair share with different penalty factorsBatch jobs worsen the priority according to the resources used Interactive jobs worsen the priority faster than batch onesInteractive CEAutomatic injection of glidein wrappers for all jobs submitted from other sources different of CrossBroker

  • Int.EU.Grid added valueint.eu.grid offersInteractivity and VisualizationOn the Fly interaction, On the Fly response; Graphical VO applicationsInter and intra cluster parallel tasksPACXMPI & OpenMPIUser friendly access to resourcesThe Migrating Desktop

    int.eu.grid features go in favour of fulfilling most of the requirements of users from other major gridsVOs can take advantage of the interoperability solutions developed within int.eu.gridAdd/configure VO resources without losing their normal (gLite) capabilities.

    These slides are an schema of what goes on with the glidein.It is not of much interest for the users since this is done transparently and without any notice for them.I think they are quite easy to follow: the mpi job request arrives at the broker, it will select one resource.Starting the application launcher (in pacx-mpi this includes the startupserver), and using CondorG, the CB will submit a job to the resource.This jobs is in fact an agent (glidein) that divides the WN into two virtual machines that can be controlled from the CBOne of the VM is used for the MPI Job, that will wait suspended until all the other parts of the job are ready. In the meantime, the CB can do backfilling and use the other VM to execute other jobs.When all the MPI tasks are ready, then the MPI jobs is awaken and continues its execution.

    Same thing as the MPI, but with interactive and batch jobs. Just the final part of the scenario. The batch job can be suspended or lower its priority and this is controlled by the broker