the gateway computational web portal marlon pierce, choonhan youn, geoffrey fox erdc, august 16 2001
TRANSCRIPT
The Gateway Computational Web Portal
Marlon Pierce, Choonhan Youn, Geoffrey Fox
ERDC, August 16 2001
Tutorial Overview
• Demo• Grid and Gateway Overviews• HTML Forms• Java QuickStart Guide• JavaServer Pages Overview• Gateway JSP Tools• WebFlow Module Development• Installation and Security Issues
Computational Grids Survey
A brief introduction to computational grid projects and goals.
What Is a Computational Grid?
• Grids link distributed scientific resources.– Resources can be geographically, politically distributed
• Goal: provide means for sharing resources between organizations.
• Example “high-end” resources:– Supercomputers and clusters– Mass storage– Advanced visualization (CAVES) and collaboration (Access
Grid).– Particle colliders, telescopes, earthquake detectors
• www.globus.org/research/papers/anatomy.pdf
What Does a Grid Need?
• Multi-institutional security – PKI or Kerberos
• Information services– Manage, store, deliver information about resources.– Use information to make decisions
• Scheduling and Queuing– Advance reservation– Meta-queuing
• Remote execution, file transfer, monitoring
Example of a Grid Problem:CERN’s Large Hadron Collider
• Goes on-line in 2005• Will generate petabytes of raw, distributed
data, terabytes of event summary data.• Computing resources for data analysis will
be distributed between CERN and regional centers spread all over the world
• 1500-2000 people will collaborate on experiments.
Grid Projects
• Grid Infrastructure– Condor: www.cs.wisc.edu– Globus: www.globus.org – Legion: www.cs.virginia.edu/~legion
• Grid Applications– Netsolve: www.cs.utk.edu/netsolve – Ninf: www.etl.go.jp
• Global Grid Forum: www.gridforum.org
Examples of Deployed Grids
• NASA’s Information Power Grid– Links NASA’s Ames, Glenn, and Langley Centers.
– LaunchPad currently available
– www.ipg.nasa.gov
• DOE’s ASCI Distributed Resource Management– Links classified computing resources at Lawrence
Livermore, Los Alamos, and Sandia National Labs.
– Full deployment scheduled by Nov 2001.
Latest Grid News
• NSF will spend $53 million on the Distributed Terascale Facility (DTF)– 13.6 teraflops, 600 terabytes, 40 Gigabit/sec– DTF sites: NCSA, SDSC, Argonne, CalTech– Industry partners: IBM, Intel, Qwest
• See www.ncsa.uiuc.edu/News/Access/Releases for more information (August 9).
Example: Globus
• Run applications remotely:– globus-job-run: interactive.– globus-job-submit: batch for PBS, LSF.– globusrun: most general version (RSL).
• Split jobs between hosts.• Send and retrieve data securely (PKI).• Monitor jobs remotely.• Monitor hosts remotely.
What’s the Problem?
• Globus client must be installed on desktop– Difficult installation
– No ubiquitous access (PDAs, your grandmother’s PC)
• Typical solution is to support Globus at particular sites and have users remotely log in.– Problems arise because many users are not Unix-savvy.
• Lots of new commands to learn.
Computational Portals
• Computational portals are designed to simplify access to grid technologies.
• Also provide coarse-grained grid approach that ties grid and non-grid resources.– Not everyone uses Technology X.– Not everything at a TechX supporting site will
use TechX.– Different TechX sites may remain separate.
Gateway Architecture
• Gateway is implemented in a three-tiered architecture.
• Browser Front End– JSP dynamically generates HTML pages.
• Component Middle Tier– JavaBeans on the web server.– Distributed WebFlow servers.
• HPC back end– Link to grid and non-grid services with rsh, ssh.– More sophisticated interfaces can be built.
WebFlow Master Server
WebFlow Child Server
WebFlow Child Server
WebFlow Child Server
WebFlow Child Server
Web ServerAnd
Servlet Engine
JavaBeanServiceProxy
JavaBeanLocal
Service
JavaBeanServiceProxy
JavaBeanLocal
Service
SECIOP
SECIOP
Web BrowserAnd
Client Applications
JVM
Web BrowserAnd
Client Applications
HTTP(S) HTTP(S)
Data Storage
Condor Flock
HPC+PBS HPC+LSF
Globus Grid
RS
H,S
SH
RS
H,S
SH
Gateway Design Goals
• Build a working portal for users.• Produce a tool chest for portal developers.• Targeted Services:
– File Transfer– Problem organization and session archiving– Batch script generation– Job submission– Job monitoring– Shared visualization– Security
Levels of Use
• Users and Admins can do everything through web.
• Portal developers may want to edit pages, use our components.
• Advanced developers can write modules.
Portal Users andAdministrators
Portal Developers
ModuleDevelopers
Sophistication
Gateway Descriptors
How to add your codes and your hosts to the portal.
Gateway Descriptors
• Form the base of portal for any particular field.
• Collect static info about applications, hosts in an XML data record.
• Application Descriptors describe how to run codes.
• Host descriptors describe HPC systems.• Users are described by another mechanism.
Sample Application Descriptor
<XSIL Name="ANSYS" Type="csm.parseXMLDesc">
<Param Name="NumberOfInParams">0</Param>
<Param Name="NumberOfInFiles">1</Param>
<Param Name="NumberOfOutParams">0</Param>
<Param Name="NumberOfOutFiles">1</Param>
<Param Name="IOStyle">StandardIO</Param>
Sample Host Descriptor <XSIL Name="Modi4" Type="csm.parseXMLHost">
<Param Name="HostName">modi4</Param>
<Param Name="QueueType">LSF</Param>
<Param Name="ExecPath">/usr/bin/ansys57</Param>
<Param Name="WorkDir">/scratch</Param>
<Param Name="QsubPath">/usr/bin/bsub</Param>
Adding Your Application
• We store application and host data in a single file.– Applications “contain” hosts.
• You can create and edit this by hand, or• You can use administrator interface to edit
the data record.• Admin interface also lets your verify data.
– Did I give the right executable path.
Java Quick Start Guide
A quick and dirty overview of the Java programming language.
Basic Elements
• The Java language resembles C/C++:– Primitive types: int,float, double, char, boolean
– Strings are actually classes (more later on this)
– Standard control structures like for and while loops, if statements, case/switch statements, try/catch blocks.
• Some important differences from C/C++:– No pointers
– Method arguments are always passed by value.
– No preprocessors or macros.
If/Else Statement Format
if(condition1) {//conditionally executed code
}else if(condition2) {//conditionally executed code
}else {//conditionally executed code
}
For Loops
• Syntax:
for(int i=0;i<MAX;i++) {//executed code
}• MAX is a variable defined elsewhere.
Java Classes
• Java is object-oriented– Classes encapsulates data and methods (functions)
within a single entity.– Objects are instances of classes.
• Analogy: the declaration “int i” creates an instance of an integer.
• The Java SDK comes with an extensive library of pre-defined classes for you to use.
• See the online API:– http://java.sun.com/j2se/1.3/docs/api/
Example Class: Hashtable
• Hashtable allows you to store name/value pairs.
• To create a new hashtable object:Hashtable myhash=new Hashtable();
• You can now use Hashtable methodsmyhash.put(“MyName”,”Marlon”);String name= (String)myhash.get(“MyName”);
References
• The Java web site has API documentation and tutorials:– http://java.sun.com/j2se/1.3/docs/
• Excellent reference text:– “Core Java” Volumes I and II by Cay
Horstmann and Gary Cornell (Prentice Hall)
• O’Reilly publishes the API:– “Java in a Nutshell” by David Flanagan
Interactive HTML
Using HTML forms to tie widgets to server actions.
The <form> Tag
• The <form> …</form> tag pair surround all HTML input types.
• Format:<form name=“myform” method=“Post”
action=“/GOW/servlet/someAction”>… <!-- Input tags go here --></form>
• The “action” attribute specifies what happens when an input button is pressed.– Can be CGI, a servlet, or a JSP page
The <input> Tag
• Input tags define text fields, submit buttons, radio buttons, menus, …
• Several can be combined within a single <form>
• Format:<form method=“GET” action=“servlet/myServlet”><input type=“text” name=“text” value=“Sample”><input type=“submit”></form>
Putting It All Together
<html><body><form action=“someaction”
method=“Post”>Please type your name:<input type=“text”
name=“myname” value=“Marlon”>
<input type=“submit”></form></body></html>
test.html
What Happens When I Click the Button?
• The CGI Script/Servlet/JSP/… specified in the action receives all name/value fields of <input>.
• These are sent in the HTTP request to the server.• Usually you should use “Post” instead of “Get”
– No size limit on requests with Post
– Requests are not shown in the browser location field.
• The server-side code usually returns output to the browser.
JavaServer Pages (JSP)
Putting Java into HTML to build dynamic web pages.
What Are JavaServer Pages?
• JSP let you embed Java code into your HTML web pages. – Use .jsp extension
• When the page is loaded by the browser, JSP is translated into a servlet, executed, and you see the output.
• To run this, you need a special server– Apache’s Tomcat, IBM’s WebSphere, …
Embedding Scriptlets
<%@ page import=“java.util.Date” %><%@ page import=“java.util.Hashtable”%><html><body>Hello, Marlon <%Hashtable Myhash=new Hashtable();Date now=new Date();Myhash.put(“date”,now);%>The time is <%= now.toString() %><!-- More HTML and scriptlets to follow. --></body></html>
What Does It Mean?
• The “import” statements at the top point to the location of the Java class files.
• Everything between <% and %> is interpreted as Java.– This sections are called scriptlets.
• Java can be in-lined with html using the <%= %> tags.– These are referred to as expressions.
Ex: For Loop of Radio Buttons
<% for (int i=0;i<5;i++) {
%>Radio<%= i %> <input type=“radio”>
<%}
%>
Using JavaBeans in JSP
• As presented so far, JSP still requires extensive knowledge of Java API.– O’Reilly’s Java API “Nutshell” is 600 pages.
• JavaBeans are custom components that encapsulate specific sets of functions. – Develop a small set of classes for area-specific tasks.
• It is good design to separate display from control code so that each is reusable.– You don’t want sprawling JSP pages.
Separation of Responsibility
Portal UsersDefine
Functionality,L&F
Web Developers
Work on L&F
Java Programmers
Develop Beans
JavaBeans in JSP
• Create an instance “gem” of the gemBean class <jsp:useBean id=“gem” class=“gem.gemBean” scope=“session”/>
• You can now use “gem” like any other objectgem.loadData();gem.runSim();
• By setting the scope, pages can share beans.• You can also use this tag to initialize once• Other HTML-like tags exist for accessing data.
Overview of Gateway JSPs
• Welcome Page– Sets up most beans– Buttons are included in TrackNavigator.jsp
CodeSelect.jsp
• Codes are read in from the Application Descriptor file.
• The page is generated automatically from the descriptor.
• Problem name is mapped to user context directory, where session data will be stored.
JobSubmit.jsp
• Based on selected code, forms are generated automatically.
• Application Descriptor file specifies the number of input files, parameters, etc.
Submitted.jsp
• Shows the generated queue script, based on user requests.
• The user has one last chance to edit.
• The “Submit” button can be tied to an action to run the script.
ReturnPage.jsp
• Job has been submitted.
• The track navigator is again included at the bottom of the page.
Gateway Bean Classes
An overview of the Bean classes that can be used to build portals.
Gateway Architecture
• We have developed a number of service beans for computational portals.
• Some accomplish specific tasks on server.
• Others act as proxies to WebFlow modules (next section).
Context Data
• Gateway organizes user sessions into “problems” and “sessions.” A problem contains one or more sessions.
• All of this is called Context data. It maps to a directory on the server.
• All information gathered from the user is stored as name value pairs in the appropriate subdirectory.
ContextManagerBean
• Contains convenience methods for finding old problems and sessions, creating new ones, deleting old ones, etc.
• Common Methods: too many to list. Come to the lab or see the documentation– www.gatewayportal.org/DOC/index.html
moduleServerBean
• Hides the messy details of connecting to WebFlow and getting an instance of the module you want.
• Creates instances of all WebFlow modules, provides accessor methods for them.
• So to get the submitJob module, I just usesubmitJob sj=modserver.getSubmitJob();
in my JSP page.
parseXMLBean
• Parses the Application Descriptor data record.
• Provides specific getters for hosts, applications.
• Provides general getters for other parameters:– getCodeTagValue(“ANSYS”,”IOStyle”);
createScript
• This is an abstract superclass of script generators. – Extend it with
createPBS.java, createLSF.java, createCSH.java, etc.
• Actual class created at runtime with scriptFactory.java.
setPropBean
• JSPs communicate by sending HTTP requests to each other.
• We many name/value pairs to write to the Context data directory.
• setPropBean provides automating methods to remove drudgery and cut out page bulk.
• Other JSPs can recover data using ContextManager.
JSP
setPropBean
ContextData
ContextManager
HTTP Requests
Miscellaneous Beans
• jobInfoBean: convenient wrap around hashtable for storing name/value strings.
• nameEncodeBean: inserts/removes underscores in problem names. Used to create unix directory names.
• GetFileBean: reads/writes script files to disk, filters out control characters.
Page Control
• Page flow is controlled by the servlet GOWAdminServlet.java. – Pages call this servlet,
which invokes the next page.
• The servlet receives the request from page A, looks up the next page is display, and shows it.
Commands
• Commands are classes that implement a simple “Command” interface.– Must override the execute() method.
• ForwardCommand: Simplest case. Just forwards control to the specified page.
• SubmitCommand: Assembles and executes a remote command to run a job before displaying the next page.
WebFlow Modules
An overview of how to use existing modules and how to write your own.
The Role of WebFlow
• WebFlow servers can distribute portal services over many hosts.
• WebFlow can do this because it is hierarchical:– Single parent acts as gatekeeper for child
servers.– Ex: Run main server at FSU, child server at
NCSA to provide access to remote file system.
WebFlow Design
• WebFlow is a custom-built component system.– Implements JavaBeans spec
using CORBA
• Servers contain “contexts” (abstract containers) and “modules”.– Contexts are organizational,
can be remote (i.e. child servers).
– Modules are CORBA implementation files.
Configuring WebFlow Servers
• WebFlow servers configured with text files.• Header:
– Name of server– File to write IOR if it is a master server– Parent– URL of IOR file (if child).
• List of provided modules follows:– Name– Location of interface (IDL or XML) – Java package name of module
Some Standard Modules
• submitJob: executes external local and remote commands (rsh, ssh), moves files to and from remote systems (rcp, scp).
• remotefile: moves files between client and server machines.
• ContextManager: can manage remote contexts. Uses two helper modules.
• Charon: http security module.
Using Modules
• Modules are just Java classes.– API on the web at
www.gatewayportal.org/DOC/index.html.
• Get instance in JSP page using moduleServerBean.– You can now invoke the object’s methods on
the remote server as if they were local.
Developing Modules
• Develop IDL interface (list of methods)• Must compile IDL with Orbacus’s jidl. Generates
CORBA stubs, skeletons.• Write a Java implementation file
– Defines methods of the interface.
• Compile it all.• Add it to the appropriate server’s configuration file.• Modify moduleServerBean to make it available to
the JSP pages.
IDL Boilerplate
#ifndef _WEBFLOW_#include "../BC.idl“#endifmodule WebFlow{ module myModule { interface myModule:BeanContextChild {
#Insert your methods here. void test();
string execCommand(in string command);…};
}; } ;
Implementation File BoilerPlatepackage WebFlow.myModule;
public class myModuleImpl extends WebFlow.BeanContextChildSupport
implements myModuleOperations {
String msg_;
org.omg.CORBA.Object peer;
public myModuleImpl(org.omg.CORBA.Object peer,
String msg) throws WebFlow.NullPointerException {
super(peer);
this.peer=peer;
String msg_=msg;
}
//Your method definitions go here.
}
Web Portal Security
A review of some security issues and some minimal recommendations.
Multi-tiered Security
• Multiple tiers require security between, within each tier.
• Security issues:– Authentication
– Authorization
– Privacy
• Implementing these end-to-end is a challenge.
Some Minimal Security Suggestions
• Use SSL-enabled Apache web server.
• Disable remote access to Tomcat.
• Use multiple authentication methods– HTTP Authentication– Client certificates
• Use ssh or kerberized rsh, not plain rsh.
• Put on test bed first, log all usage.
Next Steps
• Add meta-job descriptiors to provide better links between HPC and visualization.
• Improved 3D graphics for remote visualization.
• Component interfaces to Condor and Globus.– Globus CoG kits are available for Java.– GPDK provides Bean bridge to CoG.
Some Resources
• Gateway web site: www.gatewayportal.org.– All materials and software can be downloaded
from here.
• Grid Computing Environments: www.computingportals.org.
• My contact info:– Email: [email protected]– Phone: (937)904-5140
Coda: Topics for Lab Session
Hands-on activities for Thursday’s lab.
Lab Topics
• Installing and configuring Tomcat.
• Installing and configuring Apache with SSL.
• Downloading, configuring, and running WebFlow.
• Modifying GEM sample portal JSP pages.
Browser
Charon ClientCharon Module
Web Server And
Servlet Engine
WebFlow Server
HTTP(S)
SECIOP
HTTP
Desktop Client Remote Server
AdministrativeServlet
SubmitCommand
ForwardCommand
CommandInterface
JavaServerPage
JavaServerPage
Request Response
JavaServerPage
ScriptFactory
PBSScript
Generator
GRDScript
Generator
LSFScript
Generator
PBSScript
LSFScript
GRDScript
Script Generator Superclass
A B
WebFlow Parent Server
Child Server A Child Server B
Proxy Images
Modules
User Contexts