a java-based science portal for neutron scattering experiments

A Java-based Science Portal for Neutron ScatteringExperiments

Sudharshan S. Vazhkudai∗ James A. Kohl∗ Jens Schwidder∗

ABSTRACTThe Spallation Neutron Source (SNS) is a state-of-the-artneutron scattering facility recently commissioned by the USDepartment of Energy (DOE). The neutron beam producedat SNS will have an intensity that is an order of magni-tude higher than existing facilities worldwide, enabling asignificantly better understanding of and exploration intothe structure of matter. The SNS is a billion-and-a-halfdollar investment supporting research that impacts diversescience domains such as materials, chemistry, engineering,polymers, structural biology, and superconductivity. Thou-sands of scientists from around the world will annually per-form experiments at SNS, ultimately producing petabytes ofraw data that must be reduced, curated, analyzed and visu-alized. The SNS facility is developing a Java-based one-stopshopping web portal with access to the broad spectrum ofdata and computing services that will facilitate scientific dis-covery by enabling geographically dispersed users to seam-lessly access and utilize the SNS facility resources. In this ar-ticle, we describe the design and implementation of the SNSportal, focusing on several key architectural components,highlighting the diverse usage of Java in a production envi-ronment, ranging from enterprise level software compositionto remote interactive visualization and integration with highperformance distributed computing.

Categories and Subject DescriptorsD.2.11 [Software]: Software Engineering Software Archi-tectures

General TermsDesign

KeywordsNeutron Science Portal, Java Web Portal, Service Oriented

∗Computer Science and Mathematics Division, Oak RidgeNational Laboratory {vazhkudaiss,kohlja,schwidderj}@ornl.gov

(c) 2007 Association for Computing Machinery. ACM acknowledges thatthis contribution was authored or co-authored by an employee, contractoror affiliate of the U.S. Government. As such, the Government retains anonexclusive, royalty-free right to publish or reproduce this article, or toallow others to do so, for Government purposes only.PPPJ 2007, September 5–7, 2007, Lisbon, Portugal.Copyright 2007 ACM 978-1-59593-672-1/07/0009 ...$5.00.

Architecture

1. INTRODUCTIONLocated at Oak Ridge National Laboratory (ORNL), the

Spallation Neutron Source (SNS) [14] is a large-scale leading-edge neutron scattering facility that hopes to fundamen-tally revolutionize the analysis and characterization of fine-grained atomic/material structure. These advanced new ex-perimental capabilities will drive scientific discovery acrossa spectrum of domains, including materials, chemistry, engi-neering, polymers, structural biology, magnetism and super-conductivity. Yet there are many challenges to performingstate-of-the-art neutron science at this scale, toward whichSNS must blaze a trail and break new ground for the neutronscience community at large.

The SNS was recently commissioned by the US Depart-ment of Energy (DOE), in collaboration with six nationallaboratories, and will have at least eighteen distinct instru-ments, each facilitating a certain aspect of neutron scat-tering experimentation or research. These instruments willproduce an order of magnitude larger data than existing fa-cilities. Three SNS instruments are already online and intesting, and are already beginning to attract users, whileremaining and additional instruments will continue to comeonline incrementally over the course of the next several years.

Traditionally, neutron scientists have conducted their ex-periments/raw data collection in situ at various neutron fa-cilities, and then departed with their datasets in hand foranalysis back at their home institutions. This was feasibledue to the relatively small size of these datasets. However,at SNS, individual data file sizes will range from hundredsof megabytes to several gigabytes and beyond, making therelocation and storage of these large suites of experimentaldata somewhat costly. Further, because most science teamsare collaborative in nature, across multiple geographicallydistributed institutions, it becomes increasingly difficult foreach user to store and analyze a given experiment’s datalocally at their home site. The sharing of data and resultscan be prohibitively slow. In conjunction with these prac-tical concerns, ad hoc data management solutions also im-ply a lack of reliable metadata and provenance informationabout the processing of an experiment’s raw data and subse-quent analysis results. A single unified and centralized datastorage and management infrastructure is imperative for thefull production scale of experiments that will be performedat SNS, to ensure flexible and efficient accessibility and theproper integrity of metadata management and tracking.

Another concern relates to the breadth and diversity of

analysis and reduction software for neutron science. Eachnew instruments brings its own associated set of data anal-ysis and reduction processes, and requires specially cateredand customized software tools, although these tools all tendto build on the same common functional and mathematicalfoundations. Until recently in the neutron science commu-nity, there have not been proliferate and widely acceptedstandards for analysis, reduction and visualization software,leaving many users to fend for themselves in terms of actu-ally processing the data. The result has been several dis-parate, custom or ad hoc solutions, with much redundancyand overlap in the existing community tools. This begs theneed for an overarching software infrastructure to provideunified access and integration of these many software tools.

Even given a new cohesive and integrated software anal-ysis infrastructure, the majority of users, especially at aca-demic institutions, would often be confined to run analysesjust on local desktop computers. Only a handful of usergroups can afford the high-performance computing (HPC)infrastructure truly required for timely processing of thesemassive new datasets. Access to computational resources,such as clusters with large in-core memory and high-throughputmass storage archives, is required to properly and competi-tively conduct the more sophisticated data analyses, or anyassociated simulation/modeling of experimental samples.

Based on a series of end-user interviews, in conjunctionwith community workshops of the Neutron Science SoftwareInitiative (NeSSI) conducted by SNS [14], an extensive set ofuser requirements has been collected. Neutron science usersfall under three main categories: novice users, who desirepush-button interfaces; seasoned users, who will tweak thetools provided and use them in novel ways; and expert users,who desire direct shell access to their data workspace, with arich environment to write their own tools and compose cus-tom workflows. All users desire that the facility maintain,annotate and archive data, with a rich set of readily availableGraphical User Interfaces (GUIs) to sift and search throughdata and browse their workspaces. The users don’t care howor where the data is stored, as long as they have the abilityto easily and quickly access it, for running any community-or facility-provided analysis tools, including composing cus-tom workflows around them. Users also need the ability toperform complex interactive visualizations of raw and re-duced data, with seamless access to an HPC environmentfor efficient data analysis, reduction and simulation.

SNS is taking an evolutionary step for the neutron sciencecommunity by providing one-stop shopping through the SNSfacility web site, with a diverse set of these key required ser-vices, including analysis/visualization software, data host-ing/sharing and computing/storage infrastructure. To ad-dress the aforementioned requirements, SNS is building acomprehensive software solution, with a Java-based web por-tal and set of crucial back-end services. This portal approachis beneficial because it provides a means for abstracting andmanaging the complexity of the system, through a singlepoint of entry, and a common framework for defining thevarious services and front-end user interfaces. Most users,regardless of their sophistication with computers, have someexperience browsing the Internet in a web browser, so thisgeneral presentation layer is amenable to many. Java it-self lends a well-explored collaborative approach to softwaredevelopment, with a variety of available development en-vironments, tools and libraries. SNS can also capitalize on

an existing investment in several Java-based neutron sciencetools, and experience with developing Java.

SNS users require that the diverse and sophisticated set ofneutron science services be readily available ”any time, any-where”, through the convenience of a standard web browserfront-end. Because the user base covers a wide breadth ofskills and experience, especially with respect to computa-tional environments, it is crucial that the cost of entry beminimal for the resulting system, with a shallow learningcurve and little or no required software installations on theclient machines, yet with flexibility/extensibility for moreadvanced users.

Java is a natural technology on which to base this infras-tructure, due to its portability and flexibility in inheritingfrom and extending various existing standard classes. Javais being used to implement many of the SNS software sub-systems and services, as well as the encompassing portalpresentation layer. Numerous existing web portals and en-terprise software solutions commonly use Java or Java-basedframeworks, due to their flexibility, potential for reuse andthe ease with which web-based applications can be developedand deployed. However, applications in science and engi-neering pose new challenges beyond merely serving static orinteractive web content.

The obstacles which SNS must surpass include:

• Gigabytes to terabytes of data from instrument dataacquisition systems must be archived in a scalable stor-age system that is conducive to both long-term storageas well as efficient dynamic web-based accesses. Thesetens of thousands of datasets must be annotated andcataloged for scalable searching and querying.

• As dataset sizes grow, so does the complexity of remotevisualization, including conducting interactive 1-D, 2-D and 3-D views driven remotely from large archiveddata though a web portal. This opens numerous issuesin terms of both web client and server-side caching ofdata and geometry/image delivery to provide practicallevels of interactivity.

• Seamless and transparent access must be provided toHPC cyberinfrastructure (massive supercomputers, high-speed optical networks, parallel and networked file sys-tems, and archival mass storage) to enable users to re-motely conduct and monitor long-running data analy-ses in progress.

• Remote execution of monolithic community-providedanalysis and reduction tools, replete with their ownindependent GUI constructs, must be integrated withthe internally developed software infrastructure, anddisplayed through the common web portal.

• A secure authentication and authorization layer mustbe provided to control access these services and theproprietary experiment data.

In the remainder of this paper, we present details of thedesign and implementation of SNS software components thataddress the aforementioned challenges, and discuss some ofthe many research issues involved in their realization. We il-lustrate scenarios where the use of Java has been apt in thisendeavor, and situations with a genuine need to interface

Access and Authorization Control

Control Interface Data Interface Analysis Interface

Database and Flat File Access Protocols

MetaDataMetaData

DocumentationDocumentation

PublicationsPublications

Acquisition

Data Management

SNS Validated Software

SNS Validated Software

LegacyCode

LegacyCode

CommercialPackages

CommercialPackages

Computer Interface

AnalysisResults

AnalysisResults

DataData

DesktopClient

UserApplication

WebBrowser

New User Code

NewUser Code

Figure 1: Overall SNS Software Architecture

with native operating system services, e.g. for simultane-ous secure access to back-end Java services by multiple dis-tinct users. We discuss our modular software build processand the use of Maven to bring together a number of diversedevelopmental efforts: code from our tightly-coupled inter-nal development team, friendly developer contributions, andopen-source third-party and legacy software. Our solutionshighlight the diverse usage of Java ranging from enterprisesoftware composition to integration with high-performancedistributed computing.

2. SNS SOFTWARE ARCHITECTURETo set the context for the remainder of the paper, this

section briefly overviews the overall software architecture atSNS, as shown in Figure 1. The many functional blocks andsystems are organized in a modular layered fashion, allow-ing a separation of concerns and extensibility/pluggabilityof various subsystems. This organization is intended to bevery flexible, to support the wide range of use cases that areexpected, for users with varying degrees of sophisticationas well as programmatic access by other external softwareenvironments. At the top level, access and use of the SNSsoftware services are made available from remote web clients(the most common usage, and the main focus of this paper),local desktop tools (for in situ use during actual experimen-tation) and other large-scale software frameworks (for post-mortem data acquisition/independent analysis approaches).

All of these access modes must first pass through an Ac-cess and Authorization security layer, to validate the iden-tity and permissions of a given end-user to use the underly-ing software services and access specific experimental datasuites. It is crucial to exclude unauthorized access or mali-cious external attacks, especially for the highly critical ex-perimental control interfaces, but also to protect the highvalue of the proprietary information contained in as-yet-unpublished experimental results. As such, it is not suffi-cient merely to identify a specific user as having general ac-cess to the SNS software facilities, but the access to specificproposals/experiments and results data must be controlledon a per user basis (see Section 3). All requests/accessesto the internal software services must be properly authen-ticated/authorized before these operations can be executed,

and external access is only allowed at appropriately exportedinterfaces within the internal service infrastructure.

There are three broad high-level functional interfaces ex-ported through the top-level security layer: Controls, Dataand Analysis. The Control interface is primarily intended forinternal use for directly monitoring and controlling variousaspects of in-progress neutron scattering experiments andsamples. The main task of this interface is management ofthe data acquisition system, to oversee details of the livesample environment, including temperature, position, andthe starting/stopping of data collection. The Data inter-face is a set of simple streamlined mechanisms for searchingand directly accessing data files from the permanent datamanagement system archives, including downloading, anno-tating and uploading/submitting new user-generated files.This interface is independent of the more specific and so-phisticated data reduction and analysis functions, that areprovided as part of the Analysis interface, which can like-wise access data, in whole or in part, and download/uploadvarious data files during data processing.

Most of the complexity in the SNS software architectureresides in the Analysis interface, where a variety of numericalalgorithms can be applied in custom workflows, to systemat-ically process (analyze, reduce and visualize) the experimen-tal data. These algorithms operate on data at several levelsof complexity, from raw neutron event data to histogramsto reduced and energy/frequency domain representations.Similarly, these algorithms and analysis/reduction functionsoriginate from a variety of sources, including internally de-veloped and validated software, legacy/community and com-mercial monolithic analysis tools, and custom user-providedanalysis codes. These different types of analysis functions,across a broad spectrum of software paradigms and runtimeenvironments, must be seamlessly integrated into a unifiedanalysis framework. This allows interchangeable coopera-tion among functions, in arbitrary custom pipeline com-binations, for processing of experimental data. Some ofthese software libraries and programs can be decomposedor repackaged into individual software components that canbe directly integrated and composed into contiguous multi-function analysis applications, potentially on-the-fly. Otherscodes remain independent monolithic programs that must beindividually staged with proper input files, with subsequentresults collected for the next stage in the workflow pipeline.Details of this complex “Application Management” task arebriefly explained in Section 6.

Underneath the higher level Data and Analysis interfaces,there is a generic Java-based Data Management layer for ac-cessing various archival and scratch data files, and a Com-puting layer for low-level handling and monitoring of avail-able computational resources, for executing the requesteddata analysis pipelines. The base layer database archivesare used to organize and coordinate access to all experimen-tal data and analysis results, with the associated meta-dataand provenance information, as well as provide user docu-mentation, publications, experimental proposal details anduser preferences.

The focus of this paper is primarily on the presentationof these various SNS facility services through a generic webbrowser “portal” client. This “SNS Portal” environment pro-vides a unified and flexible user environment for coordinat-ing access to these services, as described in Section 3. Themajority of the portal environment software (both front-end

client and back-end services) is written in Java.Java is generally a logical choice, as it enjoys a wide accep-

tance in science communities as a portable and standardizeddevelopment language, and a commonly supported deliverymechanism for web-based content. It has been used in thedevelopment of other science portals such as the Earth Sys-tem Grid (ESG) [9]. In addition, many Grid-related por-tals are based on Java technologies, and projects like Grid-sphere [10] have done much to support the Java-based de-velopment of portals for Grid communities.

3. PORTAL INFRASTRUCTUREThe “SNS Science Portal” is the front-end of the multi-

tiered SNS software architecture. The portal is intended tosupport thousands of users per year as the facility matures.It provides users with a rich interface to address their soft-ware, computing and data needs. In its current form, theSNS Portal is a Java applet and a suite of web services hostedin a Tomcat servlet container as a web application. Tom-cat itself runs behind an Apache web server. In addition toTomcat, several other internal software components and ser-vices are hosted in a JBoss application server, running on aseparate database machine, to distribute the load and shieldexternal access. The web services expose the functionalityof SNS back-end software components through straightfor-ward programmatic interfaces. In the following section, keyaspects of the front-end portal software are highlighted.

User Interface: As a science portal, the goal of SNS is toprovide its user base with a broad set of interactive function-ality and controls, for a variety of data processing and analy-sis operations. The user interface enables workspace brows-ing, metadata/database searching, analysis/simulation toolsetup, visualization of large datasets, and composition ofdata processing workflows. A Java Swing-based UI was cho-sen for the primary portal interface, given the breadth ofavailable widgets and classes, and the need for rapid de-velopment and easy/portable deployment. Swing expeditesimplementation of detailed graphical interfaces, with a greatdegree of flexibility in UI design and manipulation options.

The portal front-end is written as a Java applet, whichmust be trusted using digital signatures to allow for load-ing/saving of files at front-end client computers. Yet SNSis committed to supporting both “lightweight” portlet orforms-based and “thick” applet-scale client interfaces. Forinstance, to be inclusive of code contributions from the neu-tron science and other user communities, third party toolsare sometimes written as lightweight portlets, adhering tothe JSR 168 standard to promote sharing (e.g., with OGCE[13]). Applets have been prototyped to coexist with portletswithin a JSR compliant portal framework (such as Grid-sphere or Jetspeed). The SNS portal also includes tradi-tional web content serving, for basic facility information, on-line instrument status, etc. This information is best servedas RSS feeds within a portlet.

Workspace Browsing: Much of an SNS user’s online expe-riences are confined to activities within their workspace, andthe experimental/analysis data therein. A user can man-age numerous files within the workspace, from experimentsconducted across many facilities, all viewed via the singleunified interface. The SNS portal presents a “file explorer”view of the workspace, as shown in Figure 2. A user mightalso desire to utilize additional views, such as one organizedby instrument usage proposals. The portal remains agnos-

Figure 2: Workspace browsing in the SNS portal

tic as to how data is stored in the SNS archives – the cur-rent implementation is as a filesystem. However, the back-end Workspace interface has been designed to abstract thelow-level data management implementation from the portal.This abstract workspace service interface implements a basicset of file operations and data extraction capabilities.

Within the workspace browsing GUI, the viewing of mul-tiple file formats is supported. Each file format is associatedwith a specific front-end viewer and a cooperative back-endweb service for appropriately serving data file contents. Forexample, a “NeXus” file viewer displays a hierarchical viewof the common SNS data file format (discussed in Section4), while Text and XML viewers simply display the plaintext contents of files. A “Properties” panel displays relevantstandard file information, including any additional format-specific file metadata.

Security Infrastructure: SNS security infrastructure com-prises of authentication followed by an authorization layer.SNS has the rather restrictive requirement that authentica-tion be little more than simple user name and text password-based. This is to minimize complexity and ensure the com-fort zone of our user base, as well as provide a scalable solu-tion for the large number of distinct transient users annually(e.g. it is not practical to issue secure electronic id cards orone-time passwords just for a week or two of SNS portaluse). To this end, the portal utilizes a user name/password-based authentication system called XCAMS, as generallysupported by ORNL for informal external users. XCAMSis connected to an Apache module that provides the portalwith information about the authenticated user in the HTTPheader of all service requests. The portal security stub pullsthis information to instantiate a user principal object thatthe back-end services can use to identify access permissions.Once authenticated to the portal, users must be authorizedto use specific resources, including data, computing infras-tructure, databases and—in the future—for externally con-trolling the instruments themselves.

At the inception of SNS software architecture, the decisionwas made to use UNIX user accounts and standard filesys-tem access control lists (ACLs) to meter authorized access todata and compute resources. (This decision was motivated

by the need for interactive shell access by advanced users, inaddition to portal-based usage.) The portal must thereforemap the authenticated user principal with back-end user ac-counts to reconcile access to resources. Because the portalserver runs inside a Java Virtual Machine (VM), which itselfruns as user tomcat on the back-end web server machine, theJVM is oblivious to the real UNIX user accounts and can-not negotiate access. Specifically, there is no inherent JVMmechanism for assuming any given user’s identity (i.e. “se-tuid”), let alone for multiple simultaneous users in differentthreads. Any back-end action on behalf of a portal user mustbe performed as that user to adhere to file/group permis-sions access control. For instance, users from one proposalgroup cannot access data from another proposal, and dataanalyses must be executed as “themselves” in the workspace(this also enables the facility to monitor and track resourceusage).

A special native authorization layer interfaces the portalservlets with underlying operating system services, e.g. tonegotiate with Data Management to access data or withApplication Management to secure access to computing re-sources (see Section 6). In addition to UNIX accounts,user credential and certificate mechanisms are required bymany HPC resources and distributed computing environ-ments. A portal “Simulation” prototype provides authoriza-tion through X.509 certificates to execute jobs on clusters(e.g., the “TeraGrid,” a national cyberinfrastructure testbed[15]). The following sections further discuss the design andimplementation of additional SNS back-end infrastructure.

4. DATA AND METADATA MANAGEMENTBy 2008, data generated at SNS will be on the order of

one Terabyte per day contained in over 20,000 files. By2011, the cumulative data store is expected to reach over onePetabyte. Thousands of users will access the datasets, in-cluding downloading, data mining, and conducting real-timeanalysis, visualization, and sophisticated metadata searches.Data will be accessed through a mix of portal-, web- andshell-based tools and services. The SNS data managementarchitecture must be flexible enough to address this diverseusage, and scale as SNS ramps up its data production.

At the core of this data management architecture is theSNS data repository, which is simply a networked file sys-tem. Several alternatives were considered for the data stor-age organization, such as SRB (Storage Resource Broker)[20] and SRM (Storage Resource Manager) [21], which areused in other science communities. However, the SNS re-quirement to support both shell and web-portal users pre-cludes the use of these (and other) more sophisticared solu-tions. Many of these data management solutions cannot bedirectly mounted as file systems for shell access.

Data is organized by experimental runs, which are storedwithin project/proposal directories that are nested withineach instrument folder. Datasets are stored with basic UNIXfile access privileges so that only users belonging to the pro-posal team can access them. Ultimately, all data files areowned by SNS, and are eventually made world-readable viaa directory link in the “public” folder when the proposal be-comes public/has been published.

SNS users also have file workspaces at the facility wheretheir proposal and public data is linked in from the maindata repository. In addition, users can create scratch anal-ysis data and other personal data in their workspaces, en-

abling access through the portal from anywhere. The datamanagement system also provides user workspace manage-ment services such as creation of user accounts, populatinguser workspaces with links to accessible data, etc. Manyother user workspace administration activities are initiatedin response to this SNS proposal submission/creation.

Data on spin storage is archived in HPSS online archivalstorage [19]. In the long run, all data cannot be maintainedon spin storage, so SNS is developing a database catalogsystem atop both the file system and archival storage. Anactive “window” of recent datasets will stay on disk file sys-tems, while older data is accessible only from archival stor-age. These database catalogs and the general data manage-ment system provide Java interfaces via portal clients, usingremote method invocation and web services. The data man-agement system currently supports transfer protocols suchas ftp and GridFTP [17], and hsi (Hierarchical Storage In-terface, to access HPSS) [19].

In addition to storing and serving data to clients, the datamanagement system is also responsible for capturing meta-data about SNS data. The system applied for this purpose iscalled ICAT, which uses a common database schema to de-fine neutron science data (adopted by both SNS and ISIS, aneutron facility in the United Kingdom (UK)). The schemacomprises attributes covering proposal information, experi-ment samples, instrument parameters, etc. The ICAT soft-ware is written using Enterprise Java Beans (EJB) that in-terface with an Oracle database at the back-end and presenta web service and remote method interface to the portalfront-end. The portal UI presents a guided search interfaceto the user based on the services provided by the metadatamanagement component. Users are initially presented withall the data they are allowed to see. They can then trimthis set of results adding further constraints (such as select-ing data from a specific principal investigator or experiment,or with specific keywords in the title, etc.)

A note on the fundamental dataset building block is inorder here: SNS datasets are NeXus [12] files. NeXus is aneutron-community hierarchical data format standard thatdescribes the dataset in terms of targets, detector counts,pixel banks and x-y offsets. The data management systemprovides specialized NeXus file format services, e.g. servingfile contents as XML to portal clients. To this end, we haveadapted an existing Java NeXus [12] library implementation.

Java has proven itself an excellent choice for the data man-agement system, by incorporating existing software fromthe Neutron Science community, such as JNexus and ICAT.These Java-based tools integrated easily within the SNS in-frastructure, along with standard data transfer tools suchas ftp and GridFTP, and provided a service layer atop theunderlying storage fabric (file and archival system) for webclients. The flexibility and convenience of composing the dis-parate Java interfaces expedited the development process.

5. REMOTE VISUALIZATIONA variety of existing neutron science visualization and

analysis tools are available for use at SNS, including ISAW[11], DAVE [1], IDL [3], Matlab and many others. However,many of these tools are commercial products or contain pro-prietary interfaces and formats. Because the ISAW systemwas available as open source and was already written in Java(with the additional benefit of a friendly working relation-ship between SNS and the ISAW developers), it was a clear

technology choice for the integration of some basic neutronscience “views” into the SNS Portal. A custom “subset” ofthe ISAW classes was extracted and packaged by the ISAWdevelopers for inclusion in the SNS software infrastructure.

The challenge to using these handy ISAW view classeswas in wrangling and extending the associated internal datamanagement. Like most monolithic tools, ISAW assumesthat it can read data arrays into memory as needed fromavailable input files on the local filesystem. Clearly, whenrunning the ISAW viewer classes as part of a remote front-end portal applet for the SNS, this is not logistically possi-ble! Fundamentally, there must be a separation between thelocal client data processing and the actual back-end data ac-cess, which is implemented by replacing the core Java dataobject classes with remote-capable counterparts. Data mustbe loaded into the applet’s memory by invoking a special-ized ”data extraction” servlet, via URL from within the Javaapplet. The servlet runs back on the portal web server ma-chine, which has local (mounted) filesystem access and socan open up the desired data files. The servlet extracts therequested data array (or the desired subregion/elements ofit), and then transmits these data back to the front-end ap-plet as the results/output of the servlet invocation.

For the purposes of the SNS Portal ISAW viewers, the sizeof the multidimensional numerical data arrays that need tobe sent to the front-end applet can be quite large, on theorder of several hundred megabytes each. Therefore, an ef-ficient custom data delivery protocol was created to directlytransmit the numerical values, without the overheads (orbenefits) of full Java object serialization. Every byte has animpact in terms of network bandwidth, so eliminating theadditional Java state in a full numerical object class can helpreduce the latency in delivering these raw data. In a ratherprimitive but ultimately concise approach, the arrays of nu-merical values are output in simple ASCII format (ratherthan the most efficient binary format, to guarantee integrityacross machines with heterogeneous data formats) and thenreassembled with minimal meta-data back into new numer-ical Java arrays at the client side applet. This stream couldbe further optimized by compression, however the currentimplementation does not yet do this. Data arrays are orga-nized into appropriate new applet side data classes, whichare then able to implement the interfaces required for drivingthe ISAW viewer classes. These special applet variations ofthe ISAW data classes are extended with additional “servlet-friendly” methods that provide meta-data on the origins ofthe data, and allow a variety of new operations relating toupdating the currently loaded array subregions (see below).

Unfortunately, due to the minimalistic client operatingenvironment expected for the diverse spectrum of poten-tial SNS users, this servlet-based data linkage cannot as-sume a significant amount of available/allocated memory inthe Java Virtual Machine (JVM) plug-in being run by theclient’s web browser. So, aside from having to deal withthe significant level of indirection or “distance” (latency) be-tween the data and the applet, this portal infrastructuremust also deal with serious remote visualization and datacaching issues. Entire data arrays cannot be fully loadedall at one time into the front-end client applet, but mustbe incrementally or interactively loaded as needed to drivethe various viewer classes. This creates problems both interms of viewer application interfaces and local client datamanagement.

The basic assumption/expectation of the ISAW viewerclasses, to have fully in-core data residing in its internaldata classes, is no longer valid. This creates many com-plications in trying to emulate an in-core data array at thefront-end data classes, by encapsulating a complex front-enddata caching scheme. In the worst case scenarios, based onstrict data usage and access patterns, some viewers requiredadditional “callback” hooks to allow automatic pre-fetchingof specific data elements prior to their use by the viewer.(Fortunately, these special hooks could be added to the mainISAW source tree without significant perturbation.)

In other more moderate cases, incremental data accesspatterns allowed specific data subregions to be retrieved us-ing a straightforward data caching algorithm, from insidethe front-end (impostor) data class methods. In all cases,there is a small latency penalty to be paid while the requireddata elements are downloaded on a“cache miss,”which man-ifests as an occasionally “pause” during what would other-wise be smooth scrolling or animation. However, by choos-ing an appropriately small “cache block size” this latency canbe minimized, in part depending on the available networkbandwidth between the client and the back-end server.

These cache miss latencies could be further attenuated byapplying some level of “local disk caching” at the front-endclient machine. Given the restrictive JVM default of localclient memory, relative to the many hundreds of megabytesrequired to store typical SNS data arrays, a great deal ofnetwork bandwidth and latency could be avoided by cachingeven a moderate amount of data subregions locally on aclient filesystem. With the current algorithm, every cachemiss triggers a full data replacement in the limited JVM in-core memory space. By storing/retrieving some data locally,many data requests could be handled more efficiently, withminimal impact to the user’s available disk space. The onlyadditional requirement for this local client caching wouldbe that the portal applet must be signed, however this isalready required for other tangential logistical reasons.

Once the above fundamental data delivery capabilitieswere woven into the original ISAW viewer classes, and theseclasses integrated into the SNS Portal infrastructure, severaladditional functional extensions were made to these JavaSwing viewer GUIs. In many cases, the original viewerswere custom designed for specific types of neutron sciencedata, and as such assumed particular data array dimension-ality and axis arrangements. Yet for the SNS Portal’s ex-tended application of these data viewers, it is beneficial tobe able to select the desired data dimensions to view in a1-D, 2-D or even 3-D viewer, by extracting specific lower-dimensional slabs or slices from data arrays of a higher di-mensionality. Specific axes of a data array can be swapped,to provide various orthogonal “probes” or “slices” through ahigher-dimensional array.

To support these orthogonal slicing and rotational oper-ations, with appropriate navigation and zooming throughthe source data arrays, the GUIs for each ISAW viewer wereextended by retrieving and extending the “built-in” ISAWviewer control classes, thanks to some handy existing ac-cessor methods provided by ISAW. Additional Java slidersand selector elements were applied to “scroll” the selecteddata region across a higher-dimensional array, and choosethe data axis, by label, to be applied for each given viewerplotting axis. For example, when viewing a 3-D data arrayin the 2-D image viewer, the slider chooses which 2-D slice

Figure 3: SNS Portal’s Extensions to the ISAW 2-D

Image Viewer

of the larger volume is displayed, as can be seen in Figure3. Swapping the 2 plotting axes among the 3 available dataaxes allows all 6 variations/orientations of the image slice tobe selected. These successful GUI extensions to the origi-nal ISAW viewers warrant integration back into the originalISAW source tree, as part of a future collaboration.

One other significant challenge was encountered in in-tegrating the animated 3-D Volume Viewer into the SNSportal. This viewer uses a Java wrapper library onto theOpenGL standard native graphics library, called “JOGL”.Though this native JOGL solution works reasonably well ina local non-applet execution of ISAW, it creates a number ofcomplications for remote applet-based execution. Whereas asimple ISAW install script can place the appropriate JOGLlibraries in accessible filesystem locations for local ISAWuse, there is not a good standardized solution for on-the-flyJOGL installation or usage within a standard Java appletin a client web browser.

One solution is to manually install the JOGL librarieson each client machine, which is error prone especially forthe non-computer-expert user (and violates a fundamental”user friendliness” requirement for the SNS). Other commonJOGL solutions involve the use of specialized applet loaderclasses, which can automatically detect the native clientoperating environment and install the appropriate JOGLshared object libraries. However, to utilize such a solutionwould require that the entire SNS Portal applet be loadedthrough this custom applet loader classes. This was notdeemed a wise decision from a software reliability and main-tenance perspective. It is hoped that upcoming new Javastandards will integrate native OpenGL support, and thatultimately this “temporary” snafu will be alleviated.

6. APPLICATION MANAGEMENTThe Application Manager (AM) is responsible for orches-

trating the execution of data analysis applications and in-strument simulations that are submitted via the SNS Por-tal. These various computational jobs are constructed in the

Figure 4: Application Manager

form of data pipelines that include both approved facilityand open community or commercial software packages, eachof which can be executed either interactively or in batchmode. The overall AM design is illustrated in Figure 4,which shows the design layout of the internal functionalblocks and their relation to other elements of the overallsoftware architecture (as defined in Section 2). Only a briefoverview of the many complexities of the AM design will becovered here.

To specify and submit pipelined analysis or simulationjobs, users interact with the Pipeline Construction User In-terface (PCUI) to indicate the sequence of tools, each withtheir own input/output data files, configuration files andcontrol parameters, to describe the desired overall pipelineinvocation. A custom tool parameter GUI is automaticallygenerated for each tool in the pipeline, based on meta-datainformation obtained from the Tool Information Service (TIS)database. An example of such an interface is provided inFigure 5.

The TIS runs as a portal service and provides an XML-based description of all the input parameters for each giventool. This rich meta-data describes the names, command-line invocation syntax, suggested widget elements for theGUI, and tooltip and description information. There caneven be complex relationships described among tool param-eters, such as required versus optional inputs, and even con-dition inclusion or exclusion of sets of parameters basedon other dependent parameter’s settings. The Java appletparses the TIS XML using the Commons-Digester libraryand produces a “parameter tree” that contains all the pa-rameter meta-data, including the organization of parametersinto high-level functional “groups.” This tree is traversed toconstruct the tool GUI, and verify the set of inputs beforesubmission for execution.

The Input Translator (IT), another back-end portal ser-vice, is presented with the full pipeline specification, consist-ing of an ordered list of tools with their associated pipelineconnections and input parameter sets. The IT also obtainstool meta-data information from the TIS, on the actuallocation of these tools on various computational resourcesincluding their invocation syntax and preferred executionenvironment, and resolves specific details for the potential

Figure 5: Automatically Generated Tool GUI

execution targets. The Job Script Generator (JSG) mas-sages this information into a set of executable command-linescripts and identifies the data dependencies among them.The scripts and dependencies are passed on to the Appli-cation Instance Manager (AIM), which analyzes the datadependencies and appropriately submits scripts with satis-fied (“ready to execute”) data dependencies for schedulingand execution.

The Execution Environment Selector (EES) is primarily ascheduler and takes each “ready” script, along with any pre-ferred execution locations and other data locality require-ments, and identifies the best resource for its execution. Inits decision making process, the EES makes queries to theComputing Resource Information Service (CRIS) to obtaindynamic information on current resource loading and behav-ior. The CRIS is implemented as a stand-alone persistentJava database service that collects information from multi-ple execution environments using the Ganglia cluster moni-toring tool [2] and presents it as XML summaries. The CRISprovides its services to the EES via Java RMI.

The Job Manager (JM) then obtains user authenticationinformation for the given job script on the target resource(from queries to the Authorization and Authentication Ser-vice discussed in Section 3) and actually schedules the job.The JM handles the mechanics of negotiating access to theresource location, staging the input data files and execu-tion script, and submitting the job to its target resource’sscheduling queue. Data files for processing are obtained viainteraction with the Data Management Service (see Section4). Output data is stored back into the data managementsystem, in a scratch area of the user’s workspace, for visu-alization or other exploration of the results. Jobs are moni-tored using the Job Information Service (JIS) with a job-idand other job related information that can be used to querythe status of the job.

The target locations for running SNS analysis and simu-lation jobs can be any of the available instrument-dedicatedanalysis computers, internal ORNL institutional clusters, orthe national cyberinfrastructure testbed, the TeraGrid [15].In the current implementation, several execution modes areprovided through back-end Java portal services. First is asimple remote shell-based execution through SSH. Second,an execution interface has been implemented around Slurm[4], which is a job execution tool for cluster managementthat does its own localized scheduling and rudimentary loadbalancing. Mechanisms have also been developed to allow

computation-intensive simulation jobs to be launched fromthe SNS Portal on the TeraGrid using Globus [18]. If jobsare run internally on ORNL machines, then the output datacan be easily captured onto local filesystems, and subse-quently accessed on the portal server machine via standardfile system mounts. However, if jobs are run on externallocations such as TeraGrid, then GridFTP is used to movethe resulting data files back into the SNS data managementsystem (or other local scratch space).

6.1 SNS Remote Display ConduitMany of the SNS analysis and reduction functions, and

most instrument simulation jobs, are basic command-linetools or libraries that do not have extensive interactive GUIs.However, some community standard neutron analysis toolsand other commercial tools provide (or require) the use ofan independent custom GUI panel to operate. Often suchtools cannot be separated out and directly integrated intothe SNS software infrastructure, such as the ISAW viewershave been (see Section 5).

For these more challenging cases, the most convenient anddesirable scenario is that these individual tool GUIs be madeavailable directly to the end-user through the portal. Whenthe user submits a pipeline that contains one of these GUItools, the appropriate user interface should “magically” ap-pear at the right moment, to let the user control the tooland complete the desired analysis step. This requires muchadditional infrastructure to coordinate the staging and in-teractive remote display of the tool GUI to the end-user.

The SNS Remote Display Conduit currently applies a spe-cialized extension of the“WeirdX”system by JCraft, Inc. [5],to allow remote display of X Windows programs through aclient’s web browser. The SNS extension to this Java-basedX server is called “WeeerdX,” which integrates directly intothe SNS software infrastructure to allow a seamless displayof back-end analysis tool GUIs on the user’s client machine.

This solution is superior to other remote desktop clients,such as the various incarnations of VNC, because WeeerdXallows the full flexibility of stand-alone “rootless” windowsand subwindows on the client desktop. Most existing re-mote desktop clients do not support rootless windows, or ifso, do not support rootless popups or subwindows. WeeerdXprovides a more natural“look and feel”versus these encapsu-lated remote desktop clients, improving the seamless usabil-ity by the end-user. WeeerdX also communicates with theback-end using the high-level X event stream, rather thanchanneling raw pixels from the back-end desktop. This pro-vides the potential for a logical and practical compressionof the X event stream to help improve the responsiveness ofthe GUIs.

However, great complexity exists in the WeirdX/WeeerdXsolution, in trying to cover and maintain support for the im-mense collection of potential X server protocols, features andextensions. While the current WeeerdX implementation ismoderately complete and functional, there are many unim-plemented features that must be added before this systemwill be considered fully functional.

7. MESSAGING AND LOGGINGMessaging: Many SNS software components send and re-

ceive notifications for events. One example is the completionof an experiment run in the data acquisition system (DAS)that triggers the archiving of a dataset in the data manage-

ment system. These events need to be distributed in a scal-able and extensible fashion so that future components canproduce and receive messages without changes to the exist-ing systems. The Java Messaging Service (JMS) API pro-vides a solution for the above requirements. It specifies anAPI for common messaging semantics like publish/subscribe(topics) and point-to-point communication (queues). TheJMS API has been implemented by a number of differentproviders.

The SNS messaging system uses JBossMQ, the JMS im-plementation in JBoss 4.x. JBossMQ serves as the messagebroker that client components connect to in order to send orreceive messages. It was a convenient choice since we werealready using JBoss. However, other JMS implementationslike ActiveMQ [6] could have also been used. The SNS soft-ware components take advantage of the different messagingsemantics JMS supports. Most messages are published us-ing topics so that multiple clients can subscribe to events.Some client systems that need to send messages, like theDAS system for instruments, are not written in Java. Fornon-Java clients, we created a simple web service that allowsinjecting messages into the JMS based system. The clientcan use simple HTTP POST requests to submit messages.The message body uses a custom XML format that is usedto generate Java SNS message objects. So far, we are notfaced with non-Java clients that need to receive messages.However, we anticipate this need in the future.

Logging: The SNS software requires logging for a varietyof purposes (debug logs, access and usage logs to meet audit-ing requirements, etc.). The logging mechanism is also usedfor runtime monitoring. The SNS software is distributedacross many machines, which necessitates distributed log-ging. The Java components of the SNS Portal use the Log4jAPI to generate log events. Log4j is a flexible Java API thatis extensible and can be configured in various ways. UsingLog4j, we are able to direct log events to various targets fromlocal log files to databases. Log information that needs tobe retained for auditing—like access logs—are sent to anSQL database. Information can also be sent through JMS,which allows any number of client components to monitorlog events within the portal software. Critical log events insome components are also sent out via email to administra-tors. Basic debug information and log events are written tolocal files as a backup mechanism in case the distributed log-ging fails. Further, all of the above can be configured with-out any changes to the component producing the log events.A central Log Manager, under development, will provideaccess to logs collected in databases and allow browsing, fil-tering, and searching through a web-based interface.

8. PROJECT MANAGEMENTThe SNS portal development effort consists of a core devel-

opment team that is in-house. In addition, there are otherexternal collaborators from universities and non-profit or-ganizations. Since the team is distributed across differentsites, it was important for us to find ways in which differ-ent aspects of the portal can be developed independently ofeach other, and in parallel. This is accomplished using acombination of Subversion, Trac [16], Maven [7] and Con-tinuum [8].

We use a Subversion repository (SVN) in combinationwith a ticket management system, Trac. We have setup sev-eral SVN and Trac combinations for the different aspects of

the portal to help streamline the notification messages andhave individual roadmaps for different Portal components.In order to manage and build our Java projects we decidedto use Maven 2. Maven is a project management tool fromthe Apache Software Foundation based on the concept ofa project object model (POM). Our motivation for usingMaven was the easy sharing of build libraries through inter-nal Maven repositories. Projects can elegantly define depen-dencies between each other through the POM. We createdMaven repositories for the following purposes: snapshots, re-leases, and to deploy a number of third party Java librariesthat were not available through other Maven repositories.Source code inherited from other Java projects were con-verted into Maven projects and added to SVN. For all otherexternal dependencies, we included the binary versions inour internal repositories to make them easily available forour development. For each external dependency, we createda POM that documented the source of the dependency andwhy it was included. Another desirable side effect of usingMaven is profile management. For instance, during the de-velopment cycle, we are required to build and deploy theportal software on different machine configurations. Mavenallows us to configure different profiles with settings specificto our development, test, or production setups. One exam-ple is the use of different authentication tools based on themachine the software is deployed on.

We needed a Continuous Integration (CI) system thatwould allow us to automate basic processes and centralizecommon operations. This can significantly reduce the learn-ing curve for a developer to setup the project environmenton his local development machine. We used Continuum asour CI system due to its closeness to Maven. Each Mavenproject is added to Continuum for automatic builds everyhour. In addition to the basic builds, additional goals havebeen configured that allow developers to deploy snapshotsof artifacts to the internal repository and make them avail-able to the rest of the team. The use of Continuum ensuresthat the deployed libraries correspond to a specific revisionin SVN. The main projects are also configured with goalsthat allow push button deployments to development, test,and production machines. Further, the use of Continuumfor build and deployment also obviates the need for everydeveloper to require direct write access to Maven reposito-ries.

9. CONCLUSIONSNS is a state-of-the-art neutron scattering facility that

will revolutionize the way neutron science is conducted. Sucha facility requires robust software, catering to diverse userrequirements including data access, data analysis, visual-ization and secure remote access. In this article, we havepresented the SNS software architecture and the portal de-velopment. SNS portal is evolving as a one-stop shop for allof the above mentioned user needs. We have discussed thedesign rationale and the Java-based implementation of theinternals of the SNS portal. We have shown how Java hasserved as an excellent candidate for the portal web applica-tion, its front-end interface and the back-end abstractionsto mask disparate implementations. Further, we have dis-cussed in detail, the design and implementation of the SNSsoftware components such as data management, applicationmanagement, remote visualization, messaging, logging, au-thentication and authorization. A sizable portion of the code

in all of the above has been in Java, dealing with the con-joined use of databases and file systems for data manage-ment, high performance distributed computing and access toclusters and national testbeds for application management,different authentication modes and communication betweenthese components. Our experience building the science por-tal, the back-end software components and catering to alarge user base supports our choice of Java-based tools fora diverse set of tasks, ranging from software composition toviz caching to distributed computing.

10. ACKNOWLEDGMENTThis work was supported by the U.S. Department of En-

ergy, Office of Science, under contract No. DE-AC05-00OR2275with UT-Battelle, LLC.

11. REFERENCES[1] Dave - data analysis and visualization environment.

http://www.ncnr.nist.gov/dave/.

[2] Ganglia monitoring system.http://ganglia.sourceforge.net/.

[3] Idl: The data visualization and analysis platform.http://www.ittvis.com/idl/.

[4] Slurm: A highly scalable resource manager.http://www.llnl.gov/linux/slurm/.

[5] Weirdx - pure java x window system server.http://www.jcraft.com/weirdx/.

[6] Apache activemq. http://activemq.apache.org, 2007.

[7] Apache maven. http://maven.apache.org/, 2007.

[8] Apache maven continuum.http://maven.apache.org/continuum/, 2007.

[9] Esg: Earth system grid.http://www.earthsystemgrid.org/, 2007.

[10] Gridsphere portal framework.http://www.gridsphere.org/gridsphere/gridsphere,2007.

[11] Isaw. http://www.pns.anl.gov/computing/isaw/, 2007.

[12] Nexus. http://www.nexus.anl.gov/, 2007.

[13] Ogce. http://www.ogce.org/index.php, 2007.

[14] Spallation neutron source. http://www.sns.gov, 2007.

[15] Teragrid. http://www.teragrid.org, 2007.

[16] Trac. http://trac.edgewall.org/, 2007.

[17] J. Bester, I. Foster, C. Kesselman, J. Tedesco, andS. Tuecke. GASS: A data movement and access servicefor wide area computing systems. In Proceedings of theSixth Workshop on I/O in Parallel and DistributedSystems, 1999.

[18] I. Foster and C. Kesselman. The globus project: Astatus report. In Proceedings of the IPPS/SPDP 98Heterogeneous Computing Workshop, 1998.

[19] M. Gleicher. HSI: Hierarchical storage interface forHPSS. http://www.hpss-collaboration.org/hpss/HSI/.

[20] A. Rajasekar, M. Wan, and R. Moore. MySRB & SRB- components of a data grid. In Proceedings of the 11th International Symposium on High PerformanceDistributed Computing, 2002.

[21] A. Shoshani, A. Sim, and J. Gu. Storage resourcemanagers: Essential components for the grid. InJ. Nabrzyski, J. Schopf, and J. Weglarz, editors, GridResource Management: State of the Art and FutureTrends, 2003.

a java-based science portal for neutron scattering experiments

Documents