10.13 network-basedvisualization usingthedistributed...
TRANSCRIPT
-
10.13
1 . INTRODUCTION
Network-based Visualization Using the Distributed Image SpreadSheet (DISS)
Geophysical digital libraries are now approachingpetabytes in size with data accumulating from the newgeneration of Earth Observing System (EOS) high datarate satellite sensors [1] . Visualization tools that candirectly access data from digital libraries or other networkstorage devices without necessarily trying to create a"DAAC (Distributed Active Archive Center) on a desktop"will facilitate more thorough and timely use of data inremote sensing archives . The growth of large databases,hyperimaging satellites, multisource observations, andcomplex coupled earth-ocean-atom sphere modelscombined with the limited number of scientists thatanalyze this large volume of data, provides a strong needto develop new tools that combine visualization andanalysis to improve and leverage scientific productivity .These tools must be capable of manipulating gigabyte-sized scientific datasets now and terabyte-sized datasets inthe near term .
The DISS extends the Interactive Image SpreadSheet(IISS) software tool for accessing large distributed remotesensing digital libraries [2][3] . The DISS provides highspeed network access using http and ftp methods directly,combined with novel data compression schemes, andnetwork-based data caching for low latency and efficientbandwidth utilization . Both the DISS and IISS extend thepopular business spreadsheet paradigm from scalarquantities to multidimensional image and data arrays . TheDISS software tool uses a multidimensional spreadsheetarrangement of cells and frames for manipulatinggigabytes of multichannel image and model data, andsupports image and grid analysis algorithms . An exampleof 3D assimilated (HDF) data visualization is shown inFigure 1 . A unique DISS capability is to rapidly andsmoothly zoom, roam, animate and execute functions insynchrony for a large set of 2-D or 3-D data cells (e.g . Fig .1) . Large dataset visualization is facilitated usingmultiresolution tiling methods [4] . Other softwaresystems are beginning to adopt the multi-cell organizationof displays [5] that was pioneered in the IISS environment.
The DISS extends the IISS capability for handlingextremely large Earth science hyperdatasets on local filesystems, to take advantage of the new ultrahighperformance Internet network systems . Combined withBulk Data Service (BDS), a software extension to the
K. Palaniappan, A. F . Haslert J . Fraser, and M. ManyinfiDepartment of Computer Engineering and Computer Science
University of Missouri-Columbia, MO 65211t Laboratory for Atmospheres
NASA Goddard Space Flight Center, Greenbelt MD 20771
commonly used NFS (Network File System) protocol, theDISS allows for fast transfers of large sequentially readdata files . SGI developed BDS to allow NFS devices tofully exploit high performance networking technologies,such as Fibre Channel, HIPPI, and ATM. Native NFS,developed in an era when (10BaseT) Ethernet was state-of-the-art, is well suited for small or randomly accessed filesbut not optimized for large data transfers . For sequentialdata transfers, BDS delivers two to four times thebandwidth available with native NFS across channels .
Due to the wide range of performance capabilities andrequirements, many visualization and analysis tools havebeen developed [6] . The DISS and IISS tools enhancehuman perceptual abilities for the visualization andanalysis (visanalysis) of extremely large multidimensionalgeophysical datasets - image, observation or model"hyper-datasets" . We emphasize the fact that bothvisualization and analysis are equally importantcomponents, in the scientific study of complex phenomenausing multiple data sources and large data volumes bysynthesizing the term visanalysis in 1992 . The term hyper-dataset is used to refer to the fact that modern remotesensing datasets have increased resolution in severaldimensions such as temporal, spatial, spectral orradiometric quantization . Visanalysis tools offer a flexibleapproach to the exploratory analysis, understanding andqualitative assimilation of complex datasets .
The DISS uses high performance networks to enablescientists to study and compare the large volumes of dataoften in a near realtime mode. Highly interactive visualbrowsing tools combined with 2-D and 3-D graphicsrendering in each frame of the DISS have beendemonstrated to be highly successful for quicklyinspecting thousands of separate datasets . The DISS hasbeen successfully tested in a geographically distributedheterogeneous environment using high performancenetworks such as the NASA NREN, NSF vBNS, Internet2 .The rest of the paper describes the DISS methods,protocols, performance and tradeoffs for remote dataaccess .
2. DISS ENVIRONMENT FOR REMOTE ACCESS
The DISS Environment is currently being used toanalyze remote sensing multispectral data sets fromNOAA/AVHRR, GOES/VISSR, Meteosat, GMS, FY-2,
17TH CONFERENCE ON IIPS
399
Reprint Seventeenth Int. Conf. on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology, 81st AMS, Jan. 14-19, 2001, Albuquerque, NM, American Meteorological Society, 2001, pp. 399-403.
-
ERS-1/ATSR, Nimbus-7/TOMS, Landsat TM,DMSP/SSMI, TRMM/TMI, TRMM/PR Earth remotesensing satellite instruments [2][3] . Products that havebeen derived from NOAH/AVHRR and Landsat TM data,using the DISS, include color composites, vegetationindices, perspective and stereo views [3]. The products canbe combined, frames selected within a cell, framesanimated interactively and image formulas evaluated . Thespreadsheet-based visual interface (see Figure 1) has beenfound to be extremely intuitive and highly productive sinceit reduces the need for a user to explicitly deal withinput/output based programming . The resulting DISSEnvironment provides the scientist with an effective andpowerful visualization tool for concentrating on algorithmdevelopment and data exploration . The DISS has beenused to successfully prototype and evaluate theperformance and utility of using high speed wide areanetworks such as NREN and vBNS for visualization oflarge remote sensing datasets (Figure 4).
The DISS Environment uses a 3-D arrangement ofelements to accommodate organizing large datasets in thetime or channel dimension . The object oriented view isshown in Figure 2 . Note that in the current version of theDISS the collaborative or remote access component is onlyhandled at the Frame level and there is no Book object(several separate sheets comprise a Book) ; these are newfeatures that are being added . Cells contain a stack offrames so each page or layer of the spreadsheet canmanipulate an independent group of data with the addedflexibility that relationships between spreadsheet cells andlayers can be directly specified . Each frame of the imagespreadsheet contains one or more multidimensional datasets to be visualized as shown in Figure 3 . Visualizationdata sets include raw and processed satellite imagery,graphical (vector) data, surface and terrain models, andthree-dimensional volumes . The DISS supports distributednetwork access to datasets and computation capabilities .
DISS Application
BookSheet (Collaboration)
FrameData
Fig . 2 Hierarchy of spreadsheet objects and terminology .
2.1 Remote and Compressed File Access
DISS uses the common Internet protocols FTP (filetransfer protocol) and HTTP (hypertext-transfer protocol)for remote data access . Virtually any web-accessiblemachine running an FTP or HTTP server can act as a dataserver for DISS since these are widely supported Internetprotocols . Remote data is accessed within DISS using thefamiliar universal resource locator (URL) syntax within aformula or read panel in place ofa file name . For example,
400
AMERICAN METEOROLOGICAL SOCIETY
Read[filename->" ftp ://rsd .gsfc .nasa.gov/pub/Weather/GOES-8/jpg/vis/4km/01atest.jpg"]
is a generic read formula where fi lename is a URL.The Read[] operator installs the data into a frame exactlyas though the data were local, but instead the data isretrieved transparently via FTP from a remote machine .HTTP' can be used identical to FTP as in the example . FTPis currently restricted to retrieving data from serversallowing anonymous FTP . Enabling anonymous accessavoids the inconveniences associated with userauthentication . For instance, if a user were simultaneouslyaccessing data from 10 distinct FTP sites, and each siterequired authentication, the user would have to supply 10username/password pairs . Security can still be tightlymaintained using TCP wrappers to monitor RPC requestsand restricting p o r t m a p services using the/etc/hosts . allow file to control client access .
Remote access to data within the DISS frameworkbenefits the user by enabling convenient sharing of resultsand visualizations for collaboration, and by allowingtransparent access to the latest near-realtime data . Userscan generate a DISS header file in which all datareferences are remote and specified by URLs (FTP orHTTP) . Sharing these results with other users, particularlyusers not local to the data, is easily done by sharing thetext-only DISS header file . When all data references in theheader are remote, then data is exchanged only betweenthe user and the remote sensing archive since the retrievaluses the embedded URLs. Similarly, a DISS header can beconstructed to use the latest near-realtime data providedthe site serving the data has made provisions for providingthe latest data through a single named source . Forinstance, in the Read[] example the file named Olatestjpgwould probably exist as a named pointer to the most recentdata . Thus, any formulas referencing this named pointerwould always use the latest data available at that Web site .
2.2 Data Compression, File or Data Caching, andFormulas
In addition to providing network data access viaHTTP and anonymous FTP, DISS provides automaticdecoding of gzip or Unix compressed file formats [7] . Thedecoding is transparent to the user allowing direct accessto compressed data within any read formula operator.Here a digital orthophoto quarter quadrangle (DOQQ) isread from the Missouri ICREST site :Read band[filename->" http ://davis .geog .missouri .edu/icrest/data/dogq/o3709354nws .bip .gz",header-size->24752, src xdim->6188,src-ydim-> 7563, flip->TRUE]Both gzip and Unix compress formats are useful becausethey provide lossless compression of any data type . Gzipsupport is provided through the zlib library [7] . Bothremote access and compressed file reading can be used in
Reprint Seventeenth Int. Conf. on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology, 81st AMS, Jan. 14-19, 2001, Albuquerque, NM, American Meteorological Society, 2001, pp. 399-403.
-
conjunction with remote access for efficient networktransport . Data compression decreases data size thusdecreasing access time, while maintaining data fidelity .Scalability to extremely large datasets such as the USLandsat mosaic require data reorganization, progressivetransmission, error resilience in addition to compression .
In a typical DISS session, a single file will usually beaccessed many times . For instance, a multispectral datasetsuch as a multiband Landsat product may be accessedseveral times by different operators . To prevent retrievingor decompressing the same file for each frame access, aper-session cache is kept of all remote and compressed fileaccesses . Internally DISS maintains a mapping of URLsand compressed filenames to local files in temporarystorage that may be on a network attached storage deviceas shown in Figure 4 . After a file has been retrieved ordecompressed once, each subsequent access is made fromthe cache . Pseudo code for the caching of remote orcompressed files is shown below :
cache_file( filename ) {if( is-remote(filename) OR
is compressed(filename) )if( in cache(filename) )
return( cached_filenameelse f
if( is remote(filename)retrieve fileput file incache(cached_filename)
if( is compressed(filename) )decompress fileput file in cache
return( cached_filenameelse
return( filename ) )
2.3 Disk Cache vs. Memory Pool
Currently, remote and compressed file reading is supportedfor the following operators : Read[], Read_HDF[],Read_HDFL2[], Read_multiband[], Read__band[],Read MPEG[], Read_VSLCCA[], Volume[] . The cachedfile may reside on a local filesystem or on a networkattached storage device close to the end user for fasternetwork access (Fig . 4) . All of these operators implementthe cached remote and compressed access using themethod described above . Using this single point ofentrance for caching provides an extensible framework foradding other data types that require caching . Shouldanother form of data ingest require caching support,merely changing the above caching function would addthat functionality to all the operators currently supportingcaching . Through these operators any (compressed) filecan be read remotely including raw format, MPEG,VSLCCA, Vis5D, PNM, SGI, JPEG, GIF, and TIFF.
While all remote and compressed accesses are cachedto disk, two operations cache to memory :
Read_VSLCCA[] and Read_MPEG[] .
Whenever aVSLCCA or MPEG movie sequence is read, it is placedinto an in-memory pool (similar to the colormap pool) .Any references to a frame in that sequence actually pointsinto the pool . Any additional references to the movie orvideo file resolves to the frames already in memory . Onlywhen no references to a video sequence remain, is thememory freed and removed from the pool .
When a movie sequence is retrieved remotely orcompressed, it is both cached to disk and cached inmemory . First the sequence file is retrieved ordecompressed and saved locally to disk . Then thesequence is read into memory and it is placed in the moviepool . At some point when there are no longer anyreferences to the sequence, it is removed from the memorypool, but the local disk copy still remains . Any furtherreferences to the sequence will cause it to be read from thelocal disk cache and placed into the memory pool . Thissupports better interactive access to movie sequences thanvideo streaming (frames are discarded after viewing) but atthe expense of additional storage requirement close to user.
3. EXPERIMENTS
The DISS was selected, as a testbed application todemonstrate the potential of the US Next GenerationInternet (NGI), the NASA Research and EducationNetwork (NREN), the National Science Foundation Veryhigh speed Backbone Network Service (vBNS), andInternet 2 . Early NGI prototyped applications exploitingthe capability of a 1000 times faster Internet than the onein place today . The DISS environment was developed toparticipate in these prototyping efforts [8],(http://apps . internet2 .edu/demos98/diss .htm) .
Early demonstrations of the DISS were completed inJune 1997 at the National Center for AtmosphericResearch (NCAR) in Boulder, Colorado, as part of theGlobal Observation Information Network (GOIN)workshop, a US-Japan bilateral demonstration ofinformation networks . The user client workstation was aremote visualization terminal at NCAR to access dataremotely at GSFC, about 1500 miles away, from a highperformance storage system via a high performance WAN.The WAN connectivity included a NASA NRENconnection at OC-3 (155 mbs) rates with connectivity toNCAR via a vBNS connection at San DiegoSupercomputing Center at OC-12 (622 mbs) rates andFiber Distributed Data Interface (FDDI) network interfacesfor the server and client . The slowest link was a DS-3 linethrough ESNet . In September 1997 at NASA AmesResearch Center in Moffett Field, California, DISS wasdemonstrated using Asynchronous Transfer Mode (ATM)end-to-end for the first time . An OC-3 connection usingNREN's network via the Sprint ATM cloud wasestablished between GSFC and Ames [8] Applicationperformance of the DISS dramatically improved by afactor of ten using ATM end-to-end but only a fraction of
17TH CONFERENCE ON ZIPS
401
Reprint Seventeenth Int. Conf. on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology, 81st AMS, Jan. 14-19, 2001, Albuquerque, NM, American Meteorological Society, 2001, pp. 399-403.
-
the total available OC-3 bandwidth was harnessed .Significant increase in bandwidth utilization (64 to 100mbs) was achieved in August 2000 and August 1999 at thelatest NREN workshops and included using SGI'sVizserver for remote framebuffer display, Bulk DataService (efficient NFS) for remote file access,multithreaded image spreadsheet for improved i/operformance, and tuning TCP/UDP window sizes [8] .
For realtime direct broadcast, the data rate for GOES-8 and GOES-9 receivers is 2 GB/hour or 4 Mb/s . TheTerra EOS direct broadcast (X-band, 8.2125 GHz) datarate for MODIS will be 5.9 GB/hour or 13.125 Mb/s. Theprocessed data rate will typically be 16 times higher or 384Mb/s for GOES (4 new floating point 32-bit parameterscalculated for each 8-bit pixel) . Both the raw andprocessed data streams may need to be distributed toseveral sites for processing and visualization and becontinuously available . Jitter and bulk data transfer ratesare the key issues and are relevant only if the availablebandwidth is sufficient . Using network storage devices thedata streams could be buffered for several hours to reducethe bandwidth requirement . The DISS is used to visualizedirect broadcast data in near realtime .
The collaborative component of the DISS for access togeospatial datasets several modes are feasible : (i) the user,the data, or both are remote from the server, (ii) thevisualization will be precomputed ("canned") or observedin real time ("interactive"), (iii) the portions of the totalvisualization workload are partitioned between the clientand server systems . In each mode the primary dataexchange will be a combination of video, distributedgraphics language (DGL), or distributed files . This affectsnetwork QoS requirements (both the total data volumetransferred and the transfer rate required) and client systemvisualization and processing requirements . In the videomode, the high performance server does the renderingprocessing, generating the video display, and sending it tothe client over the network as video . A rate of 5 frames/secfor is needed for interactivity . Using a 1920 x 1035 wide24" monitor display with 48 bits per pixel requires avisualization video bandwidth of 480 Mb/s . Imagecompression reduces bandwidth requirements (i .e . the SGIVizserver) . For mission critical applications such as AirForce weather simulations the acceptable jitter would beless than 5 %. Latency needs to be less than 0.5 secondsfor interactivity and update of user mouse and keyboardinputs . In the DGL or vector mode, the client sends lowerbandwidth geometry or polygon information along withuser interactions to the server, and the server interpretsthese user commands and processes the datasets intogeometry or vector data which are sent over the network tothe client using geometry compression for efficientbandwidth use . Performance evaluation is continuing .
402
AMERICAN METEOROLOGICAL SOCIETY
4. SUMMARY
The utility of the DISS as an interactive scientificvisualization tool has been demonstrated for efficientquality control of large datasets, organizing a large volumeof multidimensional time varying multisource data,understanding interrelationships between complexdatasets, rapid prototyping of algorithms constructed byalgebraic operations, or comparing model data withobserved data . The DISS has been successfully tested forlarge data set visualization over high performancenetworks using FTP and HTTP servers and incorporatinglossless data compression . The DISS has also been usedfor collaborative visualization of realtime data fromseveral geographically dispersed sites . Network-based diskand memory caching significantly improves performanceand interactivity for visualization .
ACKNOWLEDGEMENTS This work was supported inpart by NASA/GSFC Award NAG-5-3900, NASA MTPEAward NAG-5-6968, NAG-5-6283, NAG-13-99014, andthe NSF vBNS Award 9720668 . Andy Germain (NASAGSFC) and Dick DesJardins (NASA Ames) wereinstrumental in the NREN demonstrations . John Harwellassisted with remote access, and Shelley Zhuang addedMPEG and VSLCCA video support .
REFERENCES1 . Dodge, J ., "The Earth Observing System Direct broadcast and
receiving stations", Earth Observation Magazine, Apr 1999 .2. K. Palaniappan, A.F . Hasler, M . Manyin, "Exploratory
analysis of satellite data using the Interactive ImageSpreadsheet (IISS) environment", Ninth Intl. AMS ConfInteractive Information and Processing Systems, AmericanMeteorological Society, 1993, pp. 145-152.
3. Hasler, A . F ., K . Palaniappan, M. Manyin, and J . Dodge, 1994 :A High Performance Interactive Image Spreadsheet (IISS),Computers in Physics, Vol . 8, No . 3, 1994. pp . 325-342
4. Palaniappan, K ., J . Fraser, 2001 : Multiresolution tiling forinteractive viewing of large datasets, 17`" Int. Conf. onInteractive Information and Processing Systems (LIPS),Albuquerque, NM, Amer. Meteor . Soc .
5. Chi, E . H ., J . Riedl, P . Barry, J. Konstan, 1998 : "Principles forinformation visualization spreadsheets", IEEE ComputerGraphics & Applications, pp . 30-38.
6 . Szuszczewicz, E.P . and J.H . Bredekamp, VisualizationTechniques in Space and Atmospheric Sciences, NASA SP-519, Washington, DC ., 1995 .
7. Gzip and zip libraries . ftp ://ftp .cdrom.com/pub/infozip/zlib/zlib.tar .gz, http://artpacks.acid.org/pub/infozip/zlib/
8. "Distributed Image Spreadsheet for Earth and Space ScienceData", NREN HPCC Workshop V Gigabit Networking, Aug . 14-16, 2000, Workshop IV: Bridging the Gap, Aug . 10 -11, 1999, andWorkshop II. Tomorrow's Networking Applications Today, Sept .15-17, 1997, NASA Ames Research, Mountain View, CA .hnp ://www.nren.nasa.gov/eos distribution .html,hM://www .nren . nasa . gov/workshop4. html
Reprint Seventeenth Int. Conf. on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology, 81st AMS, Jan. 14-19, 2001, Albuquerque, NM, American Meteorological Society, 2001, pp. 399-403.
-
diskcache
memorycache
Network Storage
User Workstation
uGOES/Imager
e
Realtime DBData Server
Powerbook
Terra/MODIS
Figure 1 Visualization of DAO GEOS-1 assimilated data using theextended DISS environment with Vis5D capabilities for studying theonset and extent of the 1988 Indian Monsoon . Cell Al and B1 showselected variables using isosurface and trajectory visualization .
Figure 3 Organization of cell data includes a list of frames contained inthe cell (circular doubly-linked list for framestack), pointer to currentframe and each frame containing original and display data information.
Figure 4 Distributed archives of large remote sensing imageryaccessed via the Internet using wireline and wireless computers.Storage nodes may be attached to servers or directly on network .
MobileMultimediaDevia
Realtime DBData Server
User Workstation
Network Storage
ArchivalData Serva
NASA DAAC
AVHRR TOMS Landsat TRMM
17TH CONFERENCE ON LIPS
403
Reprint Seventeenth Int. Conf. on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology, 81st AMS, Jan. 14-19, 2001, Albuquerque, NM, American Meteorological Society, 2001, pp. 399-403.
-
Figure 1 Visualization of DAO GEOS-1 assimilated data using theextended DISS environment with Vis5D capabilities for studying theonset and extent of the 1988 Indian Monsoon. Cell A1 and B1 showselected variables using isosurface and trajectory visualization.
Figure 3 Organization of cell data includes a list of frames contained inthe cell (circular doubly-linked list for framestack), pointer to currentframe and each frame containing original and display data information.
Figure 4 Distributed archives of large remote sensing imageryaccessed via the Internet using wireline and wireless computers.Storage nodes may be attached to servers or directly on network.
Reprint Seventeenth Int. Conf. on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology, 81st AMS, Jan. 14-19, 2001, Albuquerque, NM, American Meteorological Society, 2001, pp. 399-403.
Access the Conferences17th International Conference on Interactive Information and Processing Systems (IIPS) for Meteorlogy, Oceanography, and HydHelp