earth observing data systems in the internet era · 2018-03-31 · highlight earth observing data...

7
H I G H L I G H T Earth Observing Data Systems in the Internet Era ; 8 2 , /I , 8 / , I , ' I Abstract There has been an unprecedented in- crease of capabilities in distributed data systems in the last few years en- abling different science and applica- tions areas. The existence of the Internet and the World Wide Web af- ford users with access to data at di- verse distributed sites that would had been very difficult in the past and only available to specialists. In remote sensing itself, global Earth observing missions and operational satellites produce and will continue to produce large volumes of public domain data. The existence of the Internet and the World Wide Web allow these data to be accessed by a variety of scientists, ap- plications experts and the general public. Yet, the unprecedented large volumes of such missions are present- ing a challenge to wide user access without either higher bandwidths of future Internet systems or without more focused, user-centered data pro- ductions. The latter is best achieved in federated data systems. Both require- ments will be needed for future data systems. Here we explore different functionalities afforded by distributed data systems in Earth observations and associated interoperability op- tions. As an example, we examine op- tions in the Earth Science Information Partners Federation. Results of several tests interoperability options appli- cable to federated systems are pre- sented. Introduction The existence of the Internet and the World Wide Web (WWW) have ushered in an era of wide access to information and data impossible even a few years ago. General-pur- pose search engines are still limited when it comes to providing, in a few steps, very specific, useful, informa- tion to users and requires instead several searches to yield the desired results. It is clear, however; that us- ers can now access data sets and in- formation that before 1995 (when the National Science Foundation's NSFnet went public opening the door for today's Internet) was re- served for specialists at government labs and a small number of aca- demic institutions. Scientists, spe- cialists, graduate and undergradu- ate students, high school students, and even the general public can now access, order, or even download data to their own systems for their own use. Along with the existence of the vast web information con- tained in the WWW, the current Internet is being stretched by pre- cisely this volume of information and usage, mostly of commercial or pri- vate nature. This limits effective ac- cess to large data volumes by the spe- cialists who were benefiting in the past. Scientists, researchers, and ap- plications users need not just access to information and data, but efficient access. If a user requires some data sets for a specific application which are, say, hundreds of megabytes or even gigabytes, general kilobit on- line access rates are inadequate. The user will have to order the data sets in hard media and the advantage of fast, on-line access is lost. These are some of the reasons why the clogging of the conven- tional Internet highways has led to the need for a faster, more sophisti- cated network that is being created much the same way as the original Internet was created, that is through focused funding. Internet2 (12) and the Next Generation Internet (NGI), two related initiatives, will allow transfer of high-priority information at a high rate while passing lower- priority information (for example, conventional e-mail) at lower fre- quencies (Finley, 1998). These ini- tiatives will prove instrumental for distributed Earth observing data sys- tems. Whereas, Internet2 is the uni- versity community's response to the need to return to dedicated band- width for academiclresearch needs, the Next Generation Internet (NGI, 1998) is a government multi-agency R&D initiative. Both I2 and NGI are benefiting from the existence of the very-high- performance Backbone Network Ser- vice (vBNS), a joint project between the NSF and MCI Telecommunica- tions. VBNS is a nationwide network connecting many of the 120 I2 insti- tutions. The present vBNS backbone runs at 622 Mbps (OC12) and most I2 members participating on the vBNS are connected at OC3 rates (155 Mbps), while some are connected at T3 and others up to OC12 (Finley, 1998). It is expected that the exist- ence of vBNS will accelerate and as- sist in the implementation of both I2 and NGI. For example, both I2 and NGI are exploring ways to share in the establishment of GigaPoPs (giga- bit-capacity points of presence); the main links in the future high-band- width Internet systems. In addition to the network chal- lenge, we focus our attention on glo- bal Earth observing systems and the associated Earth science data sets and the challenge they present to both distribution and "mining" of the information they contain. Al- though each science field may not face identical problems as that of Earth science, many of the conclu- sions we will draw apply to any dis- tributed science data system with large data holdings. 540 Mav 1999 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Upload: others

Post on 02-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Earth Observing Data Systems in the Internet Era · 2018-03-31 · HIGHLIGHT Earth Observing Data Systems in the Internet Era ; 8 2, /I , 8 /,I,' I Abstract There has been an unprecedented

H I G H L I G H T

Earth Observing Data Systems in the Internet Era ; 8 2 , / I , 8 / , I , ' I

Abstract There has been an unprecedented in- crease of capabilities in distributed data systems in the last few years en- abling different science and applica- tions areas. The existence of the Internet and the World Wide Web af- ford users with access to data at di- verse distributed sites that would had been very difficult in the past and only available to specialists. In remote sensing itself, global Earth observing missions and operational satellites produce and will continue to produce large volumes of public domain data. The existence of the Internet and the World Wide Web allow these data to be accessed by a variety of scientists, ap- plications experts and the general public. Yet, the unprecedented large volumes of such missions are present- ing a challenge to wide user access without either higher bandwidths of future Internet systems or without more focused, user-centered data pro- ductions. The latter is best achieved in federated data systems. Both require- ments will be needed for future data systems. Here we explore different functionalities afforded by distributed data systems in Earth observations and associated interoperability op- tions. As an example, we examine op- tions in the Earth Science Information Partners Federation. Results of several tests interoperability options appli- cable to federated systems are pre- sented.

Introduction The existence of the Internet and the World Wide Web (WWW) have ushered in an era of wide access to information and data impossible even a few years ago. General-pur- pose search engines are still limited when it comes to providing, in a few steps, very specific, useful, informa- tion to users and requires instead

several searches to yield the desired results. It is clear, however; that us- ers can now access data sets and in- formation that before 1995 (when the National Science Foundation's NSFnet went public opening the door for today's Internet) was re- served for specialists at government labs and a small number of aca- demic institutions. Scientists, spe- cialists, graduate and undergradu- ate students, high school students, and even the general public can now access, order, or even download data to their own systems for their own use. Along with the existence of the vast web information con- tained in the WWW, the current Internet is being stretched by pre- cisely this volume of information and usage, mostly of commercial or pri- vate nature. This limits effective ac- cess to large data volumes by the spe- cialists who were benefiting in the past. Scientists, researchers, and ap- plications users need not just access to information and data, but efficient access. If a user requires some data sets for a specific application which are, say, hundreds of megabytes or even gigabytes, general kilobit on- line access rates are inadequate. The user will have to order the data sets in hard media and the advantage of fast, on-line access is lost.

These are some of the reasons why the clogging of the conven- tional Internet highways has led to the need for a faster, more sophisti- cated network that is being created much the same way as the original Internet was created, that is through focused funding. Internet2 (12) and the Next Generation Internet (NGI), two related initiatives, will allow transfer of high-priority information at a high rate while passing lower- priority information (for example,

conventional e-mail) at lower fre- quencies (Finley, 1998). These ini- tiatives will prove instrumental for distributed Earth observing data sys- tems. Whereas, Internet2 is the uni- versity community's response to the need to return to dedicated band- width for academiclresearch needs, the Next Generation Internet (NGI, 1998) is a government multi-agency R&D initiative.

Both I2 and NGI are benefiting from the existence of the very-high- performance Backbone Network Ser- vice (vBNS), a joint project between the NSF and MCI Telecommunica- tions. VBNS is a nationwide network connecting many of the 120 I2 insti- tutions. The present vBNS backbone runs at 622 Mbps (OC12) and most I2 members participating on the vBNS are connected at OC3 rates (155 Mbps), while some are connected at T3 and others up to OC12 (Finley, 1998). It is expected that the exist- ence of vBNS will accelerate and as- sist in the implementation of both I2 and NGI. For example, both I2 and NGI are exploring ways to share in the establishment of GigaPoPs (giga- bit-capacity points of presence); the main links in the future high-band- width Internet systems.

In addition to the network chal- lenge, we focus our attention on glo- bal Earth observing systems and the associated Earth science data sets and the challenge they present to both distribution and "mining" of the information they contain. Al- though each science field may not face identical problems as that of Earth science, many of the conclu- sions we will draw apply to any dis- tributed science data system with large data holdings.

540 Mav 1 9 9 9 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Page 2: Earth Observing Data Systems in the Internet Era · 2018-03-31 · HIGHLIGHT Earth Observing Data Systems in the Internet Era ; 8 2, /I , 8 /,I,' I Abstract There has been an unprecedented

H I G H L I G H T

Distributed Earth Observing Data Systems-An Example: The ESIP Federation Distributed Earth observing data sys- tems are benefiting from the exist- ence of the Internet. As stated above, though, the very success of distribution of data and relatively easy access parallels the clogging of the Internet highways. Coupled with the fact that remote sensing satel- lites both of the commercial variety and government-sponsored Earth ob- serving systems are producing in- creasingly higher and higher data rates and data volumes at centers, implies that the problem will get more acute. For example, commer- cial hyperspectral remote sensing (HSRS) satellites will be utilizing hundreds of channels with resolu- tion of tens of meters, producing hundreds of gigabytes of data sets for even small regional coverage. Earth observing satellites with glo- bal coverage, such as NASA's Earth Observing System Terra satellite and Landsat 7 (both to be launched in 1999), will be producing data at rates approaching 1 terabyte per day. Even existing "smaller" missions like the joint NASAINASDA Tropi- cal Rainfall Measuring Mission (TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many terabytes. The future era of ad- vanced or high technology remote sensing sensor systems will be pro- ducing a wealth of data rich in in- formation. The challenge is to har- vest a non-negligible proportion of these data holdings for the benefit of science and the society at both the national and global levels.

NASA's Earth System Enterprise (ESE) is building a data system to serve its present and future data holdings. Termed Earth Observing System Data and Information System (EOSDIS), it relies on a central in- formation management system (IMS) distributed to a variety of Distrib-

uted Active Archive Centers (DAACs), which produce, store and distribute data sets according to specific focused science areas (Asrar and Greenstone, 1995). Al- though EOSDIS contains both dis- tributed and centralized aspects, it was designed at a time when a cen- tralized approach was deemed to be more effective. The advent of the Internet has changed the overall situation. As such, in response to the 1995 recommendations of the National Research Council (NRC), NASA has augmented the existing EOSDIS by a federation of informa- tion and data providers, called the Earth Science Information Partners (ESIPs). Specifically, the 1995 NRC study of the U.S. Global Change Re- search Program (USGCRP) and NASA's Mission to Planet Earth (the present ESE), recommended to NASA to augment its current EOSDIS which relies on a core ar- chitecture with a federation of Earth science information partners (BSD, 1995). Specifically, the NRC recom- mended that "the responsibility for product generation and publication and for user services should be transferred to a federation of part- ners selected through a competitive process open to all." NASA re- sponded by funding the ESIP federa- tion (NASA, 1997) consisting of 12 Earth science-based ESIPs (also known as ESIP-2's) and 12 value- added, or extended communities ESIPs (also known as ESIP-3's). Sub- sequently, the current NASA DAACs, which are responsible for producing, archiving, and distribut- ing NASA ESE data products, also joined the federation (as ESIP-1's). In the context of the ESIP federation (herein termed the Federation), the DAACs continue to provide baseline services of low-level data produc- tion, archiving, and distribution. The selected ESIPs are drawn from academia, government, and the pri- vate sector. They are charged with distributing and archiving baseline

data and information (ESIP-1's); cre- ating specialized scientific prod- ucts for the Earth science and global change research communities (ESIP- 2's); and developing innovative, practical applications of Earth sci- ence data for the broader community by producing value-added products (ESIP-3's). Federation members are expected to use information tech- nology that is advanced, scalable, and evolutionary. The description of the individual ESIPs and the overall federation can be found at http:// www.ceosr.gmu,edu/-esipfed.

One may examine how the Internet will facilitate the Federa- tion. Its members, the individual ESIP systems, are expected to serve individual user communities with diverse needs and required data and information products. A centralized approach cannot be designed to serve diverse and ever-changing needs. Only a federated approach that relies on specialized data sys- tems that responds to and is in close contact with their user communities can hope to succeed. Moreover, a distributed federation consisting of at least 33 members (it is expected that the membership of the Federa- tion will only increase with time), serving hundreds, and more likely thousands, of diverse users cannot rely on a specialized, dedicated net- work. Because of their specialized communities' needs, individual ESIPs and clusters of ESIPs need to be designed to address the complex queries emanating from scientific and applications' data exploration, analysis, and data mining tasks. Moreover, due to the distributed na- ture of the Federation and the re- quirement for its members to interoperate, support for data, query and function exchanges is impor- tant. Although the basic Internet can serve some basic data access and query functions, the more ad- vanced data exchange and interoperable functionalities require

CONTINUED ON PAGE 543

PHOTOGRAMMETRIC ENGINEERING 81 REMOTE SENSING M a y 1 9 9 9 541

Page 3: Earth Observing Data Systems in the Internet Era · 2018-03-31 · HIGHLIGHT Earth Observing Data Systems in the Internet Era ; 8 2, /I , 8 /,I,' I Abstract There has been an unprecedented

H I G H L I G H T

CONTINUED FROM PAGE 541

quality of service (40s) and higher bandwidths that will likely exist in both I2 and NGI. This brings up many implementation and performance issues and trade-offs.

The Federation is following a distributed ap- proach within its own boundaries, forming a "federa- tion of federations," wherein clusters of ESIPs and other related partners are coming together for more efficient and rapid responses to diverse user commu- nities' needs. What we stated above will become even more acute as these clusters or "mini-federa- tions" are forming.

Federation lnteroperability Options The interoperability options for Federation partners, as specified by the original NASA cooperative agree- ment notice (CAN) under which the ESIPs were funded, are shown in figure 1. These options include both catalogue search protocols as well as data search and access solutions.

To enable these options, a variety of interoperabil- ity modes, which themselves imply different imple- mentations, are clearly available.

It is clear from the above discussion that the Fed- eration has many options open. At the lowest end, Federation partners could choose minimum interop- erability afforded by a data advertisement or direc- tory services such as GCMD to catalogue services as provided by the Oak Ridge National Laboratory's DAAC MERCURY system and the CEOS Interoper- ability Protocol (CIP). The latter has more extended services as well. If data access and exchanges are needed, the Distributed Oceanographic Data System (DODS) or part of the Seasonal to Interannual ESIP (see below) could be utilized. If full functionality transparency is needed, tight interoperable solutions will need to be found. As clusters form, it is clear that different cluster requirements (driven by their combined user communities) will drive the interop- erability designs. These will also determine whether the existing Internet can serve these needs (likely if only advertisement services and rudimentary cata- logue services are needed) or whether I2 will be re- quired (see below). To a large extent, which options are going to be adopted is dependent on what the Federation and its clusters are attempting to accom- plish. It is likely that more "tight" interoperability options will be pursued by individual clusters of ESIPs which are naturally working together, whereas minimal solutions such as Global Change Master Di- rectory (GCMD) directory and advertisement search options are likely to be adopted by the Federation as a whole.

CONTINUED O N PAGE 545

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

lnteroperability in the WP- Febation (from the CAN)

= Fedad ~ a p l t t c D P t P Conuuittec (FGDC) CIednghwse Anactnityfor dslrtbuted search& ret iwd of digltd gboepahal datafrom multiple sites uang a common esarchvocabulatg

Figure 1. Interoperability in the WP-Federation (from the CAN)

Modes of ESIP Interoperability

Birectrrp@GMD) md r t t o a ~ & P #a to besoms comptiad With& ~ a k r a l oeospalial Data Canmirtse (PGDC) repirsme&a It vould make ESP data much oakax to fPnd

&hd a h a c o d occw mto~onroEay.

* C~llllllOEI f i a t - Thsre couldbe anHTNn, page thatisthe "front" to dl ESIPa Itwould explainthefsdrratronand ~pnds users to vhchESIP to ntat Ths opticm s d b r e @ ~ O mon 6mnm~aioatrmb*eenIfia ESIPI 4th mamtruar offhis common

bea4s i tvadda~mrom i d o h t b s WRWhvel offebnatrm.

s Trvlsparmcg - EachESIP wouldmesh smoothly wtth each other, sothe urn a llilothet ydm i s not awcae d any dsstvlction beWeenESIPs

Figure 2. Spec trum of Degree of Interoperability for ESIPs

M a y 1999 543

Page 4: Earth Observing Data Systems in the Internet Era · 2018-03-31 · HIGHLIGHT Earth Observing Data Systems in the Internet Era ; 8 2, /I , 8 /,I,' I Abstract There has been an unprecedented

H I G H L I G H T

CONTINUED FROMPAGE 543

A Specific Distributed System in the Federation The Seasonal to Interannual Earth Science Information Partner (SIESIP) is one of the 12 ESIP-2 sys- tems. SIESIP is described in detail elsewhere (Kafatos et al., 1998; Kafatos, 1999). Here we concentrate on some interoperability and query aspects for this ESIP that might have general relevance for the Fed- eration.

The SIESIP project focuses on serving the data and information products needs of the seasonal to interannual (S-I) scientists, and as- sociated process studies specialists and interdisciplinary scientists. It can also serve as an appropriate pro- totype for the Federation due to its distributed nature and representa- tive functionality. A distributed ar- chitecture of three sites forms the mini-federation; namely George Ma- son University (GMU), the Center for Ocean, Land, Atmosphere Studies (COLA), and the NASA Goddard DAAC. The SIESIP architecture hides the distributed implementa- tion details from the user and yet supports many functionalities in- cluding online data search, data analysis, and data orders. In addi- tion, SIESIP supports many complex queries for diverse user communi- ties including content-based queries through a multiresolution represen- tation of data using statistical sum- maries of important geophysical pa- rameters (Li et al., 1998). Extended range of operations trigger different types of data transfers among the three sites that use different hard- ware and software resources for their local implementations. This pro- vides for autonomy and enhance- ment of local capabilities, constitut- ing a mini-federation that could serve as a prototype of the general Federation. Queries under SIESIP can range from simple data orders to involving data mining tasks that

may require on-the-fly integration of physically dispersed data.

The relationships between the three main partners (Figure 3) ex- tend from metadata exchanges (thin arrows) and data exchanges (me- dium arrows), to full functional ex- changes (thick arrow). The interoperable operations and current SIESIP consortium capabilities (such as working prototypes at GMU, see Kafatos et al., 1997) are to be en- hanced to serve the specific Earth science S-I community and to pro- vide an innovative information tech- nology query, engine and implemen- tation of a working distributed system.

Images of associated parameters for each plot (either multiple GIF displays or animations) can be sup- plied on-demand via the Internet. Moreover, correlation coefficients, means, standard derivations, and other statistically-derived param- eters derived from the content-based browsing, can form a set of new metadata for the system (Li et al., 1998; Kafatos et al., 1998).

The SIESIP functionality and queries allow online data search, analysis, and order. Searches can be performed based on regular metadata or based on data contents via the WWW. Since SIESIP supports data pyramids of different resolu-

Figure 3 .

SIESIP as a Prototype for the Earth Science Federation

At the heart of the query engine and system analysis is the Grid Analysis and Display System (GrADS) (Doty et al., 1997). A variety of data products are distributed among the members of the mini-fed- eration and accessible via the WWW (Kyle et al., 1998; UDel data). One particular useful data collection is the Climatology Interdisciplinary Data Collection, available in four CD's which can be orderedlrequested from the Goddard DAAC or from GMU (see also http://siesip.gmu. eduldata.htm1).

tions (Li et al., 1998) a specific reso- lution could be selected for brows- ing. With the multiple resolution data, users can start from low resolu- tion images and drill down to higher resolution ones. In this way, users can browse data covering a large spatial and temporal range and then focus on a small interesting range based on the previous browsing re- sult. Because users use data of grow- ing resolution each time they get closer to the target data, the data volume does not increase rapidly.

CONTINUED ON PAGE 546

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING M a y 1999 545

Page 5: Earth Observing Data Systems in the Internet Era · 2018-03-31 · HIGHLIGHT Earth Observing Data Systems in the Internet Era ; 8 2, /I , 8 /,I,' I Abstract There has been an unprecedented

H I G H L I G H T

CONTINUED FROM PAGE 545

Therefore, data mining on a large volume of data can be performed at a reasonable speed. This functional support allows quick queries of po- tentially large data volumes using the current Internet. In order to sup- port these diverse types of func- tions, SIESIP search is designed around three types/phases of queries that focus on catalogue search, analysis and content-based search. and data ordering as follows.

Phase 1: Using the metadata and browse images provided by the SIESIP system, the user browses the data holdings.

Phase 2: The user gets a quick estimate of the type and quality of data found in phase 1. Analytical tools are applied, including statis- tical functions and visualization algorithms available via WWW through SIESIP. The SIESIP inter- face incorporates a spectrum of

statistical data mining algorithms. We have also begun to implement tools for finding positive correla- tions providing realistic, human- aided data mining capability. The use of analysis tools such as GrADS to aid the search is also in- corporated in this phase.

Phase 3: The user has located the data sets of interest and is ready to order. If the data are available through SIESIP, i t will handle the data order; otherwise, an order will be issued to the ap- propriate data provider on behalf of the user, or necessary informa- tion will be forwarded to the user for this task. The three-phase data search and

order functions are an integral part of the SIESIP consortium. Each node (GMU, COLA and GDAAC) performs one or more of these search and or- der functions. The SIESIP consor- tium is, therefore, an IT implemen-

Bridges to the future

tation of the needed functionalities of the distributed system to serve our communities (http:/1 www.siesip.gmu.edu).

The architecture design of the SIESIP mini-federation is to support the queries in all three phases in a modular fashion. Specifically, there exist three types of servers; each serving queries in one phase (See Figure 4). In this architecture dia- gram, the disk associated with a server is located in the same physi- cal location. That is, the communi- cation between the server and the disk associated with it does not go through the Internet. A "Metadata Server" is for phase 1 (or metadata) queries, a "GrADS Server" is for phase 2 (or analysis) queries, and a "Data Order Server" is for phase 3 queries (or data set requests). There is also an experimental "Data Pyra- mid Server" which will be respon- sible for data mining queries. The

r. &lutfons for precise /me#@ inform&tbn

l(#bSWFtmtar sell m$o, e& CXrn-lOeq w te19pbbne +1 ill9 6711 3&% fax tl6B 67al34s

546 M a y 1 9 9 9 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Page 6: Earth Observing Data Systems in the Internet Era · 2018-03-31 · HIGHLIGHT Earth Observing Data Systems in the Internet Era ; 8 2, /I , 8 /,I,' I Abstract There has been an unprecedented

H I G H L I G H T

Data Pyramid stores low-resolution data as well as some pre-computed statistics for fast processing of data mining (content-based) queries. An implementation of the SIESIP archi- tecture can have many distributed servers, as well as user interface drivers, located at different physical locations, while a physical system may host a number of different serv- ers and interface drivers. As such, SIESIP covers a number of different federated services.

WWW continues to grow, it starts to assert its role as an essential, easy-to- use mechanism for delivering infor- mation of different modalities in a geographically distributed system. In addition to its ease of use and wide availability, the Web lends itself to the client-server architectural model.

To make data sets available online for a cluster of ESIPs or a mini-federation, data sets and re- lated metadata should be trans- ferred to the system from the origi-

Metadata Dda Sds

Data Sets

Figure 4. Conceptual Architecture of a SIESIP Site

The system supports all function- alities through the same interface. A GUI applet based on the Java Swing package will appear in a web page. By using the window like GUI with menu lists, folders, and more, users can search metadata, make spatial and temporal selections, and submit analysis or order queries.

Implementation Options and Results In this section we examine some of the technologies available for imple-

Data Pyramid

nal data sources or data producers. For example, metadata ingested at the data product level could be achieved through HTML forms and CGI programs. Data providers could then provide a minimum amount of information regarding the data prod- ucts by filling appropriate Web forms. The system operator would then run another program to insert the information caught by the CGI program into the DBMS. The increas- ing popularity of XML might provide an easv metadata capture protocol

menting an interoperability layer for that wbuld anything the a single ESJP consisting of more than data provider (often a scientist) one site (such as in SIESIP) or a clus- would have to do. To ingest ter of ESIPs, in the overall Federation. metadata, one should consider In doing '0, the growing Power of the whether the exchanges are asyn- WWW becomes most important. As chronous, which are often necessary

to access data stored in nearline tape libraries; or whether the data are online, in which case the metadata will include the Universal Resource Locators and a simple FTP (or HTTP) transfer to the client can be utilized.

Recent developments in informa- tion technology have resulted in a number of distributed object archi- tectures that provide the framework required for building and using cli- entlserver applications that use dis- tributed objects. The framework also supports a large number of servers and applications running concur- rently. Many such frameworks pro- vide natural mechanisms for in- teroperability, for example, the Common Object Request Broker Ar- chitecture (CORBA) and the Remote Method of Invocation (RMI). CORBA is a product of an industry consor- tium called the Object Management Group (OMG). It is a set of specifica- tions for providing interoperability and portability to distributed object- oriented applications. CORBA-com- pliant applications can communi- cate with each other regardless of location, implementation language, underlying operating system and hardware systems. The RMI specifi- cation is a new API that lets one create objects whose methods can be invoked from a different Java Virtual Machine (JVM). The JVM may be running in the same physical ma- chine or a remote server. Thus, RMI basically provides the capability for calling methods on remote objects.

We have selected to examine the above two options for interoperabil- ity tests (although performed for our own ESIP, their applicability is not limited to it and has Federation- wide importance). We have also se- lected to consider more basic tech- niques such as ftp and sockets. This selection was based on the potential for creating low-overhead protocols that may be suitable for a simple baseline class of Federation appli-

CONTINUED ON PAGE 548

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING M a y 1999 547

Page 7: Earth Observing Data Systems in the Internet Era · 2018-03-31 · HIGHLIGHT Earth Observing Data Systems in the Internet Era ; 8 2, /I , 8 /,I,' I Abstract There has been an unprecedented

H I G H L I G H T

CONTINUED FROM PAGE 547

cations. These can also be used to implement more so- phisticated interoperability standards such as 239.50 and its CIP profile (see above). In order to assess the impact of using one technique ver- sus the other, we have experimentally studied the perfor- mance of CORBA and RMI, as well as light-weight, but more primitive solutions such as sockets and ftp. These technologies were tested over 10 Mbps LANs as well as over the Internet. The tested scenarios considered up to 16 clients and two servers. The transferred object sizes ranged from 2KB to 10MB. Table 1 summarizes the testing conditions and the measurements that were collected.

Figures 5 and 6 show some of the characteristics of these different techniques. All of the compared methods seem to have similar behavior up to 256KB messages. Be- yond that point, the overhead associated with CORBA and RMI becomes quite clear. Table 2 summarizes the re- sults and indicates a number of important facts. CORBA seems to be four times slower than RMI, 10 times slower than sockets, and 40 times slower than ftp in a LAN en- vironment when requested objects are significant (2 MB in this experiment). This might be a typical size for a subset of remote sensing [e.g. MODIS) data set. Due to the large overhead over the Internet, the performance gap between these technologies becomes smaller, less than one order of magnitude. However, the performance difference remains high between these techniques. In these experiments CORBA was not tested over the Internet. However, as understood from the LAN experi- ment, the performance of CORBA is much lower than that of RMI. It is possible to infer that CORBA's perfor- mance is perhaps three to four times worse than RMI. This is only an estimate based on the fact that RMI was five times better in the LAN experiment and that the common overhead of the Internet environment is likely to bring the gap closer. FTP seems to be optimized to a system's parameters and therefore was able to perform better than sockets that are not optimized by the operat- ing system developers. Furthermore, managing a lot of sockets has both a performance penalty as well as scalability limitations.

Conclusions The above results indicate that in the new emerging era of I2 and NGI, data transfers of single or a few remote sensing images of tens to hundreds of megabytes (such as a MODIS Terra "granule" or a hyperspectral image) will require dedicated networks and high bandwidths as in I2 and NGI. Even in supposedly high bandwidth sys- tems, Quality of Service (QoS) will be most important where QoS refers to the realized communication band- width between two distributed sites. In today's Internet,

CONTINUED O N PAGE 562

Implementations CORBA, RMI, SOCKET, FTP

Data Object Sizes 2k, 64k, 2M, 10M

No. of ClientsINo. of Servers 1-1612

Networks LAN (10 Mbps) and Internet

Table 1.

System Response Time (LAN) : single I

A

! i a M -

= i= ,I

M 0 e 0 ,-

El P M

B m X -

He Sn

Figure 5 . Response Time Over the LAN

- System Response Time (Internet) :

single user

300 E 250 F 6 200 r t

t R M l

g g 150 SOCKET

0 0 -A- FTP g* 100

B 50 0

2K 64K 256K 2M 10M

File Size

Figure 6. Response Time Over the Internet

Table 2.

548 M a y 1999 PHOTOGRAMMETRIC ENGINEERING REMOTE