a distributed approach to computational earthquake science: opportunities and challenges

12

Click here to load reader

Upload: lisa-grant

Post on 15-Apr-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

Computing in SCienCe & engineering This arTicle has been peer-reviewed. 31

C o m p u t a t i o n a l E a r t h q u a k e S c i e n c e

Advances in understanding earthquakes require the integration of models and multiple distributed data products. Increasingly, data are acquired through large investments, and utilizing their full potential requires a coordinated effort by many users, independent researchers, and groups who are often distributed both geographically and by expertise.

A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

E arthquake science depends greatly on numerous data types, spanning spatial scales from microscopic to global, and timescales of fractions of seconds to

millions of years. Integrating these many com-ponents creates a rich environment for applying cyberinfrastructure to study earthquake physics. The inadequate preparation and response to re-cent major earthquakes in Haiti, Chile, and Japan have shown that the field is ripe for transformation, and formerly isolated groups must work together more effectively. Data providers must better un-derstand how geophysicists consume their data and fuse it with other data sources downstream. Geophysicists must understand how to convey their work to emergency planners and respond-ers. Experts focused on the processes of particu-lar geographic areas must find ways to translate their knowledge to other regions and research teams. All efforts must be focused on identifying and tackling grand challenges that span areas of expertise. Collaboration alone isn’t enough: the field needs a common framework designed to fos-ter the desired connections. This is especially im-perative as datasets and sources grow and as new spaceborne missions and ground-based networks contribute a wealth of new data.

Numerous and growing online seismic, geo-logic, and geodetic data sources from government agencies and other resources around the world

provide an exceptional opportunity to integrate varied data sources to improve our understanding of earthquakes. The data support comprehensive efforts in data mining, analysis, simulation, and forecasting. However, the uncoordinated but im-proving state of current data collections, robust-ness of data repositories, and the lack of formal modeling tools capable of ingesting multiple data types hampers earthquake research activities. A growing number of research groups and com-munities are recognizing the need to integrate heterogeneous data and models. The data used or produced by these groups fall at different parts of the continuum, from raw data to science models

Andrea Donnellan, Jay Parker, Margaret Glasscoe, and Eric De JongJet Propulsion Laboratory, NASAMarlon Pierce and Geoffrey FoxIndiana UniversityDennis McLeodUniversity of Southern CaliforniaJohn RundleUniversity of California, DavisLisa Grant LudwigUniversity of California, Irvine

1521-9615/12/$31.00 © 2012 ieee

CopubliShed by the ieee CS and the aip

CISE-14-5-Don.indd 31 8/14/12 5:29 PM

Page 2: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

32 Computing in SCienCe & engineering

and decision-support products. Such groups in-clude but are not limited to QuakeSim, the Asia Pacific Economic Cooperation (APEC) for Earth-quake Simulation, Earthquake Data Enhanced Cyber-Infrastructure for Disaster Evaluation and Response (E-Decider), the Alaska Satellite Facility (ASF), Computational Infrastructure for Geo-dynamics (CIG), EarthScope, the Southern Califor-nia Earthquake Center (SCEC), the Scripps Orbit

and Permanent Array Center (SOPAC), and the University Navstar Consortium (UNAVCO). For more background information on each group, see the sidebar, “Roles of Various Research Groups.”

Integrating geophysical modeling and earthquake forecasting tools with remote sensing data is useful for science analysis and pre- and post-earthquake risk and/or damage evaluations. Delivered as Open Geo-spatial Consortium (OGC) standards-compliant

Roles of VaRious ReseaRch GRoups

The research groups that we mention in this article add value in different ways. They also focus on different parts

of the end-to-end data flow. Data come in as raw products and processing centers turn these into higher-level data products. Other organizations model the data and/or turn them into standardized output or interpretations.

The Scripps Orbit and Permanent Array Center (SOPAC) is located at the University of California, San Diego; the University Navstar Consortium (UNAVCO) is a university-governed consortium process headquartered in Boulder, Colorado. SOPAC and UNAVCO archive geodetic data and data products, and both have map-based interfaces for selecting data products and viewing products summarized as velocity vectors and single-station time series. SOPAC is a major partner in the GPS Explorer Web portal (http:// geoapp.ucsd.edu), which provides map-based access to GPS networks and stations, together with derived strain maps and extracting time-series parameters such as sea-sonal terms and earthquake-induced position jumps. UNAVCO serves GPS data products from the Plate Bound-ary Observatory and Earthscope, as well as satellite Inter-ferometric Synthetic Aperture Radar (InSAR) raw data and Lidar landscape images. SOPAC and UNAVCO focus on processing and delivering geodetic data products, not integrating with physics-based models.

The Alaska Satellite Facility (ASF) downlinks, archives, and distributes satellite data. Its mission is to promote, facilitate, and participate in the advancement of remote sensing to support national and international Earth science research, field operations, and commercial remote-sensing applications that benefit society. ASF is the designated Distributed Active Archive Center (DAAC) for several imaging radar data products, including data from NASA’s Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR). ASF recognizes the need to provide high-quality data and services in a timely manner, which is a key requirement for groups such as QuakeSim.

NASA’s QuakeSim project focuses on the computational infrastructure for modeling and mining remotely sensed and other earthquake-related data. QuakeSim is unique in

that its Web environment integrates modeling applications with a fault database and with geodetic data products (GPS and InSAR). The observational data products are produced elsewhere (such as SOPAC or the ASF). The fault data are from published studies and catalogs, and the database is integrated with the model construction tools. The user environment assists in interpreting the observed deformation in terms of geophysical processes, such as the elastic crustal response to fault motion. Forward models and inversions are integrated with the geodetic data types, so that a model of deformation produces a synthetic interferogram with the same type of display as a real interferogram. QuakeSim development has resulted in many successes but has also identified a number of key challenges related to data, the computational infrastructure, and infrastructure for modeling, analysis, and visualization.

Data analysis and model outputs are ingested in the Earthquake Data Enhanced Cyber-Infrastructure for Disaster Evaluation and Response (E-Decider). E-Decider is developing a decision-support platform that uses remote sensing data and NASA modeling software to help emergency managers more effectively respond to large earthquake disasters. E-Decider builds on the QuakeSim project to develop tools and services that target disaster planning and emergency response communities.

Other organizations such as APEC Cooperation for Earthquake Simulation (ACES) and Computational Infrastructure for Geodynamics (CIG) focus more on the modeling than data aspects of earthquakes. ACES is a coordinated international effort linking complementary nationally based programs, centers, and research teams focused on earthquake research. ACES aims to develop realistic supercomputer simulation models for the complete earthquake-generation process, thus providing a “virtual laboratory” to probe earthquake behavior. The focus of CIG is on developing computational geophysics applications, ranging from seismic wave propagation to viscoelastic crustal deformation. It supports science gateways that let users run preinstalled software on remote supercomputers. It doesn’t attempt to integrate observational data products, any map-based tool, or Web service interface, because applications are generally designed to run from a command line.

CISE-14-5-Don.indd 32 8/14/12 5:29 PM

Page 3: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

September/oCtober 2012 33

data products through a Web portal/Web ser-vices infrastructure, these capabilities allow easy access and use by scientists and decision makers. Ensuring that the system is readily supportable and extensible in the future provides a power-ful means to study the earthquake cycle and re-spond to earthquakes. A complete system offers a new opportunity to gain an understanding of the entire earthquake cycle, including earthquake nucleation and precursory phenomena. The com-plexity of phenomena and range of scales from microscopic to global involved in the earthquake- generation process make creating such a system and understanding earthquakes a grand challenge.

In looking at what we need to create such a system, we draw on our experiences from inter-actions among QuakeSim, SOPAC, ACES, E- Decider, and ASF, which illustrate how an im-proved computational infrastructure facilitates data and model integration—and also highlights new challenges. In particular, we draw on expe-rience with QuakeSim and E-Decider, which has helped us identify key challenges in integrating models and data from different sources and in developing applications for improved data inges-tion, science analysis, and support to emergency responders. Such challenges include the need for

•more open processes,• greater integration and accountability among

different groups,•managing data and interfaces to overcome het-

erogeneity and bandwidth limitations,• tracking data product provenance,• system robustness, and• user engagement.

The solutions to these challenges are partially technical and partially sociological.

Foundations of an integrated earthquake modeling SystemWe focus here primarily on the interseismic part of the earthquake cycle that is addressed by re-motely sensed crustal deformation data. However, the challenges and opportunities extend to other data types and parts of the earthquake cycle. The primary data product types we use are earth-quake fault data, seismicity, GPS data, and Inter-ferometric Synthetic Aperture Radar (InSAR) (see Figure 1).

Fault data include geometry and slip parameters. The seismicity includes time, magnitude, depth, and possibly the mechanism of earthquakes. GPS provides precise positions of survey-grade

monuments at networks of stations. Daily so-lutions are typically produced that provide time series of the solutions at each station. These time series are also converted to velocity and jump esti-mations. Subdaily results are also now produced routinely. InSAR repeat-pass interferometry can be collected from spaceborne or airborne plat-forms. Interferograms represent the line-of-sight ground-range phase change between a position on the ground and the instrument. If the instru-ment has an exact repeat pass and errors (such as from the atmosphere) are removed, the interfero-gram reflects the component of ground surface motion toward or away from the instrument. Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) is an airborne L-band radar platform.

A robust system for ingesting data and mod-eling earthquakes is modular and extensible, supporting science and disaster response user communities. The goal is to have robust auto-mated tools in place so that when an event such as an earthquake does occur, response can take place in a routine manner and delays are miti-gated. Earthquakes from around the world—many of them large enough to be potentially detectible using remote sensing techniques, but otherwise insignificant—provide an oppor-tunity to establish and test routine automated processes.

Technologies such as cloud computing, work-flow and Web services, and semantic “markup” provide an opportunity to address the outlined

Figure 1. Example data sources include earthquake fault data, seismicity, GPS data, and Interferometric Synthetic Aperture Radar (InSAR) data. These provide complementary information in both time and space for both forward and inverse modeling. GPS and UAVSAR provide observations of crustal deformation. Fault location and geometry, and location and magnitude of earthquakes provide additional constraints on crustal deformation models.

CISE-14-5-Don.indd 33 8/14/12 5:29 PM

Page 4: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

34 Computing in SCienCe & engineering

challenges and require a comprehensive set of ac-tivities. These include

•developing robust bridging services in a service-oriented architecture to integrate data from multiple sources (see Figure 1);

• developing a fundamental framework for model optimization by integrating multiple data types;

• developing cyberinfrastructure within science gateways to handle the optimization frame-work’s computing requirements, including the need to access large datasets;

• ensuring data handling issues of model contribu-tion, provenance, version tracking, commenting, rating, and so on; and

•developing capabilities for using output in downstream applications.

Science and end-user Workflow as a DriverBasic and applied science tasks illustrate diverse needs for coupling tools and data. Carrying out sci-ence problems within a computational environment helps to identify issues and is effective for further developing tools and functionality. There must be an adequate number of easily accessible tools avail-able to make this appealing to scientists and end us-ers, such as emergency responders. End users have a low tolerance for tools that don’t immediately ad-dress their needs and might be less inclined to use them if they don’t work well at the outset. The ideal scenario is to develop tools with friendly users and then further develop documentation and expand the user base as the tools become more functional.

Science use cases come with a set of challenges that can differ greatly compared to end users’ needs. Emergency responders, for example, need a well-developed set of tools that can be used routinely, while science studies often deviate from routine tool usage. Part of the scientific process involves explor-ing data or models in new ways. As a result, keeping up with new tools to satisfy ever-changing scientific approaches is challenging. Scientists usually need toolboxes to develop new approaches rather than standardized workflows, and the infrastructure and personnel needed to develop toolboxes that allow for flexible data analysis is quite extensive.

Scientists don’t have the time or necessarily the desire to develop tools for other users, making the creation of such tools a challenge. It’s important to identify routine tasks that scientists typically carry out during data or model exploration and create tools that minimize the duplication of effort to add efficiency to the scientific process. Creating an infrastructure to broaden user bases is important

for realizing the benefits derived from large in-vestments in data collection from ground-based, airborne, or spaceborne projects or missions.

Use cases grouped under three modes of sci-ence (understanding, forecasting, and response) illustrate different user needs and the potential interaction with computational infrastructure and tool developers.

use case 1: Scientific understandingA scientist identifies regions of active crustal de-formation from GPS and InSAR/UAVSAR data products. GPS products can be in the form of position time series or station velocities. Scien-tists can scan the velocity data plotted in vector form on a map in different reference frames to help them identify where active crustal defor-mation is occurring. They can also plot differ-ent cross sections through ground-range change interferograms to understand details of crustal deformation (see Figure 2). Further, scientists can invert crustal deformation data for fault motions constrained by paleoseismic fault data and then develop simulations based on fault locations and behavior. Finally, scientists can search GPS time series for transient anomalies that indicate previ-ously unknown characteristics of crustal behavior. The possibilities are numerous, and scientists of-ten want to explore the data in new ways. Many steps are routine, however, and scientific users af-filiated with computational tool developers can as-sess which tasks are carried out frequently enough to warrant new tool development.

use case 2: ForecastingScientists identify active faults from multiple data sources such as GPS, UAVSAR, InSAR, paleo-seismic fault data, and seismicity. This is likely to be an outgrowth of Use Case 1’s scientific under-standing and exploration. Once techniques are de-veloped, they carry out pattern analysis to search for anomalous features in GPS time series and seis-micity data. The scientists can conduct simulations of interacting faults and statistical analysis of these interactions. Earthquake probabilities are evalu-ated for short- to decade-time scales. This goal has been considerably advanced in recent years as a result of the SCEC’s Regional Earthquake Likeli-hood Models (RELM) experiment.1,2 In addition, the US Geological Survey intends to bring a short-term forecast capability online in the near future.3 Ultimately, these probabilities are integrated into the Working Group on California Earth-quake Probabilities (WGCEP) Uniform Cali-fornia Earthquake Rupture Forecast (UCERF).4

CISE-14-5-Don.indd 34 8/14/12 5:29 PM

Page 5: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

September/oCtober 2012 35

These probabilities are the basis of risk estima-tion and California’s earthquake insurance rates. Studies in this series have been conducted at ap-proximately four-year intervals since 1988. The analysis techniques must be well understood and well defined or standardized to incorporate the probabilities into UCERF.

use case 3: responseWhen an event occurs, scientists can estimate de-formation from models that use available seismic in-formation. Initially, that information is the event’s location, depth, and magnitude; scientists use this data to make assumptions about the possible mech-anism. Where fault data are available, the scientist can constrain the likely mechanism to known faults. In time, an earthquake mechanism is produced, providing two orthogonal fault geometries.

Of particular interest is the danger posed by af-tershocks. Structures weakened by the main shock can collapse when subjected to strong aftershocks. Many examples of this occurred as a result of the

4 September 2010 M7.1 Canterbury, New Zea-land, earthquake and the 22 February 2011 M6.3 aftershock.5 Scientists can use the calculated de-formation to estimate the envelope of maximum displacement, and hence the most likely region of damage. They can also use this envelope to guide the acquisition of UAVSAR and GPS data for both emergency and science responses. They can then assess possible locations of future aftershocks as the fault models are refined.

It’s possible to define the event’s damage zone as a polygon and format it for ingestion into loss-estimation tools. As new information becomes available, damage and potential aftershock as-sessments can be refined. Further, scientists can use the computational infrastructure to compute deformation gradients for tilt-and-slope change maps. Scientists can also use radar images to carry out optical and radar change detection and edge de-tection. They can then make the products available to emergency responders, but the products must be easily accessible and intuitively interpretable.

Figure 2. Tool to plot line-of-sight deformation (unwrapped interferogram values) along a user-selected profile from a map display of an Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) interferogram. InSAR = interferometric synthetic aperture radar.

CISE-14-5-Don.indd 35 8/14/12 5:29 PM

Page 6: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

36 Computing in SCienCe & engineering

Data Storage and accessNumerous processes result in deformation of the Earth’s surface, and models require an increas-ing number of data types to guide understanding of them. Integrating and modeling the ever- growing and increasingly multisource geodetic GPS and InSAR data volumes is necessary to im-prove crustal deformation models. The data are of many different forms and sizes. Fault data, for example, yield information about fault geome-try, slip rates, and earthquake recurrence. At the other end of the spectrum, interferometric radar data tend to be in large binary image files on the order of 1 Gbyte per image. QuakeSim appli-cations use data from many sources, including fault data, GPS time series and velocities, seis-micity data, seismic moment tensor information, and InSAR images.

Efficiently analyzing, integrating, and modeling geodetic and geologic data requires digital storage of the data, including the fault specifications, and automated access to the data through network ser-vices (see Figure 3). As the data sources, volumes, and regions of interest grow, it’s necessary for ap-plications, not just people, to access the data for remote automated processing. The data are dis-tributed and under the cognizance of a wide array of agencies and institutions. Developing standards through formal and informal collaborations and partnerships is a challenge, but such approaches

are necessary for maximizing the use of solid Earth science data.

Remotely sensed data provide estimates of crustal deformation that are key to improving fault models. The crustal deformation data provide a means of recording and understanding the aseis-mic part of the earthquake cycle in which strain accumulates or is released. GPS data provide long-term estimates of the crustal deformation of a network of California and global sites. The time series of daily changes in position of these sites provide detailed information about temporal crustal changes. Current InSAR data products provide detailed images and spatial distribution of crustal deformation sparsely sampled in time. A spaceborne InSAR mission dedicated to studying surface deformation would provide routine high-resolution interferograms of ground motions, adding a significant increase in the temporal and spatial resolution of InSAR data products. Even so, airborne UAVSAR data products are large and complex enough that it’s best to maximize their utility by developing tools that take into account bandwidth for access and different user groups.

QuakeSim applications use data products, rather than raw data, making QuakeSim reliant on data product suppliers, such as UAVSAR Re-peat Pass Interferometry, InSAR, and GPS posi-tion time series and velocities. This then requires the supplier (such as the ASF Distributed Active

Figure 3. The architecture of the QuakeTables database system consists of data acquisition from external resources, data processing and storage for multiple data types, and the delivery of data and metadata through the Web interface and via Web Services.

!Ontology manager

Web interfaceQuakeSim formatter

PortalSimplex

dislocation GeoFest

Faultdatabase

Radar datarepository

CGSand UCERF2

EQ simulationdataInSARUAVSAR

Metadata processor

Feed aggregator

External data parser

Metadataindex

Resource manager

QuakeTables applicationExternal resources

CISE-14-5-Don.indd 36 8/14/12 5:29 PM

Page 7: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

September/oCtober 2012 37

Archive Center (DAAC) in UAVSAR’s case) and the user (such as QuakeSim) to account for the nec-essary interfaces between the data products and applications. System robustness is important, be-cause system components rely on remote sources. An outage at the data provider results in tools not working unless adequate redundancy and other safeguards are in place. System robustness is par-ticularly important if the system is to be used for earthquake emergency response.

Understanding the origin and processing the data products is important for assessing their quality for ingestion into models. Data prod-ucts often change with time as new processing techniques or new interpretations become avail-able. One key challenge is keeping up with and documenting improved data products from newer processing techniques as they become available. Ideally, there’s a feedback loop from modelers to data product providers that enables modelers to identify issues with the data and request re-processing of data. This separation of labor will have superior results to the common bottleneck that occurs when tectonic modelers must also be data production experts. Both data processing techniques and model development are so com-plicated that they can take careers to develop, and as a result teams of people with varied roles

rather than individuals must contribute to the final analysis.

Data products, even for the same data type, aren’t standardized and are often not adapted for machine interfaces. This requires manual input or often (at best) scraping webpages for information. Although this isn’t the right approach, it’s often the only available approach. Standardized service inter-faces are needed for interfacing data with model-ing and visualization tools. Data formats should be standardized through community use cases. Data product needs for earthquake science are as follows:

•All data products should be coupled with self-describing network service interfaces; a great deal of useful data and metadata about earth-quakes, for example, are bound in human-readable webpages instead of machine-readable formats (such as ontologies).

• Services should be documented, published, and discoverable.

• Services for analyzing lower-level data prod-ucts should also be designed with the same ap-proach. These services generate products that can be consumed downstream.

Data presented in a map view that can be browsed eases data selection (see Figure 4).

Figure 4. The QuakeTables map interface shows data (here, UAVSAR interferograms) geographically, along with metadata describing the interferograms; users can select, explore, and utilize data products of interest from this view.

CISE-14-5-Don.indd 37 8/14/12 5:29 PM

Page 8: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

38 Computing in SCienCe & engineering

Information about data is often encoded into long file names, and often locations over which the data are collected are encoded as station names or flight paths. When the user is unfamiliar with the identification scheme, it’s often difficult to locate data of interest. A map view of the data makes it easier for users to efficiently scan for data over regions of interest. Problems arise when data are collected over different time spans and over other data in the same region, or when multiple data in-terpretations exist. Pop-up lists, menus, or time-slider bars can alleviate some of these issues. The data must be human browseable, but also acces-sible through APIs, which connect data directly to various applications.

computational infrastructureA user-friendly computational infrastructure is necessary for identifying and pulling in data from numerous sources; simplifying or automating data assimilation, mining, and modeling workflow; and providing feeds and interfaces for generalized data users. The scaling of compute power should occur on the back end and be transparent to the user.

Geophysical analysis typically requires a user to do the following either in an automated manner or with user intervention:

• select data in terms of types, time, and space;• choose a subset of the data to the relevant focus

of interest;•move data for mining, modeling, or visualiza-

tion;• analyze data by modeling, inverting, or mining it;• visualize data and results; and• track data and models.

For small datasets or regions of interest, a user can do these steps manually, and in fact such in-vestigations provide excellent examples for devel-oping workflow for larger and more complicated cases. Current data volumes and in particular those for existing or planned InSAR missions mo-tivate the need for an end-to-end architecture in which data can be systematically analyzed, mod-eled, and interpreted. Automation requires inter-faces between the widely distributed datasets, data products, and applications. Without such a system in place, data from large projects and missions will be unutilized or underutilized.

In an end-to-end computational infrastructure, users should be able to evaluate data, develop science models, produce improved earthquake forecasts, and respond to disasters in intuitive map-based interfaces (see Figure 5). Fault models can be constrained and improved not just by geology, but also by feature identification from InSAR/UAVSAR and inversions of both GPS and InSAR crustal deformation data.6,7 Forecast-ing is improved by developing better interacting fault models, pattern analysis, and fusion of both seismicity and crustal deformation data. Intui-tive computational infrastructure can enable new observations by providing tools to conduct simu-lation experiments and produce new informa-tion products for use in a wide variety of fields, ranging from earthquake research to earthquake response. Timely and affordable delivery of infor-mation to users in the form of high-level products is necessary for earthquake forecasting and emer-gency response. It’s also necessary for exploiting crustal deformation to enable new discoveries and uses.

Computing requirements can be minimal, for a few users working on small cases, or large, for either many users (such as after an event) or for large problems (such as inverting interferograms). Initial needs will be modest because early users might invert small time and space problems—such

Figure 5. The QuakeSim fault map browser used by the crustal deformation model tool lets users interactively select fault models from different earthquake catalogs via QuakeTables and use them as input for forward models.

CISE-14-5-Don.indd 38 8/14/12 5:29 PM

Page 9: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

September/oCtober 2012 39

as one or two radar strips and the overlapping GPS stations—with the radar downsampled by as much as 2500:1. With experience, users need models with increasing area, resolution, stations, and time spans. This will require spawning doz-ens of analyses or inversion jobs with different ini-tial conditions. For example, 25 initial conditions, 20 radar strips spread over both time and space, 20,000 interesting data points per strip, and 5,000 iterations—each requiring an evaluation of 100 fault/volcano/aquifer model fault patches, requir-ing a forward model of about 500 operations per pixel per inversion—requires 2.5 × 1,015 flops, which is 1 Gflop for 29 days. The data products shouldn’t or wouldn’t necessarily be local to the computation. Distributed data and services that allow concept exploration and the fitting of mod-els to multisource data are increasingly necessary. Collaboration by groups and careful design would build in efficiencies.

There are numerous practical issues for estab-lishing an effective computational infrastructure. Tools must be intuitive and easily accessible. Stand-alone tools can be public and reside out-side any required login. This mode of operation is often preferred by users because it avoids the need to remember another login and password combination and allows for greater privacy. How-ever, there are also limitations. Chief among these is that project tracking is impossible. The user would be required to maintain projects lo-cally, which is reasonably easy with simple input and output files, but this becomes rapidly com-plicated when project components are coupled to various applications at the back end. For example, a user could set up and run a model, which is then coupled to various output-format and map views. Linking project components is easier if it’s done at the back end within a logged in environment. It is also more efficient in an environment where large datasets are accessed and/or displayed and can facilitate project sharing among users.

Large jobs, which are necessary to model com-plex fault systems or crustal deformation, are currently run on supercomputers that reside at high-performance computing facilities. These re-sources are often oversubscribed and users’ mod-els can spend a long time in a queue before the job runs. Local machines might not be adequate for running large models, however. Investment must be made into more high-performance computers and facilities or into an elastic cloud-computing infrastructure that has high-communication bandwidth between nodes. One such solution is Iterative MapReduce,8–11 which interoperates

between high-performance computers and cloud environments and can be deployed to handle large datasets and model runs.

Large model runs using large datasets pushed to applications only work when a high-bandwidth infrastructure is available. Subsetting the data or having the storage and compute node close by or connected by a high-bandwidth connection can provide greater access to users not connected to a high-bandwidth infrastructure. Pushing around large amounts of data only works if network con-nections are fast, so maximizing the system’s util-ity requires careful consideration about subsetting data or otherwise keeping applications and data in close proximity.

A data-intensive computing infrastructure pro-vides a modeling and visualization environment to a broad geophysical community, supporting multiple data types without the need to down-load large datasets. Access to GPS, InSAR, fault models, and seismicity is just starting to be coor-dinated. Currently, large amounts of data must move to an investigator’s computer, and integra-tion into models is ad hoc. Modeling interacting fault simulations largely occur under local efforts at the research group level, with comparisons oc-curring primarily at infrequent workshops. Web-service-based interfaces allow public, independent verification and comparison of simulators and statistical forecast methods, feeding directly into regional hazard models.

VisualizationEffective visualization is particularly necessary for interpreting complicated data or models. The computational infrastructure must support dis-tributed scientists with visualizing simulation and situational data using both geographic informa-tion system and traditional 3D physical simula-tions, where both are time dependent. Challenges exist both in visualizing complex data and in producing visualizations that are properly con-strained by data. Visualization tools should be flexible so that users can view and compare the observational data and model output in different ways. For example, GPS velocity vectors, when plotted relative to different stations, illuminate different features responsible for the deformation. When GPS vectors are plotted relative to the San Gabriel Mountains, compression to the west in the Ventura basin becomes apparent (see Figure 6). Shear zones on either side of the San Gabriel Mountains are also apparent. Similar plots, but relative to stations in the Mojave Desert, highlight the eastern California shear zone.

CISE-14-5-Don.indd 39 8/14/12 5:29 PM

Page 10: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

40 Computing in SCienCe & engineering

Science visualization “movies” driven by GPS or InSAR data display both transient and secu-lar crustal deformation. Movies combine images, maps, multispectral data, terrain elevation models, and crustal deformation representations to create 3D and stereoscopic time-varying representations of scientific data. Movies let scientists view and compare observational data and models at a vari-ety of temporal and spatial scales and change their viewpoints in space and time.

We can view changes that occur over decades, centuries, millennia, and geologic epochs in a few minutes on screen. Movies can be sped up, slowed down, edited, repeated, and looped to fo-cus on particular events. Individual movies focus on global, regional, and local aspects of observa-tional data and models. We can edit and combine these aspects, beginning with a global context and zooming in to provide regional and local views. We can exaggerate vertical and horizontal scales to enhance small changes. Infrared and radar obser-vations can be mapped into the visible spectrum. We can modify the contrast, intensity, color hue, and saturation to highlight specific data features. Data can be interpolated and extrapolated to cre-ate additional movie frames to track changes.

However, we must ensure that the use of these techniques doesn’t create artifacts that confuse the viewer. Challenges arise when developing proce-dures that use data to drive the automated produc-tion of movies. GPS stations are sparsely located (see Figure 6), requiring interpolation between the sta-tions. GPS time series don’t exist for the same time frame for all stations. Stations have been added to the network over a time period longer than a decade.

This introduces meshing complexity or adds ar-tifacts to the visualization from station outages. GPS time series must be properly interpolated both spatially and temporally to provide the most physically accurate animation. InSAR data are also sparse and are typically for short time frames, but they can further guide accurate mapping of crustal deformation. UAVSAR observations in southern California have identified numerous lo-calized zones of shear that couldn’t be identified with the spatial sampling provided by GPS.

A cquisition, processing, organization, and storage of remotely sensed data represent a significant investment, and the benefits of the data won’t be

realized without additional investment in a com-putational infrastructure. It’s not likely—nor is it desirable—to design, develop, and install a single monolithic computational infrastructure to access, analyze, and model the geophysical and geodetic data. Data are acquired and organized by teams of experts, while other experts analyze data products or develop models or simulations. The various experts are typically distributed geo-graphically, and are often grouped in various cen-ters of expertise providing datasets, data products, models, or other information, for which the flow of data, products, or information can be upstream or downstream from any given component.

Successful modeling and analysis requires con-necting the components together. For example, fault modeling connects observational dat asets to downstream simulation and forecasting tech-niques. Data access, analysis, and modeling centers must develop modular distributed data systems with a common computational frame-work and infrastructure that connects upstream or downstream data, data products, applications, and analysis tools. A modular distributed data system design framework and infrastructure currently doesn’t exist to support solid Earth re-search, but steps are being taken in this direction as groups realize the need to interface their com-ponents with those of others.

Increases in data volumes, data processing al-gorithm complexity, and model sophistication require more compute power, communication bandwidth, and human- and machine-friendly in-terfaces. As large datasets are accessed, it’s neces-sary to either keep the data close to the models or provide high-bandwidth connections between the data sources and the models. Data and prod-ucts should be organized for interactive human

Figure 6. GPS vectors plotted for southern California relative to Station CHIL. CHIL is the four-letter designator for Chilao Flats located in the San Gabriel Mountains. Fault traces are from the Uniform California Earthquake Rupture Forecast, version 2.4 (UCERF-2).4

CISE-14-5-Don.indd 40 8/14/12 5:29 PM

Page 11: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

September/oCtober 2012 41

and machine searches and connected with self-describing network service interfaces. Map views of all data products should include pop-up lists, menus, overlays, time, and spectral slider bars. Services should be discoverable, documented, and published in machine-readable formats. The ideal remotely sensed geodetic data environment should support both science users and emergency/disaster response community users.

acknowledgmentsWe carried out this work at the Jet Propulsion Labo-ratory, California Institute of Technology, Indiana University, University of Southern California, and University of California’s Davis and Irvine campuses under contract with NASA. NASA’s Advanced Infor-mation Technologies and Applied Sciences Programs sponsored the work. We thank UNAVCO, SOPAC, ASF DAAC, and the University of Nevada, Reno, for productive ongoing discussions and collaboration.

references1. J.R. Holliday et al., “A RELM Earthquake Forecast

Based on Pattern Informatics,” Seismological Research

Letters, vol. 78, no. 1, 2007, pp. 87–93.

2. Y.-T. Lee et al., “Results of the Regional Earthquake

Likelihood Models (RELM) Test of Earthquake Fore-

casts in California,” Proc. Nat’l Academy of Sciences

USA, vol. 108, 2011, pp. 16533–16538; doi:10.1073/

pnas.1113481108.

3. T.H. Jordan and L. Jones, “Operational Earthquake

Forecasting: Some Thoughts on Why and How,”

Seismological Research Letters, vol. 81, no. 4, 2010,

pp. 571–574.

4. E.H. Field et al., The Uniform California Earthquake

Rupture Forecast, Version 2 (UCERF-2), USGS Open File

Report 2007-1437, CGS Special Report 203, SCEC

Contribution 1138, 2007 Working Group on California

Earthquake Probabilities, 2008; http://pubs.usgs.gov/

of/2007/1437.

5. H. Iizuka, Y. Sakai, and K. Koketsu, “Strong Ground

Motions and Damage Conditions Associated with

Seismic Stations in the February 2011 Christchurch,

New Zealand Earthquake,” Seismological Research

Letters, vol. 82, no. 6, 2011, pp. 875–881.

6. A. Donnellan, J.W. Parker, and G. Peltzer, “Com-

bined GPS and InSAR Models of Postseismic

Deformation from the Northridge Earthquake,”

Pure and Applied Geophysics, vol. 159, no. 10, 2002,

pp. 2261–2270.

7. M. Wei, D. Sandwell, and B. Smith-Konter, “Opti-

mal Combination of InSAR and GPS for Measuring

Interseismic Crustal Deformation,” J. Advances in

Space Research, vol. 46, no. 2, 2010, pp. 236–249;

doi:10.1016/j.asr.2010.03.013.

8. Service Aggregated Linked Sequential Activities

(SALSA) Group, Twister Iterative MapReduce, 2010;

www.iterativemapreduce.org.

9. J. Ekanayake et al., “Twister: A Runtime for Iterative

MapReduce,” Proc. ACM Int’l Symp. High-Performance

Parallel and Distributed Computing, 2010, ACM;

http://grids.ucs.indiana.edu/ptliupages/publications/

hpdc-camera-ready-submission.pdf.

10. B. Zhang et al., “Applying Twister to Scientific Ap-

plications,” Proc. IEEE 2nd Int’l Conf. Cloud Computing

Technology and Science, IEEE CS, 2010, pp. 23–52;

http://grids.ucs.indiana.edu/ptliupages/publications/

PID1510523.pdf.

11. T. Gunarathne, J. Qiu, and G. Fox, “Iterative Map-

Reduce for Azure Cloud,” Proc. Cloud Computing and

Its Applications, ACM, 2011; http://grids.ucs.indiana.

edu/ptliupages/publications/cca_v8.pdf.

andrea donnellan is a principal research scientist at the Jet Propulsion Laboratory, NASA, and a re-search professor at the University of Southern Cali-fornia. Her research focuses on NASA’s QuakeSim project (she’s the principal investigator) and other earthquake-related projects. Donnellan has a PhD in geophysics from the California Institute of Tech-nology’s Seismological Laboratory. Contact her at [email protected].

Jay parker is a scientific applications software en-gineer at the Jet Propulsion Laboratory, NASA. His research focuses on applying fast and accurate nu-merical models to geophysical remote sensing. Parker has a PhD in electrical engineering from the Univer-sity of Illinois, Urbana-Champaign. Contact him at [email protected].

Margaret Glasscoe is the principal investigator for the E-Decider project at the Jet Propulsion Labora-tory, NASA. Her research interests include disas-ter response, modeling deformation of the Earth’s crust to study postseismic response to large earth-quakes, numerical models of the rheological behav-ior of the lower crust, and simulations of interacting fault systems. Glasscoe has an MS in geology from the University of California, Davis. Contact her at [email protected].

eric de Jong is the chief scientist for the Instrument Software and Science Data Systems Section, and the research director for the Visualization and Image Pro-cessing (VIP) Center, Image Processing Laboratory (IPL), Digital Image Animation Laboratory (DIAL), and Cartographic Analysis Laboratory (CAL) at the Jet Propulsion Laboratory, NASA. His research interests include scientific visualization of the Sun; planetary

CISE-14-5-Don.indd 41 8/14/12 5:29 PM

Page 12: A Distributed Approach to Computational Earthquake Science: Opportunities and Challenges

42 Computing in SCienCe & engineering

surfaces, atmospheres, and magnetospheres; and the evolution and dynamics of stars, galaxies, and plan-etary systems. De Jong has a PhD in interdisciplinary science from the University of California, Santa Bar-bara. Contact him at [email protected].

Marlon pierce is the assistant director for the Science Gateways Group in Research Technologies Applications at Indiana University. His current research and develop-ment work focuses on computational sciences with an emphasis on grid computing and computational Web portals. Pierce has a PhD in physics from Florida State University. Contact him at [email protected].

Geoffrey Fox is the director of the Digital Science Cen-ter and distinguished professor and associate dean for research and graduate studies at the School of Informatics and Computing at Indiana University. His research interests include cyberinfrastructure and e-Science, high-performance computing, and paral-lel and distributed computing and their applications. Fox has a PhD in theoretical physics from Cambridge University. He’s a fellow of both ACM and the Ameri-can Physical Society. Contact him at [email protected].

dennis Mcleod is a computer science professor and director of the Semantic Information Research Laboratory at the University of Southern California.

His research interests include data and knowledge base systems, federated databases, database mod-els and design, ontologies, knowledge discovery, scientific data management, information trust and privacy, and multimedia information management. McLeod has a PhD in computer science and electri-cal engineering from MIT. Contact him at [email protected].

John rundle is a distinguished professor of physics and geology at the University of California, Davis. His research is focused on understanding the dynamics of earthquakes through numerical simulations, pat-tern analysis of complex systems, dynamics of driven nonlinear Earth systems, and adaptation in general complex systems. Rundle has a PhD in geophysics and space physics from the University of California, Los Angeles. He’s a fellow of the American Physical Society and the American Geophysical Union. Contact him at [email protected].

lisa Grant ludwig is an associate professor in pub-lic health at the University of California, Irvine. Her research interests include natural hazards, paleoseis-mology, active faults, the San Andreas fault, south-ern California faults, seismic hazard, environmental health, and geology. Ludwig has PhD in geology with geophysics from the California Institute of Tech-nology. Contact her at [email protected].

The American Institute of Physics (AIP) is a not-for-pro t membership corporation chartered in New York State in 1931 for the purpose of promoting the advancement and diffusion of the knowledge of physics and its application to human welfare. Leading societies in the elds of physics, astronomy, and related sciences are its members.

In order to achieve its purpose, AIP serves physics and related elds of science and technology by serving its member societies, individual scientists, educators, students, R&D leaders, and the general public with programs, services, and publications—information that matters.

The Institute publishes its own scienti c journals as well as those of its member societies; provides abstracting and indexing services; provides online database services; disseminates reliable information on physics to the public; collects and analyzes statistics on the profession and on physics education; encourages and assists in the documentation and study of the history and philosophy of physics; cooperates with other organizations on educational projects at all levels; and collects and analyzes information on federal programs and budgets.

The scientists represented by the Institute through its member societies number more than 134 000. In addition, approximately 6000 students in more than 700 colleges and universities are members of the Institute’s Society of Physics Students, which includes the honor society Sigma Pi Sigma. Industry is represented through the membership of 37 Corporate Associates.

Governing Board: Louis J. Lanzerotti* (chair), Richard Baccante, Barry Barish, Malcolm R. Beasley,G. Fritz Benedict, J. Daniel Bourland,* Terri Braun, Robert Byer, Timothy A. Cohn, Beth Cunningham,*Bruce H. Curran,* Robert Doering, Michael D. Duncan,* H. Frederick Dylla*(ex of�cio), David Ernst,Janet Fender, Judith Flippen-Anderson,* Brian J. Fraser,* Jaime Fucugauchi, A. Jeffrey Giacomin,*Mark Hamilton, John Haynes, Paul L. Kelley, Angela R. Keyser, James T. Kirby Jr, Kate Kirby, RudolfLudeke,* Jim Marshall, Kevin B. Marvel,* Christine McEntee, Catherine O’Riordan, Elizabeth A. Rogan,Charles E. Schmid,* Joseph Serene,* Benjamin B. Snavely* (ex of�cio), David Sokoloff,Scott Sommerfeldt, Gene Sprouse, Gay Stewart, Hervey (Peter) Stockman, Michael Turner.

*Executive Committee member.

Management Committee: H. Frederick Dylla, Executive Director and CEO; Richard Baccante, Treasurer and CFO; Theresa C. Braun, Vice President, Human Resources; John S. Haynes, Vice President, Publishing; Catherine O’Riordan, Vice President, Physics Resources; Benjamin B. Snavely, Secretary.

ww

w.a

ip.o

rg

CISE-14-5-Don.indd 42 8/14/12 5:29 PM