earthquakeclustersover multi-dimensionalspace, visualizationof

25
Earthquake Clusters over Multi-dimensional Space, Visualization of E 2347 E Earthquake Clusters over Multi-dimensional Space, Visualization of DAVID A. YUEN 1 ,WITOLD DZWINEL 2 , YEHUDA BEN-ZION 3 ,BEN KADLEC 4 1 Dept. of Geology and Geophysics, University of Minnesota, Minneapolis, USA 2 Dept. of Computer Science, AGH University of Sci. and Technol., Kraków, Poland 3 Department of Earth Sciences, University of Southern California, Los Angeles, USA 4 Department of Computer Science, University of Colorado, Boulder, USA Article Outline Glossary Definition of the Subject Introduction Earthquakes Clustering Multidimensional Feature Space The Detection and Visualization of Clusters in Multi-Dimensional Feature Space Description of the Data Earthquake Visualization by Using Clustering in Feature Space Remote Problem Solving Environment (PSE) for Analyzing Earthquake Clusters Future Directions Acknowledgments Bibliography Glossary Grid Virtual metacomputer, which uses a network of geo- graphically distributed local networks, computers and computational resources and services. Grid Comput- ing focuses on distributed computing technologies, which are not in the traditional dedicated clusters. Data Grids – represent controlled sharing and man- agement of large amounts of distributed data. Problem solving environment (PSE) A specialized com- puter software for solving one class of problems. They use the language of the respective field and often em- ploy modern graphical user interfaces. The goal is to make the software easy to use for specialists in fields other than computer science. PSEs are available for generic problems like data visualization or large sys- tems of equations and for narrow fields of science or engineering. Global seismographic network (GSN) The goal of the GSN is to deploy permanent seismic recording sta- tions uniformly over the earth’s surface. The GSN stations continuously record seismic data from very broad band seismometers at 20 samples per second, and to provide for high-frequency (40 sps) and strong- motion (1 and 100 sps) sensors where scientifically warranted. It is also the goal of the GSN to provide for real-time access to its data via Internet or satellite. Over 75% of the over 128 GSN stations meet this goal as of 2003. WEB-IS A software tool that allows remote, interactive visualization and analysis of large-scale 3-D earth- quake clusters over the Internet through the interac- tion between client and server. Scientific visualization is branch of computer graphics and user interface design that are dealing with present- ing data to users, by means of patterns and images. The goal of scientific visualization is to improve un- derstanding of the data being presented. Interactive visualization is a branch of graphic visual- ization that studies how humans interact with com- puters to create graphic illustrations of informa- tion and how this process can be made more effi- cient. Remote-visualization – the tools for interac- tive visualization of high-resolution images on remote client machine, rendered and preprocessed on the server.

Upload: others

Post on 14-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2347

E

Earthquake Clusters overMulti-dimensional Space,Visualization ofDAVID A. YUEN1, WITOLD DZWINEL2,YEHUDA BEN-ZION3, BEN KADLEC41 Dept. of Geology and Geophysics,University of Minnesota, Minneapolis, USA

2 Dept. of Computer Science, AGH Universityof Sci. and Technol., Kraków, Poland

3 Department of Earth Sciences,University of Southern California, Los Angeles, USA

4 Department of Computer Science,University of Colorado, Boulder, USA

Article Outline

GlossaryDefinition of the SubjectIntroductionEarthquakes ClusteringMultidimensional Feature SpaceThe Detection and Visualization of Clusters

in Multi-Dimensional Feature SpaceDescription of the DataEarthquake Visualization by Using Clustering

in Feature SpaceRemote Problem Solving Environment (PSE)

for Analyzing Earthquake ClustersFuture DirectionsAcknowledgmentsBibliography

Glossary

Grid Virtual metacomputer, which uses a network of geo-graphically distributed local networks, computers andcomputational resources and services. Grid Comput-ing focuses on distributed computing technologies,

which are not in the traditional dedicated clusters.Data Grids – represent controlled sharing and man-agement of large amounts of distributed data.

Problem solving environment (PSE) A specialized com-puter software for solving one class of problems. Theyuse the language of the respective field and often em-ploy modern graphical user interfaces. The goal is tomake the software easy to use for specialists in fieldsother than computer science. PSEs are available forgeneric problems like data visualization or large sys-tems of equations and for narrow fields of science orengineering.

Global seismographic network (GSN) The goal of theGSN is to deploy permanent seismic recording sta-tions uniformly over the earth’s surface. The GSNstations continuously record seismic data from verybroad band seismometers at 20 samples per second,and to provide for high-frequency (40 sps) and strong-motion (1 and 100 sps) sensors where scientificallywarranted. It is also the goal of the GSN to provide forreal-time access to its data via Internet or satellite. Over75% of the over 128 GSN stations meet this goal as of2003.

WEB-IS A software tool that allows remote, interactivevisualization and analysis of large-scale 3-D earth-quake clusters over the Internet through the interac-tion between client and server.

Scientific visualization is branch of computer graphicsand user interface design that are dealing with present-ing data to users, by means of patterns and images.The goal of scientific visualization is to improve un-derstanding of the data being presented.

Interactive visualization is a branch of graphic visual-ization that studies how humans interact with com-puters to create graphic illustrations of informa-tion and how this process can be made more effi-cient. Remote-visualization – the tools for interac-tive visualization of high-resolution images on remoteclient machine, rendered and preprocessed on theserver.

Page 2: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2348 E Earthquake Clusters over Multi-dimensional Space, Visualization of

OpenGL A standard specification defining a cross-lan-guage cross-platform API for writing applications thatproduce 2D and 3D computer graphics.

Sumatra-Andaman earthquake An undersea earthquakethat occurred at 00:58:53 UTC (07:58:53 local time)December 26, 2004, with an epicenter off the west coastof Sumatra, Indonesia. The earthquake triggered a se-ries of devastating tsunamis along the coasts of mostlandmasses bordering the Indian Ocean, killing largenumbers of people and inundating coastal communi-ties across South and Southeast Asia, including partsof Indonesia, Sri Lanka, India, and Thailand.

Earthquake catalog Data set consisting of earthquakehypocenters, origin times, andmagnitudes. Additionalinformation may include phase and amplitude read-ings, as well as first-motion mechanisms and momenttensors.

Pattern recognition The methods, algorithms and toolsto analyze data based on either statistical informa-tion or on a priori knowledge extracted from the pat-terns. The patterns for classification are groups ofobservations, measurements, objects, defining featurevectors in an appropriate multidimensional featurespace.

Data mining Algorithms, tools, methods and systemsused in extraction of knowledge hidden in a largeamount of data.

Features denoted f i or Fj (i; j – feature indices) – a set ofvariables which carry discriminating and characteriz-ing information about the objects under consideration.The features can represent rawmeasurements (data) f ior can be generated in a non-linear way from the dataFj (features).

Feature space The multidimensional space in which theFk vectors are defined. Data and feature vectors repre-sent vectors in respective spaces.

Feature vector A collection of features ordered in somemeaningful way into multi-dimensional feature vec-tors Fl (Fl where l – feature vector index) that repre-sents the signature of the object to be identified repre-sented by the generated features Fl.

Feature extraction The procedure of mapping sourcefeature space into output feature space of lower di-mensionality, retaining the minimal value of error costfunction.

Multidimensional scaling The nonlinear procedure offeature extraction, which minimizes the value of the“stress” being the function of differences of all the dis-tances between feature vectors in the source space andcorresponding distances in the resulting space of lowerdimensionality.

Data space The multi-dimensional space in which thedata vectors f k exist.

Data vector A collection of features ordered in somemeaningful way into multi-dimensional vectors f k( f k ; k – data vector index) and f k D [mk ; zk ; xk ; tk]where mk is the magnitude and xk, zk, tk – its epicen-tral coordinates, depth and the time of occurrence, re-spectively.

Cluster Isolated set of feature (or data) vectors in data andfeature spaces.

Clustering The computational procedure extracting clus-ters in multidimensional feature spaces.

Agglomerative (hierarchical) clustering algorithm Theclustering algorithm in which at the start the featurevectors represent separate clusters and the larger clus-ters are built-up in a hierarchical way. The procedurerepeats the process of gluing-up the closest clustersup to the stage when a desired number of clusters isachieved.

k-Means clustering Non-hierarchical clustering algo-rithm in which the randomly generated centers ofclusters are improved iteratively.

Multi-resolutional clustering analysis Due to clusteringa hierarchy of clusters can be obtained. The analysisof the results of clustering in various resolution levelsallows for extraction of knowledge hidden in both local(small clusters) and global (large clusters) similarity ofmultidimensional feature vectors.

N-body solver The algorithm exploiting the concept oftime evolution of an ensemble of mutually interactingparticles.

Non-hierarchical clustering algorithm The clusteringalgorithm in which the clusters are searched for byusing global optimization algorithms. The most repre-sentative algorithms of this type is k-means procedure.

Definition of the Subject

Earthquakes have a direct societal relevance because oftheir tremendous impact on human community [59]. Thegenesis of earthquakes is an unsolved problem in the earthsciences, because of the still unknown underlying phys-ical mechanisms. Unlike the weather, which can be pre-dicted for several days in advance by numerically integrat-ing non-linear partial differential equations on massivelyparallel systems, earthquake forecasting remains an elusivegoal, because of the lack of direct observations and the factthat the governing equations are still unknown. Insteadone must employ statistical approaches (e. g., [61,72,82])and data-assimilation techniques (e. g., [6,53,81]). The na-ture of the spatio-temporal evolution of earthquakes has

Page 3: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2349

to be assessed from the observed seismicity and geodeticmeasurements. Problems of this nature can be analyzed byrecognizing non-linear patterns hidden in the vast amountof seemingly unrelated information. With the prolifera-tion of large-scale computations, data mining [77], whichis a time-honored and well-understood process, has comeinto its own for extracting useful patterns from large inco-herent data sets found in diverse fields, such as astronomy,medical imaging, combinatorial chemistry, bio-informat-ics, seismology, remote sensing and stock markets [75].Recent advances in information technology, high perfor-mance computing, and satellite imagery have led to theavailability of extremely large data sets, exceeding Ter-abytes at each turn, that are coming regularly to phys-ical scientists who need to analyze them quickly. Thesedata sets are non-trivial to analyze without the use ofnew computer science algorithms that find solutions witha minimal computing complexity. With the imminent ar-rival of petascale computing by 2011 in USA, we canexpect some breakthrough results from clustering analy-sis. Indeed, clustering has become a widely successful ap-proach for revealing features and patterns in the data-mining process. We describe the method of using clus-tering as a tool for analyzing complex seismic data setsand the visualization techniques necessary for interpret-ing the results. Petascale computing will also spur visual-ization techniques, which are sorely needed to understandthe vast amounts of data compressed in many differentkinds of spaces, with spatial, temporal and other types ofdimensions [78]. Examples of clusters abound in natureinclude stars in galaxies, hubs in airline routes and centersof various human relationships [5]. Clustering comes frommulti-scale, nonlinear interactions due to the rock rheol-ogy and earthquakes.

Introduction

Earthquake clustering is automatically implicated bythe classical Gutenberg–Richter relationship [40], whichspecifies the frequency of earthquakes between somesmall magnitude cutoff and a certain large magnitudearound 8 [39]. This empirical finding with a broad magni-tude range implies that the largest seismic events are sur-rounded by a large number of smaller events. This clus-tering may have both spatial and temporal dependences.One of the goals of earthquake clustering studies is to findthese special points in a high-dimensional space related tothe nature of the dimensional space associated with earth-quake dynamics [24,25]. One major goal of this chapteris to introduce the reader to the notion of searching forclustering points in dimensional spaces higher than the

3D physical space we are used to. This concept is cru-cial to our understanding of the clustering points of earth-quakes in these higher-dimensional spaces, which may en-able progress in forecasting earthquakes. Information inseismicity data sets can be both relevant and irrelevantfrom the point of view of deterministic earthquake dynam-ics. It can be also “entangled” and impossible to be inter-preted with normal human perception. The role of datamining is to have a mathematically rigorous algorithm forextracting relevant information from this deluge of data,and make it understandable. Clustering techniques, whichare commonly used today in many fields, ranging from bi-ology (e. g., [26]) to astrophysics, allows us to produce spe-cially crafted data models that can be employed for pre-dicting the nature of future events. In more complex cases,these special data models can work in concert with formalmathematical and physical paradigms to give us deeperphysical insight.

The concept of clustering has been used for many yearsin pattern recognition [2,50,78]. The clustering can usemore (e. g. [54]) or less mathematically rigorous principles(e. g. [33]). Nowadays clustering and other feature extrac-tion algorithms are recognized as important tools for re-vealing coherent features in the earth sciences [32,65,66,67], bioinformatics [51] and in data mining [37,43,44,57].Depending on the data structures and goals of classifica-tion, different clustering schemes must be applied [36,55].

In this chapter we emphasize the role of clustering inthe understanding of earthquake dynamics and the wayto visualize and interpret the computed results from clus-tering. All the seismic events occurring over a certain re-gion during a given time period can be viewed as a sin-gle cluster of correlated events. The strength of mutualcorrelations between events, such as correlations in spa-tial and time positions along with magnitude, cause thissingle cluster to have very complex internal structure. Thecorrelations – the measures of similarity between events –divide the global cluster into variety of small clusters ofmulti-scale nature, i. e., small clusters may consist of a cas-cade of smaller ones. Coming down the scale we recordclusters of more and more tightly correlated events. Ex-ploring the nature of events belonging to a single cluster,we can extract common features they posses. Havingmoreinformation about events belonging to the same cluster wecan derive hidden dependences between them. Moreover,we can anticipate the type of an unknown event belongingto a certain cluster from the character of the other eventsof this cluster.

In the following sections we describe the idea of clus-tering and the new idea of higher dimensions associatedwith data sets. We also demonstrate the results of cluster-

Page 4: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2350 E Earthquake Clusters over Multi-dimensional Space, Visualization of

ing analysis of both synthetic and real data. Long syntheticdata were derived by using a model for a segmented strike-slip fault zone in a 3D elastic half-space [7]. The real datarepresent short time (5 years interval) seismic activities ofthe Changbaishan volcano (the north-east frontier of theNorth China craton) and the Japanese Archipelago. Lastly,we also highlight the role of visualization of clusters as animportant tool for understanding this type of new data ar-rangement, and we describe the role played by remote vi-sualization environment specially devised for visualizationof earthquake clusters.

Earthquakes Clustering

Statistical Laws as Elementary Building Bricksof Earthquake Models

The earthquake prediction problem is of fundamental im-portance to society and also geosciences. Progress in thisfield is hampered, mainly because many important dy-namic variables – such as stress – are not accessible fordirect observations. Moreover, instrumental observationsof seismicity are possible only for a fraction of a singlelarge earthquake cycle. Overcoming these difficulties willrequire combining analyzes of model and observed data byusing knowledge extraction instruments. The fundamen-tal process of knowledge extraction is finding dependencesbetween data and/or betweenmodel parameters. They canbe revealed as patterns (clusters) in time, spatial and fea-ture (parameter) space domains. The most elementary de-pendences can be expressed in the form of semi-empiricalfunctional laws.

There are a few basic statistical laws which representthe basis for earthquake models development. The fre-quency-size statistics of regular tectonic earthquakes (ex-cluding swarms and deep focus earthquakes) follow theGutenberg–Richter relation [39,80,84]:

logN(M) D a � bM (1)

where N is the number of events with magnitude largerthan M and a, b are constants giving, respectively, theoverall seismicity rate and relative rates of events in differ-ent magnitude ranges. Observed b-values of regional seis-micity typically fall in the range 0.7–1.3.

Aftershock decay rates are usually be described by theOmori–Utsu law [71,79]:

N/t D K(t C c)�p (2)

where N is the cumulative number of events, t is the timeafter the mainshock, and K , c, and p are empirical con-stants. The epidemic-type aftershock-sequences (ETAS)

model combines the Omori–Utsu law with the Guten-berg–Richter frequency-magnitude relation for a history-dependent occurrence rate of a point process in the form(e. g., [61])

(tjHt) D �CX

t i<t

K0 exp[˛(Mi � Mc)](t � ti C c)p

(3)

where ˛ is a constant background rate, Mi is the magni-tude of earthquake at time ti ;Mc is a lower magnitudecut-off, Ht denotes the history, and the productivity factorK0 exp[˛(Mi � Mc)] gives the number of events triggeredby a parent earthquake with magnitude Mi. The ETASmodel is used widely in analysis of seismic data, owing toits built-in clustering associated with the incorporation ofthe Gutenberg–Richter and Omori–Utsu laws. Examplesof recent applications can be found in [45,62,68].

These results can be used to derive additional prop-erties such as average recurrence times (e. g., [4,18,19,20,68,89]). It is usually defined as the number of years be-tween occurrences of an earthquake of a given magni-tude in a particular area. For example, the probability ofa devastating earthquake striking the greater San Fran-cisco Bay Region over the following 25 years (2007–2031)is 0.62 [68]. Corral [18,19,20] proposed the existence ofa universal scaling law for the probability density functionH(�) of recurrence times (or interevent times) � betweenearthquakes in a given region:

H (�) Š � f ( �) : (4)

The function f (x) appears to be similar for many differentseismic regions, which suggests some universal properties.The average rate represent the region specific constant,whose reciprocal is the only relevant characteristic timefor the recurrence times. Molchan [58] showed that un-der general conditions, the only universal distribution ofinter-event times in a stationary point process is exponen-tial. Hainzl et al. [42] and Saichev and Sornette [68] dis-cussed relations between statistics of interevent times, theETASmodel of triggered seismicity, and the Corral [18,19]distribution of Eq. (4).

In the context of earthquake prediction it is impor-tant to analyze earthquake cycles with repeating sequencesof events such as foreshocks, mainshocks and aftershocks(e. g., [9,74,80]). Apart from qualitative tendencies re-flected by statistical laws, the earthquakes exhibit vari-ous types of more subtle spatio-temporal clustering, i. e.,grouping of events of the same type both in time and inspatial coordinates. The recognition of these patterns fol-lowed by the analysis of the reasons of their appearancemay lead to the development of improved prediction algo-rithms.

Page 5: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2351

In the following section we present a closer look ofclustering as a knowledge extraction technique and a pos-sible way of its application to earthquake data analysis.

Basic Concepts of Clustering

Clustering analysis is a mathematical concept whose mainuseful role is to extract themost similar (or dissimilar) sep-arated sets of objects according to a given similarity (ordissimilarity) measure [2]. Clustering is one of the mostfundamental processes generated by nature. For example,people gathering in groups, tribes, demonstrations, par-ties, cities, produce clusters. Similarly, towns and cities areclusters of buildings while galaxies are clusters of stars.The local computer networks and bacterial colonies arealso clusters. The objects forming clusters can be the clus-ters of smaller objects, which in turn, are clusters of evensmaller and smaller building bricks. The complexity ofcluster structure reflects the complexity of the real world.The clusters of various shapes, densities and sizes, withadditional attributes as colors, transparency etc. built uppatterns, which are the fingerprints of all multi-scale pro-cesses and phenomena. The clusters are the primitives ofthe patterns.

The same notion of clustering concerns geograph-ical locations and other properties of earthquakes. InFig. 1a we present a spatial distribution of earthquake

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 1Multiscale character of the earthquake clusters. The epicenters of earthquakes of various depth and magnitude are displayed.The data come from the CNSS Earthquake Catalog (http://quake.geo.berkeley.edu/cnss/maps/cnss-map.html). a the western hemi-sphere, b the US western coast c California and Nevada

epicenters in the western hemisphere of the Earth (datafrom http://quake.geo.berkeley.edu/cnss/maps/cnss-map.html). One can see with the naked eye that their distribu-tion is far from being uniform.We observe both elongatedand oblate structures – the earthquake clusters – separatedat this resolution by large holes of seismically quiescentarea.

Properties of the clusters result from properties of thegenerating processes. The shape and structure of clus-ters are visual representation of information on these pro-cesses. Therefore, detection of clusters and their analysisis the first step for knowledge extraction from this infor-mation. For example, as shown in Fig. 1a, the earthquakeclusters on Earth are located in geologically active regions,mainly, on the edges of colliding tectonic plates. The dis-tribution and shape of the earthquake clusters follow theborders between the plates. In Fig. 1b we show the largeearthquake cluster from Fig. 1a located at the US westerncoast. One can distinguish here many smaller clusters ofdifferent density separated by geologically inactive area.A similar pattern (see Fig. 1c) is observed by zooming-in one of denser clusters from Fig. 1b. This multi-reso-lutional and self-similar system is characteristic for manycritical phenomena� JerkyMotion in Slowly DrivenMag-netic and Earthquake Fault Systems, Physics of [3,9,16,74].The worldwide fault network has a fractal structure (ormultifractal) [22,27,79]. Wavelet-basedmulti-fractal anal-

Page 6: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2352 E Earthquake Clusters over Multi-dimensional Space, Visualization of

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 2Seismic activity of the Changbaishan volcano during 5 years time span from 07.1999 to 05.2004 (the north-east frontier of the NorthChina craton) [47]. Theplates represent the seismic events in 3-D feature space attributedby eruption time (blueaxis), magnitude (redaxis) and distance to the epicenter (green axis) coordinates. The clusters are rendered using the wrap point technique (the Amira vi-sualization package www.amiravis.com). Two different positions of coordinates are shown. The large cluster representing the earth-quake swarm is preceded by the small precursory cluster of seismic activity and quiescent time period

ysis [27] shows clearly several distinct scaling domainsin earthquake catalogs revealing rich self-similar multi-scale structure. However, the spatial structure of earth-quake clusters alone is inadequate to formulate plausiblehypotheses about earthquake dynamics. More informa-tion is required.

As shown in Fig. 1, besides the geographical location,earthquakes have additional features such as the time anddepth of occurrence and the amount of energy released(proportional to 10˛m with ˛ � 1:5 and m the magni-tude). These attributes can be used as additional coordi-nates of, so called, feature space (e. g., [78]). In Fig. 2 wedisplay the earthquake clusters representing the seismicactivity nearby the Changbaishan volcano in an abstract3-D feature space. Apart from geographical location – rep-resented by the distance from the epicenter – other coordi-nates (features) are employed: the time of occurrence andthe magnitude of the earthquake. As shown in Fig. 2, thelarge cluster of seismic activity is preceded by the smallprecursory cluster and low activity region. The larger clus-ter is characterized by the seismic events from broader in-terval of magnitudes and with satellite earthquakes moredistant from the epicenter than in the preceding smallercluster.

The dynamics of the volcanic earthquakes covers onlya period of 5 years. The time is too short to conclude aboutthe long-time earthquake dynamics. To obtain data cover-ing much longer time period we used synthetic data gen-erated by numerical simulations of seismicity on a hetero-geneous fault governed by 3-D elastic dislocation theory,power-law creep and boundary conditions correspondingto the central San Andreas Fault [7,28,29]. In Fig. 3 we

represent seismic activity during 150 years. This periodcontains Mf � 1 � 3 � 104 events (represented in Fig. 3by colored dots) in the magnitude interval [3.3–6.8]. Un-like in the Changbaishan case, the seismic events have onemore feature – the earthquake depth. Thus the featurespace has now four dimensions. In Fig. 3 we display thedata distribution in time-depth-position 3-D space. Thefourth dimension – the magnitude – is displayed in Fig. 3by the size of dot. To make the situation clearer only thelarge earthquakes with magnitudesm > 6 (large dots) andthe smallest ones m < 4 (small dots) are distinguished inFig. 3. As shown in Fig. 3 and in [24], the synthetic seismicevents with magnitudesm < 4 produce stripe-like clustersin the data space. They precede large earthquakes (m > 6)and are separated in time by the regions of mixed type ofevents (i. e., with 4 < m < 5).

Another system of earthquake clusters are shownin Fig. 4 The synthetic data (Mf � 105 events) corre-sponding to the seismic activity during 1500 years weregenerated by the same model [7] for similar geologicaland boundary conditions. Only medium size events with4:5 < m < 6 were taken for clustering. In addition to thelocal strip like clusters of smaller events (m < 4) detectedfor 150-years data, one can observe in Fig. 4a distinct spa-tio-temporal patchwork structure of clusters of mediumsized events (4:5 < m < 6). These clusters follow spatio-temporal changes in strength-stress properties of the faultin the region simulated.

In summary, we can highlight very fundamental prop-erties of earthquakes, multi-resolutional clusters are builtup by the earthquake epicenters. The clustering is a dy-namical process involving many spatio-temporal scales.

Page 7: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2353

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 3The plot reconstructing seismic activity during 150 years from synthetic data [7] (horizontal distance – X, depth – z; visualized byusing the Amira visualization package [1]). Large events (with magnitude m > 6) are shown as distinctly larger dots on the back-ground of the lowest magnitude events (m < 4). There are visualized patches of lowmagnitude events preceding larger events [24].The separate clusters are marked in colors

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 4The plot reconstructing seismic activity during 1500 years from synthetic data [7]. The largest clusters obtained for events withmagnitudes 4:5 < m < 6. Large events (m > 6) are shown as distinctly larger plates. The separate clusters aremarked in colors

The dynamic nature of earthquake clusters in a very longtime horizon is obvious because everyone can expect thattectonic plates will change dynamically the geo-mechan-ical properties of the Earth crust. In a long time pe-riod covering thousands years, the patterns from Fig. 1will evolve following the changes in the fault network.More mysterious is the character of earthquake dynam-ics in spatio-temporal scales allowing for making realis-tic predictions. We show that in the medium-time periodlasting more than a hundred of years the seismic events

may produce periodic system of clusters in approximatelyequal time intervals with increasing and decreasing seis-mic activity. The large earthquakes, preceded by the qui-escent time periods, appear. The short-time dynamics re-veal additionally, that the earthquake swarms are signaledby the smaller precursory cluster of seismic activity. Theearthquake attributes such as the magnitude, and the epi-center depth, allow for better interpretation of emergingclusters and exploration of hypotheses space. Therefore,for studying various aspects of earthquake dynamics, in-

Page 8: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2354 E Earthquake Clusters over Multi-dimensional Space, Visualization of

cluding their prediction, we have to analyze the clusterstructures in multi-dimensional feature space, to be surethat none of important information will be lost or ne-glected.

Multidimensional Feature Space

In Fig. 1 every point i representing one out of Mf earth-quakes has two dimensions – the geographical coordinatesx i D [x1; x2] of the epicenter. The point can be treated as2-D vector f i in the feature space where f i D x i . Assum-ing additional coordinates, at the highest level of resolu-tion, a single seismic event i can be represented as a five-dimensional data vector f i D [mi ; zi ; x i ; ti ] where mi isthe magnitude and x i ; zi ; ti – its epicentral coordinates,depth and the time of occurrence, respectively. The spa-tio-temporal clusters can be extracted by 3-D visualizationsimilar as those of Figs. 3, 4 distinguishing extra dimensionby the size of dots and colors. Only clusters in the threespatially visualized dimensions can be extracted, while theother attributes associated with the earthquake character-istics are used for discriminating among the different typesof clusters.

As shown in Figs. 3, 4 and in [24], at the lowest reso-lution level we can analyze the data locally by looking forclusters with similar events. However, considering a sin-gle event on a given area as a feature vector [78] can-not be a good approach from a generalization point ofview. The number of events is usually large. There aremany noisy background events, which destroy the relevantclusters or produce artificial ones. Moreover, the cluster-ing of raw data neglects the important statistical informa-tion, which concerns the entire inspected area. An alter-native approach exists in which the entire seismic area canbe described as a multidimensional feature vector evolv-ing in time. In the following these features will repre-sent descriptors ak (seismicity parameters) correspond-ing to different statistical properties of all the events mea-sured in a given time interval. The number of descrip-tors N defines the dimensionality of the feature vectorF i D [a1; a2; : : : ; aN ]; i D 1; 2; : : : ;M. The vector repre-sents not a single seismic event but it corresponds to seis-mic situation on the whole controlled area in the subse-quent time interval indexed by i. The number of featurevectorsM is equal to the number of time intervals in whichthe descriptors are computed. The index i is a discreteequivalent of time. We expect that the features vectorsrepresenting different moments of time also have the ten-dency to produce clusters in the abstract N-dimensionalfeature space. Monitoring changes of these time-series inabstract N-dimensional space may be used as a proxy for

the evolution of stress and a large earthquake cycle ona heterogeneous fault [9].

To explain this approach better, let us assume that wehave to analyze the client behaviors in a hypermarket. Wecan watch every client separately assuming that it can bedefined as a feature vector consisting of only two coordi-nates: the time he entered the shop, money spent. Then wecan try to find clusters emerging with time during a shop-ping day. This cannot be easy due to both a large num-ber of feature vectors (clients) producing statistical noiseand lack of correlations between them. Another approachconsists in treating as a feature vector not a single clientbut every subsequent time interval ti D i � T(i D0; 1; 2; : : : ;MF ; ti < te ; MF D (te � tb)/T and tb – be-ginning of the working day and te – closing time). Let thecoordinates of the subsequent feature vector define the fol-lowing descriptors averaged in t: the number of peopleinside the shop (crowding), the flow, items bought, moneyspent per person. We note that now the number of featurevectors will be substantially smaller than in the previousapproach but the dimensionality of feature space is larger.Let us assume that as a result of clustering we extracttwo distinct clusters. The first one consists of feature vec-tors (time intervals) from between 10.00–11.00 and 13.00–14.00. The cluster is characterized by very small valuesof the first three descriptors (crowding, flow, numbers ofitems sold) and relatively large expenses. The second clus-ter consists of time intervals from between 8.00–9.00 and16.00–18.00 with all descriptors large. We could concludethat the first cluster consists of time intervals from shop-ping hours that are the favorite for wealthy retired peo-ple from the rich village in the neighborhood, while thesecond cluster is associated with the rush in shopping justbefore before the beginning and after the end of workinghours.

In the same way, the clusters of feature vectors (timeintervals) consisting of seismicity parameters should re-flect the similarity between seismic activities in varioustime intervals. As we show before the large seismic eventsare preceded by precursory events, reflected by an abnor-mal seismic activity in the whole area. We suppose thatthese moments of time are similar in the context of theset of seismicity parameters selected. Thus the feature vec-tors corresponding to precursory events should belong tothe same cluster. The idea of predictive system based onclustering consists in detecting the clusters of former fore-shocks and signal if the current feature vector – which rep-resents current seismic situation over the area – is or is notthe member of this clusters.

The seismicity parameters are computed as time andspace averages in a given time and space intervals within

Page 9: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2355

EarthquakeClusters overMulti-dimensional Space, Visualizationof, Table 1Definition of seismicity parameters

NS Degree of spatial non-randomness at short distances. Thedifferences between distributions of event distances anddistances between randomly distributed points.

NL Degree of spatial non-randomness at short distances.CD Spatial correlation dimension calculated on the basis of

correlation integrals and on interevent distances.SR Degree of spatial repetitiveness represents the tendency

of events with similar magnitudes to have nearly thesame locations of hypocenters.

AZ Average depth of the earthquake occurrence.TI Inverse of seismicity rate – time interval in which a given

(constant) number of events occurs.MR Ratio of the numbers of events falling into two different

magnitude rangesD Mf (m � M0)/Mf (m < M0).

EarthquakeClusters overMulti-dimensional Space, Visualizationof, Figure 5The exemplary set of seismicity parameters {M,NS,NL, CD, SR, AZ,TI, MR} in time (i – subsequent number of the feature vector) fora file from the 1500-years synthetic data catalog (from [25]). Thegreen and red strips show the time moments belonging to thetwo different clusters. The green cluster corresponds to the timeintervals of lower while the red cluster of higher seismic activi-ties. The time series represent aboutMF D 103 feature vectors Fi

a sliding time window with a length T and time step dt.The values of ak represents one of the following seismic-ity parameters: NS, NL, CD, SR, AZ, TI, MR. The valueof dt was assumed to be equal to the average time differ-ence between two recorded consecutive events while Tis equal to about 1/10 of the average time distance between

two successive large events (m > 6 or m > 5). By increas-ing the values of dt andT one can obtain smoother timeseries due to better statistics. On the other hand, poorerprediction characteristics can be expected then. We definethe seismicity parameters as shown in Table 1 [28,29].

The seismicity parameters produces seven time seriesand create the abstract 7-dimensional feature space of timeevents Fi = (NSi, NLi, CDi, SRi, AZi, TIi,MRi ) where i arediscretized values of time t D tb C iT . In Fig. 5 we dis-play an example set of seismicity parameters (with averagemagnitudeM) for synthetic data [25]. The precise locationof the clusters and the visualization of the clustering resultsare significant challenges in clustering over multi-dimen-sional space. In the following section we present briefly thebasics of clustering and algorithms needed in this venture.

The Detection and Visualization of Clustersin Multi-Dimensional Feature Space

Our main challenge is to devise a clustering scheme whichcan divide the M feature vectors x i i D 1; 2; : : :M into kseparate groups (clusters). More formally, assuming thatX = fx igiD1;:::;M and x 2 RN ; x i D fxi1; xi2; : : : ; xiNg wedefine as an k-clustering of X, i. e., the partition of X into kclusters C1; : : : ;Ck provided three conditions are met:

� Ci ¤ 0; i D 1; : : : ; k – the clusters are non empty sets,� [iD1;:::;mCi D X – the sum of elements inside clusters

is equal to the total number of feature vectors,� Ci \ Cj D 0; i ¤ j; j D 1; : : : ; k – each feature vector

belongs to only one cluster.

The computational problem with clustering is that thenumber of possible clustering ofM vectors into k groups isgiven by the Stirling numbers (very large numbers) of thesecond kind:

S (M; k) D1k!

kX

iD1

(�1)k�1 ki

!

� iM : (5)

Some values of S(N; k) are: S(15; 3) � 2� 106; S(20; 4) �45 � 109; S(25; 8) � 7 � 1017; S(100; 5) � 2 � 1068.Knowing that the value of N in typical clustering prob-lems can be 102 to 109 and more we see that the clusteringproblem is intrinsically hard and exhaustive search – look-ing through all possible clusterings – cannot be consid-ered. The special clustering schemes based on the prox-imity measures between feature vectors have to be ex-ploited. The basic steps must be followed in order todevelop a clustering task are the following:

1. Feature selection – Features must be properly selectedto encode as much information as possible. Parsimony

Page 10: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2356 E Earthquake Clusters over Multi-dimensional Space, Visualization of

and minimum redundancy among the features is a ma-jor goal.

2. Proximity measure – This is the measure how “similar”(or “dissimilar”) two features vectors are.

3. Clustering criterion, which depends on the interpreta-tion of the term “sensible”, depending on the type ofclusters expected in the data set e. g., oblate, elongated,“bridged”, circular etc.

4. Clustering algorithms. Choose a specific algorithmicscheme that unravels the clustering structure of the dataset.

5. Validation and interpretation of results are the final pro-cesses of clustering.

There are two principal types of clustering algorithms:non-hierarchical and agglomerative schemes [2,50,78].

Clustering Techniques

The non-hierarchical clustering algorithms are usedmainly for extracting compact clusters by using globalknowledge about the data structure. The well known k-means based schemes [78], consist in finding the globalminimum of the following goal function:

J(w; z) DX

j

X

i2C j

ˇ̌xi � z j

ˇ̌2; (6)

where: zj is the position of the center of mass of the clus-ter j, while xi are the feature vectors closest to zj. Tofind a global minimum of function J(), one repeats manytimes the clustering procedures for different initial con-ditions [48]. Each new initial configuration is constructedin a special way from the previous results by using themethods from [48,87]. The cluster structure with the low-est J(w; z) minimum is selected.

Agglomerative clustering schemes consist in the subse-quent merging of smaller clusters into the larger clusters,basing on proximity and clustering criteria. Depending onthe definition of these criteria, there exist many agglom-erative schemes such as: average link, complete link, cen-troid, median, minimum variance and nearest neighboralgorithm. The hierarchical schemes are very fast for ex-tracting localized clusters with non-spherical shapes. Theproper choice of proximity and clustering criteria dependon many aspects such as dimensionality of data. For ex-ample, a smart clustering criterion based on linked-listscheme for finding neighbors used for molecules cluster-ing is completely worthless for clustering N-dimensionaldata for which it has extremely high computational com-plexity. All of agglomerative algorithms suffer from theproblem of not having properly defined control param-eters, which can be matched for the data of interest and

hence can be regarded as invariants for other similar datastructures.

Majority of the classical clustering algorithms requireknowledge on the number of clusters. However this num-ber is usually unknown a priori. Furthermore, these meth-ods do not perform well in the presence of heavy noise oroutliers. Recently, new methods have been proposed thatcan: deal with noisy data, discover non-spherical clustersand allow for automatic assessment of number of clus-ters. Some important examples are the Chameleon [55],DBSCAN [70] and CURE [38] algorithms. Unfortunately,these methods are suited only for low dimensional dataand are rather inefficient limiting their use for data miningof large-scale sets. For clustering of large data sets of mul-tidimensional data other approaches are in great demand.In the innovative work by Frey and Dueck [33] the authorsuse the concept of “affinity propagation,” which takes asinput measures of similarity between pairs of data points.Real-valued messages are exchanged between data pointsuntil a high-quality set of exemplars and correspondingclusters gradually emerges. Affinity propagation promisesto find clusters withmuch lower error than other methods,and it can do this in less than one-hundredth the amountof time.

Clustering schemes do not produce univocal results.For low dimensional 2-3-D spaces human eye can decidewhether the clustering result is optimal or not. However,it becomes hopeless for higher dimensions. There existmany techniques for visualization multidimensional clus-ters. One of them is the multi-dimensional scaling (MDS)(see overview of mapping techniques in [73]) – the mostpowerful non-linear mapping technique. This method al-lows for visualization of the multidimensional data in 2-Dor 3-D and for interactive extraction of clusters.

Multidimensional Scaling

Multi-dimensional scaling or MDS is mathematicallya non-linear transformation of N-dimensional dataonto n-dimensional space, where n � N [23,73,78]. TheMDS algorithm bases on the “stress function” criterion.The goal is to maintain all the distances between pointsRi 2 ! � <

N in the Euclidean 3-D (or 2-D) space witha minimum error. The “stress function” can be written asfollows:

E�!;! 0

DX

j<i

sw �mii; j � (si; j � s0i; j)

miD min

where : s0i; j Dy i � y j

��y i � y j

�; i; j D 1; : : : ;M ;

(7)

Page 11: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2357

and Di; j – is a squared distance between points Ri,R j 2 ! � <

N and ri, r j 2 ! 0 � E3 – coordinates of therespective points in 3-D Euclidean space. The values of wandmi are the parameters of transformation.

The result of mapping depends on the quality of theminimum obtained for the “stress function”. Usually thedimensionality of the “stress function” domain is very highand is equal to N � M, i. e., thousands, in the smallest andbillions in large problems. For more thanM D 103 featurevectors, the high dimensionality of source space and datacomplexity may cause the resulting low dimensional pat-terns to be completely illegible. The application of stan-dard numerical algorithms for finding global minimumof this multimodal, non-linear and complex criterion be-comes hopeless. Therefore, for visualization of M > 103

multidimensional data samples, more reliable minimiza-tion techniques extracting global minimum of the “stressfunction” are required. In [23] we proposed N-body solverby ODE’s as a heuristic means. The algorithm is asfollows:

1. The initial configuration of M interacting “particles” isgenerated in E3,

2. Every “particle” corresponds to the respective N-di-mensional point from <N ,

3. The “particles” interact with each other with ˚i; j parti-cle-particle potential:

Vi; j D14� kr2i; j � a2i; j

�2(8)

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 6a The conceptual diagram of MDS transformation, b The clusters from Fig. 6 mapped by using multidimensional scaling into 3Dspace for synthetic seismic data catalog A covering 1500 years

(k – is the stiffness factor) and the energy produced isdissipated by the friction force proportional to the ve-locity of the particles.

4. The system of particles evolves according to the New-tonian equations of motion.

In this way the interactions between each pair of particlesare described by various spring like potentials, dependenton the separation distance between particles rij and the dis-tance Dij between respective multidimensional points in<N . If the distance between particles i and j in the output2(3)-D space is smaller than the distance between respec-tive i and j feature vectors in the source N-D space thesepoints repel one another. Otherwise, i. e. the distance islarger, the particles attract one another. By using the leap-frog numerical scheme for time-integration [41] the fol-lowing formula for velocities and positions of “particles”can be derived from the Newtonian equations:

vnC1/2i D

(1 � ')(1C ')

� vn�1/2i

C˛t

(1C ')

8<

:

KX

jD1

(rni; j2 � a2i; j)rni; j C

g˛iz

9=

;

rnC1i D rni C vnC1/2

i �t

˛ Dkm; ' D

2m�t ;

(9)

where vni , — the particle i, n – the time-step number,m D 1 – particle mass.

Page 12: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2358 E Earthquake Clusters over Multi-dimensional Space, Visualization of

As it is common in molecular dynamics [41], the sys-tem of “particles” evolves in time until the global (or closeto the global) minimum of Eq. (8) (the total potential en-ergy of the particle system) is gained. Two free parame-ters, and k, have to be fit to obtain the stable state, wherethe final positions of frozen “particles” reflect the result ofN�D to 3-D mapping. The conceptual scheme of MDSexploiting N-body solver is shown in Fig. 6A. In Fig. 6Bwe present the feature vectors shown in Fig. 5, using the7-dimensional feature space which has been transformedby using the MDS procedure and mapping onto the 3-Dspace. Take a look on the movies (Movie 1 and 2 in Sup-plementary Materials), which shows how rotation in the3-D space can help in cluster recognition.

Description of the Data

Natural Datasets

We analyze the observed and synthetic earthquake cata-logs for three time intervals of 5, 150 and 1500 years re-

EarthquakeClusters overMulti-dimensional Space, Visualizationof, Figure 7Seismic activities around the Japanese Archipelago with a timeperiod of 5 years. We use the hypocentral data provided bythe Japan Meteorological Agency (JMA). The magnitude of theearthquakes (JMAmagnitude) and their depths are representedby differences of the radius of the circle and colors, respec-tively. The red stars symbolize large events such as: Chi-Chi Tai-wan earthquake (21/9/1999M7.6 latitude 23.8 longitude 121.1),Swarm at Miyakejima (7/2000-8/2000 latitude 34.0 longitude139.0), Western Tottori earthquake (6/10/2000 M7.3 latitude35.3 longitude 133.4) (from [25])

spectively. The observed data (Fig. 7) represents seismicactivities of the Japanese islands collected by the JapanMe-teorological Agency (JMA).

The JMA Catalog consists of 915,829 events detectedin Japan Islands between 1923 and January 31, 2003. Theoriginal catalog includes also events with magnitudes lessthen 1.0. The lowestmagnitudes were determined by usinga detection level, estimated from the Gutenberg–Richterfrequency-size distribution. We have assumed that thecutoff magnitude of earthquake is equal to 3 (m > 3). Wedo not use any cutoff depth of hypocenter events. Theseismic events shown in Fig. 7, were recorded during the5 years time interval from October 1, 1997 to January 31,2003. The data set processed consists of M D 42 370 seis-mic events with magnitudes m, position in space (lati-tude X, longitude Y , depth z) and occurrence time t.

To analyze the seismic activity in longer time periods,we use data from synthetic catalogs generated by numeri-cal earthquake models [7].

Physical Model of Earthquake Dynamics

The synthetic catalogs are generated by the model of Ben-Zion [7] for a segmented strike-slip fault zone in a 3D elas-tic half-space, based on earlier developments of Ben-Zionand Rice [10,11]. Themodel attempts to account for statis-tical properties of earthquake ruptures on long and narrowfault zones with bends, offsets, etc (Fig. 8a), represented bya cellular structure in a 2D plane with discrete cells andspatial variations of frictional parameters (Fig. 8b). Themodel contains a computational grid (region II of Fig. 8b)where evolving stress and seismicity are generated in re-sponse to ongoing loading imposed as slip boundary con-ditions on the other fault regions. Regions III and V creepat constant plate velocity of 35mm/yr, while regions I andIV follow staircase slip histories with recurrence times of150 yr. The stress transfer due to the imposed boundaryconditions and failing grid cells is calculated by using a dis-cretized form of a boundary integral equation and employ-ing the static solution for dislocations in a 3D elastic half-space [10,63].

Deformation at each computational cell is the sum ofslip contributions from brittle and creep processes. Thebrittle process (Fig. 8c) is governed by distributions ofstatic friction � s, dynamic friction �d, and arrest stress� a. The static friction characterizes the brittle strengthof a cell until its initial failure in a given model earth-quake. When stress � at a cell reaches the static friction,the strength drops to the dynamic friction for the remain-ing duration of the event. The stress at a failing cell dropsto the arrest level � a, which may be lower than �d to ac-

Page 13: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2359

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 8The schematics of the model of Ben-Zion [7] for a segmented strike-slip fault zone in a 3D elastic half-space

commodate dynamic overshoot, producing local slip gov-erned by dislocation theory [17,63]. The static friction, dy-namic friction, and arrest stress are connected via a dy-namic overshoot coefficient D D (�s � �a)/(�s � �d). If thestress transfer from failing regions increases the stress atother cells to their static or dynamic strength thresholds,as appropriate, these cells fail and the event grows. Whenthe stress at all cells is below the brittle failure thresholds,the model earthquake ends and the strength at all failingcells recovers back to � s. The creep process is governedby a power-law dependence of creep-velocity on the localstress and space-dependent coefficients that increase ex-ponentially with depth and with distance from the south-ern edge of the computational grid. The chosen param-eters produce an overall “pine-tree” stress-depth profilewith a “brittle-ductile” transition at a depth of about 12.5km, and variable stress-along-strike profiles with a grad-ual “brittle-creep” transition near the boundary betweenregions II and III (see Ben-Zion [7] for additional details).The model generates many realistic features of seismic-ity compatible with observations, including frequency-sizeand temporal event statistics, hypocenter distribution withdepth and along strike, intermittent criticality, acceleratedseismic release, scaling of source time functions and more(e. g., [9,29,56,88]).

Synthetic Catalogs

Synthetic data generated by computational models cancomprise many events covering large spatial areas and ex-tremely long time spans. Moreover, the synthetic data re-tain the statistical reliability of the results. The data are freeof measurement errors, which occur in estimating earth-quake magnitudes and hypocentral locations, and do notsuffer from incomplete recording of small events, whichexist in natural catalogs. These are significant advantagesfor our study, which attempts to illustrate clearly the per-formance of clustering analysis and visualization tech-niques.

In Sect. “Description of the Data” we analyze syntheticcatalogs generated by two model realizations (A and M)of Ben-Zion [7]. The catalogs contain the time, locationand magnitude of earthquakes calculated by the modelfor 150 and 1500 years. Extensive numerical simulationswith several different classes of models, summarized byBen-Zion [8] and Zöller et al. [9], suggest that the de-gree of disorder in fault heterogeneities is a tuning pa-rameter of the earthquake dynamics. Catalog A is gener-ated by a model realization tailored to the Parkfield sec-tion of the San Andreas fault. Catalog M is generated bya realization of a more-disordered system like the San Jac-

Page 14: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2360 E Earthquake Clusters over Multi-dimensional Space, Visualization of

into fault or the Eastern California Shear Zone in South-ern California. In both data sets the time interval cov-ers all events (M � 1 � 3 � 104) that have occurred in thelast 150 years of simulated fault activity. These simulationswere repeated for ten times larger time scale i. e. 1500 yearinterval (the number of events M � 105) covering hun-dreds of large earthquakes (m > 6) and correspondinglywider time window.

The seismicity parameters were obtained by averagingthe data using a sliding time window of constant widthT and shift dt. We employ T D 10 days and dt D 2days for the Japanese data, T D 10 months and dt D 2months for the 150-years synthetic data and T D 30months and dt D 6 months for the data covering 1500years time period. Each parameter in the clustering wasnormalized with respect to the standard deviation.

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 9Real seismic data [49] analyzed by using clustering in both the data a,b and the feature (c,d) spaces. In panels a andb one can see theresults of clustering in the data space (from two different perspectives, X-Time and Depth-Time) for small (3 < m < 4) and mediummagnitude (4 < m < 6) events, respectively, represented by small dots. The different colors of the dots denote different clusters.Large events are visualized by the larger spheres. Their colors show the difference in magnitudes m (red – the largest, green – thesmallest). The clusters in panels a,b encircled in red display the places with the largest seismic activity, while those inwhite representthe clusters of small precursory events. The red,white and green stripes in panel c and d representing 4 (out of 7) seismic parametersand maximum magnitude M show the clusters of similar time events for situations corresponding to panels a and b, respectively.The Amira visualization package was used (http://www.amiravis.com)

Earthquake Visualization by Using Clusteringin Feature Space

Short-Time Period

Results of clustering of the observed Japanese seismic cat-alogs (see Fig. 7) both in raw data and in feature spacesare shown in Fig. 9. At the data resolution level a singleseismic event i can be represented as a multi-dimensionaldata vector f i D [mi ; zi ; Xi ;Yi ; ti ] where: mi is the mag-nitude, Xi – the latitude, Yi – the longitude, zi and ti –the depth and the time of occurrence, respectively. Theseismic events are visualized with the Amira package inFig. 9a,b as irregular clouds of colored dots with (z; x; t)coordinates.

In accordance with the Gutenberg–Richter relation-ship, we find that the number of events from various

Page 15: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2361

ranges of magnitudes differs considerably, and divide theentire set of data onto three subsets. The first one com-prises the small, the second medium and the last one rep-resents the largest earthquakes displayed in Fig. 9a,b asbig dots. The deepest earthquakes z >150 km are not dis-played in the Fig. 9. The various shades represent themagnitudes of earthquakes from m D 6 (green) to m D 7(red). In Fig. 9 we present the clustering results in boththe data f i and the feature Fi spaces. We look for clus-ters of similar seismic events (data space) and time events(feature space). The dots (data vectors), belonging tothe same clusters, have the same color. The Fig. 9a,b isvery rich in cluster-like forms, some of them hard to in-terpret. Correspondence of the cluster structure of dataf j( j D 1; : : : ;Mf ) with the clusters of averaged eventsF i (i D 1; : : : ;MF ) in the feature space can reveal interest-ing information. As one can see from the panel C, onlythree clusters are obtained in the feature space consist-ing of small data events (3 < m < 4). The green clus-ter corresponds to two relatively large time intervals ofsmall events preceding Miyakejima earthquake and manysmaller post shock periods. The time events Fi from thiscluster represent averaged data events f j, mainly shallow(AZ) of high degree of spatial repetitiveness SR and smalldiversity of magnitudes (MR). The red cluster consists ofdeeper events of smaller repetitiveness, and more diversi-fied in magnitudes. The larger time interval of this type ofbehavior is recognized just after Miyakejima shock. Thewhite cluster is not interesting in this scale of small eventsand includes all other events (including the earthquakeswarm).

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 10The clusters from feature space mapped into 3D space for a realistic short-time interval seismic data. The small blue cluster at thebottom represents the events at the end of the time interval, which are averaged within a shrinking time window. b The syntheticseismic data catalog A covering 150 years

In panel D we display the seismicity parameters, whichform three clusters of time events obtained for seismicevents of larger magnitude 4 < m < 6. Clusters of theseevents have different structure than in the previous case.They are parallel to X-depth plane. The borders betweenclusters roughly correspond to the borders of successiveshowers of the earthquakes. The red cluster comprises onlythe earthquakes corresponding to the Miyakejima swarmencircled in red in Fig. 9b. As we can see by the MDS vi-sualization displayed in Fig. 10a, this cluster is made upfrom a needle of time events sprouting away from the tworemaining and oval clusters. The green cluster in Fig. 9drepresents the deep events, diversified in magnitude ofhigh repetitiveness and rather high degree of spatial non-randomness at short distances (NS). As shown in Fig. 9,these time events represent mainly the post-swarm seriesof shocks. The white cluster, as before, includes all theother events.

Time Period of 150 Years

In Fig. 11 we display the time series of seismicity param-eters computed for the complete synthetic data catalog A.These time series follow the situation from Fig. 3 wheredots represent separate data events. The green, red andwhite strips in Fig. 11 separate 3 clusters of similar timeevents represented by 7-dimensional feature vectors. InFig. 10b these clusters are visualized due to theMDS trans-formation of 7-dimensional feature space into 3-D space.In Fig. 10b each dot represents a 7-dimensional featurevector mapped into 3-D space by MDS transformation.

Page 16: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2362 E Earthquake Clusters over Multi-dimensional Space, Visualization of

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 11The seismicity parameters {M,NS,NL,CD,SR,AZ,TI,MR} in time for synthetic data catalog A (from [25])

From the top panel of Fig. 11 displaying the largest eventsM in the sliding time window, we may conclude that thewhite (blue in Fig. 10b) and red clusters from Figs. 10, 11bcomprise time events, which correspond to the aftershockeffects. The white cluster represents the net aftershockevents, while the red one includes the earthquake effectsaveraged in sliding time window. Conversely, the greencluster (yellow in Fig. 10b) contains the time events pre-ceding the earthquakes.

The selectivity in time of the seismicity parameters de-pends on the width T and shift dt of the sliding timewindow. Due to space and time averaging, it is impossi-ble to correlate precisely the appearance of an earthquakewith the rest of the seismicity parameters when two earth-quakes are too close to each other. Therefore, the sequenceof green-red-white cluster events can be broken (Fig. 11)into time domains withmany large earthquakes. As shownin Fig. 11 the occurrence of the largest events correlateswell with the minima of NS, CD, SR, TI, and maxima ofAZ, MR parameters. This means that the occurrence oflarge earthquakes is preceded by increasing spatial diffu-sion of events and increasing seismicity rate. Moreover,the results confirm the some findings from the real datain a shorter time-scale:

1. The events preceding large earthquakes are shallowand have small magnitudes. They have also higher de-gree of spatial repetitiveness than events from differentclusters.

2. The earthquakes accompanying and following themainshock are rather large in magnitude, deep, havehigh seismicity rates and low spatial correlation dimen-sion (this drops off rapidly at the onset of large events),

The analysis of synthetic data shows clearly that the clus-ters in the feature space reflect well the periodicity of in-creasing and decreasing seismic activity in a given area.For this scale, however, the fine scale characteristics of pre-cursory and after-shock effects become fuzzy.

Time Period of 1500 Years

In Figs. 4, 5, 6b and Figs. 12, 13 we visualize the featurevectors for data covering 1500 year period for two mod-els: the A model with a Parkfield-type asperity and the Mmodel withmulti-size-heterogeneities. In Fig. 4 and Fig. 12one can recognize two types of clusters with different sizes.The larger cluster comprises feature vectors forming ap-proximately 150-year long periodic time intervals, whichare represented by red strips in Fig. 4 and by green stripsin Fig. 12. The second cluster consists of feature vectorsfrom periodic gaps colored in green in Fig. 4 and in whitein Fig. 12. This anomalous cluster corresponds to the pe-riodic changes in the character of seismic activities. Thethird cluster (see Fig. 13), marked in red for M type of datain Fig. 12, consists of periodic and short time intervals rep-resenting rapid bursts of seismic activity within every 150-year interval.

Page 17: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2363

EarthquakeClusters overMulti-dimensional Space, Visualizationof, Figure 12The seismicity parameters with time for synthetic (catalogM) forseismic data representing time interval of 1500 years. The redand green strips depict the events belonging to red and greenclusters from Fig. 7b, respectively

In both A and M models the gaps between 150-yearlong intervals are correlated with decrease of: the correla-tion dimension (CD), degree of spatial repetitiveness (SR)and seismicity rate. These gaps are preceded by large earth-quakes. The simulations used for generating the datasetsincorporate imposed large earthquakes on regions (I) and(IV) of Fig. 8b that bound the computational grid (re-gion II), as staircase boundary conditions with a step atevery 150 years. The analysis detected the effects of theseboundary conditions on the seismicity that is calculated inregion II.

There are also evident differences between the A andM data in the time intervals belonging to the second clus-ter. For A environment the gaps between 150-year in-tervals are greater and the secondary periodicity withinthem is less clear. Moreover, within time intervals fromthe second cluster, the degree of spatial non-randomnessdecreases at long distances (NL) for M model while forA data it decreases at short distances (NS). In addition, theaverage depth of earthquakes (AZ) is then clearly larger forA model, while for M data it remains on the average level.

In sum, by means of analyzing earthquake clusters infeature space over long time-scale, we can investigate im-portant characteristics of seismic activity such as:

EarthquakeClusters overMulti-dimensional Space, Visualizationof, Figure 13The clusters from Fig. 12 mapped by using multidimensionalscaling froma 7-dimensional feature space into 3D space for syn-thetic seismic data catalog M covering 1500 years

1. The occurrence of hierarchical time-periodicity in seis-mic activity caused by increase of short-time correla-tions and their destruction, respectively. Correlationscan be broken both due to short-wave and long-waveresonances of the Poincare type (e. g. during largestearthquakes) [Sornette, 2004].

2. The dependence of seismic activities on the ambientrheological and geological properties of the environ-ment, which strongly modify the cluster structure of thefeature vectors.

Remote Problem Solving Environment (PSE)for Analyzing Earthquake Clusters

Need for Remote Visualization and Analysis

We need fast access to large databases in order to fore-cast earthquakes by observation of similarities betweenthousands and millions of seismic events by visualiza-tion of earthquake clusters. The largest earthquake cata-logs comprise TBytes of data. Taking into account also thedata from tsunami earthquakes and micro-earthquakesin mines, the total amount of data collected by seis-mic centers spread all over the world is humongous.Moreover, knowledge extraction of earthquake precur-sors may demand exploration of cross-correlation rela-tionships among many different catalogs. Therefore, both

Page 18: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2364 E Earthquake Clusters over Multi-dimensional Space, Visualization of

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 14Worldwide distribution of earthquake seismographic stations (© USGS)

fast communication between data centers and large diskspaces are sorely needed.

As shown in Fig. 14, earthquake seismograph stations,which collect earthquake data from regions with high seis-mic activity, are distributed worldwide. Therefore, the un-processed data needs to be stored and then transferredto a dedicated remote server for data processing. Afterprocessing, the results must be returned to data acquisi-tion centers and/or other clients. Broadband access to re-mote facilities dedicated specifically to pattern recognitionand visualization allows for scrutinizing local data cata-logs by using peer-to-peer connections of data acquisitioncenters to data preprocessing servers. Clients in the net-work can automatically compare various types of earth-quake catalogs, data measured in distant geological re-gions, and the results from various theoretical and compu-tational models. By comparing data accessible in the net-work we have a chance to eliminate the environmental fac-tors and to extract the resultant earthquake precursory ef-fects.

Integration of a variety of hardware, operating sys-tems, and their proper configuration results in many com-munication problems between data centers. Efficient, reli-able, and secure integration of distributed data and soft-ware resources, such as pattern recognition and visualiza-tion packages, is possible only within the GRID paradigm

of computing [13,31]. The GRID mode of computing hasflourished rapidly in recent years and has facilitated collab-oration and accessibility to many types of resources, suchas large data sets, visualization servers and computing en-gines. Scientific teams have developed easy-to-use, flexible,generic and modular middleware, enabling today’s appli-cations to make innovative use of global computing re-sources. Remote access tools were also produced to vi-sualize huge datasets and monitor performance and dataanalysis properties, effectively steering the data process-ing procedures interactively [21]. The TeraGrid project(http://www.teragrid.org) is a successful high-perfor-mance implementation of such a GRID infrastructure andis being used as an integrated, persistent computationalresource at universities and laboratories across the USA.The Teragrid development impacts also the earthquakescience. The National Science Foundation has awarded theSouthern California Earthquake Center 15 million serviceunits of computer processing time on supercomputers na-tionwide [Grid Today, August 2007]. These computationalresources will be used for simulating thousands of possi-ble earthquakes scenarios in Southern California, includ-ing the largest breaks on the San Andreas fault (www.scec.org/cybershake). SCEC will be able to simulate the mostdisastrous earthquakes (M > 7), such as events that couldproduce Katrina-scale disasters.

Page 19: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2365

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 15Data acquisition, storage, processing, and remote problem solving environments

We discuss the idea of an integrated problem-solvingenvironment (PSE) intended for the analysis of earthquakeclusters for the prediction of earthquakes. A simplifiedscheme for data acquisition and visualization of earth-quake clusters is displayed in Fig. 15. This system pro-motes portability, dynamic results on-demand, and collab-oration among researchers separated by long distances byusing a client server paradigm. This is provided througha lightweight front-end interface for users to run locallywhile the a remote server takes care of intensive process-ing tasks on large databases, off-screen rendering, and datavisualization.

Grid Environment

In general, large datasets and high-performance comput-ing resources are distributed across the world. When col-laboration and sharing of resources are required, a compu-tational GRID infrastructure needs to be in place to con-

nect these servers (see, e. g., [15]). There must exist pro-tocols available to allow clients to tap into these resourcesand harness their power. The computational grid can beseen as a distributed system of “clients”, which consists ofeither “users” or “resources” and proxies. A GRID can beimplemented using an event brokering system designed torun on a large network of brokering nodes. Individually,these brokering nodes are competent servers, but whenconnected to the brokering system, they are able to sharethe weight of client requests in a powerful and efficientmanner. Examples of this include GRID Resource Broker-ing [30] and NaradaBrokering.

These GRID architectures are well suited to the func-tionality of a PSE for earthquake cluster analysis and as anintegrated computational environment for data exchangeand common ventures. The seismic data centers from thenetworking point of view represent a complex hierarchi-cal cluster structure. They are located geographically inthe regions of high seismic activity within heavily popu-

Page 20: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2366 E Earthquake Clusters over Multi-dimensional Space, Visualization of

lated areas of economic importance. Therefore, the seis-mic data centers create distant superclusters of various“density” of computational resources corresponding to thesize and importance of the regions. These superclustersare sparse in the sense of computational resources devotedfor earthquake detection and data acquisition. However,these same structures contain important computational,scientific and visualization facilities with strong interest inthe analysis of earthquake data and earthquake modeling.The efficient interconnection of these sites is of principalinterest. Due to the “small world network” structure ofGRID architectures it is possible to select the most effi-cient routing schemes, considerably shortening the aver-age communication path length between brokers. GRIDarchitectures are appropriate to link the clients, both usersand resources, together. Construction of efficient and userfriendly Problem Solving Environments requires integra-tion of data analysis and visualization software within theGRID environment, in such a way that it can be easilyaccessed via the Internet. We created an integrated datainterrogation toolkit to act as a PSE for visualization andclustering of seismic data, which we call WEB-IS.

Example of Remote PSE

WEB-IS is a software tool that allows remote, interactivevisualization and analysis of large-scale 3-D earthquakeclusters over the Internet [85] through the interaction be-tween client and server. WEB-IS acts as a PSE througha web portal used to solve problems by visualizing and ana-

Earthquake Clusters over Multi-dimensional Space, Visualization of, Figure 16WEB-IS is an example of a remote earthquake clustering PSE

lyzing geophysical datasets, without requiring a full under-standing of the underlying details in software, hardwareand communication [34,52]. As shown in Fig. 16, the pri-mary goal of WEB-IS in the geosciences is to provide mid-dleware that sits between the modeling, data analysis toolsand the display systems that local or remote users access.In the case of large and physically distributed datasets, it isnecessary to perform some preprocessing and then trans-mit a subset of the data to one or more processes or visu-alization servers to display. The details of where and howthe data migrates should be transparent to the user. WEB-IS makes available to the end users the capability of in-teractively exploring their data, even though they may nothave the necessary resources such as sufficient software,hardware or datasets at their local sites. This method ofvisualization allows users to navigate through their ren-dered 3-D data and analyze for statistics or apply earth-quake cluster analysis. To the client, the process of access-ing and manipulating the data appears simple and robust,while the middleware takes care of the network communi-cation, security and data preparation.

Complete realization of an earthquake clustering PSEconsists of:

1. Data analysis tools to implement earthquake clusteringtechniques;

2. High performance visualization techniques usingOpenGL or Amira;

3. The Grid environment;4. Integration toolkit, such as WEB-IS.

Page 21: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2367

These exist and can work both independently and coupledin a single special purpose system. This system can be de-veloped creating the backbone of the sophisticated com-putational data acquisition environment, which can be de-vised specifically for earthquake clustering or for generalneeds of the geophysical community. Equipped with onlyPDAs or laptops, and working on location in unreachabledesert terrains with remote data acquisition centers or per-haps just analyzing data in one of the many computationfacilities located around the globe, geophysicists will be en-abled unlimited access to data resources spread all over theworld.

We see the principal goal of our work in contribut-ing to the construction of a global warning system, whichcan be used for prediction of catastrophes such as var-ious types of earthquakes along the circum Pacific belt,where there is a great concentration of people. For exam-ple, similar methodology can be used for tsunami earth-quake alerting. Theoretical models of faulting and seismicwave propagation used for the computation of radiatedseismic energy from broad-band records at teleseismic dis-tances [14] can be adapted to the real-time situation whenneither the depth nor the focal geometry of the sourceis known accurately. The distance-dependent approxima-tion was used in [60]. By analyzing some singular geo-physical parameters such as the energy-to moment ratioH [60] for regular earthquakes, the results obtained fromthe theoretical models agree well with values computedfrom available source parameters (e. g., as published bythe National Earthquake Information Center). It appearshowever that the so called “tsunami earthquakes” – char-acterized by the significant deficiency of moment release athigh frequencies – yield the values of H considerably dif-ferent the regular earthquakes. Thus H value can be usedas a suitable criterion for discriminating various types ofearthquakes in a short duration of time, like an hour. How-ever, this hypothesis holds only for a few cases. For, socalled, “tsunamigenic earthquakes” this difference is notso clear. Moreover, the value of the moment computed onthe base of long-period seismic waves can be underesti-mated. For example, analysis of the longest period normalmodes of the Earth, 0S2 and 0S3, excited by the December26, 2004 Sumatra earthquake [76], yields an earthquakemoment of 1:3 � 1030 dyn-cm, approximately three timeslarger than the 4 � 1029 dyn-cm measured from long-pe-riod surface waves. Therefore, instead of a single-value dis-crimination we recommend using more parameters (di-mensions) for detecting tsunami earthquakes. As shownin [64] and [83], one could employ other T-phase charac-teristics such as its duration, seismic moment, and spec-tral strength or even similar features associated with the

S-phase. We believe that the lack of success in predictingearthquakes still comes from the lack of communicationsbetween researchers and difficulties in free and fast accessto the various types of data. Therefore, we hope that glob-alization of computation, data acquisition and visualiza-tion resources, together with fast access through a scale-free network, will provide a triumphant solution to thisproblem.

Future Directions

In this chapter we endeavor to bring across the basic con-cept of clustering and its role in earthquake forecasting.Indeed we find that the clustering of seismic activities re-flects both the similarity among them and their correla-tion properties. As discussed in, e. g., Ben-Zion et al. [9],Saichev and Sornette [68] and Zöller et al. � Seismicity,Critical States of: From Models to Practical Seismic Haz-ard Estimates Space, there exists an evolutionary processor memory between successive earthquakes, which im-pact the distribution of the inter-event times. We believethat by means of earthquake clustering we can capturethe essence of this predictive information [27]. Therefore,in order to carry out real-time earthquake forecasting forshort-time scales, it is necessary to derive a thorough un-derstanding of all families of earthquake clusters producedover an earthquake-prone region.

We stress here that in obtaining this type of infor-mation one must first be able to detect the precise loca-tion of the significant clusters, by filtering out simultane-ously the noise and the outliers. While the existence ofspatial-temporal clusters is important, they do not revealthe subtle information hidden behind the relations amongthe data events, such as: spatial-temporal correlation di-mensions, correspondence between the numbers of smalland large magnitude events, degree of spatial randomness,repetitiveness at different distances and other factors. Thefeatures – “descriptors” or seismicity parameters – con-structed from the empirical knowledge of the researchershould be largely independent and should represent aptlydistinctive features, which are useful for the purpose ofpattern recognition. Unlike single events described only byspatio-temporal features (and magnitude), the N-dimen-sional feature vectors can represent better the dynamics ofthe seismically active area in differentmoments of time. Byfollowing the basic rules of learning theory, wemay be ableto arrive at the number N and quality of features, whichcan assure the generalization power of the data and allowus to construct reliable data-models or classifiers.

We have shown that clustering, as a well-honed tool indata mining and pattern recognition, represents the clas-

Page 22: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2368 E Earthquake Clusters over Multi-dimensional Space, Visualization of

sifier without the teacher, which means that the nature ofthe clustering is unknown and its exact background mustbe guessed at from expert knowledge and analysis of thecluster properties. Clustering is a process based on a pri-ori knowledge extraction for constructing the hypothesisspace needed for reliable classifiers that can be taught andused for forecasting [25]. However, the quality of thesedata models depends strongly on the quality of hypoth-esis space constructed. Consequently, it depends on thequality of clusters extraction. The major problem comesfrom the lack of a universal clustering scheme, thus mak-ing the clustering process somewhat subjective. In this casewe must visualize the multidimensional feature space. Vi-sual confirmation gives one a confidence concerning thevalidity of the clusters and we can then adjust for the opti-mal clustering procedures by removing the noise and out-liers. Among the major goals of earthquake clustering, wecan include the following salient points:

� classification of the chaotic properties of seismicity pat-terns [35], for example to recognize the three maingroups of shocks: foreshocks, mainshocks and after-shocks or to remove the temporary clustering to esti-mate the background seismicity;

� understanding the correlations between observedproperties of earthquakes in different domains (e. g.,space, time, number, size);

� understanding the relations between various physicalparameters of the models and properties of the gener-ated earthquakes;

� investigating the multi-scale nature of the cluster struc-ture and reconstructing the important and hidden in-formation associated with the stress characteristics.

Classification of type of shocks seems to be an unresolvedproblem because there are no observable differences be-tween foreshocks, main shocks and aftershocks [68]. Eachearthquake is able of triggering other earthquakes accord-ing to the basic laws from [46,69]. Despite this difficulty,as shown in [9], it is possible to construct some sortof stochastic classifiers based on theoretical footing. Themethod proposed here closely related to the epidemic-typeaftershock sequence (ETAS) model [61]. It is importantthat the principal characteristics of ETAS-based modelscorrespond to experimental verifications, i. e., they treatall earthquakes on the same footing and there is not dis-tinction between foreshocks, main shocks and aftershocks.The key points of the method are the probabilities ofone event being triggered by a previous event (e. g., [82]).Making use of these probabilities, we can reconstruct thefunctions associated with the characteristics of earthquake

clusters to test a number of plausible hypotheses about theearthquake clustering phenomena.

As shown above by our results on seismicity cluster-ing for the three different time epochs, clustering can betruly regarded as a coarse-graining procedure. We can seedetails from the smaller scales are erased, thereby expos-ing the general trends associated with the long correlationlength. For large data bases covering long time intervalswe can unveil the shorter timescale characteristics by re-moving the background events, using successive cluster-ing. Eventually, we can build up the strong classifiers. Inthe case where the long-time data catalogs are missing, wecan employ the stochastic classifiers advocated Ben-Zionet al. [9] for prior thresholding of the background dataor what is sometimes called “fuzzification” [86]. By thisprocedure we can construct the hypothesis space for datamodels by clustering (or fuzzy clustering) procedures.

The results discussed in this paper contribute to thedevelopment of improved software infrastructure for anal-ysis of seismicity. A combined clustering analysis of ob-served and synthetic data, aided by state-of-the-art visual-ization of multidimensional clusters, undoubtedly lead toimproved earthquake forecasting algorithms with shortertime windows of increased probability of large seismicevents.

Acknowledgments

This research was supported by NSF ITR and Math-Geo grants. WD acknowledges support from the Pol-ish Committee for Scientific Research (KBN) Grant No.3T11C05926. YBZ acknowledges support from the NSF,USGS and SCEC.

Bibliography

Primary Literature1. Amira visualization package. http://www.amiravis.com2. Andenberg MR (1973) Clusters analysis for applications. Aca-

demic Press, New York3. Bak P, Tang C (1989) Earthquakes as a self-organized critical

phenomena. J Geophys Res 94(B11):15635–156374. Bak P, Christensen K, Danon L, Scanlon T (2002) Unified scal-

ing law for earthquakes. Phys Rev Lett 88:1785015. Barabasi AL, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T

(2002) Evolution of the social network of scientific collabora-tions. Physica A 311(3–4):590–614

6. Bennett AF (1992) Inverse methods in physical oceanogra-phy. Cambridge University Press, Cambridge, pp 346

7. Ben-Zion Y (1996) Stress, slip and earthquakes in models ofcomplex single-fault systems incorporating brittle and creepdeformations, J Geophys Res 101:5677–5706

8. Ben-Zion Y (2001) Dynamic rupture in recentmodels of earth-quake faults. J Mech Phys Solids 49:2209–2244

Page 23: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Clusters over Multi-dimensional Space, Visualization of E 2369

9. Ben-Zion Y (2003) Appendix 2, key formulas in earthquakeseismology. In: Lee WHK, Kanamori H, Jennings PC, KisslingerC (eds) International handbook of earthquake and engineer-ing seismology, Part B. Academic Press, pp 1857–1875

10. Ben-Zion Y, Rice JR (1993) Earthquake failure sequencesalong a cellular fault zone in a three-dimensional elastic solidcontaining asperity and nonasperity regions. J Geophys Res98:14109–14131

11. Ben-Zion Y, Rice JR (1995) Slippatterns and earthquake popu-lations along different classes of faults in elastic solids. J Geo-phys Res 100:12959–12983

12. Ben-Zion Y, Eneva M, Liu Y (2003) Large earthquake cy-cles and intermittent criticallity on heterogeneous faultsdue to evolving stress and seismicity. J Geophys Res108(B6):2307–27

13. Berman F, Fox GC, Hey AJG (2003) Grid computing – makingthe global infrastructure a reality. Wiley Series in Communi-cations Networking and Distributed Systems, pp 1007

14. Boatwright J, Choy GL (1986) Teleseismic estimates of theenergy radiated by shallow earthquakes. J Geophys Res91:2095–2112

15. Bollig EF, Jensen PA, Lyness MD, Nacar MA, da Silveira PR,Erlebacher G, Pierce M, Yuen DA (2007) VLAB: Web ser-vices, portlets, and workflows for enabling cyber infrastruc-ture in computational mineral physics. J Phys Earth Planet In-ter 163:333–346

16. Chen C-C, Rundle JB, Li H-C, Holliday JR, Turcotte DL, TiampoKF (2006) Critical point theory of earthquakes: Observationsof correlated and cooperative behavior on earthquake faultsystems. Geophys Res Lett L18302

17. ChinneryM (1963) The stress changes that accompany strike-slip faulting. Bull Seismol Soc Am 53:921–932

18. Corral A (2005) Mixing of rescaled data and Bayesian in-ference for earthquake recurrence times. Nonlinear ProcessGeophys 12:89–100

19. Corral A (2005) Renormalization-group transformations andcorrelations of seismicity. Phys Rev Lett 95:028501

20. Corral A, Christensen K (2006) Comment on earthquakesdescaled: On waiting time distributions and scaling laws.Phys Rev Lett 96:109801

21. da Silva CRS, da Silveira PRC, Karki B,Wentzcovitch RM, JensenPA, Bollig EF, Pierce M, Erlebacher G, Yuen DA (2007) Virtuallaboratory for planetary materials: System service architec-ture overview. Phys Earth Planet Inter 163:323–332

22. Davy P, Sornette A, Sornette D (1990) Some consequencesof a proposed fractal nature of continental faulting. Nature348:56–58

23. Dzwinel W, Blasiak J (1999) Method of particles in visual clus-tering of multi-dimensional and large data sets. Future GenerComput Syst 15:365–379

24. Dzwinel W, Yuen DA, Kaneko Y, Boryczko K, Ben-Zion Y(2003) Multi-resolution clustering analysis and 3-D visualiza-tion of multitudinous synthetic earthquakes. Vis Geosci 8:12–25

25. Dzwinel W, Yuen DA, Boryczko K, Ben-Zion Y, Yoshioka S, Ito T(2005) Nonlinear multidimensional scaling and visualizationof earthquake clusters over space, time and feature space.Nonlinear Process Geophys 12:117–128

26. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Clus-ter analysis and display of genome-wide expression patterns.Proc Natl Acad Sci USA 95:14863–14868

27. Enescu B, Ito K, Struzik ZR (2006) Wavelet-based multiscaleanalysis of real and simulated time-series of earthquakes.Geophys J Int 164:63–74

28. Eneva M, Ben-Zion Y (1997) Techniques and parametersto analyze seismicity patterns associated with large earth-quakes. J Geophys Res 102(B8):785–795

29. Eneva M, Ben-Zion Y (1997) Application of pattern recogni-tion techniques to earthquake catalogs generated by mod-els of segmented fault systems in three-dimensional elasticsolids. J Geophys Res 102:24513–24528

30. Ferreira L (2002) Introduction to grid computing withGlobus IBM Redbook series. IBMCorporation http://ibm.com/redbooks

31. Foster I, Kesselman C (eds) (1998) Building a computationalgrid: state-of-the art and future directions in high-perfor-mance distributed computing. Morgan-Kaufmann, San Fran-cisco

32. Freed AM, Lin J (2001) Delayed triggering of the 1999 Hec-tor Mine earthquake by viscoelastic stress transfer. Nature411:180–183

33. Frey BJ, Dueck D (2007) Clustering by Passing Messages Be-tween Data Points, Science 315(5814):972–976

34. Garbow ZA, Erlebacher G, Yuen DA, Sevre EO, Nagle AR,Kaneko Y (2002) Web-based interrogation of large-scale geo-physical datasets and clustering analysis of many earthquakeevents from desktop and handheld devices. American Geo-physical Union Fall Meeting, Abstract

35. Goltz C (1997) Fractal and chaotic properties of earthquakes.In: Goltz C (ed) Lecture notes in earth sciences, vol. 77.Springer, Berlin, p 3–164

36. Gowda CK, Krishna G (1978) Agglomerative clustering us-ing the concept of nearest neighborhood. Pattern Recognit10:105

37. Grossman RL, Karnath Ch, Kegelmeyer P, Kumar V, NamburuRR (2001) Data mining for scientific and engineering applica-tions. Kluwer, Dordrecht

38. Guha S, Rastogi R, Shim K (1998) CURE: An efficient algorithmfor large databases. In: Proceedings of SIGMOD ’98, Seattle,June 1998. pp 73–84

39. Gutenberg B (1942) Earthquake magnitude, intensity, energyand acceleration. Bull Seismol Soc Am 32:163–191

40. Gutenberg B, Richter CF (1954) Seismicity of the earth and as-sociated phenomena. Princeton University Press, Princeton

41. Haile PM (1992) Molecular Dynamics Simulation. Wiley, NewYork

42. Hainzl S, Scherbaum F, Beauval C (2006) Estimating back-ground activity based on interevent-time distribution. BullSeismol Soc Am 96:313–320. doi:10.1785/0120050053

43. Hand D, Mannila H, Smyth P (2001) Principles of data mining.MIT Press, Cambridge

44. Hastie T, Tibshirani R, Friedman J (2001) The elements ofstatistical learning: Data mining, inference and prediction.Springer, New York, pp 533

45. Helmstetter A, Sornette D, Grasso J-R (2003) Mainshocks areaftershocks of conditional foreshocks: How do foreshock sta-tistical properties emerge from aftershock laws. J GeophysRes 108:2046

46. Helmstetter A, Kagan Y, Jackson D (2005) Importance of smallearthquakes for stress transfers and earthquake triggering.J Geophys Res 110:B05S08

Page 24: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

2370 E Earthquake Clusters over Multi-dimensional Space, Visualization of

47. Hong H, Kadlec DJ, Yuen DA, Zheng Y, Zhang H, Liu G,Dzwinel W (2004) Fast timescale phenomena at Changbais-han volcano as inferred from recent seismic activity. Eos TransAGU Fall Meet. 85(47) http://www.agu.org

48. Ismail MA, Kamel MS (1989) Multi-dimensional data clus-tering utilizing hybrid search strategies. Pattern Recognit22(1):77–89

49. Ito T, Yoshioka S (2002) A dike intrusion model in andaround Miyakejima, Niijima and Kozushima. Tectonophysics359:171–187

50. Jajuga K, Sokolowski A, Hermann H (eds) (2002) Classification,clustering and data analysis. Springer, Berlin, pp 497

51. Jones NC, Pevzner P (2004) An introduction to bioinformaticsalgorithms. MIT Press, Cambridge

52. Kadlec BJ, Yang XL, Wang Y, Bollig EF, Garbow ZA, Yuen DA,Erlebacher G (2003) WEB-IS (Integrated System): An over-all view. Eos Trans AGU 84(46), Fall Meet. Suppl., AbstractNG11A-0163

53. Kalnay E (2003) Atmospheric modeling, data assimilationand predictability. Cambridge University Press, Cambridge,pp 341

54. Karypis G, Han E, Kumar V (1999) Chameleon: A hierarchi-cal clustering algorithms using dynamic modeling. IEEE Com-puter 32(8):68–75

55. Karypis G, Aggarwal R, Kumar V, Shekhar S (1999) Multi-level hypergraph partitioning: applications in VLSI domain.IEEE Trans on Very Large Scale Systems Integration (VLSI)7(1):69–79

56. Mehta AP, Dahmen KA, Ben-Zion Y (2006) Universal meanmoment rate profiles of earthquake ruptures. Phys Rev E73:056104

57. Mitra S, Acharya T (2003) Data mining: multimedia, soft com-puting and bioinformatics. Wiley, New Jersey

58. Molchan GM (2005) Interevent time distribution of seismicity:A theoretical approach. Pure Appl Geophys 162:1135–1150

59. National Research Council (2003) Living on an activeearth, perspectives on earthquake sciences. The NationalAcademies Press, Washington DC

60. Newman AV, Okal EA (1998) Teleseismic estimates of radi-ated seismic energy: the S/M0discriminant for tsunami earth-quakes. J Geophys Res 103(B11):23885–23898

61. Ogata Y (1999) Seismicity analysis through point-processmodeling: A review. Pure Appl Geophys 155:471–507

62. Ogata Y, Zhuang J (2006) Space-time ETASmodels and an im-proved extension. Tectonophysics 413:13–23

63. Okada Y (1992) Internal deformation due to shear and tensilefaults in a half space. Bull Seismol Soc Am 82:1018–1040

64. Okal EA, Alasset P-J, Hyvernaud O, Schindele F (2003) Thedeficient T waves of tsunami earthquakes. Geophys J Int152:416–432

65. Rundle JB, Gross S, Klein W, Ferguson C, Turcotte DL (1997)The statistical mechanics of earthquakes. Tectonophysics277:147–164

66. Rundle JB, Klein W, Tiampo K, Gross S (2000) Linear pat-tern dynamics in nonlinear threshold systems. Phys Rev E61(3):2418–2143

67. Rundle JB, Turcotte DL, Klein W (eds) (2000) GeoComplexityand the physics of earthquakes. AmericanGeophysical Union,Washington, pp 284

68. SaichevA, SornetteD (2007) Theory of earthquake recurrencetimes. J Geophys Res 112(B4):1–26

69. Saichev A, Helmstetter A, Sornette D (2005) Power law dis-tributions of offspring and generation numbers in branch-ing models of earthquake triggering. Pure Appl Geophys162:1113–1134

70. Sander J, Ester M, Krieger H (1998) Density based clusteringin spatial databases, The algorithm DBSCAN and its applica-tions. Data Min Knowl Discov 2(2):169–194

71. Shcherbakov R, Turcotte DL (2004) A modified form of Bath’slaw. Bull Seismol Soc Am 94:1968–1975

72. Shcherbakov R, Turcotte DL, Rundle JB (2005) Aftershockstatistics. Pure Appl Geophys 162(6–7):1051–1076

73. Siedlecki W, Siedlecka K, Sklanski J (1988) An overview ofmapping for exploratory pattern analysis. Pattern Recognit21(5):411–430

74. Sornette D (2006) Critical phenomena in natural sciences.Springer Series in Synergetics, Berlin, pp 528

75. Sornette D, Johansen A, Bauchaud J-P (1996) Stock marketcrashes, precursors and replicas. J Phys Int Finance 5:167–175

76. Stein S, Okal E (2004) Ultra-long period seismic moment ofthe great December 26, 2004 Sumatra earthquake and im-plications for the slip process. http://www.earth.nwu.edu/people/emile/research/Sumatra.pdf

77. Tan P-N, Steinbach M, Kumar V (2005) Introduction to datamining. Addison Wesley, Boston, pp 769

78. Theodoris S, Koutroumbas K (1998) Pattern recognition. Aca-demic Press, San Diego

79. Turcotte DL (1997) Fractals and chaos in geology and geo-physics, 2nd Edn. Cambridge University Press, New York

80. Utsu T (2002) Statistical Features of Seismicity. In: Lee WHK,Kanamori H, Jennings PC, Kisslinger C (eds) InternationalHandbook of Earthquake and Engineering Seismology, PartA. Academic Press, pp 719–732

81. Van Aalsburg J, Grant LB, Yakovlev G, Rundle PB, Rundle JB,Turcotte DL, Donnellan A (2007) A feasibility study of data as-similation in numerical simulations of earthquake fault sys-tems. Phys Earth Planet Inter 163:149–162

82. Vere-Jones D (1976) A branching model for crack propaga-tion. Pure Appl Geophys 114(4):711–726

83. Walker DA, Mc Creery CS, Hiyoshi Y (1992) T-phase spectra,seismic moment and tsunamigenesis. Bull Seismol Soc Am82:1275–1305

84. Wesnousky SG (1994) TheGutenberg-Richter or characteristicearthquake distribution, which is it? Bull Seismol Soc Amer84:1940–1959

85. Yuen DA, Garbow ZA, Erlebacher G (2004) Remote data anal-ysis, Visualization and Problem Solving Environment (PSE)based on wavelet analysis in the geosciences. Vis Geosci8:83–92. doi:10.1007/x10069-003-0012-z

86. Zadeh LA (1996) Fuzzy sets, fuzzy logic, and fuzzy systems:selected papers by Lotfi A Zadeh, World Scientific Series InAdvances In Fuzzy Systems. World Scientific Publishing, RiverEdge, pp 826

87. Zhang Q, Boyle R (1991) A new clustering algorithm withmultiple runs of iterative procedures. Pattern Recognit24(9):835–848

88. Zöller G, Hainzl S, Ben-Zion Y, Holschneider M (2006) Earth-quake activity related to seismic cycles in a model for a het-erogeneous strike-slip fault. Tectonophysics 423:137–145.doi:10.1016/j.tecto.2006.03.007

89. Zöller G, Ben-Zion Y, Holschneider M (2007) Estimating recur-rence times and seismic hazard of large earthquakes on an

Page 25: EarthquakeClustersover Multi-dimensionalSpace, Visualizationof

Earthquake Damage: Detection and Early Warning in Man-Made Structures E 2371

individual fault. Geophys J Int 170:1300–1310. doi:10.1111/j.1365-246X.2007.03480.x

Books and ReviewsErtoz L, Steinbach M, Kumar V (2003) Finding clusters of different

size, shapes and densities in noisy, high-dimensional data.Army High Performance Center, technical report, April 2003

Yuen DA, Kadlec BJ, Bollig EF, Dzwinel W, Garbow ZA, da SilvaC (2005) Clustering and visualization of earthquake datain a grid environment, vol 10/1. Vis Geosci http://www.springerlink.com/content/n60423820556/

Earthquake Damage:Detection and EarlyWarningin Man-Made StructuresMARIA I. TODOROVSKADepartment of Civil Engineering,University of Southern California, Los Angeles, USA

Article Outline

GlossaryDefinition of the SubjectIntroductionLiterature ReviewDamage and Damage-Sensitive FeaturesStructural Models and IdentificationExamplesFuture DirectionsBibliography

Glossary

Structural health monitoring Is the process of determin-ing and tracking the structural integrity and assessingthe nature of damage in a structure. It is often used in-terchangeably with structural damage detection.

Inter-story drift Is the ratio between the relative horizon-tal displacements at two levels of the structure and thedistance between them. It is important to distinguishbetween drift resulting from deformation of the struc-ture, which is directly related to damage, and drift re-sulting from the deformation of the soil and rocking ofthe structure as a rigid body. It is also important to esti-mate reliably the drift due to permanent displacement(its “DC” component), which cannot be done reliablyusing data from vibrational sensors unless six degreesof freedom of motion (three translations and three ro-tations) are recorded.

Soil-structure interaction (SSI) Is a process occurringduring vibration of structures founded on flexible soil,in which the structure and soil interact, and their mo-tions are modified. Kinematic interaction refers to theeffects of scattering and diffraction of the incident seis-mic waves from the soil excavation for the foundation.Dynamic interaction refers to the effects caused by theinertia forces of the structure and foundation, whichlead to deformation of the soil, and results in modifi-cation of the resonant frequencies and damping of theresponse of the structure, foundation and soil acting asa system.

Resonant frequencies of vibration Of a structure onflexible soil are those of the soil-structure system, andthe energy of the vibrational response is concentratedaround these frequencies. They depend on the stiff-ness of the building and that of the soil. Fixed-basefrequencies of vibration are the resonant frequenciesof the structure on rigid soil, and depend only on thestiffness of the structure. Loss of stiffness of the struc-ture due to damage results in reduction of the fixed-base frequencies, and indirectly of the system frequen-cies. Monitoring changes in the fixed-base frequenciesis most reliable because it eliminates the effects of thesoil, which can exhibit (recoverable) nonlinear behav-ior during strong shaking.

Definition of the Subject

Structural health monitoring and structural damage de-tection refers to the process of determining and trackingthe structural integrity and assessing the nature of dam-age in a structure. Being able to detect the principal com-ponents of damage in structures as they occur during anearthquake or soon after the earthquake, or the absenceof it, before physical inspection is possible, is an impor-tant and challenging problem. Considering the challengesfaced and the potential benefits for safety and for minimiz-ing disruption of productivity, structural health monitor-ing has the elements of a grand challenge problem in civilengineering [12].

Structural damage can be described by the followingfive attributes: existence, location, type, extent, and prog-nosis for the remaining useful life. Structural damage isa complex state, which can occur on different time scales,suddenly during some catastrophic event such as earth-quake or explosion, or gradually over the life of the struc-ture, due to deterioration of the structural materials byaging, service, and exposure to environmental influences.This article is concerned primarily with identification ofthe most significant components in the space of complex