application of hierarchical clustering technique for numerical tectonic regionalization of the...

14
RESEARCH ARTICLE Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran) Seyed Naser Hashemi & Rezvan Mehdizadeh Received: 23 September 2013 /Accepted: 22 May 2014 # Springer-Verlag Berlin Heidelberg 2014 Abstract Numerical agglomerative hierarchical classification is fundamentally an unsupervised method of grouping individ- uals on which there are multivariate data so as to identify natural groups in them and perhaps in the populations from which they are drawn and where no prior classification exists or is assumed. We have used the technique to make a tectonic regionalization of the Zagros region and to see whether it can increase our understanding of the regional tectonics. We first identified 137 sub-areas as units for each of which we had recorded 18 quantitative variables; these formed our data, which we held in a data matrix of n =137 rows and p =18 columns. After data standardization, we computed the relationships among all pairs of sub-areas as Euclidean distances and then grouped them hierarchically using Wards method to form a dendrogram. Cutting the dendrogram at several levels of dissimilarity pro- vided a series of tectonic zoning maps which matched the trends in tectonic evolution of the region. This sequence, ob- tained automatically, agrees well with our general understand- ing of the geology. However, in the present study some new findings about the tectonic nature of the region were obtained. For example, the role of the Kazerun-Qatar and Oman lines as two major structural features has been clearly demonstrated. In addition, a striking difference between the Minab zone and the other parts of the Zagros region has been observed. This study simply presents the necessity and usefulness of hierarchical cluster analysis, as an appropriate statistical pattern recognition technique, for increasing the degree of the objectivity of the regionalization researches in the Earth sciences. Keywords Automaticclassification . Hierarchicalclustering . Multivariate analysis . Numerical tectonic regionalization . Zagros Introduction Numerical classification and pattern recognition can be divided into two main kinds, namely unsupervised and supervised. In the first, individual units is grouped according to their inherent char- acteristics without regard to any other information regarding their interrelations or predefined structure. In the second, a prior clas- sification is assumed and the task becomes one of reallocating the units to improve the classification or of identifying the class to which any new unit is best allocated. In both cases the aim might be to condense large bodies of observations on numerous indi- vidual units into more manageable forms in a few classes. One popular way of creating unsupervised classifications is by hierarchical grouping of units based on numerous recorded characters. It is an efficient algorithmic technique devised in the 1960s when computers were small and slow. It was intended to match the evolutionary relations among biological entities such as plants and animals, which biological taxonomists had ar- ranged hierarchically for many years, and in some way simu- lated the evolutionary process. See, for example, Sneath and Sokal (1973) for an account. It was also thought to be more objective than traditional classifications in that the method of grouping the units could be completely specified and therefore be repeated exactly, given the data. In the event the technique has revealed structures in data that were not previously evident or expected. It has proved to be a tool of discovery. Geology, except in the fossilized remains of plants and ani- mals, does not have any evolutionary hierarchy to be explored or revealed. Rather the purpose of classification is to create groups of rocks, strata, outcrops or small regions in such a way that the variation within the groups is smaller than that in the populations as a whole and the groups are distinct from one another. The more homogeneous are the groups internally, and the more distinct they are from one another the better is the classification. Communicated by: H. A. Babaie S. N. Hashemi (*) School of Earth Sciences, Damghan University, Damghan, Iran e-mail: [email protected] S. N. Hashemi e-mail: [email protected] R. Mehdizadeh Department of Mining Engineering, Isfahan University of Technology, Isfahan, Iran Earth Sci Inform DOI 10.1007/s12145-014-0163-5

Upload: rezvan

Post on 24-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

RESEARCH ARTICLE

Application of hierarchical clustering technique for numericaltectonic regionalization of the Zagros region (Iran)

Seyed Naser Hashemi & Rezvan Mehdizadeh

Received: 23 September 2013 /Accepted: 22 May 2014# Springer-Verlag Berlin Heidelberg 2014

Abstract Numerical agglomerative hierarchical classificationis fundamentally an unsupervised method of grouping individ-uals onwhich there aremultivariate data so as to identify naturalgroups in them and perhaps in the populations from which theyare drawn andwhere no prior classification exists or is assumed.We have used the technique to make a tectonic regionalizationof the Zagros region and to see whether it can increase ourunderstanding of the regional tectonics. We first identified 137sub-areas as units for each of which we had recorded 18quantitative variables; these formed our data, which we heldin a data matrix of n=137 rows and p=18 columns. After datastandardization, we computed the relationships among all pairsof sub-areas as Euclidean distances and then grouped themhierarchically using Ward’s method to form a dendrogram.Cutting the dendrogram at several levels of dissimilarity pro-vided a series of tectonic zoning maps which matched thetrends in tectonic evolution of the region. This sequence, ob-tained automatically, agrees well with our general understand-ing of the geology. However, in the present study some newfindings about the tectonic nature of the region were obtained.For example, the role of the Kazerun-Qatar and Oman lines astwo major structural features has been clearly demonstrated. Inaddition, a striking difference between the Minab zone and theother parts of the Zagros region has been observed. This studysimply presents the necessity and usefulness of hierarchicalcluster analysis, as an appropriate statistical pattern recognitiontechnique, for increasing the degree of the objectivity of theregionalization researches in the Earth sciences.

Keywords Automaticclassification .Hierarchicalclustering .

Multivariate analysis . Numerical tectonic regionalization .

Zagros

Introduction

Numerical classification and pattern recognition can be dividedinto two main kinds, namely unsupervised and supervised. In thefirst, individual units is grouped according to their inherent char-acteristics without regard to any other information regarding theirinterrelations or predefined structure. In the second, a prior clas-sification is assumed and the task becomes one of reallocating theunits to improve the classification or of identifying the class towhich any new unit is best allocated. In both cases the aim mightbe to condense large bodies of observations on numerous indi-vidual units into more manageable forms in a few classes.

One popular way of creating unsupervised classifications isby hierarchical grouping of units based on numerous recordedcharacters. It is an efficient algorithmic technique devised in the1960s when computers were small and slow. It was intended tomatch the evolutionary relations among biological entities suchas plants and animals, which biological taxonomists had ar-ranged hierarchically for many years, and in some way simu-lated the evolutionary process. See, for example, Sneath andSokal (1973) for an account. It was also thought to be moreobjective than traditional classifications in that the method ofgrouping the units could be completely specified and thereforebe repeated exactly, given the data. In the event the techniquehas revealed structures in data that were not previously evidentor expected. It has proved to be a tool of discovery.

Geology, except in the fossilized remains of plants and ani-mals, does not have any evolutionary hierarchy to be explored orrevealed. Rather the purpose of classification is to create groupsof rocks, strata, outcrops or small regions in such a way that thevariation within the groups is smaller than that in the populationsas a whole and the groups are distinct from one another. Themore homogeneous are the groups internally, and the moredistinct they are from one another the better is the classification.

Communicated by: H. A. Babaie

S. N. Hashemi (*)School of Earth Sciences, Damghan University, Damghan, Irane-mail: [email protected]

S. N. Hashemie-mail: [email protected]

R. MehdizadehDepartment of Mining Engineering, Isfahan University ofTechnology, Isfahan, Iran

Earth Sci InformDOI 10.1007/s12145-014-0163-5

Page 2: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

One particular form of classification, and one that concernsus in this paper, is to divide a region into homogeneous zoneson the basis of specified characteristics, and to measure thedegree of difference or similarity between zones. It is the formdescribed by Harff and Davis (1990) more than 20 years ago,though the technique had been outlined previously by manyauthors (e.g. Collyer and Merriam 1973; Anyadike 1987;Puvaneswaran 1990; Ahmed 1997).

The hierarchical technique for classification is an appropri-ate tool for a meaningful data reduction and interpretation ofdata and reveals many hidden patterns in geology (e.g. Parks1966; Davis 2002; Harff and Davis 1990), and has been usedsubsequently (Zamani and Hashemi 2004; Hargrove andHoffman 2005; Hashemi 2013). The region in this instanceis the Zagros Belt in south west Iran, see Fig. 1. We hope that

the maps of zones created in this way would help users of theinformation to visualize and understand similarities among thecomplex multivariate geological entities there.

Given a data set containing measurements on individ-uals, in some cases we want to see if some natural groupsor classes of individuals exist, and in other cases, we wantto classify the individuals according to a set of existinggroups. Cluster analysis, as a multivariate statistical tech-nique, develops tools and methods concerning the formercase, that is, given a data matrix containing multivariatemeasurements on a large number of individuals (or ob-jects), the objective is to build some natural subgroups orclusters of individuals. This is done by grouping individ-uals that are “similar” according to some appropriatecriterion. Cluster analysis is applied in many fields such

Fig. 1 Structural map showing the main tectonic units of the Zagros orogenic belt (modified and adapted from Ghasemi and Talbot 2005 and Hashemi2013). The approximate location of the examined region is shown

Earth Sci Inform

Page 3: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

as the natural sciences, the medical sciences, economics,marketing, etc.

There are essentially two types of clustering methods:hierarchical algorithms and partioning algorithms. The hi-erarchical algorithms can be divided into agglomerativeand splitting procedures. The first type of hierarchicalclustering starts from the finest partition possible (eachobservation forms a cluster) and groups them. The secondtype starts with the coarsest partition possible: one clustercontains all of the observations. It proceeds by splitting thesingle cluster up into smaller sized clusters. Thepartitioning algorithms start from a given group definitionand proceed by exchanging elements between groups untila certain score is optimized. The main difference betweenthe two clustering techniques is that in hierarchical clus-tering once groups are found and elements are assigned tothe groups, this assignment cannot be changed. Inpartitioning techniques, on the other hand, the assignmentof objects into groups may change during the algorithmapplication (Härdle and Simar 2012).

Cluster analysis is essentially a data-driven approach,that attempts to discover structure within the data itself,grouping together the feature vectors in clusters of data(Marques de Sa 2001). There are a large number of cluster-ing algorithms, depending on the type of data, on the rulesfor pattern amalgamation in a cluster and on the approachfollowed for applying such rules. Detailed descriptions ofthe principles of the method are outlined in many textbooks(Anderberg 1973; Davis 2002; Everitt et al. 2011; Hair et al.1998; Kaufman and Rousseeuw 2005; Romesburg 1984;Swan and Sandilands 1995; Theodoridis and Koutroumbas2006).

Our aim in this research has been to subdivide theZagros region by grouping the numerous much smallerregions on which we have multivariate tectonically rele-vant data into larger ones. In this way we hope to gaininsight into the tectonic development of the region. Fig-ure 1 shows the general structural features of the Zagrosregion and the other parts of the Iranian plateau. TheZagros region in the south west of Iran is one of the mosttectonically active regions in the world. It is a moderncollision zone, and the study of this region is importantfor this reason. Indeed, there has been a great deal ofresearch on it for decades. Nevertheless, we believe thereis still much to learn about the region, and the research wedescribe below using hierarchical clustering contributes toour understanding of it. In recent years, the application ofcluster analysis and the other multivariate statistical tech-niques for the study of seismicity and tectonics of Iran hasbeen the subject of many articles (e.g. Ansari et al. 2009;Zamani et al. 2012). These studies show that clusteringtechniques can reliably be applied for identifying homo-geneous natural zones and increasing the degree of the

objectivity of the regionalization projects in Earthsciences.

Methods

Hierarchical agglomerative clustering, the technique wehave used, is well described in several textbooks (e.g.Sneath and Sokal 1973; Anderberg 1973; Webster 1977;Davis 2002; Swan and Sandilands 1995; Hair et al. 1998;Everitt et al. 2011; Kaufman and Rousseeuw 2005). It isattractive in the earth sciences, probably because it enablesone to create not only a single classification but severalclassifications at various levels of similarity. The data arenot partitioned into a particular number of classes at asingle step. Instead, the hierarchical classification consistsof a series of partitions, which may run from a singlecluster containing all objects, to as many groups as thereare individuals (Everitt et al. 2011).

The technique begins with a set of n individuals, ‘units’ inthe jargon, on each of which p quantitative variables havebeen observed or measured and recorded. These are arrangedin a data matrix, X, of dimensions n × p, thus

X ¼

x11 x12 : : x1px21 x22 : : x2p: : : : :: : : : :xn1 xn2 : : xnp

0BBBB@

1CCCCA

Each row represents one unit, and each column representsone variable.

From this matrix a measure of similarity or dissimilarityis calculated between each and every pair of units. Numer-ous measures have been proposed for computing similaritiesfrom multivariate data (Gower 1971; Anderberg 1973;Gordon 1996, 1999; Jain et al. 1999; Everitt et al. 2011;Duda et al. 2001). Of these, Euclidean distance is simple,intuitive appealing and popular, and it is the one we used. Itis computed as

djk ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXi¼1

p

xij − xik� �2

sð1Þ

Where djk is the ‘distance’ in the p-dimensional spacedefined by the p variables between units j and k. The largerit is the more dissimilar are j and k, and vice versa. Theresult is a symmetric matrix, D, of dimension n×n. Typi-cally different variables are measured on different scales.Clustering solutions may depend drastically on the metricused. So, when performing data clustering, the solutionsobtained may also vary with the measurement scale and itis necessary to standardize (normalize) data before

Earth Sci Inform

Page 4: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

hierarchical clustering (Marques de Sa 2001). Therefore, inorder to give all variables equal weight, the columns of Xare standardized.

The next step is to compute a step-wise grouping of theunits from D. Again, several methods, essentially algorithms,for doing this have been proposed. We chose Ward’s meth-od (see Webster 1977, for its description and comparisonwith other methods). It aims to minimize the ‘error sum ofsquares’, that is the sum of the squares of the distancesbetween units and their group centroids, ∑d2.

It begins by a search D for the smallest element djk. It joinsunits j and k into a group and places it at the centroid of the twounits in the multidimensional space. Its next grouping is thefusion of units or groups that make the smallest increase in∑d2. The procedure continues until just two groups remain.

Flowchart showing steps of a general hierarchical cluster-ing procedure performed for geological regionalization stud-ies is presented in Fig. 2. As mentioned in this chart, clusteranalysis can be divided into two fundamental steps (Härdleand Simar 2012):

1. Choice of a proximity measure: One checks each pair ofobservations (objects) for the similarity of their values. Asimilarity (proximity) measure is defined to measure the“closeness” of the objects. The “closer” they are, the morehomogeneous they are.

2. Choice of group-building algorithm: On the basis of theproximity measures the objects assigned to groups so thatdifferences between groups become large and observa-tions in a group become as close as possible.

The result of the procedure described above is an orderedseries of linkages between the specified units or groups, eachat specific dissimilarity. This information is conventionallydisplayed as a dendrogram, which as its name suggests, is abranching tree-like construction. The dendrogram is essential-ly one-dimensional, with a scaled horizontal axis for thedissimilarity coefficient. The dendrogram displays the hierar-chy created from the data, but is not in itself necessarily usefulas a classification. Therefore, we usually need to cut it at onemore values of dissimilarity to form classes. We do so bydrawing vertical lines—the phenon lines—at specific value ofd. A phenon line cuts the branches of the dendrogram andthereby creates classes. These units within these classeswill then be more similar to one another than in the setof units as a whole.

Study area and data analysis

Among the present active zones, the Zagros mountainbelt, in the southwest of Iran, is one of the most seismi-cally active intra-continental fold-and-thrust belts on

Earth (Fig. 1). It results from the collision of the Arabianplate with the continental blocks of Central Iran, begin-ning in Miocene time (Stöcklin 1968) and continuingtoday with a convergence rate of 22 ±2 mm yr−1

(Vernant et al. 2004). The main feature of the Zagrosregion is a linear, asymmetrical folded belt, which formsa 200–300~km wide series of ranges extending for 800–1,200~km from the northern tip of the Arabian platethrough Iraq and the southwestern part of Iran as far asthe Strait of Hormoz (Blanc et al. 2003).

The tectonic zoning of the Zagros region has been thesubject of much research. Several authors believe that thisregion shows quite different tectonics from the other partsof Iran (e.g. Takin 1972; Zamani and Hashemi 2004;Hashemi 2004; Agard et al. 2011). Falcon (1974) recog-nized three structural zones for the region. From northeasttowards southwest these are: Complex structural zonewith metamorphic rocks; Thrust (imbricated) zone of theinner Zagros; Simply folded zone, including the ‘foot-hills’ belt. Stöcklin (1968) divided Iran into smaller zonesbased on structural style, age and intensity of deforma-tion, age and the nature of magmatic and metamorphicactivities. According to his subdivision, the Zagros regionitself is divided into four zones: Arvand–Shatt-al-Arabzone, Zagros Folded belt, Zagros Thrust zone, and Sanan-daj–Sirjan zone. Berberian (1976) divided the Zagros

Fig. 2 Flowchart showing steps of a general hierarchical clusteringprocedure performed for geological regionalization studies

Earth Sci Inform

Page 5: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

region into two structural-geological units based on the re-gional differences in structural-geological characteristics.These two zones are High Zagros (Imbricate or Thrust Zone)and Foothills (Simply Folded Belt). Alavi (1980) introducedthree distinctive parallel and linear tectonic zones in theZagros region: Urmia-Dokhtar belt, Sanandaj-Sirjan zoneand Simply Folded zone. Blanc et al. (2003) subdivided theregion into two main zones, divided by major faults or abruptchanges in geomorphology or both. To the southwest, theHigh Zagros Thrust Belt contains highly imbricated slices ofthe Arabian margin and fragments of Cretaceous ophiolites(Alavi 1994; Berberian 1995). The High Zagros Thrust Beltoverthrusts the second main structural zone, the SimplyFolded Zone, to the southwest along the High Zagros Fault.Sepehr and Cosgrove (2004) divided the mountain belt intonorthwest–southeast trending structural zones parallel to theplate margin separated by major fault zones. Despite theachievements in the study of tectonics of the Zagros region,especially for the zoning of this region, it is likely to beadvantageous to use quantitative and repeatable methodsbased on numerical criteria to provide stable results.

This paper introduces a method for tectonic zoning of theZagros region using the hierarchical method of groupingdescribed above. Our starting point was the identification ofnumerous small regions based on the geological (e.g. litholo-gy and structure), geomorphological (e.g. topography), andgeophysical (e.g. gravity anomalies and seismicity) informa-tion. Contrary to our previous works made in this subject (e.g.Zamani and Hashemi 2004; Hashemi 2013), we decided tosubdivide the whole region on the basis of real and naturalborders, as many as possible. It is expected that such a proce-dure results in more realistic outputs. So, based on the presentavailable data, our units are 137 sub-areas each separated fromits neighbors by a major natural linear feature (Fig. 3). Weassumed that these small regions can be reliably considered astectonically homogeneous, and with present knowledge and ina regional scale, they are the smallest homogeneous regionswe can envisage. For each unit we had 18 quantitative vari-ables, geological and geophysical characteristics. Table 1 liststhe variables with their descriptions and symbols. The datarecorded and computed from various sources: geological,topographic, and structural maps; earthquake data catalogues;

Fig. 3 Subdivided base map of the Zagros region showing the location of 137 sub-areas separated from each other by major natural geographical,geological, and geophysical boundaries

Earth Sci Inform

Page 6: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

geophysical data sources. The variables are all ones that canreliably represent the intensity and degree of contrast of tec-tonic characteristics in the region.

We assembled the data into a matrix of dimensions137×18. The attributes may be measured in different unitsand may differ in value by orders of magnitude. Thevariables are therefore standardized in order to give equalweight to each attribute. The most common form of stan-dardization is the conversion of each datum to standardscores (Z-scores) by subtracting the mean and dividing bythe standard deviation of each variable. The process con-verts each raw variable into a standardized variable with amean of 0 and a standard deviation of 1. We created adissimilarity matrix of dimension 137×137, and from itformed a dendrogram (tree diagram) by Ward’s method(Fig. 4). All the computations were done in SPSS software(SPSS 2006).

Recognition of tectonic zones

To produce maps of tectonic zones we had to decide where toplace the phenon lines, i.e. where cut the dendrogram intobranches. We wanted the branches to coincide with distinctive

groupings (Fig. 4). Our choice was based on visual controlthat reveals the natural breaks between tectonic zones. Thenumber of tectonic maps we create depends on the dissimilar-ity at which we cut the dendrogram. The several phenon linesproduce maps with many zones to few zones. Maps withmany tectonic classes provide detail that might accord withfine underlying structures, whereas maps with few tectoniczones are simpler and show the major structures. By produc-ing maps at several levels of detail we hoped to gain insightinto the region’s geology and geophysics. Accordingly, wecan cut the dendrogram by phenon lines at different values ofdissimilarity, and in this way produce maps with differentnumber of classes. In cluster analysis, we may classify objectsat any desired level of similarity. However, as the number ofclusters increases, the distance between groups decreases andthe resulting zoning maps may at some points becomeuninterpretable.

Due to geological complexities and heterogeneity of tec-tonic situation of the Zagros region, and in order to obtain aclassification that seeks and separates major structures, wewanted to select an optimum zoning map among the producedautomatic tectonic zoning maps.

The problem of deciding how many clusters are present inthe data is one common to all clustering methods. Clearly, thisis equivalent to the identification of the number of clusters thatbest fits the data. Several criteria have been suggested forfinding optimum number of groups in a hierarchy (Milliganand Cooper 1985; Jung et al. 2003). If we regard the numberof clusters, g, as a model parameter then the cluster validityindices may be used to compare the value of the index func-tion for different values of g. Plotting the validity indexagainst g, there may be a value for g for which a significantlocal change in the index occurs. When applied to hierarchicalschemes, these procedures for determining the number ofclusters g are sometimes referred to as stopping rules (Webband Copsey 2011).

In this research, based on the average distance measuredbetween classes, we proposed a new validity index, agglom-eration distance coefficient (G), for finding optimum numberof classes. This criterion can be considered as a modifiedversion of some previously proposed methods such as “El-bow method” (Thorndike 1953), “gap statistic rule”(Tibshirani et al. 2001; Jung et al. 2003), and “knee pointmethod” (Zhao et al. 2008). It is expected that this criterioncould simply and more realistically be used for finding theoptimum number of clusters, especially in clustering ofnatural data.

It is

G ¼ average distance between classes; ð2Þ

where small values of G indicate that the classes are unstableand they are able to merge.

Table 1 Quantitative variables used for tectonic pattern recognition ofthe Zagros region, measured within 137 sub-areas

Variables description Unit Acronym

Number of earthquakes with mb 3.0 andgreater occurred in unit area (1,000 km2)

# NE3

Maximum magnitude of earthquakes Richter MME

Average magnitude of earthquakes Richter AME

Logarithm of the total seismic energyreleased over unit area (1,000 km2)

Erg. km−2 LSE

Average focal depth of earthquakes km AFD

Seismic deformation index(De Rubeis et al. 1992)

Erg. km−2. yr−1 SDI

Average gravitational acceleration Gal AGA

Gravitational acceleration relief Gal GAR

Average Bouguer anomaly mGal ABA

Bouguer anomaly relief mGal BAR

Average elevation m AEL

Topographic relief m TOR

Maximum elevation m MXE

Number of folds per unit area (1,000 km2) # NFO

Relative area of salt domes % RAS

Relative area of Ophiolites % RAO

Number of faults per unit area (1,000 km2) # NFA

Fault length density (1,000 km2) km−1 FLD

Variables were recorded in completely different units, so the data havebeen standardized before clustering

Earth Sci Inform

Page 7: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

Fig. 4 Dendrogram showinghierarchical clustering ofmultivariate data matrixconcerning with the tectoniczoning of the Zagros region usingWard’s method. Dendrogramprovides a means for revealinglatent patterns of tectonic zoning.Vertical broken lines—phenonlines—drawn at six levels ofsimilarity or cluster hierarchy,depicting 2-, 3-, 4-, 5-, 6-, and 13-cluster solutions of tectoniczoning, respectively

Earth Sci Inform

Page 8: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

The values ofG for the 2 to 15 classes have been measuredand are presented in Fig. 5. Among the solutions, based on theamount of the dissimilarity from phenon line to the next, the2-, 3-, 4-, 5-, 6-, and 13-cluster solution maps have beenselected.

Results and discussion

The maps resulting from our analyses for 2, 3, 4, 5, 6, and13 classes are presented in Fig. 6a–f. To compare the geo-logical and geophysical attributes of the classes so createdwe computed the average values of the 18 variables for eachzone differentiated in clustering process performed to obtain2, 3, 4, 5 and 6 zones maps. These averages are listed inTable 2.

In the two-class subdivision (Fig. 6a), the separation of theZagros zone (zone 1) from the central Iranian part of theregion (zone 2) is clear. This subdivision means that thesetwo zones are quite different from each other. According todata listed in Table 2 and also the stability of zone 2 in thehighest subdivision, zone 2 in Fig. 6a seems to be a part ofCentral Iran rather than the Zagros region (zone 1) which haslarger average values of the geophysical variables. In otherwords, this zone shows greater tectonic activity. This finding

is completely confirmed by the studies of the previousresearchers.

The three-partition division (Fig. 6b) distinctly isolates theKhuzestan plain (zone 3) from the other parts of the Zagrosregion (zone 2). This zone extends with a northwest–southeasttrend along parts of border between Iran and Iraq throughOman line. Based on the data listed in Table 2, zone 3 ischaracterized by very great seismic activity and moderategeological and geophysical anomalies. The zone 2 is a stablezone in this stage, and the average values of the variables forthis zone have not been changed. The zone 1 like the rest ofthe region appears with little to moderate seismic activity andvery large geological and geophysical anomalies. As a result,among these zones, zone 3 can be considered as a highlyactive zone.

The four-zone division (Fig. 6c) separates theKermanshah–Fars tectonic zone (zone 2) from the Zagrosregion. The Kermanshah–Fars tectonic zone, trending fromnorthwest to southeast, covers most of the study area. The datalisted in Table 2 simply show the general characteristics of thezones. The zone 2 (Fig. 6c) is characterized by moderateseismicity, moderate number of faults per unit area and faultlength density, very high topography, and small to moderategeophysical anomalies. The zone 1 shows moderate seismic-ity, numerous faults per unit area and a large fault-lengthdensity, moderate topography, very large geophysical anoma-lies, and large diapirism. The zones 3 and 4 are not affected bygrouping process, and the average values of variables for thesezones have not been changed.

In the five-cluster divisionmap (Fig. 6d), the tectonic zones1, 2, and 3 have remained stable, and zone 4 in Fig. 6c hasbeen divided into zones 4 and 5 (Fig. 6d). The Kazerun–Omanzone (zone 5) lies in the east of the Zagros region. This ishighly active seismically, has high topography and large geo-physical anomalies, high diapirism activity, and moderatefaulting-high folding density. So, it can be considered as ahighly active zone. The zone 4 occupies the southwestern partof the region adjacent to the Arabian plate. It is moderately tostrongly seismic, has very low topography, and moderatefaulting and folding density. In brief, the zone 5 shows thegreatest tectonic activity.

In six-cluster division (Fig. 6e) the Nahavand–Fars tectoniczone is isolated from the Kermanshah–Fars zone. This zonehas a northwest to southeast trending belt, and it occupies thecentral part of the study region. Based on the average values ofvariables (Table 2), this zone is characterized by weak tomoderate seismicity, very high topographic relief, and largefolding and faulting density. The zone 2 that occupies areas inthe northern and central parts of the region is characterized bymoderate to strong seismicity, high topography, moderate tolarge geophysical anomalies, and moderate faulting and

Fig. 5 Graph showing the values of average distance between groups fordifferent number of classes

Earth Sci Inform

Page 9: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

folding activity. The zones 1, 3, 5, and 6 (Fig. 6e) remainedstable. Moreover, the zone 4 and the zone 2 show strong andmoderate tectonic activity, respectively. The zone 6 has thestrongest tectonic activity.

As a rule, one can prepare different maps showing differentnumber of classes or zones. Nevertheless, we want maps that

we can interpret, and for this reason we prefer maps with fewclasses or zones. According to Fig. 4 the next classificationthat shows large dissimilarity is the 13-class solution. In thissubdivision, the previous large zones are separated into nu-merous small zones (Fig. 6f). Furthermore, this map is asso-ciated with more detail that reveals structures not revealed by

Fig. 6 Tectonic zoning maps of the Zagros region, provided in different levels of similarity: a 2-class map; b 3-class map; c 4-class map; d 5-class map(optimum zoning map); e 6-class map; f 13-class map. Zones are numbered according to their hierarchical orders

Earth Sci Inform

Page 10: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

the other classifications. Among these valid and acceptableselected maps, and according to the interpretability of theresults, as well as the agglomeration distance coefficient cri-terion proposed in this research for finding the optimumnumber of clusters, it seems that the 5-class map could bereliably selected as the best map in this hierarchy. So, we

selected this map as the optimum tectonic zoning map of theregion. The zones of this map can be defined and named asfollows:

Zone 1: Minab zoneZone 2: Kermanshah-Fars zone

Fig. 6 (continued)

Earth Sci Inform

Page 11: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

Zone 3: Northeastern part zoneZone 4: Khuzestan zoneZone 5: Kazerun-Oman zone

Accordingly, Table 3 presents the values of Euclideandistances between the five zones differentiated in Fig. 6d. Ascan be seen, the largest dissimilarity exists between the zones5 and 3. This is reliably acceptable because these two zones

are so different from each other; zone 5 shows strong tectonicactivity whereas zone 3 can be considered as passive tecton-ically. So, this finding accords with our understandings aboutthe tectonic characteristics of the regions. On the other hand,the smallest dissimilarity exists between zones 2 and 5 indi-cating that these zones are similar to each other. Since thatthese two zones appear in map as adjacent zones, this findingalso can be reliably acceptable.

Fig. 6 (continued)

Earth Sci Inform

Page 12: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

Tab

le2

The

averagevalues

of18

quantitativevariables(Table1)

computedforthetectoniczonesdifferentiatedin

2-through6-classmaps(Fig.6a–e)

Variable

2-cluster

3-cluster

4-cluster

5-cluster

6-cluster

Zone1

Zone2

Zone1

Zone2

Zone3

Zone1

Zone2

Zone3

Zone4

Zone1

Zone2

Zone3

Zone4

Zone5

Zone1

Zone2

Zone3

Zone4

Zone5

Zone6

NE3

7.404

0.179

3.979

0.179

13.136

4.040

3.966

0.179

13.136

4.040

3.966

0.179

10.039

18.149

4.040

5.182

0.179

3.399

10.039

18.149

MME

5.187

3.000

5.053

3.000

5.411

4.950

5.076

3.000

5.411

4.950

5.076

3.000

5.0.145

5.865

4.950

5.205

3.000

5.016

5.0.145

5.865

AME

4.471

3.000

4.489

3.000

4.441

4.456

4.497

3.000

4.441

4.456

4.497

3.000

4.390

4.528

4.456

4.519

3.000

4.486

4.390

4.528

LSE

18.251

12.520

17.738

12.520

18.941

17.854

17.834

12.520

18.941

17.854

17.834

12.520

18.442

19.792

17.854

18.174

12.520

17.676

18.442

19.792

AFD

37.606

18.571

38.460

18.571

36.177

32.479

39.789

18.571

36.177

32.479

39.789

18.571

36.250

36.044

32.479

46.435

18.571

39.489

36.250

36.044

SDI

7.926

4.207

8.008

4.207

7.788

8.534

7.892

4.207

7.788

8.534

7.892

4.207

7.506

8.270

8.534

7.192

4.207

8.217

7.506

8.270

AGA

979.040

979.014

979.017

979.014

979.080

979.015

979.017

979.014

979.080

979.015

979.017

979.014

979.169

978.926

979.015

979.233

979.014

979.914

979.169

978.926

GAR

0.744

00.142

1.100

0.142

0.147

5.182

0.193

0.142

0.147

5.182

0.193

0.142

0.138

0.163

5.182

0.339

0.142

0.125

0.138

0.163

ABA

−114.305

−161.134

−128.876

−161.134

−89.914

−62.463

−143.190

−161.134

−89.914

−62.463

−143.190

−161.134

−82.382

−102.761

−62.463

−132.301

−161.134

−148.255

−82.382

−102.761

BAR

44.046

53.614

44.944

53.614

42.543

52.333

43.302

53.614

42.543

52.333

43.302

53.614

35.114

55.218

52.333

59.460

53.614

35.786

35.114

55.218

AEL

1376.19

2131.68

1780.43

2131.68

699.53

1105.98

1930.31

2131.68

699.53

1105.98

1930.31

2131.68

459.99

1108.16

1105.98

1616.95

2131.68

2076.05

459.99

1108.16

TOR

1615

1405

1854

1405

1216

1446

1944

1405

1216

1446

1944

1405

892

1769

1446

2667

1405

1608

892

1769

MXE

2223

2818

2732

2818

1371

1830

2932

2818

1371

1830

2932

2818

946

2097

1830

2934

2818

2932

946

2097

NFO

1.512

0.456

1.536

0.456

1.473

1.177

1.616

0.456

1.473

1.177

1.616

0.456

0.986

2.304

1.177

1.862

0.456

1.501

0.986

2.304

RAS

0.005

0.000

0.001

0.000

0.011

0.002

0.001

0.000

0.011

0.002

0.001

0.000

0.000

0.030

0.002

0.000

0.000

0.001

0.000

0.030

RAO

0.009

0.000

0.001

0.000

0.000

0.007

0.000

0.000

0.000

0.007

0.000

0.000

0.000

0.000

0.007

0.000

0.000

0.000

0.000

0.000

NFA

7.044

8.641

9.300

8.641

3.267

15.688

7.880

8.641

3.267

15.688

7.880

8.641

2.543

4.502

15.688

8.709

8.641

7.495

2.543

4.502

FLD

107.080

109.882

146.631

109.882

40.875

346.017

102.323

109.882

40.875

346.017

102.323

109.882

36.903

47.651

346.017

95.902

109.882

105.310

36.903

47.651

Earth Sci Inform

Page 13: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

Conclusion

We have shown how we have applied a hierarchical groupingalgorithm to classify small regions on which we have geolog-ical and geophysical measurements into larger and tectonical-ly meaningful ones. The stages of grouping process appear toaccord with the evolution of the complex tectonic develop-ment of the region. As Fig. 6a shows, the first stage of thezoning divides the region into two main zones 1 and 2 whichare very different from each other. This finding can be ex-plained by this fact that these two zones lie on differenttectonic plates. Zone 1 is a part of Eurasian plate, whereaszone 2 is a part of Arabian plate. In all, the study of theevolutionary tectonic path shown in Fig. 6a–f, demonstratesthat this path accords fairly well with previous research(Stöcklin 1968; Falcon 1974; Nowroozi 1976; Berberian1976; Alavi 1980; Blanc et al. 2003; Sepehr and Cosgrove2004;Mouthereau et al. 2007; Hassanzadeh et al. 2008; Agardet al. 2011). A comparison between maps shown successivelyin Fig. 6a–f and the previous works indicates the following:

(1) The boundaries of the tectonic zones have been delineat-ed mainly by major faults, which appear to be inheritedfrom old geological times. This fact that these mainly N-S trending major faults play important roles in tectonicevolution of the region, is confirmed by the findings ofother researchers (e.g. Agard et al. 2011).

(2) Although there are some similarities between the mapsof the higher-level classifications and the previous con-ventional maps (e.g. Stöcklin 1968; Falcon 1974), ournew maps reveal features that are not evident on thecurrent tectonic maps (Kazerun-Qatar and Oman lines).

(3) The maps in Fig. 6c–e reveal a striking difference be-tween the Minab zone (zone 1) and the rest of the region.This zone apparently remains stable in the groupingprocess with a unique combination of characteristicsidentifiable as distinctly different from the other zones.

(4) Similar to the previous conventional maps (e.g. Sepehrand Cosgrove 2004), the map in Fig. 6e distinguishesbetween the three structural zones of Lorestan, DezfulEmbayment, and Fars, and also it shows the tectonic

importance of the Hormoz salt formation in these zones(McQuarrie 2004).

(5) Unlike qualitative zoning methods, the quantitativemethod we have used enables anyone to repeat theanalysis and arrive at the same classification.

The good agreement between our results in this researchand the previous works concerning with the structural andtectonic zoning of the region, and high similarities betweenadjacent differentiated zones in the clustering procedure, canbe considered as obvious evidence of the validity of themodel. Finally, results obtained indicate that hierarchical clus-ter analysis as an automated approach for pattern recognitioncan be applied reliably for increasing the degree of the objec-tivity of regionalization studies in Earth sciences. In order toobtain more reliable results, the number of sub-areas (cases),the number of variables, as well as the accuracy and precisionof input data must be increased. It is expected that by enteringnew tectonic-related geological and geophysical variables intothe cluster analysis model, more reliable results can beobtained.

Acknowledgments This research has been partially supported by theDamghan University Research Council. We thank Dr. Richard Webster(Rothamsted Research, Harpenden, UK) for his valuable help and guid-ance which resulted in an improvement of the earlier version of thismanuscript. We also appreciate the comments of reviewers for construc-tive suggestions that improved the paper.

References

Agard P, Omrani J, Jolivet L, Whitechurch H, Vrielynck B, Spakman W,Monie P, Meyer B, Wortel R (2011) Zagros orogeny: a subduction-dominated process. In: Lacombe O, Grasemann B, Simpson G (eds)Geodynamic evolution of the Zagros. Geological Magazine, 692–725

Ahmed BYM (1997) Climatic classification of Saudi Arabia: an applica-tion of factor-cluster analysis. J Geol 41:69–84

Alavi M (1980) Tectonostratigraphic evolution of the Zagrosides of Iran.Geology 8:144–149

Alavi M (1994) Tectonics of the Zagros orogenic belt of Iran: new dataand interpretations. Tectonophysics 229:211–238

Anderberg MR (1973) Cluster analysis for applications. Academic, NewYork

Ansari A, Noorzad A, Zafarani H (2009) Clustering analysis of theseismic catalog of Iran. Comput Geosci 35:475–486

Anyadike RNC (1987) A multivariate classification and regionalizationof West African climates. Int J Climatol 7:157–164

Berberian M (1976) Contribution to the seismotectonics of Iran (Part II).Geology Survey of Iran, Tehran

Berberian M (1995) Master “blind” thrust fault hidden under the Zagrosfolds: active basement tectonics and surface morphotectonics.Tectonophysics 241:193–224

Blanc EJP, Allen MB, Inger S, Hassani H (2003) Structural styles in theZagros Simple Folded Zone, Iran. J Geol Soc Lond 160:401–412

Collyer PL, Merriam DF (1973) An application of cluster analysis inmineral exploration. Math Geol 5:213–223

Table 3 The Euclidean distance matrix showing the distance measurescomputed among 5 tectonic zones differentiated in Fig. 6d

Zone 1 Zone 2 Zone 3 Zone 4 Zone 5

Zone 1 0.000

Zone 2 5.219 0.000

Zone 3 6.869 5.494 0.000

Zone 4 5.978 5.185 7.065 0.000

Zone 5 6.005 4.311 7.471 5.691 0.000

Earth Sci Inform

Page 14: Application of hierarchical clustering technique for numerical tectonic regionalization of the Zagros region (Iran)

Davis JC (2002) Statistics and data analysis in geology, 3rd edn. JohnWiley & Sons, New York

De Rubeis V, Casparini C, Solipaca A, Tosi P (1992) Seismotectoniccharacterization of Italy, using statistical analysis of geophysicalvariables. J Geodyn 16:103–122

Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, NewYork

Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn.John Wiley & Sons Ltd, West Sussex

Falcon NL (1974) Southern Iran: Zagros Mountains. In: Spenser A (ed)Mesozoic–Cenozoic Orogenic belts. Geological Society LondonSpecial Publication 4:199–211

Ghasemi A, Talbot CJ (2005) A new tectonic scenario for the Sanandaj–Sirjan Zone (Iran). J Asian Earth Sci 26:683–693

Gordon AD (1996) Hierarchical classification. In: Arabie P, Hubert LJ,De Soete G (eds) Clustering and classification. World ScientificPublishers, River Edge, pp 65–121

Gordon AD (1999) Classification, 2nd edn. Chapman and Hall/CRCPublication, London

Gower JC (1971) A general coefficient of similarity and some of itsproperties. Biometrics 27:857–874

Hair JF Jr, Anderson RE, Tatham RL, BlackWC (1998)Multivariate dataanalysis. Prentice Hall, Englewood Cliffs

Härdle WK, Simar L (2012) Applied multivariate statistical analysis, 3rdedn. Springer, Berlin

Harff J, Davis JC (1990) Regionalization in geology by multivariateclassification. Math Geol 22:573–588

Hargrove WW, Hoffman FM (2005) Potential of multivariate methods fordelineation and visualization of ecoregions. EnvironManag 34:39–60

Hashemi SN (2004) Tectonic regionalization of Iran using multivariategeostatistical methods. unpublished doctoral dissertation, ShirazUniversity

Hashemi SN (2013) Seismicity characterization of Iran: a multivariatestatistical approach. Math Geosci 45:705–725

Hassanzadeh J, Stockli DF, Horton BK, Axen GJ, Stockli LD, Grove M,Schmitt AK, Walker JD (2008) U-Pb zircon geochronology of lateNeoproterozoic–Early Cambrian granitoids in Iran: implications forpaleogeography, magmatism, and exhumation history of Iranianbasement. Tectonophysics 451:71–96

Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACMComput Surv 31:264–323

Jung Y, Park H, Du D, Drake BL (2003) A decision criterion for theoptimal number of clusters in hierarchical clustering. J Glob Optim25:91–111

Kaufman L, Rousseeuw PJ (2005) Finding groups in data, an introductionto cluster analysis. Wiley, New York

Marques de Sa JP (2001) Pattern recognition: concepts, methods, andapplications. Springer, New York

McQuarrie N (2004) Crustal scale geometry of the Zagros fold-thrust belt,Iran. J Struct Geol 26:519–535

Milligan GW, Cooper MC (1985) An examination of procedures fordetermining the number of clusters in a data set. Psychometrika50:159–179

Mouthereau F, Tensi J, Bellahsen N, Lacombe O, De Boisgrollier T,Kargar S (2007) Tertiary sequence of deformation in a thin-skinned/thick-skinned collision belt: The Zagros Folded Belt (Fars,Iran). Tectonics 26:TC5006. doi:10.1029/2007TC002098, 28 p

Nowroozi AA (1976) Seismotectonic Provinces of Iran. Bull Seismol SocAm 66:1249–1276

Parks JM (1966) Cluster analysis applied to multivariate geologic prob-lems. J Geol 74:703–715

Puvaneswaran M (1990) Climatic classification for Queensland usingmultivariate statistical techniques. Int J Climatol 10:591–608

Romesburg HC (1984) Cluster analysis for researchers. LifetimeLearning Publications, Belmont

SepehrM, Cosgrove JM (2004) Structural framework of the Zagros Fold-Thrust belt, Iran. Mar Pet Geol 21:829–843

Sneath PHA, Sokal RR (1973) Numerical taxonomy. W.H. Freeman, SanFrancisco

SPSS (2006) Statistical package for the social sciences, SPSS version15.0. SPSS Inc, Chicago

Stöcklin J (1968) Structural history and tectonics of Iran: a review. BullAm Assoc Pet Geol 52:1229–1258

Swan ARH, Sandilands M (1995) Introduction to geological data analy-sis. Blackwell Science, Oxford

Takin M (1972) Iranian geology and continental drift in the Middle East.Nature 235:147–150

Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn.Academic Press

Thorndike RL (1953) Who belong in the family? Psychometrika 18:267–276

Tibshirani R, Walther G, Hastie T (2001) Estimating the number ofclusters in a data set via the gap statistic. JRSS-B 63:411–423

Vernant P, Nilforoushan F, Hatzfeld D, Abbassi MR, Vigny C, Masson F,Nankali H, Martinod J, Ashtiani A, Bayer R, Tavakoli F, Chery J(2004) Present-day crustal deformation and plate kinematics in theMiddle East constrained by GPS measurements in Iran and northernOman. Geophys J Int 157:381–398

Webb AR, Copsey KD (2011) Statistical patterns recognition, 3rd edn.John Wiley & Sons, Ltd, Sussex

Webster R (1977) Quantitative and numerical methods in soil classifica-tion and survey. Oxford University Press, Oxford

Zamani A, Hashemi N (2004) Computer-based self organized tectoniczoning: a tentative pattern recognition for Iran. Comput Geosci 30:705–718

Zamani A, Sami A, Khalili M (2012) Multivariate rule-based seismicitymap of Iran: a data-driven model. Bull Earthq Eng 10:1667–1683

Zhao Q, Hautamaki V, Franti P (2008) Knee Point Detection in BIC forDetecting the Number of Clusters. ACIVS 2008, LNCS 5259, 664–673

Earth Sci Inform