assessment of debris flow hazards using a bayesian...

��

Assessment of debris flow hazards using a Bayesian network

Wan-jie Liang, Da-fang Zhuang, Dong Jiang, Jian-jun Pan, Hong-yanRen

PII: S0169-555X(12)00227-9DOI: doi: 10.1016/j.geomorph.2012.05.008Reference: GEOMOR 3997

To appear in: Geomorphology

Received date: 11 February 2011Revised date: 16 March 2012Accepted date: 8 May 2012

Please cite this article as: Liang, Wan-jie, Zhuang, Da-fang, Jiang, Dong, Pan, Jian-jun,Ren, Hong-yan, Assessment of debris flow hazards using a Bayesian network, Geomor-phology (2012), doi: 10.1016/j.geomorph.2012.05.008

This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.

http://dx.doi.org/10.1016/j.geomorph.2012.05.008

http://dx.doi.org/10.1016/j.geomorph.2012.05.008

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

1

Assessment of debris flow hazards using a Bayesian network

Wan-jie Lianga,b

, Da-fang Zhuanga,b

, Dong Jianga,b,*

, Jian-jun Pana and Hong-yan

Renb

a College of Resources and Environmental Sciences, Nanjing Agricultural University,

Nanjing 210095, China

b State Key Laboratory of Resources and Environmental Information Systems,

Institute of Geographical Sciences and Natural Resources Research, Chinese

Academy of Sciences, Beijing 100101, China

*Corresponding author. Tel: +86-10-64889433, Fax: +86-10-64855049

E-mail address: [email protected] (D. Jiang)

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

2

Abstract

Comprehensive assessment of debris flow hazard risk is challenging due to the

complexity and uncertainties of various related factors. A reasonable and reliable

assessment should be based on sufficient data and realistic approaches. This study

presents a novel approach for assessing debris flow hazard risk using BN (Bayesian

Network) and domain knowledge. Based on the records of debris flow hazards and

geomorphological/environmental data for the Chinese mainland, approaches based on

BN, SVM (Support Vector Machine) and ANN (Artificial Neural Network) were

compared. BN provided the highest values of hazard detection probability, precision,

and AUC (area under the receiver operating characteristic curve). The BN model is

useful for mapping and assessing debris flow hazard risk on a national scale.

Keywords: Debris flow hazard; Bayesian network; Hazard assessment; Chinese

mainland

1. Introduction

A debris flow is a common geological hazard. It often begins with a landslide, and

the potential energy of the generated sliding mass can rapidly convert into kinetic

energy. Debris flows can induce a series of disasters that may pose a serious threat to

lives, properties, and economic development. Many countries suffer from serious

debris flow hazards. For example, large-scale debris flows occurred in Uganda on

March 1, 2010, resulting in disastrous casualties, with 94 deaths, 320 people missing,

and three buried villages.

China is one of the debris-flow prone countries. Debris flows occur in regions that

correspond to 45% (106

km2) of the Chinese mainland (Kang et al., 2004). For

example, a debris flow hazard occurred in Zhouqu in Gansu Province on August 8,

2010, resulting in 1467 deaths, 298 missing people and direct economic losses of

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

3

425,000 RMB. Similar debris flow hazards also occurred in Chuxiong and Gongshan

of Yunnan Province on the same day. Therefore, the assessment of regional debris

flow hazards is of great significance for the sustainable development of China.

Various factors including topography, geology and climate influence geological

hazards, and the measurement of these factors may involve large uncertainties

(Kondratyev et al., 2006). Over the past few decades, numerous hazard analyses have

employed qualitative and quantitative methods including artificial intelligence (AI).

Qualitative approaches were widely used in the 1970s to 1990s and were based on the

knowledge and opinions of experts (Carmassi et al., 1992; Carrara and Merenda, 1976;

Hearn, 1995; Pachauri et al., 1998; Rupke et al., 1988). They have the following

shortcomings: (i) evaluation tends to be subjective and assessment results from

different experts are not comparable; (ii) updating assessment using new data is

difficult; and (iii) required field experiments and investigations are expensive and

time-consuming. In quantitative approaches, statistical analyses are adopted to solve

the problem of subjectivity (Baeza and Corominas, 2001; Carrara, 2008). For example,

Ayalew and Yamagishi (2005) adopted logistic regression for assessing landslide

susceptibility; Guzzetti et al. (2005) introduced a probability approach to assess

landslide hazard risk on a basin scale; and Calvo and Savi (2009) conducted Monte

Carlo simulations for assessing debris flow risks. However, nonlinear relationships

between the variables used cannot be solved by these approaches.

With the recent development of geographical information science, data mining and

AI have been adopted in assessing geological hazards (Jiang and Eastman, 2000; Li et

al., 2005). The techniques include ANN (Artificial Neural Network; Chang and Chao,

2006a,b; Chang, 2007; Chen et al., 2008; Gomez and Kavzoglu, 2005; Liu et al., 2005;

Lu et al., 2007), SVM (Support Vector Machine; Wan and Lei, 2009; Yao et al., 2008),

GA (genetic algorithms; Chang and Chien, 2007; Chang et al., 2009), and decision

tree models (Saito et al., 2009; Wan, 2009; Wan and Lei, 2009). However, existing AI

methods have three shortcomings: (i) limited use of prior knowledge makes it difficult

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

4

to interpret assessment results; (ii) multiple sources of information cannot be

integrated into a consistent system for assessment; and (iii) they are not good at

dealing with the uncertainty of assessment.

BN (Baysian Network) is an effective tool for knowledge representation and

reasoning under the influence of uncertainty (Pearl, 1988; Reckhow, 1999). Because

BN can present uncertainty interdependencies among random variables that are used

to describe real-world domains, it has great potential for natural hazard assessment.

Compared with other assessment methods, BN has several merits: (i) domain

knowledge and multi-source information integrated into a consistent system; (ii) many

flexible learning algorithms for searching optimal solutions; (iii) flexibility to include

additional information; and (iv) decision support using nodes of functions and

decisions. In this study, a novel method for assessing debris flow hazard risk based on

BN and domain knowledge is proposed. Three debris flow hazard maps of the

Chinese mainland from BN, ANN and SVM were produced and compared.

2. Data and methods

2.1. Assessment method based on Bayesian network

A BN model can be expressed by (N, A, θ), where (N, A) is a directed acyclic graph

(DAG) and θ is a parameter for a node. Each node n N represents a domain

variable (often corresponding to an attribute in the database), and each arc a A

between nodes represents a probabilistic dependency between the associated nodes.

Each node in N is associated with a conditional probability distribution,

collectively represented by { }i , which quantifies how strongly a node depends

on its parent node (Pearl, 1988). BN has great potential for natural hazard assessment.

Compared with other AI methods such as ANN, a major advantage of BN is that they

represent knowledge in a semantic way; and individual components such as specific

nodes, arcs, or even values in the conditional probability tables have some meaning

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

5

and can be understood independently (Greiner et al., 2001). This allows us to

construct and interpret a network relatively easily.

A naïve BN model is a simple probabilistic classifier based on Bayes' theorem with

a strong independence assumption, where all the attributes Ai are conditionally

independent given the value of a class called C. By independence, we mean

probabilistic independence, that is, A is always independent of B given C whenever

r r( | , ) ( | )P A B C P A C and Pr(C) > 0 where Pr is probability (Nir et al., 1997).

Our debris flow hazard assessment using a BN model has the following six steps:

1) Selecting relevant parameters and spatial units;

2) Constructing training sample datasets for the model;

3) Learning and constructing the structure of the model;

4) Learning and determining the parameters for each node of the model;

5) Evaluating the performance and accuracy of the model; and

6) Using the model for assessment.

2.1.1. Learning structure of the BN model

To construct a BN model, the network that best matches a given training set needs

to be found. The learning algorithms may be divided into two types: dependency

analysis, and a scoring function with a search algorithm. The algorithm of the latter

can be subdivided into two types: constraint-based and heuristic. The K2 algorithm

(Gregory and Edward, 1992) is typically constraint-based, and conducts search

according to the given node order with the limited maximum number of parent nodes.

The main drawback of the K2 algorithm is that only the optimal structure within a

limited search space can be found. The greedy hill-climbing algorithm (Lim et al.,

2006) is heuristic and belongs to the local search family. It tends to fall into local

optimization; to avoid this problem, the random mutation hill-climbing algorithm has

been put forward (David, 1994). There are many other heuristic search algorithms

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

6

such as the simulated annealing algorithm and GA (Renner and Ekart, 2003). In our

method, an initial BN structure is obtained using the K2 search. The structure is then

refined using domain knowledge to obtain a hazard assessment model.

2.1.2. Learning parameters of the BN-based model

Once a BN structure is constructed, parameters of CPT (conditional probability

table) for each node can be obtained with two general approaches: using domain

knowledge and using parameters learned from sample datasets. If sufficient

knowledge regarding the mechanism of debris flow hazards is obtained, the CPT

parameters can be determined by an expert. If enough training data are given, the

parameters can also be derived from them. The two methods can be combined. The

parameter learning algorithms include maximum likelihood estimation or Bayesian

estimation. In this study, Bayesian estimation was utilized.

2.2. Factors for debris flow hazard assessment

Debris flow occurrence is affected by complex factors such as climate, geology,

topography, and hydrology. Seven environmental factors were selected in this study to

construct the assessment model of debris flow hazards for the Chinese mainland: X1 –

annual maximum cumulative rainfall of three consecutive days; X2 – annual number

of days with daily rainfall above 25 mm; X3 – vegetation coverage index; X4 – fault

length; X5: – area percentage of slope land with >25° inclination (APL25); X6 –

maximum elevation difference of the basin; and X7 – Gravelius index.

Rainfall is the main triggering factor of debris flow hazards, and debris flow

occurrence is related to both current and antecedent rainfalls. Therefore, effective

cumulative rainfall is useful for debris flow hazard assessment (Hsieh and Chen, 1993)

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

7

although its calculation is difficult. It could be represented by the annual maximum

cumulative rainfall of three consecutive days, and the annual number of days with

daily rainfall above 25 mm could also indicate the rainfall intensity and concentration.

Slope is an essential and important factor of debris flow occurrence (Johnson and

Rodine, 1984; Wang, 1994). According to Liu et al. (2005) and our field

reconnaissance, most debris flows in China have initiated on slopes steeper than 25°.

Some land use/cover types especially vegetation with strong and large root systems

increase slope stability (Dai and Lee, 2002). Franks (1999) indicated that sparsely

vegetated slopes are the most susceptible to failure. Nilaweera and Nutalaya (1999)

stated that vegetation provides hydrological and mechanical effects of slope

stabilization. To incorporate the effects of land use/cover, we used the following

vegetation coverage index, I:

5

1 1( ( ))

n

i j ji jI a W SW S S

(1)

where a is the normalization coefficient; iW is the weight of the first class of land

use (Table 1); jSW is the weight of the subclass of land use (Table 1);

jS is the area

of the subclass in an assessment unit; and S is the total area of the unit.

Fault zone development may provide weaker rocks and facilitate slope failure and

debris production. Therefore, we measured the total length of faults in a basin.

The Gravelius index (Casali et al., 2008), Kg, can be another factor influencing

debris flows:

g b b2 0.28K P A P A (2)

where P is basin perimeter (m) and Ab is basin area (m2).

gK represents the ratio of

the basin perimeter to the perimeter of a circle with the same area. Circular basins

tend to have larger peak flow rates. Therefore, a basin with a Gravelius index close to

one may often cause debris flows. Concerning drainage basin form, the maximum

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

8

elevation difference of a basin was also chosen as a factor of debris flow hazard

because it reflects potential energy.

There are some other important factors associated with debris flow hazard

assessment, such as soil and bedrock types. Nandi and Shakoor (2010) selected soil

type and the liquidity index as factors of landslides. However, soil and bedrock types

are categorical and difficult to include in the BN model. Therefore, we did not use

them for assessment.

The assessment units used in our study are drainage basins. Using the method of Xu

et al. (2004), basins in the Chinese mainland were automatically extracted. The area

of the basins varies from 1.30×10-1

to 1.18×105 km

2.

2.3. Data preprocessing

The types and sources of data used in this study are shown in Table 2. The main

tools for processing the datasets include GIS (Geographic Information Systems)

software and the JAVA programming language. Based on the historical data of debris

flow hazards, the assessment units were classified into two types: whether a debris

flow hazard occurred or not. Class values of 1 and 2 were given for these types. Wan

and Lei (2009) indicated that in China, if local conditions are favorable for debris

flows, debris flows occur frequently. Therefore, an assessment unit is considered as

having a high debris flow probability if debris flows occurred in the past; otherwise, it

has a low probability.

The fault length in a basin was calculated from a 1:500,000 geological map using

the ArcGIS 9.3 intersect function. The maximum elevation difference of the basin and

APL25 were derived from a 90-m DEM (Table 2). Kg for each basin was calculated

using Eq. (2). The two rainfall parameters were derived from daily rainfall data (Table

2) through Kriging interpolation. I was calculated using Eq. (1) and land use data

(Table 2).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

9

It is worth noting that the seven factors mentioned above have different units and

are measured at different scales. They were normalized by:

' m i n

m a x m i n

( 1 , 2 , , )ii

X XX i n

X X (3)

where '

iX is the normalized value between 0 and 1; iX is the value of the factor;

minX is the minimum value; maxX is the maximum value; and n is the number of

data.

BN has a strong capacity of processing discrete variables, but is weak at handling

continuous variables. Therefore, the normalized values need to be converted to

discrete values. For simple data processing, the normalized values were multiplied by

10 and converted into integer values between 0 and 10. Some of the dispersed results

are listed in Table 3, where 1 2 7{ , , , }X X X are the assessment indicators defined in

Section 2.2 and C is the target variable (landslide presence/absence).

2.4. Construction of sample sets

To train and test the performance of the model, two datasets were established.

Dataset 1 has 4,146 assessment units, which are almost evenly distributed within the

Chinese mainland (Fig. 1). Among the units, 716 have experienced debris flow

hazards (= high risk) whereas 3,440 have not (= low risk). Because of spatial

autocorrelation, units surrounding a hazardous unit may have a relatively high

probability of debris flow hazards. Therefore, these units will interfere the training of

the BN model. To avoid this, non-hazard units surrounding each hazardous unit were

deleted to create dataset 2. The dataset includes 716 units of high risk and 1,310 units

of low risk (Fig. 2).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

10

2.5. Construction of the BN-based network structure

The Bayesian network structure, which can qualitatively describe the dependency

among variables, is the basis of the assessment model. The structure-learning

algorithm was introduced in Section 2.1.1. The Bayesian Net Toolbox of MATLAB

written by Kevin Murphy was used in this research. The initial structure of the BN

model was obtained using K2 search strategies with dataset 2. Then the structure was

fine-tuned based on the domain knowledge. The BN machine learning structure is

shown in Fig. 3. The variables included in Fig. 3 are described in Sections 2.2 and 2.3.

The structure-learning algorithm can be used to obtain the key relationships among

the indicators. I is affected by rainfall; the maximum elevation difference of the basin

is determined by APL25 to some extent; and a debris flow is induced by the combined

action of the seven selected indicators. The initial structure of the BN is in line with

the discipline of domain knowledge. According to Nilaweera and Nutalaya (1999), the

occurrence of debris flow has a direct relationship with vegetation coverage. In this

study, therefore, the relationship between the target variable and I was added, and the

relationship between the annual number of days with daily rainfall above 25 mm and I

was removed. The fine-tuned BN structure is shown in Fig. 4. The TPR (true positive

rate), FPR (false positive rate), precision and AUC of the three BN structures (Naïve

network, machine learning structure and fine-tuned one) are listed in Table 4. It can be

seen that the fine-tuned BN structure possesses a higher TPR value, a better precision,

a larger AUC value and a lower FPR value, and is found the most suitable for

assessment.

3. Assessment results

3.1. Performance comparison

We compared our BN model with the SVM and ANN models using the repeated

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

11

sub-sampling method. The sample set was randomly split into training and testing

datasets.

Tables 5 and 6 list the four performance scalar measures, including TPR, FPR,

precision and AUC for the results from the three methods on the two datasets. The BN

and ANN models have better performances than the SVM model (Table 6). Our

method and the ANN have almost the same precision and AUC for dataset 2, and the

former has a higher TPR value than the ANN (85.66 for BN and 81.63 for ANN). The

SVM and ANN have higher precisions than our method, while ours has a larger AUC

value (Table 5). More importantly, our method has the highest TPR value (76.99 for

BN; 22.32 for SVM; and 34.60 for ANN).

Figs. 5 and 6 show the ROC (receiver operating characteristic) curves of BN,

ANN and SVM applied to datasets 1 and 2. One evaluation criteria of the ROC curve

is that the nearer each curve to the upper left of the plot, the better the model. The

figures show that the curves for the BN model are the best.

3.2. Debris flow hazard maps for the Chinese mainland

Figs. 7–9 show the debris flow hazard zonal maps for the Chinese mainland

evaluated by the BN, ANN and SVM models, respectively. The maps are basically

consistent with the actual hazard distribution. Hazards are mainly distributed in

Yunnan, Xizang, Sichuan, Qinghai, Guizhou and Gansu Provinces. There are almost

no debris flow hazards in northernmost areas such as the northeast Chinese plain, the

north Chinese plain, the Guanzhong plain, the Inner Mongolian grassland and the

Gobi desert. Fig. 10 shows that some serious debris flow events in the Chinese

mainland in 2010 are all located in the high-risk zone assessed by our method.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

12

4. Discussion and conclusions

The uncertainties of debris flow hazard assessment consist of two components:

cognitive uncertainty and occasional uncertainty. The former mainly arises from

limited knowledge on debris flow mechanisms, influencing factors, and critical

conditions; whereas, the latter is associated with randomness, which can be estimated

but not eliminated. The BN model better estimates debris flow hazard risks and

uncertainties than other models such as ANN and SVM that only account for expected

values. The probability of knowledge representation and reasoning can effectively

avoid the problem of overconfidence, unlike ANN models (Walczak and Cerpa,

1999).

Our criteria for selecting the factors in our hazard risk assessment are: 1) estimated

impacts on hazard risk; 2) data accessibility; and 3) easy calculation. We selected the

seven factors (Section 2.2) and most of them are continuous variables. The BN model

can, however, deal with continuous variables in only a limited manner, making the

application of BN challenging. Using discretized values allows us to capture nonlinear

relationships between the categorical variables. However, if the number of categories

is large, abundant data are required to correctly find dependencies (Myllymaki et al.,

2002). In such cases the structure of the BN model also becomes complex, and the

efficiency and performance of the model may decrease. In this study, we selected

eight to 10 categories for each variable. The results shown in Section 3 indicate that

this choice is reasonable.

An important characteristic of the BN model is its use of prior information, which

reflects knowledge obtained before the research is conducted. The BN model can also

easily incorporate knowledge of different accuracies and from different sources. The

performance of our BN model (Table 4) after fine tuning shows that TPR and

precision improved while FPR decreased. In spite of this relatively limited

improvement, the model structure is more in line with expert knowledge, pointing to

the increased credibility of the model.

Tables 5 and 6 illustrate that our method is robust in terms of TPR, precision, and

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

13

the ROC area. A comparative analysis of the assessment results (Figs. 7 to 9) has

demonstrated that they agree with the actual distribution of debris flow hazards based

on field survey data including large debris flow hazards in 2010 (Fig. 10).

In this study, we used the Bayesian Net Toolbox for MATLAB as the basis of

assessment. MATLAB is a programming environment that can solve technical

computing problems faster than traditional programming languages. MATLAB also

provides hybrid programming interfaces with other languages; therefore, it can be

combined with C# and ArcGIS for software development. Our work is an example of

such combinations for a GIS-based hazard risk assessment.

Although our method is applicable to debris flow hazard assessments on a national

scale, it still requires improvement. For example, the following two issues need to be

examined in future studies:

1) Collect detailed information concerning debris flow hazards on a local scale, and

conduct modeling and its performance assessment.

2) Expand the BN model by incorporating better algorithms including those for

continuous variables.

Acknowledgements

This research was supported and funded by the Ministry of Science and Technology

of China (Grants 2008BAK50B01 and 2009AA122003) and the National Natural

Science Foundation of China (Grant 40830637).

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

14

References

Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for

landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central

Japan. Geomorphology 65, 15-31.

Baeza, C., Corominas, J., 2001. Assessment of shallow landslide susceptibility by

means of multivariate statistical techniques. Earth Surface Processes and

Landforms 26, 1251-1263.

Calvo, B., Savi, F., 2009. A real-world application of Monte Carlo procedure for

debris flow risk assessment. Computers & Geosciences 35, 967-977.

Carmassi, F., Liberati, G., Ricciardi, C., Sciotti, M., 1992. Stability evaluation for

unified power-plant siting in geothermal areas. Proceedings of the 6th

International Symposium on Landslides, pp. 893-898.

Carrara, A., 2008. Comparing models of debris-flow susceptibility in the alpine

environment. Geomorphology 94, 353-378.

Carrara, A., Merenda, L., 1976. Landslide inventory in Northern Calabria, Southern

Italy. Geological Society of America Bulletin 87, 1153-1162.

Casali, J., Gastesi, R., Alvarezmozos, J., Desantisteban, L., Lersundi, J., Gimenez, R.,

Larranaga, A., Goni, M., Agirre, U., Campo, M., 2008. Runoff, erosion, and

water quality of agricultural watersheds in central Navarre (Spain).

Agricultural Water Management 95, 1111-1128.

Chang, T.C., 2007. Risk degree of debris flow applying neural networks, Natural

Hazards 42, 209-224.

Chang, T., Chao, R., 2006a. Application of back-propagation networks in debris flow

prediction. Engineering Geology 85, 270-280.

Chang, T.C., Chao, R.J., 2006b. Application of back-propagation networks in debris

flow prediction. Engineering Geology 85, 270-280.

Chang, T.C., Chien, Y.H., 2007. The application of genetic algorithm in debris flows

prediction. Environmental Geology 53, 339-347.

Chang, T.C., Wang, Z.Y., Chien, Y.H., 2009. Hazard assessment model for debris flow

prediction. Environmental Earth Sciences 60, 1619-1630.

Chen, C.H., Ke, C.C., Wang, C.L., 2008. A back-propagation network for the

assessment of susceptibility to rock slope failure in the eastern portion of the

Southern Cross-Island Highway in Taiwan. Environmental Geology 57,

723-733.

Dai, F.C., Lee, C.F., 2002. Landslide characteristics and slope instability modeling

using GIS, Lantau Island, Hong Kong. Geomorphology 42, 213-228.

David, B.S., 1994. Prototype and feature selection by sampling and random mutation

hill climbing algorithms. Proc. of the Eleventh International Conference on

Machine Learning, pp. 293-301.

Franks, C.A.M., 1999. Characteristics of some rainfall-induced landslides on natural

slopes, Lantau Island, Hong Kong. Quarterly Journal of Engineering Geology

32, 247-259.

Gomez, H., Kavzoglu, T., 2005. Assessment of shallow landslide susceptibility using

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

15

artificial neural networks in Jabonosa River Basin, Venezuela. Engineering

Geology 78, 11-27.

Gregory, F.C., Edward, H., 1992. A bayesian method for the induction of probabilistic

networks from data. Machine Learning 9, 309-347.

Greiner, R., Darken, C., Santoso, N.I., 2001. Efficient reasoning. Computing Surveys

33, 1-30.

Guzzetti, F., Reichenbach, P., Cardinali, M., Galli, M., Ardizzone, F., 2005.

Probabilistic landslide hazard assessment at the basin scale. Geomorphology

72, 272-299.

Hearn, G.J., 1995. Landslide and erosion hazard mapping at Ok-Tedi Copper Mine,

Papua-New-Guinea. Quarterly Journal of Engineering Geology 28, 47-60.

Hsieh, Chen, L.E., 1993. Debris flow warning system II. Project of Council of

Agriculture, Executive Yuan, Republic of China (in Chinese).

Jiang, H., Eastman, J.R., 2000. Application of fuzzy measures in multi-criteria

evaluation in GIS. International Journal of Geographical Information Science

14, 173-184.

Johnson, A.M., Rodine, J.R., 1984. Debris flow. In: Brunsden, D., Prior, D.B. (Eds.),

Slope Instability. Wiley, New York, pp. 257-361.

Kang, Z.C., Li, Z.F., Ma, A.N., Luo, J.T., 2004. Debris Flow Research in China.

Science Press, Beijing, 252 p. (in Chinese).

Kondratyev, K.Y., Krapivin, V.F., Varotsos, C.A., 2006. Natural Disasters as

Interactive Components of Global Ecodynamics. Springer/Praxis, Chichester,

625 p.

Li, L., Wang, J., Wang, C., 2005. Typhoon insurance pricing with spatial decision

support tools International Journal of Geographical Information Science 19,

363-384.

Lim, A., Rodrigues, B., Zhang, X., 2006. A simulated annealing and hill-climbing

algorithm for the traveling tournament problem. European Journal of

Operational Research 174, 1459-1478.

Liu, Y., Guo, H.C., Zou, R., Wang, L.J., 2005. Neural network modeling for regional

hazard assessment of debris flow in Lake Qionghai Watershed, China.

Environmental Geology 49, 968-976.

Lu, G.Y., Chiu, L.S., Wong, D.W., 2007. Vulnerability assessment of rainfall-induced

debris flows in Taiwan. Natural Hazards 43, 223-244.

Myllymaki, P., Silander, T., Tirri, H., Uronen, P., 2002. B-course: a web-based tool for

Bayesian and causal data analysis. International Journal on Artificial

Intelligence Tools 11, 369-387.

Nandi, A., Shakoor, A., 2010. A GIS-based landslide susceptibility evaluation using

bivariate and multivariate statistical analyses. Engineering Geology 110,

11-20.

Nilaweera, N.S., Nutalaya, P., 1999. Role of tree roots in slope stabilisation. Bulletin

of Engineering Geology and the Environment 57, 337-342.

Nir, F., Dan, G., Moises, G., 1997. Bayesian network classifiers. Machine Learning 29,

131-163.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

16

Pachauri, A.K., Gupta, P.V., Chander, R., 1998. Landslide zoning in a part of the

Garhwal Himalayas. Environmental Geology 36, 325-334.

Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann

Publishers, San Mateo, California.

Reckhow, K.H., 1999. Water quality prediction and probability network models.

Canadian Journal of Fisheries and Aquatic Sciences 56, 1150-1158.

Renner, G., Ekart, A., 2003. Genetic algorithms in computer aided design.

Computer-Aided Design 35, 709-726.

Rupke, J., Cammeraat, E., Seijmonsbergen, A.C., Vanwesten, C.J., 1988. Engineering

geomorphology of the Widentobel Catchment, Appenzell and Sankt-Gallen,

Gallen, Switzerland - a geomorphological Inventory system applied to

geotechnical appraisal of slope stability. Engineering Geology 26, 33-68.

Saito, H., Nakayama, D., Matsuyama, H., 2009. Comparison of landslide

susceptibility based on a decision-tree model and actual landslide occurrence:

The Akaishi Mountains, Japan. Geomorphology 109, 108-121.

Walczak, S., Cerpa, N., 1999. Heuristic principles for the design of artificial neural

networks. Information and Software Technology 41, 107-117.

Wan, S., 2009. A spatial decision support system for extracting the core factors and

thresholds for landslide susceptibility map. Engineering Geology 108,

237-251.

Wan, S., Lei, T.C., 2009. A knowledge-based decision support system to analyze the

debris-flow problems at Chen-Yu-Lan River, Taiwan. Knowledge-Based

Systems 22, 580-588.

Wang, D., 1994. Study of mechanism of debris flow occurrence. Thesis in Department

of Civil Engineering, National Taiwan University (in Chinese).

Xu, X.L., Zhuang, D.F., Jia, S.F., Hu, Y.F., 2004. Automated extraction of drainages in

China based on DEM in GIS environment. Resources and Environment in the

Yangtze Basin 13, 343-348.(in Chinese).Yao, X., Tham, L.G., Dai, F.C., 2008.

Landslide susceptibility mapping based on support vector machine: a case

study on natural slopes of Hong Kong, China. Geomorphology 101, 572-582.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

17

Fig. 1. Spatial distribution of debris-flow hazards in dataset 1, with 4,146 assessment

units. Among the units, 716 have experienced debris flow hazards (= high risk)

whereas 3,440 have not (= low risk).

Fig. 2. Spatial distribution debris-flow hazards in dataset 2, with 2,026 assessment

units. Among the units, 716 have experienced debris flow hazards (= high risk)

whereas 1,310 have not (=low risk).

Fig. 3. Structure of BN (Bayesian Network) for machine learning.

Fig. 4. Structure of fine-tuned BN (Bayesian Network).

Fig. 5. ROC curves of BN (Bayesian Network), ANN (Artificial Neural Network) and

SVM (Support Vector Machine) for dataset 1.

Fig. 6. ROC curves of BN (Bayesian Network), ANN (Artificial Neural Network) and

SVM (Support Vector Machine) for dataset 2.

Fig. 7. Debris flow hazard zonal map for the Chinese mainland based on BN

(Bayesian Network). The blue box shows the area of Fig. 10.

Fig. 8. Debris flow hazard zonal map for the Chinese mainland based on ANN

(Artificial Neural Network).

Fig. 9. Debris flow hazard zonal map for the Chinese mainland based on SVM

(Support Vector Machine).

Fig. 10. Locations of representative large debris flow events in the Chinese mainland

in 2010. The location of the area is shown in Fig. 7.

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

18

Table 1. Weights used to calculate the vegetation coverage index.

Land cover type Forest land Rangeland Farmland Land for construction Unused land

Weight 0.38 0.34 0.19 0.07 0.02

Land cover

subtype

Forest Shrub Other High

coverage

Middle

coverage

Low

coverage

Paddy Dry Urban Rural Other Sand Saline Bare Barren

Weight 0.6 0.25 0.15 0.6 0.3 0.1 0.7 0.3 0.3 0.4 0.3 0.2 0.3 0.3 0.2

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

19

Table 2. Data used to obtain the factors of debris flow hazards

Data type Source Data description Publisher

Historical data Historical data of debris flow hazards Point data Institute of Mountain Hazards and Environment, CAS

Geology Distribution of fault zone 1:500,000 Institute of Geographic Sciences and Natural

Resources Research, CAS

Topography Digital elevation Model (DEM) 90×90 m USGS/NASA

Climate Daily rainfall data, 2000 755 sites China Meteorological Data Sharing Service System

Vegetation Land use/cover 100×100 m Institute of Geographic Sciences and Natural

Resources Research, CAS

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

20

Table 3. Part of the dispersed sample data.

X1 X2 X3 X4 X5 X6 X7 C

3 5 5 1 4 1 7 2

3 5 3 1 1 3 6 2

3 6 2 1 1 1 6 1

3 6 5 1 1 1 7 1

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

21

Table 4. Performance of the three BN structures.

Local structure TPR (%) FPR (%) Precision (%) AUC

Naïve Bayesian network 83.13 8.72 88.41 0.94

Machine learning structure 84.16 8.61 88.84 0.95

Fine-tuned structure 85.66 8.23 89.63 0.95

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

22

Table 5. Performance of assessment models for dataset 1.

Model type TPR (%) FPR (%) Precision (%) AUC

Bayesian network 76.99 23.57 76.53 0.84

ANN 34.60 4.00 85.41 0.77

SVM 22.32 2.53 84.55 0.36

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

23

Table 6. Performance of assessment models for dataset 2.

Model type TPR (%) FPR (%) Precision (%) AUC

Bayesian network 85.66 8.23 89.63 0.95

ANN 81.63 3.48 91.27 0.94

SVM 73.44 8.17 85.31 0.92

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

24

Fig. 1

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

25

Fig. 2

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

26

Fig. 3

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

27

Fig. 4

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

28

Fig. 5

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

29

Fig. 6

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

30

Fig. 7

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

31

Fig. 8

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

32

Fig. 9

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

33

Fig. 10

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

34

Highlights

1. BN-based model for assessment of debris flow hazard.

2. The model was cross-validated and preformed better than SVM and ANNs.

3. The model can be applied for assessment of debris flow hazard at national scale.

assessment of debris flow hazards using a bayesian...

Documents