assessment of debris flow hazards using a bayesian...
TRANSCRIPT
�������� ����� ��
Assessment of debris flow hazards using a Bayesian network
Wan-jie Liang, Da-fang Zhuang, Dong Jiang, Jian-jun Pan, Hong-yanRen
PII: S0169-555X(12)00227-9DOI: doi: 10.1016/j.geomorph.2012.05.008Reference: GEOMOR 3997
To appear in: Geomorphology
Received date: 11 February 2011Revised date: 16 March 2012Accepted date: 8 May 2012
Please cite this article as: Liang, Wan-jie, Zhuang, Da-fang, Jiang, Dong, Pan, Jian-jun,Ren, Hong-yan, Assessment of debris flow hazards using a Bayesian network, Geomor-phology (2012), doi: 10.1016/j.geomorph.2012.05.008
This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
1
Assessment of debris flow hazards using a Bayesian network
Wan-jie Lianga,b
, Da-fang Zhuanga,b
, Dong Jianga,b,*
, Jian-jun Pana and Hong-yan
Renb
a College of Resources and Environmental Sciences, Nanjing Agricultural University,
Nanjing 210095, China
b State Key Laboratory of Resources and Environmental Information Systems,
Institute of Geographical Sciences and Natural Resources Research, Chinese
Academy of Sciences, Beijing 100101, China
*Corresponding author. Tel: +86-10-64889433, Fax: +86-10-64855049
E-mail address: [email protected] (D. Jiang)
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
2
Abstract
Comprehensive assessment of debris flow hazard risk is challenging due to the
complexity and uncertainties of various related factors. A reasonable and reliable
assessment should be based on sufficient data and realistic approaches. This study
presents a novel approach for assessing debris flow hazard risk using BN (Bayesian
Network) and domain knowledge. Based on the records of debris flow hazards and
geomorphological/environmental data for the Chinese mainland, approaches based on
BN, SVM (Support Vector Machine) and ANN (Artificial Neural Network) were
compared. BN provided the highest values of hazard detection probability, precision,
and AUC (area under the receiver operating characteristic curve). The BN model is
useful for mapping and assessing debris flow hazard risk on a national scale.
Keywords: Debris flow hazard; Bayesian network; Hazard assessment; Chinese
mainland
1. Introduction
A debris flow is a common geological hazard. It often begins with a landslide, and
the potential energy of the generated sliding mass can rapidly convert into kinetic
energy. Debris flows can induce a series of disasters that may pose a serious threat to
lives, properties, and economic development. Many countries suffer from serious
debris flow hazards. For example, large-scale debris flows occurred in Uganda on
March 1, 2010, resulting in disastrous casualties, with 94 deaths, 320 people missing,
and three buried villages.
China is one of the debris-flow prone countries. Debris flows occur in regions that
correspond to 45% (106
km2) of the Chinese mainland (Kang et al., 2004). For
example, a debris flow hazard occurred in Zhouqu in Gansu Province on August 8,
2010, resulting in 1467 deaths, 298 missing people and direct economic losses of
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
3
425,000 RMB. Similar debris flow hazards also occurred in Chuxiong and Gongshan
of Yunnan Province on the same day. Therefore, the assessment of regional debris
flow hazards is of great significance for the sustainable development of China.
Various factors including topography, geology and climate influence geological
hazards, and the measurement of these factors may involve large uncertainties
(Kondratyev et al., 2006). Over the past few decades, numerous hazard analyses have
employed qualitative and quantitative methods including artificial intelligence (AI).
Qualitative approaches were widely used in the 1970s to 1990s and were based on the
knowledge and opinions of experts (Carmassi et al., 1992; Carrara and Merenda, 1976;
Hearn, 1995; Pachauri et al., 1998; Rupke et al., 1988). They have the following
shortcomings: (i) evaluation tends to be subjective and assessment results from
different experts are not comparable; (ii) updating assessment using new data is
difficult; and (iii) required field experiments and investigations are expensive and
time-consuming. In quantitative approaches, statistical analyses are adopted to solve
the problem of subjectivity (Baeza and Corominas, 2001; Carrara, 2008). For example,
Ayalew and Yamagishi (2005) adopted logistic regression for assessing landslide
susceptibility; Guzzetti et al. (2005) introduced a probability approach to assess
landslide hazard risk on a basin scale; and Calvo and Savi (2009) conducted Monte
Carlo simulations for assessing debris flow risks. However, nonlinear relationships
between the variables used cannot be solved by these approaches.
With the recent development of geographical information science, data mining and
AI have been adopted in assessing geological hazards (Jiang and Eastman, 2000; Li et
al., 2005). The techniques include ANN (Artificial Neural Network; Chang and Chao,
2006a,b; Chang, 2007; Chen et al., 2008; Gomez and Kavzoglu, 2005; Liu et al., 2005;
Lu et al., 2007), SVM (Support Vector Machine; Wan and Lei, 2009; Yao et al., 2008),
GA (genetic algorithms; Chang and Chien, 2007; Chang et al., 2009), and decision
tree models (Saito et al., 2009; Wan, 2009; Wan and Lei, 2009). However, existing AI
methods have three shortcomings: (i) limited use of prior knowledge makes it difficult
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
4
to interpret assessment results; (ii) multiple sources of information cannot be
integrated into a consistent system for assessment; and (iii) they are not good at
dealing with the uncertainty of assessment.
BN (Baysian Network) is an effective tool for knowledge representation and
reasoning under the influence of uncertainty (Pearl, 1988; Reckhow, 1999). Because
BN can present uncertainty interdependencies among random variables that are used
to describe real-world domains, it has great potential for natural hazard assessment.
Compared with other assessment methods, BN has several merits: (i) domain
knowledge and multi-source information integrated into a consistent system; (ii) many
flexible learning algorithms for searching optimal solutions; (iii) flexibility to include
additional information; and (iv) decision support using nodes of functions and
decisions. In this study, a novel method for assessing debris flow hazard risk based on
BN and domain knowledge is proposed. Three debris flow hazard maps of the
Chinese mainland from BN, ANN and SVM were produced and compared.
2. Data and methods
2.1. Assessment method based on Bayesian network
A BN model can be expressed by (N, A, θ), where (N, A) is a directed acyclic graph
(DAG) and θ is a parameter for a node. Each node n N represents a domain
variable (often corresponding to an attribute in the database), and each arc a A
between nodes represents a probabilistic dependency between the associated nodes.
Each node in N is associated with a conditional probability distribution,
collectively represented by { }i , which quantifies how strongly a node depends
on its parent node (Pearl, 1988). BN has great potential for natural hazard assessment.
Compared with other AI methods such as ANN, a major advantage of BN is that they
represent knowledge in a semantic way; and individual components such as specific
nodes, arcs, or even values in the conditional probability tables have some meaning
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
5
and can be understood independently (Greiner et al., 2001). This allows us to
construct and interpret a network relatively easily.
A naïve BN model is a simple probabilistic classifier based on Bayes' theorem with
a strong independence assumption, where all the attributes Ai are conditionally
independent given the value of a class called C. By independence, we mean
probabilistic independence, that is, A is always independent of B given C whenever
r r( | , ) ( | )P A B C P A C and Pr(C) > 0 where Pr is probability (Nir et al., 1997).
Our debris flow hazard assessment using a BN model has the following six steps:
1) Selecting relevant parameters and spatial units;
2) Constructing training sample datasets for the model;
3) Learning and constructing the structure of the model;
4) Learning and determining the parameters for each node of the model;
5) Evaluating the performance and accuracy of the model; and
6) Using the model for assessment.
2.1.1. Learning structure of the BN model
To construct a BN model, the network that best matches a given training set needs
to be found. The learning algorithms may be divided into two types: dependency
analysis, and a scoring function with a search algorithm. The algorithm of the latter
can be subdivided into two types: constraint-based and heuristic. The K2 algorithm
(Gregory and Edward, 1992) is typically constraint-based, and conducts search
according to the given node order with the limited maximum number of parent nodes.
The main drawback of the K2 algorithm is that only the optimal structure within a
limited search space can be found. The greedy hill-climbing algorithm (Lim et al.,
2006) is heuristic and belongs to the local search family. It tends to fall into local
optimization; to avoid this problem, the random mutation hill-climbing algorithm has
been put forward (David, 1994). There are many other heuristic search algorithms
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
6
such as the simulated annealing algorithm and GA (Renner and Ekart, 2003). In our
method, an initial BN structure is obtained using the K2 search. The structure is then
refined using domain knowledge to obtain a hazard assessment model.
2.1.2. Learning parameters of the BN-based model
Once a BN structure is constructed, parameters of CPT (conditional probability
table) for each node can be obtained with two general approaches: using domain
knowledge and using parameters learned from sample datasets. If sufficient
knowledge regarding the mechanism of debris flow hazards is obtained, the CPT
parameters can be determined by an expert. If enough training data are given, the
parameters can also be derived from them. The two methods can be combined. The
parameter learning algorithms include maximum likelihood estimation or Bayesian
estimation. In this study, Bayesian estimation was utilized.
2.2. Factors for debris flow hazard assessment
Debris flow occurrence is affected by complex factors such as climate, geology,
topography, and hydrology. Seven environmental factors were selected in this study to
construct the assessment model of debris flow hazards for the Chinese mainland: X1 –
annual maximum cumulative rainfall of three consecutive days; X2 – annual number
of days with daily rainfall above 25 mm; X3 – vegetation coverage index; X4 – fault
length; X5: – area percentage of slope land with >25° inclination (APL25); X6 –
maximum elevation difference of the basin; and X7 – Gravelius index.
Rainfall is the main triggering factor of debris flow hazards, and debris flow
occurrence is related to both current and antecedent rainfalls. Therefore, effective
cumulative rainfall is useful for debris flow hazard assessment (Hsieh and Chen, 1993)
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
7
although its calculation is difficult. It could be represented by the annual maximum
cumulative rainfall of three consecutive days, and the annual number of days with
daily rainfall above 25 mm could also indicate the rainfall intensity and concentration.
Slope is an essential and important factor of debris flow occurrence (Johnson and
Rodine, 1984; Wang, 1994). According to Liu et al. (2005) and our field
reconnaissance, most debris flows in China have initiated on slopes steeper than 25°.
Some land use/cover types especially vegetation with strong and large root systems
increase slope stability (Dai and Lee, 2002). Franks (1999) indicated that sparsely
vegetated slopes are the most susceptible to failure. Nilaweera and Nutalaya (1999)
stated that vegetation provides hydrological and mechanical effects of slope
stabilization. To incorporate the effects of land use/cover, we used the following
vegetation coverage index, I:
5
1 1( ( ))
n
i j ji jI a W SW S S
(1)
where a is the normalization coefficient; iW is the weight of the first class of land
use (Table 1); jSW is the weight of the subclass of land use (Table 1);
jS is the area
of the subclass in an assessment unit; and S is the total area of the unit.
Fault zone development may provide weaker rocks and facilitate slope failure and
debris production. Therefore, we measured the total length of faults in a basin.
The Gravelius index (Casali et al., 2008), Kg, can be another factor influencing
debris flows:
g b b2 0.28K P A P A (2)
where P is basin perimeter (m) and Ab is basin area (m2).
gK represents the ratio of
the basin perimeter to the perimeter of a circle with the same area. Circular basins
tend to have larger peak flow rates. Therefore, a basin with a Gravelius index close to
one may often cause debris flows. Concerning drainage basin form, the maximum
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
8
elevation difference of a basin was also chosen as a factor of debris flow hazard
because it reflects potential energy.
There are some other important factors associated with debris flow hazard
assessment, such as soil and bedrock types. Nandi and Shakoor (2010) selected soil
type and the liquidity index as factors of landslides. However, soil and bedrock types
are categorical and difficult to include in the BN model. Therefore, we did not use
them for assessment.
The assessment units used in our study are drainage basins. Using the method of Xu
et al. (2004), basins in the Chinese mainland were automatically extracted. The area
of the basins varies from 1.30×10-1
to 1.18×105 km
2.
2.3. Data preprocessing
The types and sources of data used in this study are shown in Table 2. The main
tools for processing the datasets include GIS (Geographic Information Systems)
software and the JAVA programming language. Based on the historical data of debris
flow hazards, the assessment units were classified into two types: whether a debris
flow hazard occurred or not. Class values of 1 and 2 were given for these types. Wan
and Lei (2009) indicated that in China, if local conditions are favorable for debris
flows, debris flows occur frequently. Therefore, an assessment unit is considered as
having a high debris flow probability if debris flows occurred in the past; otherwise, it
has a low probability.
The fault length in a basin was calculated from a 1:500,000 geological map using
the ArcGIS 9.3 intersect function. The maximum elevation difference of the basin and
APL25 were derived from a 90-m DEM (Table 2). Kg for each basin was calculated
using Eq. (2). The two rainfall parameters were derived from daily rainfall data (Table
2) through Kriging interpolation. I was calculated using Eq. (1) and land use data
(Table 2).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
9
It is worth noting that the seven factors mentioned above have different units and
are measured at different scales. They were normalized by:
' m i n
m a x m i n
( 1 , 2 , , )ii
X XX i n
X X (3)
where '
iX is the normalized value between 0 and 1; iX is the value of the factor;
minX is the minimum value; maxX is the maximum value; and n is the number of
data.
BN has a strong capacity of processing discrete variables, but is weak at handling
continuous variables. Therefore, the normalized values need to be converted to
discrete values. For simple data processing, the normalized values were multiplied by
10 and converted into integer values between 0 and 10. Some of the dispersed results
are listed in Table 3, where 1 2 7{ , , , }X X X are the assessment indicators defined in
Section 2.2 and C is the target variable (landslide presence/absence).
2.4. Construction of sample sets
To train and test the performance of the model, two datasets were established.
Dataset 1 has 4,146 assessment units, which are almost evenly distributed within the
Chinese mainland (Fig. 1). Among the units, 716 have experienced debris flow
hazards (= high risk) whereas 3,440 have not (= low risk). Because of spatial
autocorrelation, units surrounding a hazardous unit may have a relatively high
probability of debris flow hazards. Therefore, these units will interfere the training of
the BN model. To avoid this, non-hazard units surrounding each hazardous unit were
deleted to create dataset 2. The dataset includes 716 units of high risk and 1,310 units
of low risk (Fig. 2).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
10
2.5. Construction of the BN-based network structure
The Bayesian network structure, which can qualitatively describe the dependency
among variables, is the basis of the assessment model. The structure-learning
algorithm was introduced in Section 2.1.1. The Bayesian Net Toolbox of MATLAB
written by Kevin Murphy was used in this research. The initial structure of the BN
model was obtained using K2 search strategies with dataset 2. Then the structure was
fine-tuned based on the domain knowledge. The BN machine learning structure is
shown in Fig. 3. The variables included in Fig. 3 are described in Sections 2.2 and 2.3.
The structure-learning algorithm can be used to obtain the key relationships among
the indicators. I is affected by rainfall; the maximum elevation difference of the basin
is determined by APL25 to some extent; and a debris flow is induced by the combined
action of the seven selected indicators. The initial structure of the BN is in line with
the discipline of domain knowledge. According to Nilaweera and Nutalaya (1999), the
occurrence of debris flow has a direct relationship with vegetation coverage. In this
study, therefore, the relationship between the target variable and I was added, and the
relationship between the annual number of days with daily rainfall above 25 mm and I
was removed. The fine-tuned BN structure is shown in Fig. 4. The TPR (true positive
rate), FPR (false positive rate), precision and AUC of the three BN structures (Naïve
network, machine learning structure and fine-tuned one) are listed in Table 4. It can be
seen that the fine-tuned BN structure possesses a higher TPR value, a better precision,
a larger AUC value and a lower FPR value, and is found the most suitable for
assessment.
3. Assessment results
3.1. Performance comparison
We compared our BN model with the SVM and ANN models using the repeated
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
11
sub-sampling method. The sample set was randomly split into training and testing
datasets.
Tables 5 and 6 list the four performance scalar measures, including TPR, FPR,
precision and AUC for the results from the three methods on the two datasets. The BN
and ANN models have better performances than the SVM model (Table 6). Our
method and the ANN have almost the same precision and AUC for dataset 2, and the
former has a higher TPR value than the ANN (85.66 for BN and 81.63 for ANN). The
SVM and ANN have higher precisions than our method, while ours has a larger AUC
value (Table 5). More importantly, our method has the highest TPR value (76.99 for
BN; 22.32 for SVM; and 34.60 for ANN).
Figs. 5 and 6 show the ROC (receiver operating characteristic) curves of BN,
ANN and SVM applied to datasets 1 and 2. One evaluation criteria of the ROC curve
is that the nearer each curve to the upper left of the plot, the better the model. The
figures show that the curves for the BN model are the best.
3.2. Debris flow hazard maps for the Chinese mainland
Figs. 7–9 show the debris flow hazard zonal maps for the Chinese mainland
evaluated by the BN, ANN and SVM models, respectively. The maps are basically
consistent with the actual hazard distribution. Hazards are mainly distributed in
Yunnan, Xizang, Sichuan, Qinghai, Guizhou and Gansu Provinces. There are almost
no debris flow hazards in northernmost areas such as the northeast Chinese plain, the
north Chinese plain, the Guanzhong plain, the Inner Mongolian grassland and the
Gobi desert. Fig. 10 shows that some serious debris flow events in the Chinese
mainland in 2010 are all located in the high-risk zone assessed by our method.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
12
4. Discussion and conclusions
The uncertainties of debris flow hazard assessment consist of two components:
cognitive uncertainty and occasional uncertainty. The former mainly arises from
limited knowledge on debris flow mechanisms, influencing factors, and critical
conditions; whereas, the latter is associated with randomness, which can be estimated
but not eliminated. The BN model better estimates debris flow hazard risks and
uncertainties than other models such as ANN and SVM that only account for expected
values. The probability of knowledge representation and reasoning can effectively
avoid the problem of overconfidence, unlike ANN models (Walczak and Cerpa,
1999).
Our criteria for selecting the factors in our hazard risk assessment are: 1) estimated
impacts on hazard risk; 2) data accessibility; and 3) easy calculation. We selected the
seven factors (Section 2.2) and most of them are continuous variables. The BN model
can, however, deal with continuous variables in only a limited manner, making the
application of BN challenging. Using discretized values allows us to capture nonlinear
relationships between the categorical variables. However, if the number of categories
is large, abundant data are required to correctly find dependencies (Myllymaki et al.,
2002). In such cases the structure of the BN model also becomes complex, and the
efficiency and performance of the model may decrease. In this study, we selected
eight to 10 categories for each variable. The results shown in Section 3 indicate that
this choice is reasonable.
An important characteristic of the BN model is its use of prior information, which
reflects knowledge obtained before the research is conducted. The BN model can also
easily incorporate knowledge of different accuracies and from different sources. The
performance of our BN model (Table 4) after fine tuning shows that TPR and
precision improved while FPR decreased. In spite of this relatively limited
improvement, the model structure is more in line with expert knowledge, pointing to
the increased credibility of the model.
Tables 5 and 6 illustrate that our method is robust in terms of TPR, precision, and
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
13
the ROC area. A comparative analysis of the assessment results (Figs. 7 to 9) has
demonstrated that they agree with the actual distribution of debris flow hazards based
on field survey data including large debris flow hazards in 2010 (Fig. 10).
In this study, we used the Bayesian Net Toolbox for MATLAB as the basis of
assessment. MATLAB is a programming environment that can solve technical
computing problems faster than traditional programming languages. MATLAB also
provides hybrid programming interfaces with other languages; therefore, it can be
combined with C# and ArcGIS for software development. Our work is an example of
such combinations for a GIS-based hazard risk assessment.
Although our method is applicable to debris flow hazard assessments on a national
scale, it still requires improvement. For example, the following two issues need to be
examined in future studies:
1) Collect detailed information concerning debris flow hazards on a local scale, and
conduct modeling and its performance assessment.
2) Expand the BN model by incorporating better algorithms including those for
continuous variables.
Acknowledgements
This research was supported and funded by the Ministry of Science and Technology
of China (Grants 2008BAK50B01 and 2009AA122003) and the National Natural
Science Foundation of China (Grant 40830637).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
14
References
Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for
landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central
Japan. Geomorphology 65, 15-31.
Baeza, C., Corominas, J., 2001. Assessment of shallow landslide susceptibility by
means of multivariate statistical techniques. Earth Surface Processes and
Landforms 26, 1251-1263.
Calvo, B., Savi, F., 2009. A real-world application of Monte Carlo procedure for
debris flow risk assessment. Computers & Geosciences 35, 967-977.
Carmassi, F., Liberati, G., Ricciardi, C., Sciotti, M., 1992. Stability evaluation for
unified power-plant siting in geothermal areas. Proceedings of the 6th
International Symposium on Landslides, pp. 893-898.
Carrara, A., 2008. Comparing models of debris-flow susceptibility in the alpine
environment. Geomorphology 94, 353-378.
Carrara, A., Merenda, L., 1976. Landslide inventory in Northern Calabria, Southern
Italy. Geological Society of America Bulletin 87, 1153-1162.
Casali, J., Gastesi, R., Alvarezmozos, J., Desantisteban, L., Lersundi, J., Gimenez, R.,
Larranaga, A., Goni, M., Agirre, U., Campo, M., 2008. Runoff, erosion, and
water quality of agricultural watersheds in central Navarre (Spain).
Agricultural Water Management 95, 1111-1128.
Chang, T.C., 2007. Risk degree of debris flow applying neural networks, Natural
Hazards 42, 209-224.
Chang, T., Chao, R., 2006a. Application of back-propagation networks in debris flow
prediction. Engineering Geology 85, 270-280.
Chang, T.C., Chao, R.J., 2006b. Application of back-propagation networks in debris
flow prediction. Engineering Geology 85, 270-280.
Chang, T.C., Chien, Y.H., 2007. The application of genetic algorithm in debris flows
prediction. Environmental Geology 53, 339-347.
Chang, T.C., Wang, Z.Y., Chien, Y.H., 2009. Hazard assessment model for debris flow
prediction. Environmental Earth Sciences 60, 1619-1630.
Chen, C.H., Ke, C.C., Wang, C.L., 2008. A back-propagation network for the
assessment of susceptibility to rock slope failure in the eastern portion of the
Southern Cross-Island Highway in Taiwan. Environmental Geology 57,
723-733.
Dai, F.C., Lee, C.F., 2002. Landslide characteristics and slope instability modeling
using GIS, Lantau Island, Hong Kong. Geomorphology 42, 213-228.
David, B.S., 1994. Prototype and feature selection by sampling and random mutation
hill climbing algorithms. Proc. of the Eleventh International Conference on
Machine Learning, pp. 293-301.
Franks, C.A.M., 1999. Characteristics of some rainfall-induced landslides on natural
slopes, Lantau Island, Hong Kong. Quarterly Journal of Engineering Geology
32, 247-259.
Gomez, H., Kavzoglu, T., 2005. Assessment of shallow landslide susceptibility using
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
15
artificial neural networks in Jabonosa River Basin, Venezuela. Engineering
Geology 78, 11-27.
Gregory, F.C., Edward, H., 1992. A bayesian method for the induction of probabilistic
networks from data. Machine Learning 9, 309-347.
Greiner, R., Darken, C., Santoso, N.I., 2001. Efficient reasoning. Computing Surveys
33, 1-30.
Guzzetti, F., Reichenbach, P., Cardinali, M., Galli, M., Ardizzone, F., 2005.
Probabilistic landslide hazard assessment at the basin scale. Geomorphology
72, 272-299.
Hearn, G.J., 1995. Landslide and erosion hazard mapping at Ok-Tedi Copper Mine,
Papua-New-Guinea. Quarterly Journal of Engineering Geology 28, 47-60.
Hsieh, Chen, L.E., 1993. Debris flow warning system II. Project of Council of
Agriculture, Executive Yuan, Republic of China (in Chinese).
Jiang, H., Eastman, J.R., 2000. Application of fuzzy measures in multi-criteria
evaluation in GIS. International Journal of Geographical Information Science
14, 173-184.
Johnson, A.M., Rodine, J.R., 1984. Debris flow. In: Brunsden, D., Prior, D.B. (Eds.),
Slope Instability. Wiley, New York, pp. 257-361.
Kang, Z.C., Li, Z.F., Ma, A.N., Luo, J.T., 2004. Debris Flow Research in China.
Science Press, Beijing, 252 p. (in Chinese).
Kondratyev, K.Y., Krapivin, V.F., Varotsos, C.A., 2006. Natural Disasters as
Interactive Components of Global Ecodynamics. Springer/Praxis, Chichester,
625 p.
Li, L., Wang, J., Wang, C., 2005. Typhoon insurance pricing with spatial decision
support tools International Journal of Geographical Information Science 19,
363-384.
Lim, A., Rodrigues, B., Zhang, X., 2006. A simulated annealing and hill-climbing
algorithm for the traveling tournament problem. European Journal of
Operational Research 174, 1459-1478.
Liu, Y., Guo, H.C., Zou, R., Wang, L.J., 2005. Neural network modeling for regional
hazard assessment of debris flow in Lake Qionghai Watershed, China.
Environmental Geology 49, 968-976.
Lu, G.Y., Chiu, L.S., Wong, D.W., 2007. Vulnerability assessment of rainfall-induced
debris flows in Taiwan. Natural Hazards 43, 223-244.
Myllymaki, P., Silander, T., Tirri, H., Uronen, P., 2002. B-course: a web-based tool for
Bayesian and causal data analysis. International Journal on Artificial
Intelligence Tools 11, 369-387.
Nandi, A., Shakoor, A., 2010. A GIS-based landslide susceptibility evaluation using
bivariate and multivariate statistical analyses. Engineering Geology 110,
11-20.
Nilaweera, N.S., Nutalaya, P., 1999. Role of tree roots in slope stabilisation. Bulletin
of Engineering Geology and the Environment 57, 337-342.
Nir, F., Dan, G., Moises, G., 1997. Bayesian network classifiers. Machine Learning 29,
131-163.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
16
Pachauri, A.K., Gupta, P.V., Chander, R., 1998. Landslide zoning in a part of the
Garhwal Himalayas. Environmental Geology 36, 325-334.
Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann
Publishers, San Mateo, California.
Reckhow, K.H., 1999. Water quality prediction and probability network models.
Canadian Journal of Fisheries and Aquatic Sciences 56, 1150-1158.
Renner, G., Ekart, A., 2003. Genetic algorithms in computer aided design.
Computer-Aided Design 35, 709-726.
Rupke, J., Cammeraat, E., Seijmonsbergen, A.C., Vanwesten, C.J., 1988. Engineering
geomorphology of the Widentobel Catchment, Appenzell and Sankt-Gallen,
Gallen, Switzerland - a geomorphological Inventory system applied to
geotechnical appraisal of slope stability. Engineering Geology 26, 33-68.
Saito, H., Nakayama, D., Matsuyama, H., 2009. Comparison of landslide
susceptibility based on a decision-tree model and actual landslide occurrence:
The Akaishi Mountains, Japan. Geomorphology 109, 108-121.
Walczak, S., Cerpa, N., 1999. Heuristic principles for the design of artificial neural
networks. Information and Software Technology 41, 107-117.
Wan, S., 2009. A spatial decision support system for extracting the core factors and
thresholds for landslide susceptibility map. Engineering Geology 108,
237-251.
Wan, S., Lei, T.C., 2009. A knowledge-based decision support system to analyze the
debris-flow problems at Chen-Yu-Lan River, Taiwan. Knowledge-Based
Systems 22, 580-588.
Wang, D., 1994. Study of mechanism of debris flow occurrence. Thesis in Department
of Civil Engineering, National Taiwan University (in Chinese).
Xu, X.L., Zhuang, D.F., Jia, S.F., Hu, Y.F., 2004. Automated extraction of drainages in
China based on DEM in GIS environment. Resources and Environment in the
Yangtze Basin 13, 343-348.(in Chinese).Yao, X., Tham, L.G., Dai, F.C., 2008.
Landslide susceptibility mapping based on support vector machine: a case
study on natural slopes of Hong Kong, China. Geomorphology 101, 572-582.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
17
Fig. 1. Spatial distribution of debris-flow hazards in dataset 1, with 4,146 assessment
units. Among the units, 716 have experienced debris flow hazards (= high risk)
whereas 3,440 have not (= low risk).
Fig. 2. Spatial distribution debris-flow hazards in dataset 2, with 2,026 assessment
units. Among the units, 716 have experienced debris flow hazards (= high risk)
whereas 1,310 have not (=low risk).
Fig. 3. Structure of BN (Bayesian Network) for machine learning.
Fig. 4. Structure of fine-tuned BN (Bayesian Network).
Fig. 5. ROC curves of BN (Bayesian Network), ANN (Artificial Neural Network) and
SVM (Support Vector Machine) for dataset 1.
Fig. 6. ROC curves of BN (Bayesian Network), ANN (Artificial Neural Network) and
SVM (Support Vector Machine) for dataset 2.
Fig. 7. Debris flow hazard zonal map for the Chinese mainland based on BN
(Bayesian Network). The blue box shows the area of Fig. 10.
Fig. 8. Debris flow hazard zonal map for the Chinese mainland based on ANN
(Artificial Neural Network).
Fig. 9. Debris flow hazard zonal map for the Chinese mainland based on SVM
(Support Vector Machine).
Fig. 10. Locations of representative large debris flow events in the Chinese mainland
in 2010. The location of the area is shown in Fig. 7.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
18
Table 1. Weights used to calculate the vegetation coverage index.
Land cover type Forest land Rangeland Farmland Land for construction Unused land
Weight 0.38 0.34 0.19 0.07 0.02
Land cover
subtype
Forest Shrub Other High
coverage
Middle
coverage
Low
coverage
Paddy Dry Urban Rural Other Sand Saline Bare Barren
Weight 0.6 0.25 0.15 0.6 0.3 0.1 0.7 0.3 0.3 0.4 0.3 0.2 0.3 0.3 0.2
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
19
Table 2. Data used to obtain the factors of debris flow hazards
Data type Source Data description Publisher
Historical data Historical data of debris flow hazards Point data Institute of Mountain Hazards and Environment, CAS
Geology Distribution of fault zone 1:500,000 Institute of Geographic Sciences and Natural
Resources Research, CAS
Topography Digital elevation Model (DEM) 90×90 m USGS/NASA
Climate Daily rainfall data, 2000 755 sites China Meteorological Data Sharing Service System
Vegetation Land use/cover 100×100 m Institute of Geographic Sciences and Natural
Resources Research, CAS
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
20
Table 3. Part of the dispersed sample data.
X1 X2 X3 X4 X5 X6 X7 C
3 5 5 1 4 1 7 2
3 5 3 1 1 3 6 2
3 6 2 1 1 1 6 1
3 6 5 1 1 1 7 1
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
21
Table 4. Performance of the three BN structures.
Local structure TPR (%) FPR (%) Precision (%) AUC
Naïve Bayesian network 83.13 8.72 88.41 0.94
Machine learning structure 84.16 8.61 88.84 0.95
Fine-tuned structure 85.66 8.23 89.63 0.95
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
22
Table 5. Performance of assessment models for dataset 1.
Model type TPR (%) FPR (%) Precision (%) AUC
Bayesian network 76.99 23.57 76.53 0.84
ANN 34.60 4.00 85.41 0.77
SVM 22.32 2.53 84.55 0.36
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
23
Table 6. Performance of assessment models for dataset 2.
Model type TPR (%) FPR (%) Precision (%) AUC
Bayesian network 85.66 8.23 89.63 0.95
ANN 81.63 3.48 91.27 0.94
SVM 73.44 8.17 85.31 0.92
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
24
Fig. 1
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
25
Fig. 2
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
26
Fig. 3
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
27
Fig. 4
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
28
Fig. 5
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
29
Fig. 6
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
30
Fig. 7
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
31
Fig. 8
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
32
Fig. 9
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
33
Fig. 10
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
34
Highlights
1. BN-based model for assessment of debris flow hazard.
2. The model was cross-validated and preformed better than SVM and ANNs.
3. The model can be applied for assessment of debris flow hazard at national scale.