s1. supplementary methods
TRANSCRIPT
1
S1. Supplementary Methods
Rising floodwaters: mapping impacts and perceptions of flooding in Indonesian Borneo
Jessie A. Wells, Kerrie A. Wilson, Nicola K. Abram, Malcolm Nunn, David L.A. Gaveau, Rebecca K. Runting, Nina
Tarniati, Kerrie L. Mengersen, Erik Meijaard
1.1. River networks and Watersheds
The DEM used for delineating river networks consisted of tiles from the void-filled CGIAR-CSI SRTM
dataset v4.1 [1], which we mosaicked and projected (to WGS 1984 UTM 49N), giving a DEM with a cell
size of 93.054 m. We generated a hydrologically correct DEM from the CGIAR-CSI v4.1 DEM, by using
ArcHydro 2.0 tools [2] in ArcGIS 10.2 [3] to perform sink identification, sink filling, and burning-in of
major water bodies (OpenStreetMap Planet.osm, 01 March 2013). We then used this hydrologically correct
DEM to calculate flow direction and flow accumulation (i.e. number of DEM cells from which water flows
to a given cell).
River networks: We delineated ‘Major Rivers’ by tracing all cells with a flow accumulation of >24,000 cells,
corresponding to a minimum drainage area of 200 km2 at the channel head. ‘All Rivers’ approximate the
network of permanent streams (i.e. non-ephemeral water flows), and were delineated by tracing cells with
flow accumulations >2,376 upcells, meaning the finest headwater streams each drain a minimum area of 20
km2. The threshold for permanent streams was estimated as the finest networks of surface water that are
visible in high resolution Quickbird and Ikonos imagery (Google Earth v. v.7.0, 1 March 2014).
Watershed definitions: We delineated primary watersheds as river basins that drain to the sea, based on the
‘major rivers’ stream network. Each primary watershed is therefore an area of land from which water drains
and converges to a single outlet point at the coast. Coastal catchments delineated by this process often
encompass multiple, adjacent finer-scale catchments, draining directly to the sea but without forming ‘major
rivers’ that reach the flow accumulation threshold of >24,000. To ensure the spatial predictors primarily
reflect the upstream area of any given focal point (and not distant areas along a coastline), we split any
coastal catchments larger than 600 km2 into catchments delineated with the ‘all rivers’ flow accumulation
threshold of 2,379. This gave a final set of 895 primary watersheds across Borneo, 564 of them in
Kalimantan (Indonesian Borneo).
We delineated subwatersheds within the primary watersheds, as the areas that drain to each stream segment
of the major rivers (2416 subwatersheds across Borneo, 1780 of them in Kalimantan). Each subwatershed
has its outlet at the junction of two major rivers, or the ocean. A ‘relative watershed’ defines the watershed
for any given location of interest, and consists of the subwatershed that contains the focal location, along
with any other subwatersheds that lie upstream (i.e. contribute to flow into the focal subwatershed). Relative
watersheds thus follow the nested structure of the river network.
2
Riverine focus: Our focus is on riverine flooding (possibly incorporating some flash flood events), rather
than coastal storm or tidal flooding. Therefore, we restricted the analyses to mainland watersheds (excluding
estuaries and deltas), and only consider settlements > 400 m from estuaries, deltas or the ocean. Some of the
major rivers show tidal influences for tens of kilometres inland, so it is possible that higher tides may have
contributed to the height of some of the reported flood events. However, this possibility concerns a minority
of flood events, and is not likely to strongly affect our analyses of village flooding frequencies and
presence/absence of newspaper-reported floods.
1.2. Village Interview datasets:
Survey methods, quality assessment, and coding of responses
This study analysed data on flood frequency and trends from interviews with the village head (or other
official) in 364 villages in Kalimantan (Indonesian Borneo). These interviews were conducted as part of a
larger survey of villagers’ perceptions of forests and wildlife in Kalimantan and Sabah.
The larger survey is described in detail by Meijaard et al. [4, 5], including interview methods, selection of
villages and respondents, ethics approvals, local government permissions, and protocols to ensure prior and
informed consent was given by each participant. Villages were sampled in a stratified random design to
enable simultaneous studies of ecosystem services and wildlife conservation, in areas close to forests (either
within forests or less than 10 km from forests), and within the geographic range of the orangutan (Pongo
pygmaeus). Sampling was therefore random with respect to past or present flooding. Interviews were
conducted in bahasa Indonesia by trained interviewers from local NGOs.
The larger survey involved two sets of interviews. Firstly, a village-level interview was conducted with the
village head (or village government official), asking about the village history, demographics, livelihoods, and
natural disasters including floods. Secondly, interviews on villagers’ individual perceptions of forests and
wildlife were conducted with 7–12 respondents per village.
In this study, we focused on Kalimantan, and analyse village-level information on flood frequency and trends
based on the interview with the village head (or village government official), which have not been previously
published.
In contrast, other studies of villagers’ perceptions of wildlife or ecosystem services [5–7] were based on the
interviews with individual villagers. Perceptions of flooding were not asked directly during the individual
interviews. However, villagers often volunteered the view that forests are important for flood regulation, in
response to an open question on why forests were important to the health of respondents and their families.
These volunteered perceptions were analysed in [6], and are briefly summarised in Supplementary Results
2.1.
The village-level interviews collected information on the history of the village (year of establishment); total
population size; number of men and women; percentage of villagers who are Muslim, Christian or adhere to
other religions; number of schools; presence of customary forest land; main sources of village livelihoods;
3
presence of industrial land uses (timber, plantations, mining); and history of fires and floods (flooding
frequency over the past 5 years, and any trends in frequency over the past 30 years).
We conducted quality assessments of the survey datasets based on patterns of responses recorded from each
village, interview team and NGO, including lengths of the ‘open’ question responses. We excluded
interviews from any teams which recorded less detailed information (indicated by open responses with text
lengths consistently below c.100 characters), and any examples where text responses were not unique. This
process gave a ‘highest reliability’ dataset containing interviews from 512 villages in Kalimantan.
For the present study on flooding, we analysed only the village-level interviews where responses to the
specific questions on flooding were recorded as full sentences containing quantitative information on one or
more aspects of flooding (frequency, event years and/or trends).
This gave a final dataset of 364 villages, out of the 512 villages in Kalimantan. These interviews were
conducted between April-October 2009 (341 villages) or April-October 2012 (23 villages). There were no
detectable differences between responses recorded in 2009 vs 2012, nor among months April to October in
2009.
The 364 villages in this study had an average of 353 families per village, or an estimated total of 108,100
families. The mean year of establishment was 1957, with some as early as the 1700s, and the majority from
the 1940s – 1970s. Fourteen of the villages analysed for present-day flood frequency (not trends) were
established in the 1980s or 1990s, either as recent settling by previously semi-nomadic indigenous groups, or
as part of the government’s transmigration programs. These were excluded from analysis of 30 year trends.
Questions and coding of responses for this study:
Our study selected the village-level interviews from 364 villages within Kalimantan, for which the
interviews with the village head (or other official) gave detailed responses to questions on flooding – i.e.
responses were recorded as full sentences containing quantitative information. Open text responses were then
coded as detailed below.
In all cases, a ‘flood’ was defined as either a riverine flood or flash flood, in which floodwaters covered the
village’s main road or path at the centre of the village. This simple definition was selected to maximise
consistency across villages, and through time for a given village (being less sensitive to changes in village
size than definitions based on flood extents or flooding of houses, and facilitating more consistent recall over
the 30 year period).
If no response was recorded, we treated these values as unknown. We did not assume, for example, that
absence of a response, meant absence of flooding.
(1) Frequency of flooding over the past 5 years. The respondents were asked how frequently they
experienced floods in the 5 years prior to the survey. Frequency was coded on a power-function scale, as f=
approximately N floods per year: NA – no response; 0 – No floods reported (f=0 per year); 1 – Floods rare,
irregular or intervals >2 yr (f=approx. 0.1 per year); 2 – Floods intermediate in frequency (f=approx. 0.5 per
year); 3 – Annual flooding (f=approx. 1), or 4 – More than one flood per year (f=approx. 2)
4
(2) Trend in flooding frequency over the past 30 years. Responses to the question “has the frequency of
floods declined, stayed the same; or increased over the past 30 years?” were coded as: Decline in frequency
(-1); No change (0); or Increased frequency (1). No response, or no floods reported in 30 years, were coded
as missing data (NA). The severity of flooding was reported (in open answers) to have increased along with
frequency, or no change was mentioned.
Sample sizes: From the total of 364 village-head interviews, responses were recorded for flood frequency
over the past 5 years in 302 villages, and trends in flooding over the past 30 years for 260 villages (256 of
these 260 villages also reported on recent frequency). These villages are shown in Figure 1 (main article),
along with 2010 landcover, and are distributed widely across the island, in 19 districts in four of
Kalimantan’s five provinces (West, Central, East and North Kalimantan).
1.3. Newspaper reported flood events and estimation of impacts
We obtained flooding reports from the online archives of six news publishers in Kalimantan (Tribun Post,
Kalimantan News, Detik News, Equator, and Radar), covering 16 local or regional newspapers, using the
search keyword 'banjir' (flood), over the 3 yr period 20 April 2010 – 29 April 2013. We georeferenced
settlements affected by newspaper-reported floods based on named localities using Google Earth,
Wikimapia, an online database for geographical names (http://www.geographic.org/geographic_names/), and
named village administrative boundaries for the 2010 Indonesian Census [8]. We assigned spatial co-
ordinates to each record as the centre of the main street of any named village, or, in the case of records only
referenced to subdistrict level, a location within a settlement close to the nearest major river. Therefore, these
co-ordinates are approximate, and likely to be accurate to within hundreds of metres for most records, or up
to 1 km for less-specific named locations, for example within the cities of Samarinda or Banjarmasin. Of the
total of 966 settlements reported flooded, 380 could be georeferenced (Figure S1). Many of the settlements
that could not be georeferenced were from a single flood event in April 2010, affecting 430 villages in South
Kalimantan.
For each flood event, we recorded the number of city areas affected, and either the specific number of
villages (if this was reported), or alternatively, the number of subdistricts if exact village numbers were not
known. Each report gave between one and four numerical estimates of flooding impacts, most often as
numbers of households flooded. If numeric flood impacts were reported directly, we used them in all
calculations. In 12 cases where a word was used rather than a number, this was translated conservatively as
dozens = 50, hundreds = 200, several hundred = 300, thousands = 2000. If numbers were not reported
directly, then estimated values were obtained either from other data within the same record (e.g. N people
affected was estimated from N houses flooded, using multipliers based on average household size for each
Province in 2010 [9], specifically 4.3 West, 3.9 Central, 3.7 South and 4.1 East Kalimantan), or by applying
median and high and low numbers of houses and people affected per event. Specifically, if the number of
houses per village, subdistrict or city was not reported, we applied low and high estimates based on the
distribution of N houses per settlement for each type, taking the median, and the 10th and 80th percentiles of
5
the distribution of reported values per settlement per event (Table S1). The 80th percentile was selected,
rather than the 90th, to give more conservative ‘high’ estimates, less influenced by the tail of mainly urban
events.
Note that these estimates of people affected were based specifically on flooding of houses, and do not
include people affected via flooding of fields, workplaces or public facilities.
Table S1. Number of houses flooded per settlement, as the median and 10th and 80th percentiles from the
distribution of values reported in newspaper articles on 138 distinct flood events in Kalimantan.
Figure S1 (separate PDF file) shows the 380 flooded settlements as derived from the newspaper dataset,
along with the set of 380 randomly sampled absences (for sampling of absences, see section 1.5 Newspaper-
reported floods: Presence/Absence modelling).
Settlement: Median N houses 10th percentile 80th percentile
Village 120.5 23 300
Subdistrict 200 64.2 396
City 437 40 5000
6
1.4. Boosted Regression Tree Modelling
We developed Boosted Regression Tree (BRT) models separately for each of the three flooding datasets:
(1) flooding frequency from village interviews (i.e. coded frequency of floods over the past five years),
(2) flooding trends from village interviews (presence/absence of an increase in frequency over 30 years), and
(3) newspaper reports of flood events (presence/absence). BRT methods combine many regression trees to
form an ensemble model. Specifically, individual regression trees (each relating a response to predictors
using a ‘tree’ of recursive binary splits) are generated using an adaptive method to iteratively improve model
performance, via a stochastic gradient boosting algorithm [10]. This tree structure naturally allows for
modelling of interactions among predictors.
Each response variable was modelled using either a Gaussian distribution (for flood frequency data from the
village interviews) or Bernoulli distribution (for binary presence/absence data). Flooding trends were
recoded as presence/absence data, because only 0.8% of villages reported a decline in frequency. This gave a
final dataset with values of 0 (‘no change’ in flood frequency over the past 30 years) or 1 (increased
frequency). For the analysis of news reports, 0 denotes absence of a reported flood event, and 1 denotes
presence (see section 1.5 below).
We developed and evaluated all BRTs using five-fold cross validation fitted in R version 3.1.0 [11], with the
functions gbm.step (version 2.9) from the 'dismo' package [12]. We performed the cross validation using
five equal subsets of the data, and assessed the optimal number of trees as the number that minimised the
holdout residual deviance (as the optimal compromise between minimising bias and variance). We set tree
depth to 3 or 4 (allowing 3-way or 4-way interactions), and found the results were almost identical for depths
of 3–5. The learning rate was 0.002 (giving a low weight to the contribution of individual trees in each
boosting step), and bag fraction 0.5 (i.e. 50% of the observations were randomly selected for each boosting
step).
For each analysis, we initially fitted models using the full set of spatial predictor variables (detailed in
section 1.6, below). The final models dropped any predictor variables that contributed < 1% of explained
variance. We used log10 transformations for two predictor variables that had extremely skewed distributions
(size of watershed, population density).
We assessed the performance of the models for flood frequency by the correlation between observed and
predicted values (for the training and testing datasets) and cross-validation statistics (across the five subsets).
Performance of the presence/absence models (i.e. models of flooding trends and news-reported floods) was
assessed using confusion matrices (assessing classification error against observed data, using an optimized
threshold of the predicted probability to define presence/absence), and Receiver Operating Curve statistics
[12].
Finally, we mapped predictions from each BRT model across all populated areas of Kalimantan, as areas
with an estimated population density of ≥1.2 per km2 in the LandScanTM 2011 dataset. For the village-based
7
analyses, we omitted predictions for dense urban areas, since the surveys did not cover these areas and we
restrict the scope of prediction to smaller settlements.
Populated areas – identifying locations for sampling and display of BRT predictions:
We identified possibly populated areas as 1 km grid cells with population density >1.2 people per km2
calculated from the LandScanTM 2011 population dataset [13]. Many of Borneo's villages are located in areas
that have an estimated population density of only 1.2 – 4 people per km2, and we selected this minimum
density by comparison with two accurate village datasets in East and West Kalimantan, to minimise
exclusion of small villages, while not claiming to predict perceptions or experiences of flooding for areas
that are not inhabited. For details of the LandScanTM dataset and comparisons with village locations, see
Population density data, in section 1.6, below.
We used ‘possibly populated areas’ for two purposes: i) To generate random samples of settlements for
modelling newspaper-reported flooding; ii) To generate and display model predictions across all populated
areas of Kalimantan. Specifically, we generated BRT predictions for the centre of each of the 1 km cells with
population >1.2 per km2, and these point predictions were then converted to raster maps. (As mentioned
above, mapped predictions from the village-based analyses cover all ‘possibly populated areas’ except dense
urban areas, which are outside the range of prediction from the village interview datasets.)
Map resolution is described as ‘1 km’ resolution for brevity, to represent cells of exactly 930.54 x 930.54 m,
equal to 10 x 10 DEM cells. This does not affect the population density values, which were calculated per
km2, not per cell.
1.5. Newspaper-reported floods: Presence/Absence modelling
Spatial modelling of flood occurrence from the newspaper data requires the reports of 'flood presence' to be
analysed in relation to a set of 'absence' points for flooding over the same period, because newspapers report
flood events, and do not directly report 'non-floods'. This is in contrast to the village interviews, where
information on both the presence and absence of floods was given directly (and further specified by
frequency). For the newspaper analyses, ‘presence’ data consisted of 380 reported floods of settlements over
the period 20 April 2010 – 29 April 2013. The ‘absence’ data consisted of a spatially random sample of 380
settlements in which no floods were reported, within the geographic range of newspaper coverage. This
approach of using an equal number of estimated absence points within the environmental and geographic
range of observed presences, follows recent guidelines for the selection method and number of pseudo-
absences for BRT modelling [14].
Specifically, we randomly selected 380 ‘absence’ points from all populated areas (locations with population
density ≥ 1.2 per km2 based on the LandScanTM 2011 dataset) within the 41 districts covered by these
newspapers, restricted to the report dataset’s ranges of elevation (0-200 m), distance from rivers (0 – 4.4
km), and distance from the coastline (0.4 – 260 km). We further restricted ‘absences’ to exclude points
8
within 1 km from a reported flood, and to exclude points adjacent to rivers if within 3 km immediately
upstream or 18 km downstream from a reported flood (these distances were based on the distance-decay of
similarity between flood frequencies in the village interviews, see Results section below). These ‘absence’
points represent the absence of a newspaper-reported event over the specified time period. In addition to the
analysis of ‘absence’ points, we performed an alternative analysis using ‘background’ points, i.e. a random
sample of 1200 points, with the same ranges of elevation and distances to rivers and the coast as above, but
without any consideration of the locations of reported flood events (i.e. as a random sampling of
‘background points’, without attempting to identify ‘absences’).
Newspaper reporting may be biased towards events in more accessible or highly populated areas, since
regional newspapers and the majority of their readers are based in coastal cities. This could result in relative
over-reporting of urban flood events and under-reporting of flood events in remote or low density
settlements. (It is also possible that frequent flooding, as experienced by many of the villages in the interview
surveys, is not reported as news unless an event is unusually large or damaging). Newspapers rarely reported
floods in the most remote areas (>300 km from a city). However, the majority of all reports still came from
small villages or towns (28% from areas with population density <20 km-2, 71% <1000 km-2), often located
>100 km from cities or the coast, indicating broad coverage both geographically and in relation to population
density.
To minimise the possible effects from reporting rather than flood occurrence, we restricted the ‘absence’ and
‘background’ points for these analyses to populated areas of known newspaper coverage, and to the same
ranges of elevation and distances to rivers and the coast, as described above. Furthermore, we found similar
results for analyses using alternative ‘absence’ datasets, or using alternative subsets of the newspaper data, or
when excluding reports from high population densities (see S2 Supplementary Results, Sensitivity to
methods for News-reported flood models).
1.6. Predictor variables for modelling villager perceptions and reported flooding
We calculated 23 land use and land cover variables and 12 other spatial predictor variables for each sampled
settlement, for the purpose of estimating BRT models. We also calculated these variables for populated areas
across Kalimantan, as the area of interest for displaying BRT model predictions. We calculated the predictor
variables from spatial data layers developed using ArcGIS 10.2 [3] and projected using the World Geodetic
System 1984 Universal Transverse Mercator Zone 49 North (EPSG 32649). Spatial data layers covered all of
Kalimantan, and small areas of Sarawak and Sabah, where watersheds extended across borders.
For each settlement or populated area, predictors consisted of :
Watershed-based variables, estimated for the watershed area upstream from each settlement (i.e. for
the ‘relative watershed’ specific to each settlement). These variables consist of the percentage area
covered by each LULC class; mean climate, soil and topographic variables; population density
(persons per km2); and line-density of rivers and roads (length per unit watershed area, km per km2).
9
Distance variables, giving the Euclidean distance to the coastline and to the nearest instance of
rivers, peatlands, or LULC classes.
Social variables, assigned to each settlement from district-level data (representation of major
religions, main ethnic group)
Land Use and Land Cover (LULC) variables
LULC variables consisted of the area (proportion or absolute area within the focal watershed) and distance
(from each focal point) to each of 20 LULC types.
We developed the LULC map for all of Borneo for the year 2010, using as our primary source (1) the
‘SarVision 2010’ LULC classification by SarVision from 2010 ALOS PALSAR satellite imagery [a
refinement of methods reported by 15]; and incorporating spatial information from five further sources: (2)
Open mining sites identified from Landsat GLS imagery; (3) wetland agriculture identified in SarVision’s
2007 LULC classification ; (4) A map of impervious surface cover [16] used in the identification of urban
areas; (5) Maps of oil palm plantations and industrial timber plantations [17]; and (6) Digitized logging roads
to distinguish forest areas as intact, logged, or severely degraded [17].
Details of datasets used to generate the LULC map for Borneo for 2010:
(1) A 2010 LULC classification (50 m resolution) developed by by SarVision from 2010 ALOS PALSAR satellite
imagery, using methods similar to those applied to 2007 data by Hoekman et al. [15]. Our final landcover layer either used
the SarVision classes directly (e.g. Mangrove), or grouped classes to form a more general class (e.g. Bare or Sparse created
from: Wet Bare Sparse, Grass Regrowth, Bare Recently Cleared), or new classes was based on reclassification using the
datasets below.
(2) Open mining sites (coal, gold) were manually digitized by visual inspection of >52 Landsat GLS images acquired in
~2010 (http://earthexplorer.usgs.gov/), by David Gaveau and Elis Molidena. Open mining sites were readily identified as
large clear-cut areas with distinctive homogeneous spectral signatures characteristic of bare soil areas, within mining
concessions. Smaller mining areas, predominantly gold mining near rivers, were extracted from the Indonesian Ministry of
Forestry maps of 2009 landcover for Kalimantan (http://webgis.dephut.go.id/, manual digitsation from Landsat 2009
imagery). Forty of these areas were checked against high resolution Ikonos and Quickbird imagery using the QGIS
OpenLayers plugin, and all were visible, with a maximum 200 m displacement between the mapped class and the edges of
mining scars visible in the high resolution imagery.
(3) Wetland agriculture areas identified by SarVision in their classification of 2007 ALOS PALSAR imagery at 50 m
resolution (this class was well identified in 2007, but was not separated from other wetlands in the 2010 classification)
(4) A map of Impervious surface cover for South East Asia (% per 1km2 pixel) was used in concert with the landcover
layers to define ‘urban’ areas. The Impervious cover dataset, developed by Sutton et al. [16], predicts the impervious
surface percentage for a given pixel using a simple two variable multi-variate regression model, where the predictors were
night-time lights (from DMSP-OLS satellite imagery) and a population count from Landscan 2010. We then overlaid the
impervious cover map over high resolution imagery (Google Earth) for ten cities and towns, to decide on a threshold of
minimum 4% impervious cover per 1 km pixel to define areas of broad urban coverage. Within these areas, forest classes
from SarVision 2010 LULC were retained, and the remaining open or sparsely vegetated areas were assigned to a new
‘Urban’ class. (Specifically, two rules were applied: Within areas of impervious surface cover >4%, reclassify as 'Urban
Cover' all examples of Sarvision2010 grasslands 5, shrubland 8, wet bare sparse 9, shrub regrowth 10, grass regrowth 15,
and bare recently cleared 16. Within areas of impervious cover >10%, reclassify as 'Urban Cover' all examples of
Sarvision2010 Woodland 2 and Open forest 13.)
10
(5) Maps of 2010 coverage of oil palm plantations and industrial timber plantations (mainly Acacia) were used to identify
further areas of these landcovers, in addition to the smaller areas identified in the original SarVision classification. The
maps were developed through onscreen digitising (using ArcGIS 10) of 150 Landsat images from 1990-, 2000-, and 2010-
eras, downloaded from the Global Land Survey database (http://earthexplorer.usgs.gov/) [17].
(6) A vector map of logging roads was used to distinguish intact, logged and degraded forests. Gaveau et al. [17] identified
logging roads (indicating mechanized logging) by manually digitising logging roads visible in Landsat images from 1990-,
2000-, and 2010-eras, downloaded from the Global Land Survey database (http://earthexplorer.usgs.gov/; manual onscreen
digitization performed in ArcGIS 10). The extent of likely logging impacts was estimated by buffering the logging roads by
a distance of 700 m, based on analysis of changes in tree cover with distance from roads using MODIS imagery [17], and
then incorporating any small areas of <100 ha enclosed by the buffer. This extent was then used to split forest areas into
logged and unlogged areas: 1. to split woodland-open forest areas into ‘Severely degraded logged forest’ (if within the
roads network buffer), or ‘Agroforest or regrowth’ (if outside the buffer); 2. to split the ‘closed forest’ class into ‘intact’
and ‘logged forest’, and 3. to split ‘closed peat forest’ into its intact and logged classes).
A further three LULC variables consisted of area and distance to protected areas [18], impervious surface
area (% coverage in each 1 km2 cell) from 2010 satellite data [16], and aboveground carbon (Mg ha-1)
estimated from LiDAR remote sensing [19].
We calculated LULC values for each focal point (i.e., each village or 1 x 1 km cell across Kalimantan), as
1) Watershed-based metrics, calculated as the percentage cover of each LULC (or the mean for impervious
cover or aboveground carbon) in the watershed area upstream from a focal point, and 2) Distance metrics, as
the distance from each focal point to the nearest instance of each LULC class (to avoid possible influence of
isolated pixels of a given LULC class, an instance was defined as any patch ≥ 4.04 ha, or 16 pixels, where
patches connected pixels on the diagonal as well as adjacent).
We derived four topographic variables. We extracted elevation from the DEM at 3 arc-second resolution (i.e.
using the original CGIAR-CSI v4.1 DEM, that was void-filled, but not hydrologically corrected.). We
calculated river distances as the Euclidean distance from each settlement or cell centre to the nearest instance
of a river and a major river, giving the two variables: ‘Rivers – distance to nearest river’ (based on the ‘All
Rivers’ network of streams with minimum drainage areas of 20 km2), and ‘Rivers - distance to nearest Major
River’ (based on the ‘Major Rivers’ network of rivers with minimum drainage areas of 200 km2). Similarly,
distance to coast gives the Euclidean distance to the nearest point on the coastline.
We estimated road density using a line density function for a roads network that combines government base
maps of public and logging roads from 2003 [20], with primary logging roads digitised from 1973 to 2010
Landsat imagery [17], and additional public roads digitized from 2009 Landsat imagery.
We selected two climate variables with minimal correlations: long term means of temperature seasonality,
precipitation seasonality, and precipitation of the wettest month. We extracted these climate variables from
the WorldClim, ver. 1.4 dataset (http://www.worldclim.org/) at 30 arc-seconds resolution, based on
observations over the period 1950 - 2000. These two variables were only weakly correlated (r=0.28), driven
by a small number of extreme values.
Rainfall varies from 1520 – 4820 mm per year across the island, while monthly rainfall varies between 80–
310 mm for the driest month and 160–740 mm for the wettest month each year (averages from monthly data
11
1950-2000, Hijmans et al. 2005). Especially in eastern Borneo, rainfall shows pronounced seasonality, and
much of the precipitation falls within the wet season, often between November and February. Dry months
with <100 mm rainfall are generally rare, though drought conditions can occur, especially during strong El
Niño events.
We estimated two soil variables: 1. the presence of peat soils, and 2. the “change in soil saturated water
content” (satTheta mm/m) for current vs. undisturbed conditions. Soil maps were based on Indonesian
landsystem maps at 1:250,000 scale, reclassified to 7 FAO soil orders [20]. Peatlands were identified as the
order Histosols, and enlarged to include peatlands identified by Wetlands International [22] (this affected
only a small proportion of areas originally mapped as Entisols). The ‘change in soil saturated water content’
was calculated as the difference in saturated water content between the present condition and undisturbed
condition. Values for saturated water content were estimated for each soil class (7 soil orders) and each of
five levels of soil disturbance (primary vegetation, logged, agriculture, plantations, and degraded, inferred
form the 2010 landcover maps), based on data for Kapuas Hulu, West Kalimantan, in the GenRiver database
[23], for all soil orders except for histosols. For histosols, estimates were based on a review of literature
values for Kalimantan and Sarawak peatlands [24, 25].
We estimated population density as the number of people per km2 based on LandScanTM 2011 population
counts [13], and ranges from 0 to 79,294 people per km2 across Kalimantan. We resampled this dataset to the
3-sec DEM and then projected it to calculate the local population density (within a 465 m radius from each
focal point), the maximum population density per watershed, and the percentage area of the watershed
formed by ‘possibly populated’ areas with densities ≥1.2 km-2, and ‘populated areas’ with densities ≥ 10
km-2.
Because comprehensive and accurate datasets on settlement locations currently exist only for small areas of
Kalimantan, we used the LandScanTM 2011 dataset as the most accurate source of spatial population data that
covers all of Kalimantan and gives non-zero densities for most of the known locations of small villages. This
dataset estimates the ‘ambient’ population distribution as an average over a 24 hour period (i.e. integrating
diurnal movements, in contrast to estimates based on residence locations alone). LandScanTM uses
dasymetric mapping to distribute population counts within census areas, based on relationships with multiple
spatial predictors including roads and land cover. The prediction of higher populations in areas with low
vegetation cover, however, leads to some artifacts in areas where the vegetation is naturally sparse. To
reduce this effect, we applied zero values to areas of karst limestone mountains and ultra-basic mountains
mapped in the RePPProT landsystems dataset [26].
Other datasets that cover Kalimantan misplace or entirely omit thousands of smaller towns and villages,
often showing only larger, easily identifiable settlements. For example, WorldPop2010 estimates zero
population counts for all areas of Kalimantan outside the district capitals and other major urban areas [27].
The Indonesian 2010 national census gives population counts at village level [8], but the only spatial
information provided are village administration boundaries (mean area 16 x 16 km), which may contain
12
multiple settlements, are sometimes inaccurate, do not indicate any variation in population within that area,
and do not align with watersheds.
We used detailed settlement datasets from two districts for comparison with the LandScanTM 2011 data, and
to estimate threshold values for identifying ‘possibly populated’ areas throughout Kalimantan, balancing
omission and commission errors: 1) Kapuas Hulu District, West Kalimantan [28]; 2) Berau District, East
Kalimantan: The Nature Conservancy (Berau office) reconciled village point locations from the agencies
BAPPEDA (Regional Planning) and BPN (National Land Agency), updated from field surveys in 2005.
We selected a threshold population density ≥1.2 per km2 for ‘possibly populated areas’, as the minimum able
to capture 90% of known village locations, without selecting extensive areas of forests or wetlands with no
known settlements. These ‘possibly’ populated cells cover 52% of Kalimantan’s land area. More densely
populated areas were identified by a threshold of ≥10 per km2, and cover 15.2% of Kalimantan’s land area.
To represent ethnic groups, we digitized a map of Borneo showing the main ethnic group in each location
[29], where the broad groups consist of: Central-Northern groups; Dusun and North-Eastern groups; Iban and
Ibanic groups; Kayan and Kenyah groups; land Dayak and western groups; Malay groups; Ngaju and Barito
groups; Nomadic groups and an unknown category.
Representation of major religions was included because religion was a strong predictor of forest use and
perceptions held by individuals in a concurrent study based on the individual interviews in the wider
interview survey [5, 6]. The percentage of the population who were registered as Christian and Muslim were
obtained for each district in Kalimantan from Government Statistical agencies, either from online sources
(for Central Kalimantan, http://kalteng.bps.go.id/GIS.html) or published documents dated 2009 – 2011 [30–
32].
Correlations among predictor variables
We assessed correlations among predictor variables using values from the 195,739 populated points across
Kalimantan. The predictors showed generally low correlations with one another, except for moderate
correlations between cover of a given land use at the subwatershed scale versus the relative watershed scale
(range r=0.38 for industrial timber to r=0.67 for intact forest cover, for points where the relative watershed
was larger than the local subwatershed). We included both scales in the initial models, and one or the other
retained if significant.
For all other variables, Pearson correlation coefficients were low (r< 0.32), or were moderate (r<0.64) but
still showed large ranges of variation in each variable, at any given value of the other. For example, elevation
is weakly correlated with slope and distance from the coast (because elevation and slopes are low closer to
the coast), but varies over its full range 0 – 2000 m for points >30 km from the coast. Elevation is also
weakly correlated with aboveground woody carbon (r=0.37), because low carbon values only occur at low
elevations, but carbon values span their full range at any other elevation.
Among LULC classes, we observed moderate positive correlations among the open landcovers of wet and
dry agriculture, grasslands and shrublands (range of correlations for distance or cover r=0.36 to 0.64), and
13
weak negative correlations between open classes and forests (logged or intact, r=-0.22 to -0.52). Oil palm
area was weakly negatively correlated with logged or intact forests (r=-0.30 to -0.42). Mean carbon
quantities showed the expected negative correlations with the cover of open LULC classes, ranging in
strength from oil palm (r=-0.52) to shrub and grasslands (-0.69, -0.70), and % area populated (-0.67), and
positive correlations with logged (0.66) and intact (0.80) forests.
Impervious cover showed a positive correlation with % urban cover (r=0.88, due to association of highest
values), but appeared to be a more sensitive metric, since watersheds with 0% urban cover could have up to
3% impervious cover. A few extremely high values drove correlations between impervious cover and mean
population density (0.62), and between mean population density and urban cover (0.77).
Distance to all rivers sets the minimum possible for the distance to major rivers, however there remained
large variation in both variables above this logical floor. Similarly, population density metrics showed only
weak correlations beyond the logical constraint that maximum density (in the watershed) cannot be lower
than local density.
The two climate variables (precipitation seasonality, and precipitation of the wettest month), were weakly
correlated (r=0.28), driven by a few extremely high values, and showed only weak correlations with other
predictors (a slight tendency for watersheds with higher seasonality to have higher population densities and
agricultural covers, r <0.54).
1.7. Review of event records and hazard or risk assessments by the Indonesian Government
We searched government documents, academic and grey literature, and online databases and portals in
English and Indonesian, seeking information on flooding (events, monitoring, and response), river
monitoring, and assessments of hazard or risks for Kalimantan. The publicly accessible online Disaster Loss
Database contains data on disasters for a range of time periods and at varying level of detail across Provinces
(DiBi – Data dan informasi Bencana Indonesia http://dibi.bnpb.go.id/DesInventar/simple_data.jsp, managed
by the Indonesian National Disaster Management Agency, BNPB). For Kalimantan, flood event records
cover the period from 16 April 1998 to the present. The DiBi database was queried on 9 June 2015, for event
records over the same period and extent as our newspaper search (for the period 20 April, 2010 – 29 April,
2013, in West Kalimantan, East and North Kalimantan, South Kalimantan, and 3 districts in Central
Kalimantan).
S1 References
[1] Jarvis A, Reuter HI, Nelson A and Guevara E 2008 Hole-filled seamless SRTM data V4
[2] Esri Water Resources Team 2013 Arc Hydro Tools Overview - for ArcGIS 10.x (New York: ESRI)
[3] ESRI 2013 ArcGIS Desktop: Release 10.2
[4] Meijaard E, Mengersen K, Buchori D, Nurcahyo A, Ancrenaz M, Wich S, Atmoko SSU, Tjiu A,
Prasetyo D, Nardiyono et al 2011 Why don’t we ask? A complementary method for assessing the
status of great apes PLoS One 6 e18008. doi:10.1371/journal.pone.0018008
[5] Meijaard E, Abram NK, Wells JA, Pellier A-S, Ancrenaz M, Gaveau DLA, Runting RK and
14
Mengersen K 2013 People’s perceptions on the importance of forests on Borneo PLoS One 8 e73008
[6] Abram NK, Meijaard E, Ancrenaz M, Runting RK, Wells JA, Gaveau DLA, Pellier A-S and
Mengersen K 2014 Spatially explicit perceptions of ecosystem services and land cover change in
forested regions of Borneo Ecosyst. Serv. 7 116–127
[7] Abram NK, Meijaard E, Wells JA, Ancrenaz M, Pellier A-S, Runting RK, Gaveau D, Wich S,
Nardiyono, Tiju A et al 2015 Mapping perceptions of species’ threats and population trends to inform
conservation efforts: the Bornean orangutan case study Divers. Distrib. 21 487–499
[8] BPS 2011 Sensus Penduduk 2010 (Jakarta: Badan Pusat Statistik BPS – Statistics Indonesia)
[9] BPS 2011 Trends of the Selected Socio-Economic Indicators of Indonesia – Perkembangan Beberapa
Indikator Utama Sosial-Ekonomi Indonesia, Nov 2011 Katalog BPS: 3101015 (Jakarta, Indonesia:
Badan Pusat Statistik BPS – Statistics Indonesia)
[10] Elith J, Leathwick JR and Hastie T 2008 A working guide to boosted regression trees J. Anim. Ecol.
77 802–813
[11] R Core Team 2014 R: A Language and Environment for Statistical Computing (Vienna, Austria: R
Foundation for Statistical Computing)
[12] Hijmans RJ, Phillips S, Leathwick JR and Elith J 2013 Dismo v 08-11 CRAN - Comprehensive R
Archive Network)
[13] Bright EA, Coleman PR, Rose AN and Urban ML 2012 LandScan 2011 (Oak Ridge, TN: Oak Ridge
National Laboratory SE)
[14] Barbet-Massin M, Jiguet F, Albert CH and Thuiller W 2012 Selecting pseudo-absences for species
distribution models: how, where and how many? Methods Ecol. Evol. 3 327–338
[15] Hoekman DH, Vissers MAM and Wielaard N 2010 PALSAR Wide-Area Mapping of Borneo:
Methodology and Map Validation IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 3 605–617
[16] Sutton P, Elvidge C, Tuttle B, Ziskin D, Baugh K and Ghosh T 2010 Impervious Surface Area of
South East Asia (Boulder, Colorado USA: National Geophysical Data Centre, National Oceanic and
Atmospheric Administration)
[17] Gaveau D, Sloan S, Molidena E, Husnayaen H, Sheil D, Abram N, Ancrenaz M, Nasi R, Wielaard N
and Meijaard E 2014 Four decades of forest persistence, clearance and logging on Borneo PLoS One
9 e101654. doi:10.1371/journal.pone.0101654
[18] Wich SA, Gaveau D, Abram N, Ancrenaz M, Baccini A, Brend S, Curran LM, Delgado RA, Erman
A, Fredriksson GM et al 2012 Understanding the Impacts of Land-Use Policies on a Threatened
Species: Is There a Future for the Bornean Orang-utan? PLoS One 7 e49142.
doi:10.1371/journal.pone.0049142
[19] Baccini A, Goetz SJ, Walker WS, Laporte NT, Sun M, Sulla-Menashe D, Hackler J, Beck PSA,
Dubayah R, Friedl MA et al 2012 Estimated carbon dioxide emissions from tropical deforestation
improved by carbon-density maps Nat. Clim. Chang. 2 182–185
[20] Gingold B, Rosenbarger A, Muliastra YIKD, Stolle F, Sudana IM, Manessa MDM, Murdimanto A,
Tiangga SB, Madusari CC and Douard P 2012 How to Identify Degraded Land for Sustainable Palm
Oil in Indonesia. Working Paper (Washington DC: World Resources Institute and Sekala)
[21] Hijmans RJ, Cameron SE, Parra JL, Jones PG and Jarvis A 2005 Very high resolution interpolated
climate surfaces for global land areas Int. J. Climatol. 25 1965–1978
[22] WRI and Sekala 2012 Peatland Depth in Kalimantan, Spatial Dataset Digitized from Maps at
1:250,000 Scale by Wetlands International World Resources Institute and Sekala)
[23] van Noordwijk M, Widodo RH, Farida A, Suyamto D, Lusiana B, Tanika L and Khasanah N 2011
GenRiver and FlowPer: Generic River Flow Persistence Models. User Manual Version 2.0 (Bogor,
Indonesia: World Agroforestry Centre (ICRAF) Southeast Asia Regional Program)
[24] Anshari GZ, Afifudin M, Nuriman M, Gusmayanti E, Arianie L, Susana R, Nusantara RW,
Sugardjito J and Rafiastanto A 2010 Drainage and land use impacts on changes in selected peat
properties and peat degradation in West Kalimantan Province, Indonesia Biogeosciences 7 3403–
3419
15
[25] Shimada S, Takahashi H, Haraguchi A and Kaneko M 2001 The carbon content characteristics of
tropical peats in Central Kalimantan, Indonesia: Estimating their spatial variability and density
Biogeochemistry 53 249–267
[26] HCV Consortium for Indonesia 2009 Guidelines for the Identification of High Conservation Values
in Indonesia. English Version
[27] Gaughan AE, Stevens FR, Linard C, Jia P and Tatem AJ 2013 High resolution population distribution
maps for Southeast Asia in 2010 and 2015. PLoS One 8 e55882
[28] Liswanti N 2013 Engaging multiple stakeholders in collaborative land use planning and ecosystem
based management: The use of foresighting approach 6th Annu. Int. Ecosyst. Serv. Partnersh. Conf.
[29] Sellato B 1989 Naga Dan Burung Enggang. Hornbill and Dragon. Kalimantan, Sarawak, Sabah,
Brunei (Jakarta: Elf Aquitaine)
[30] BPS-KalSel 2009 Kalimantan Selatan Dalam Angka. Kalimantan Selatan in Figures 2009
(Banjermasin, Indonesia: Badan Pusat Statistik Provinsi Kalimantan Selatan)
[31] BPS-KalBar 2011 Kalimantan Barat Dalam Angka. Kalimantan Barat in Figures 2011 (Pontianak,
Indonesia: Badan Pusat Statistik Provinsi Kalimantan Barat)
[32] BPS-KalTim 2011 Kalimantan Timur Dalam Angka. Kalimantan Timur in Figures 2011 (Samarinda,
Indonesia: Badan Pusat Statistik Provinsi Kalimantan Timur)