10980_2017_520_moesm1_esm.docx - springer …10.1007... · web viewa map of the probability of...
TRANSCRIPT
Appendix 1.
Brunei Model Description
The ten most important variables in the calibrated random forest Brunei model based on Model
Improvement Ratio were Aggregation Index at 10km radius, Edge Density at 10km radius,
Proportion of peatswamp forest at 40km radius, Proportion of water at 50km radius, Proportion
of plantation or regrowth at 1km radius, Proportion of lowland forest at 40km radius, Shannon
Diversity at 10km radius, Topographical Roughness at 40km radius, Elevation and Proportion of
lowland mosaic at 40km radius (Figure A1).
Of the ten most influential variables in the prediction of forest loss in Brunei between 2000 and
2010, as judged by model improvement ratio, five had positive monotonic relationships with
increasing probability of deforestation with increasing probability of forest loss with increasing
value of the variable (Edge Density at 10km radius, Proportion of peatswamp forest at 30km
radius, Proportion of water at 50km radius, Proportion of plantation or regrowth at 1km radius,
Shannon Diversity Index at 10km radius; Figure A2). Two variables, including the most
influential variable (Aggregation Index at 10km radius), had negative monotonic relationships
with deforestation risk, such that deforestation risk decreases as the value of these variables
increase. Two variables had unimodal relationships such that the frequency of deforestation
was maximum at intermediate values (Proportion of lower montane forest at 40km radius and
Distance to Population Centre).
We produced visualization of the pattern of predicted forest loss probability across Brunei
(Figure A3), with zoomed-in view of two areas showing the pattern of observed loss in relation
to the predicted probability of loss (Figure A4). These maps show a very high association
between the predicted probability of loss and the actual pattern of forest loss and persistence
in Brunei.
The ten most important variables in the calibrated logistic regression Brunei model, based on
variable p-value, were Proportion of Plantation or regrowth at 1km radius, Proportion of water
at 50km radius, Proportion of peatswamp forest at 30km radius, Proportion of lowland forest at
40km radius, Proportion of upper montane forest at 50km radius, Edge Density at 10km radius,
Proportion of lowland mosaic at 40km radius, Focal Mean Population Density at 100km radius,
Topographical Roughness at 40km radius, and Aggregation Index at 10km radius (Table A1).
A map of the probability of forest loss across Brunei predicted by random forest and logistic
regression is shown in Figure A5. The two models differ in that the random forest model has
much sharper transition spatially from areas with high predicted to low predicted forest loss
probability and a more complex, fine-scale pattern of predicted probability than the smoother
prediction produced by logistic regression.
Malaysian Borneo Model Description
The ten most important variables, based on Model Improvement Ratio, in the calibrated
Malaysian Borneo random forest model were Proportion of lowland mosaic at 20km radius,
Proportion of plantation or regrowth at 30km radius, Edge Density at 20km radius, Elevation,
Proportion of montane mosaic at 50km radius, Patch Density at 30km radius, Proportion of
lower montane forest at 50km radius, Shannon Diversity Index at 40km radius, Focal Mean
Population Density at 100km radius and Topographical Roughness at 40km radius (Figure A6).
Of the ten most influential variables in the calibrated random forest model predicting
deforested vs. not-deforested cells in Malaysian Borneo between 2000 and 2010, six had
positive monotonic relationships wherein frequency of forest loss increased as the value of the
variable increased (Proportion of lowland mosaic at 20km radius, Proportion of plantation or
regrowth at 30km radius, Edge Density at 20km radius, Patch Density at 30km radius, Shannon
Diversity at 40km radius, and Focal Mean Population Density at 100km radius). All of these
showed strongly non-linear relationships rapid initial rise followed by asymptotic flattening
(Figure A7). Four of the ten most influential variables in the calibrated Malaysian Borneo
random forest model showed monotonic negative relationships where frequency of forest loss
between 2000 and 2010 declined as the value of the variable increased. These mostly showed
what appear to be negative exponential shapes, with rapid decline at low values of x followed
by flattening as the x variable increased (Elevation, Proportion of montane mosaic at 50km
radius, Proportion of upper montane forest at 50km radius, and Topographical Roughness at a
40km radius; Figure A7).
We produced visualization of the pattern of predicted forest loss probability across Malaysian
Borneo (Figure A8), with zoomed-in view of two areas showing the pattern of observed loss in
relation to the predicted probability of loss (Figure A9). These maps show a very high
association between the predicted probability of loss and the actual pattern of forest loss and
persistence in Brunei.
The ten most important variables in the calibrated logistic regression Malaysian Borneo model,
based on variable p-value, were Edge Density at a 20km focal radius, Elevation, proportion of
lowland open at a 50km focal radius, proportion of water at a 40km focal radius, Distance to
Population Center, Proportion of montane mosaic at a 50km focal radius, Proportion of lower
montane forest at a 50km focal radius, Proportion of lowland forest at a 40km focal radius,
Topographical Roughness at a 40km focal radius, and Proportion of plantation or regrowth at a
30km focal radius (Table A2).
A map of the probability of forest loss across Malaysian Borneo predicted by random forest
and logistic regression is shown in Figure A10. The two models differ in that the random forest
model has much sharper transition spatially from areas with high predicted to low predicted
forest loss probability and a more complex, fine-scale pattern of predicted probability than the
smoother prediction produced by logistic regression (Figure A10).
Kalimantan Model Description
The ten most important variables, based on Model Improvement Ratio, for Kalimantan were
Elevation, Patch Density at 40km radius, Proportion of lowland mosaic at 50km radius,
Proportion of lower montane forest at 20km radius, Proportion of plantation or regrowth at
50km radius, Edge Density at 10km radius, Focal Mean Population Density at 100km radius,
Topographical Roughness at 50km radius, Proportion of lowland open at 50km radius and
Shannon’s Diversity Index at 40km radius (Figure A11).
Of the ten most important variables in the calibrated random forest model predicting
deforested vs. not-deforested cells between 2000 and 2010, seven had positive monotonic
relationships (Patch Density at 40km radius, Proportion of lowland mosaic at 50km radius,
Proportion of plantation or regrowth at 50km radius, Edge Density at 10km radius, Focal Mean
Population Density at 100km radius, Proportion of lowland open at 50km radius, and Shannon
Diversity at 40km radius; Figure A12). As in the Malaysian Borneo model, many of these were
strongly nonlinear relationships. The remaining three top variables all had negative monotonic
relationships (Elevation, Proportion of lower montane forest at 20km, Topographical Roughness
50km radius). As in the Malaysian Borneo model, these were strongly nonlinear, with most
showing a negative exponential shape (Figure A12).
We produced visualization of the pattern of predicted forest loss probability across Kalimantan
(Figure A13), with zoomed-in view of two areas showing the pattern of observed loss in relation
to the predicted probability of loss (Figure A14). These maps show a very high association
between the predicted probability of loss and the actual pattern of forest loss and persistence
in Brunei.
The ten most important variables in the calibrated logistic regression Kalimantan model, based
on variable p-value, were Edge Density at a 10km radius, Elevation, Patch Density at a 40km
radius, Proportion of peatswamp forest at a 40km radius, SHDI at a 40km radius, Slope Position
at a 40km radius, Proportion of upper montane forest at a 40km radius, Proportion of lowland
forest at a 1km radius, Topographical Roughness at a 50km radius, and Proportion of plantation
or regrowth at a 50km radius (Table A3).
A map of the probability of forest loss across Kalimantan predicted by random forest and
logistic regression is shown in Figure A15. The two models differ in that the random forest
model has much sharper transition spatially from areas with high predicted to low predicted
forest loss probability and a more complex, fine-scale pattern of predicted probability than the
smoother prediction produced by logistic regression.
Model Comparisons
Random Forest vs. Logistic Regression
We produced maps of several of the inset areas from figures A8 and A13 for Malaysian Borneo
and Kalimantan to visually display the differences in patterns of predicted forest loss risk
relative to actual forest loss between 2000 and 2010 for random forest and logistic regression
models. In general, the figures show that the random forest models seem to have higher spatial
prediction of areas where forest loss occurs and does not occur than the logistic regression
maps, which tend to make smoother and less precise predictions (Figures A16-A19).
Random Forest with Landscape Metrics vs. Random Forest without Landscape Metrics
We produced maps of several of the inset areas from figures A8 and A13 for Malaysian Borneo
and Kalimantan visually displaying differences in predicted probability between random forest
models including landscape metrics and random forest models excluding landscape metrics
(Figures A20-A23). In general, these figures show that including landscape metrics improved
prediction of forest loss risk in a number of locations across the study area, with a higher
congruence of observed patterns of forest loss and predicted risk of forest loss.
Table A1. Coefficients and p-values for the logistic regression predicting forest loss with the same predictor variables and same training data set as the calibrated Brunei random forest model.
VariableCoefficient
Standard
Error z value P-Value
PLAND_PLANTATION/REGROWTH_1k 5.58E-02 6.34E-03 8.805 < 2e-16
PLAND_WATER_50k 8.84E-02 2.60E-02 3.402 0.000669
PLAND_PEATSWAMP_FOREST_30k -4.92E-01 1.58E-01 -3.122 0.001797
PLAND_LOWLAND_FOREST_40k -1.77E-01 6.44E-02 -2.739 0.006155
PLAND_UPPERMONTANE_FOREST_50k 3.90E-01 1.96E-01 1.991 0.046525
EDGE_DENSITY_10k 5.78E-01 3.47E-01 1.668 0.095411
PLAND_LOWLAND_MOSAIC_40k -1.70E-01 1.17E-01 -1.458 0.144886
(Intercept) -1.52E+01 1.36E+01 -1.119 0.263077
FOCAL_MEAN_POP_DENSITY_100k -2.71E-02 2.51E-02 -1.08 0.280033
ROUGHNESS_40k -1.10E-01 1.16E-01 -0.948 0.343267
AGGREGATION_INDEX_10k 1.32E-01 1.40E-01 0.943 0.345471
DISTANCE _POP_CENTRE 1.81E-05 2.00E-05 0.902 0.366956
PLAND_LOWERMONTANE_FOREST_50k -6.84E-01 8.18E-01 -0.837 0.402671
SHANNON_10k -9.09E-01 1.14E+00 -0.797 0.425579
ELEVATION -1.53E-03 2.65E-03 -0.575 0.565207
Table A2. Coefficients and p-values for the logistic regression predicting forest loss with the
same predictor variables and same training data set as the calibrated Malaysian Borneo random
forest model.
VariableCoefficient
Standard
Error z value P-Value
(Intercept) -3.38E+00 4.01E-01 -8.42 < 2e-16
EDGE_DENSITY_20k 2.48E-01 1.39E-02 17.807 < 2e-16
ELEVATION -2.18E-03 1.08E-04 -20.092 < 2e-16
PALND_LOWLAND_OPEN_50k -3.41E-01 4.14E-02 -8.249 < 2e-16
PLAND_WATER_40k 3.40E-02 3.78E-03 8.998 < 2e-16
DISTANCE _POP_CENTRE -2.75E-06 3.41E-07 -8.053 8.10E-16
PLAND_MONTANE_MOSAIC_50k 4.22E-01 6.07E-02 6.954 3.55E-12
PLAND_LOWERMONTANE_FOREST_50k 2.43E-02 4.25E-03 5.702 1.19E-08
PLAND_LOWLAND_FOREST_40k 1.89E-02 3.58E-03 5.269 1.37E-07
ROUGHNESS_40K -3.74E-02 7.20E-03 -5.194 2.06E-07
PLAND_PLANTATION/REGROWTH_30k 1.76E-02 3.89E-03 4.521 6.15E-06
PLAND_PEATSWAMP_FOREST_40k 1.76E-02 4.33E-03 4.067 4.76E-05
PLAND_LOWLAND_MOSAIC_20k 1.55E-02 4.98E-03 3.1 0.00194
SHANNON_40k -2.52E-01 1.64E-01 -1.538 0.12401
FOCAL_MEAN_POP_DENSITY_100k 3.47E-03 2.32E-03 1.492 0.1356
PLAND_MONTANE_OPEN_50k 8.69E-03 1.16E-02 0.749 0.45367
PATCH_DENSITY_30k 2.29E-01 7.90E-01 0.289 0.77236
Table A3. Coefficients and p-values for the logistic regression predicting forest loss with the
same predictor variables and same training data set as the calibrated Kalimantan random forest
model.
VariableCoefficient
Standard
Error z value P-Value
(Intercept) -2.13E+00 2.07E-01 -10.251 < 2e-16
EDGE_DENSITY_10k 1.97E-01 7.85E-03 25.126 < 2e-16
ELEVATION -6.00E-03 3.89E-04 -15.44 < 2e-16
PATCH_DENSITY_40k -7.48E+00 8.24E-01 -9.078 < 2e-16
PLAND_PEATSWAMP_FOREST_40k -3.81E-02 2.64E-03 -14.426 < 2e-16
SHANNON_40k 1.04E+00 1.20E-01 8.724 < 2e-16
SLOPE_40 3.33E-03 4.66E-04 7.156 8.28E-13
PLAND_UPPERMONTANE_FOREST_40k 3.36E-01 4.79E-02 7.02 2.22E-12
PLAND_LOWLAND_FOREST_1k -5.49E-03 7.95E-04 -6.914 4.73E-12
ROUGHNESS_50 6.45E-02 1.10E-02 5.857 4.72E-09
PLAND_PLANTATION/REGROWTH_50k 1.27E-02 2.23E-03 5.72 1.07E-08
PLAND_LOWERMONTANE_FOREST_20k -2.70E-02 6.40E-03 -4.222 2.42E-05
DISTANCE _POP_CENTRE -8.92E-07 3.42E-07 -2.609 0.00909
PLAND_LOWLAND_MOSAIC_50k 1.23E-02 5.06E-03 2.436 0.01484
PLAND_WATER_30k -7.75E-03 3.26E-03 -2.381 0.01727
PLAND_LOWLAND_OPEN_50k 1.39E-02 6.59E-03 2.107 0.03511
FOCAL_MEAN_POP_DENSITY_100k -6.34E-05 1.82E-03 -0.035 0.97212
Aggregation Index 1
0km
Edge Density 1
0km
Peatswamp Forest
40km
Water 50km
Plantation/Regrowth 1km
Lowland Forest 40km
Shannon Diversi
ty 10km
Topographical R
oughness 40km
Elevation
Lowland Mosa
ic 40km
Distance
Population
Lower Montane Forest
50km
Population Density 1
0km
Upper Montane Forest
50km0
0.2
0.4
0.6
0.8
1
1.2
Figure A1. Model improvement ratio of variable importance for retained variables for the
calibrated random forest Brunei model.
Figure A2. LOWESS splines showing the response curve for forest loss (1.0 on y-axis) and forest
persisting (0.0 on y-axis) for the ten variables with the highest model improvement ratio for
Brunei.
Figure A3. Map of calibrated predicted probability of forest loss between 2000 and 2010 for
Brunei. Black cells are those that were deforested before 2000. The predicted probability of
forest loss increases as a color-ramp linearly from 0 in dark blue to 0.881 in dark red. The two
A
B
inset white boxes are areas where the pattern of actual deforested and not-deforested points
are displayed overlain on the probability map in the next figure.
Figure A4. Display of the location of pixels observed to have been deforested, as greed dots,
between 2000 and 2010 in Brunei in the two insets A and B shown in the previous figure,
overlain on the predicted probability of forest loss surface. Black pixels are areas that were
deforested before 2000. The color ramp indicates the predicted probability of forest loss
between 2000 and 2010, ranging from 0 in dark blue to 0.8811 in dark red.
A B
Figure A5. Comparison of predicted probability of forest loss between 2000 and 2010 for Brunei
produced by (A) the calibrated random forest model, and (B) logistic regression with the same
input variables and training data set.
A B
Lowland M
osaic
20km
Plantation/Regrowth 30km
Edge Density 2
0km
Elevation
Montane Mosa
ic 50km
Patch Densit
y 30km
Lower M
ontane Forest 50km
Shannon Diversi
ty 40km
Population Density 1
0km
Topographical R
oughness 40km
Lowland Forest
40km
Lowland O
pen 50km
Distance
Population
Montane Open 50km
Water 4
0km
Peatswamp Forest
40km0
0.2
0.4
0.6
0.8
1
1.2
Figure A6. Model improvement ratio of variable importance for retained variables for the
calibrated random forest Malaysian Borneo model.
Figure A7. LOWESS splines showing the response curve for forest loss (1.0 on y-axis) and forest
persisting (0.0 on y-axis) for the ten variables with the highest model improvement ratio for
Malaysian Borneo.
Figure A8. Map of calibrated predicted probability of forest loss between 2000 and 2010 for
Malaysian Borneo. Black cells are those that were deforested before 2000. The predicted
probability of forest loss increases as a color-ramp linearly from 0 in dark blue to 1.0 in dark
red. The four inset white boxes are areas where the pattern of actual deforested and not-
deforested points are displayed overlain on the probability map in the next figure.
A
B
C
D
Figure A9. Display of the location of pixels observed to have been deforested, as greed dots,
between 2000 and 2010 in Malyasian Borneo in the four insets A, B, C and D shown in the
previous figure, overlain on the predicted probability of forest loss surface. Black pixels are
areas that were deforested before 2000. The color ramp indicates the predicted probability of
forest loss between 2000 and 2010, ranging from 0 in dark blue to 1.0 in dark red.
A B
C D
Figure A10. Comparison of predicted probability of forest loss between 2000 and 2010 for
Malaysian Borneo produced by (A) the calibrated random forest model, and (B) logistic
regression with the same input variables and training data set.
AB
Elevation
Patch Densit
y 40km
Lowland Mosa
ic 50km
Lower Montane Forest
20km
Plantation/Regrowth 50km
Edge Density 1
0km
Population Density 1
0km
Topographical R
oughness 50km
Lowland Open 50km
Shannon Diversi
ty 40km
Lowland Forest 1km
Water 30km
Peatswamp Forest
40km
Distance
Population
Upper Montane Forest
40km
Slope Position 40km
0
0.2
0.4
0.6
0.8
1
1.2
Figure A11. Model improvement ratio of variable importance for retained variables for the
calibrated random forest Kalimantan model.
Figure A12. LOWESS splines showing the response curve for forest loss (1.0 on y-axis) and forest
persisting (0.0 on y-axis) for the ten variables with the highest model improvement ratio for
Kalimantan.
Figure A13. Map of calibrated predicted probability of forest loss between 2000 and 2010 for
Kalimantan. Black cells are those that were deforested before 2000. The predicted probability
of forest loss increases as a color-ramp linearly from 0 in dark blue to 1.0 in dark red. The four
A
B
C
D
inset white boxes are areas where the pattern of actual deforested and not-deforested points
are displayed overlain on the probability map in the next figure.
Figure A14. Display of the location of pixels observed to have been deforested, as greed dots,
between 2000 and 2010 in Kalimantan in the four insets A, B, C and D shown in the previous
figure, overlain on the predicted probability of forest loss surface. Black pixels are areas that
A B
CD
were deforested before 2000. The color ramp indicates the predicted probability of forest loss
between 2000 and 2010, ranging from 0 in dark blue to 1.0 in dark red.
AB
Figure A15. Comparison of predicted probability of forest loss in the 2000-2010 time period for
Kalimantan produced by (A) calibrated random forest and (B) logistic regression modeling
produced using the same input variables and training data set.
Figure A16. Maps of the difference in predicted probability between for inset A in figure x of
Malaysian Borneo for calibrated random forest models including landscape metrics (A), and
logistic regression models including landscape metrics (B). The predicted probability of forest
loss between 2000 and 2010 is shown as a color ramp from 0 in dark blue to 1 in dark red. Cells
in black were deforested prior to the year 2000. Green dots indicate pixels that were
deforested between 2000 and 2010. The white ovals are areas where the two analyses differ
substantially in predicted forest loss probability, and where actual forest loss occurred, with the
logistic regression model predicting high forest loss risk in places where no forest loss actually
occurred, while random forest predicted lower probability in these areas.
A B
Figure A17. Maps of the difference in predicted probability between for inset B in figure x of
Malaysian Borneo for calibrated random forest models including landscape metrics (A), and
logistic regression models including landscape metrics (B). The predicted probability of forest
loss between 2000 and 2010 is shown as a color ramp from 0 in dark blue to 1 in dark red. Cells
in black were deforested prior to the year 2000. Green dots indicate pixels that were
deforested between 2000 and 2010. The white ovals are areas where the two analyses differ
substantially in predicted forest loss probability, and where actual forest loss occurred, with the
logistic regression model predicting high forest loss risk in places where no forest loss actually
occurred, while random forest predicted lower probability in these areas.
A B
Figure A18. Maps of the difference in predicted probability between for inset A in figure x of
Kalimantan for calibrated random forest models including landscape metrics (A), and logistic
regression models including landscape metrics (B). The predicted probability of forest loss
between 2000 and 2010 is shown as a color ramp from 0 in dark blue to 1 in dark red. Cells in
black were deforested prior to the year 2000. Green dots indicate pixels that were deforested
between 2000 and 2010. The white ovals are areas where the two analyses differ substantially
in predicted forest loss probability, and where actual forest loss occurred, with the logistic
regression model predicting high forest loss risk in places where no forest loss actually
occurred, while random forest predicted lower probability in these areas.
A B
Figure A19. Maps of the difference in predicted probability between for inset B in figure x of
Kalimantan for calibrated random forest models including landscape metrics (A), and logistic
regression models including landscape metrics (B). The predicted probability of forest loss
between 2000 and 2010 is shown as a color ramp from 0 in dark blue to 1 in dark red. Cells in
black were deforested prior to the year 2000. Green dots indicate pixels that were deforested
between 2000 and 2010. The white ovals are areas where the two analyses differ substantially
in predicted forest loss probability, and where actual forest loss occurred, with the logistic
regression model predicting high forest loss risk in places where no forest loss actually
occurred, while random forest predicted lower probability in these areas.
A B
Figure A20. Maps of the difference in predicted probability between for inset A in figure x of
Malaysian Borneo for calibrated random forest models (A) including landscape metrics, (B)
excluding landscape metrics. The predicted probability of forest loss between 2000 and 2010 is
shown as a color ramp from 0 in dark blue to 1 in dark red. Cells in black were deforested prior
to the year 2000. Green dots indicate pixels that were deforested between 2000 and 2010. The
white ovals are areas where the two analyses differ substantially in predicted forest loss
probability, and where actual forest loss occurred, with the model including landscape metrics
(A) more accurately reflecting the higher forest loss risk in these areas than the model that
excluded landscape metrics (B).
A B
Figure A21. Maps of the difference in predicted probability between for inset B in figure x of
Malaysian Borneo for calibrated random forest models (A) including landscape metrics, (B)
excluding landscape metrics. The predicted probability of forest loss between 2000 and 2010 is
shown as a color ramp from 0 in dark blue to 1 in dark red. Cells in black were deforested prior
to the year 2000. Green dots indicate pixels that were deforested between 2000 and 2010. The
white ovals are areas where the two analyses differ substantially in predicted forest loss
probability, and where actual forest loss occurred, with the model including landscape metrics
(A) more accurately reflecting the higher forest loss risk in these areas than the model that
excluded landscape metrics (B).
A B
Figure A22. Maps of the difference in predicted probability between for inset A in figure x of
Kalimantan for calibrated random forest models (A) including landscape metrics, (B) excluding
landscape metrics. The predicted probability of forest loss between 2000 and 2010 is shown as
a color ramp from 0 in dark blue to 1 in dark red. Cells in black were deforested prior to the
year 2000. Green dots indicate pixels that were deforested between 2000 and 2010. The white
ovals are areas where the two analyses differ substantially in predicted forest loss probability,
and where actual forest loss occurred, with the model including landscape metrics (A) more
accurately reflecting the higher forest loss risk in these areas than the model that excluded
landscape metrics (B).
A B
Figure A23. Maps of the difference in predicted probability between for inset B in figure x of
Kalimantan for calibrated random forest models (A) including landscape metrics, (B) excluding
landscape metrics. The predicted probability of forest loss between 2000 and 2010 is shown as
a color ramp from 0 in dark blue to 1 in dark red. Cells in black were deforested prior to the
year 2000. Green dots indicate pixels that were deforested between 2000 and 2010. The white
ovals are areas where the two analyses differ substantially in predicted forest loss probability,
and where actual forest loss occurred, with the model including landscape metrics (A) more
accurately reflecting the higher forest loss risk in these areas than the model that excluded
landscape metrics (B).
A B
AB
CD
BA
C D
Figure A24. Landsat RGB imagery of four inset landscapes, two in Malaysian Borneo (A,B), and
two in Kalimantan (C,D), showing extensive network of logging roads in the Malaysian
landscapes and lack of forest roads penetrating the unlogged forest in Kalimantan.