sydney harbour: innovative environmental data science in australia's most iconic waterway
TRANSCRIPT
-
INNOVATIVE ENVIRONMENTAL DATA SCIENCE IN AUSTRALIAS MOST ICONIC WATERWAY
LUKE HEDGE
THE UNIVERSITY OF NEW SOUTH WALES
DRLUKEHEDGE
SYDNEY HARBOUR
-
50 % | WORLDS COASTLINE ALTEREDShutterstock | pokki1
-
POPULATION | 4.4 M image | Rodney Campbell
-
A SYSTEMATIC SCIENTIFIC REVIEW
>20, 000 journal articles searched310 publications found
four universitiestwo government agenciesone national museum
15 scientist authors
image | Rodney Campbell
-
PURE RESE
ARCH
APPLIED RESEARCH200 Publications
110 Publications
Kingsley Griffin
Deposit Photos
image | Rodney Campbell
-
92 Publications 91 Publications 7 Publications 28 Publications
ROCKY REEF SEAFLOOR SEAGRASS ROCKY SHORES
OPEN WATER MANGROVE BEACHES FRESHWATER
92 PAPERS 91 PAPERS 7 PAPERS 28 PAPERS
32 PAPERS 26 PAPERS 4 PAPERS 1 PAPERS
APPLIED RESEARCH BASIC RESEARCH
image | Rodney Campbell
-
0 50 100 150
ECOLOGY
CHEMISTRY
BIOLOGY
MANAGEMENT
OCEANOGRAPHY
GEOLOGY
FISHERIES
image | Rodney Campbell
-
image | Rodney Campbell
-
WHERE ARE THE HABITATS ?WHERE ARE THE IMPACTS ? WHERE DO THEY OVERLAP ?
CREDIBLEINTERPRETABLE
ACTIONABLE
-
image | Vitaly Korovin
-
CREDIBLEINTERPRETABLE
ACTIONABLE
-
Species distribution models
A model that relates environmental predictors to known species locations across a landscapeElith et al (2009) Annu. Rev. Ecol. Evol. Syst. 2009.
To provide understanding or prediction
-
2 D. I. WARTON AND L. C. SHEPHERD
Fig. 1. (a) Example presence-only dataatlas records of where the tree species An-gophora costata has been reported to be present, west of Sydney, Australia. The studyregion is shaded. (b) A map of minimum temperature (C) over the study region. Vari-ables such as this are used to model how intensity of A. costata presence relates to theenvironment. (c) A species distribution model, modeling the association between A. costataand a suite of environmental variables. This is the fitted intensity function for A. costatarecords per km2, modeled as a quadratic function of four environmental variables using apoint process model as in Section 4.
example is given in Figure 1(a). This figure gives all locations where a par-ticular tree species (Angophora costata) has been reported by park rangerssince 1972, within 100 km of the Greater Blue Mountains World HeritageArea, near Sydney, Australia. Note that this does not consist of all loca-tions where an Angophora costata tree is foundrather it is the locationswhere the species has been reported to be found. We would like to use thesepresence points, together with maps of explanatory variables describing theenvironment (often referred to in ecology as environmental variables), topredict the location of A. costata and how it varies as a function of explana-tory variables (Figure 1).
Presence-only data are used extensively in ecology to model species distribu-tionswhile the term presence-only data was rarely used before the 1990s,ISI Web of Science reports that it was used in 343 publications from 2005to 2008. The use of presence-only data in modeling is a relatively recentdevelopment, presumably aided by the movement toward electronic recordkeeping and recent advances in Geographic Information Systems. One rea-son for the current widespread usage of presence-only data is that often thisis the best available information concerning the distribution of a species, asthere is often little or no information on species distribution being availablefrom systematic surveys [Elith and Leathwick (2007)].
Species distribution models, sometimes referred to as habitat models orhabitat classification models [Zarnetske, Edwards and Moisen (2007)], are
perform well in characterizing the natural distributions of species (within their current range)
Occurrence points Enviro predictor Model prediction
Warton and Sheppard (2010) Annals App. Stat.
Elith et al (2009) Annu. Rev. Ecol. Evol. Syst. 2009.
Warton and Aarts (2013) J. Anim. Ecol.
useful ecological insight and strong predictive capability
-
Ecological Applications, 24(1), 2014, pp. 7183 2014 by the Ecological Society of America
Prediction of fishing effort distributionsusing boosted regression trees
CANDAN U. SOYKAN,1,2,3 TOMOHARU EGUCHI,1 SUZANNE KOHIN,2 AND HEIDI DEWAR2
1Marine Mammal and Turtle Division, Southwest Fisheries Science Center, National Marine Fisheries Service,National Oceanic and Atmospheric Administration, 8901 La Jolla Shores Drive, La Jolla, California 92037 USA
2Fisheries Resources Division, Southwest Fisheries Science Center, National Marine Fisheries Service,National Oceanic and Atmospheric Administration, 8901 La Jolla Shores Drive, La Jolla, California 92037 USA
Abstract. Concerns about bycatch of protected species have become a dominant factorshaping fisheries management. However, efforts to mitigate bycatch are often hindered by alack of data on the distributions of fishing effort and protected species. One approach toovercoming this problem has been to overlay the distribution of past fishing effort with knownlocations of protected species, often obtained through satellite telemetry and occurrence data,to identify potential bycatch hotspots. This approach, however, generates static bycatch riskmaps, calling into question their ability to forecast into the future, particularly when dealingwith spatiotemporally dynamic fisheries and highly migratory bycatch species. In this study,we use boosted regression trees to model the spatiotemporal distribution of fishing effort fortwo distinct fisheries in the North Pacific Ocean, the albacore (Thunnus alalunga) troll fisheryand the California drift gillnet fishery that targets swordfish (Xiphias gladius). Our resultssuggest that it is possible to accurately predict fishing effort using ,10 readily availablepredictor variables (cross-validated correlations between model predictions and observed data;0.6). Although the two fisheries are quite different in their gears and fishing areas, theirrespective models had high predictive ability, even when input data sets were restricted to afraction of the full time series. The implications for conservation and management areencouraging: Across a range of target species, fishing methods, and spatial scales, even arelatively short time series of fisheries data may suffice to accurately predict the location offishing effort into the future. In combination with species distribution modeling of bycatchspecies, this approach holds promise as a mitigation tool when observer data are limited. Evenin data-rich regions, modeling fishing effort and bycatch may provide more accurate estimatesof bycatch risk than partial observer coverage for fisheries and bycatch species that are heavilyinfluenced by dynamic oceanographic conditions.
Key words: albacore; bycatch mitigation; dynamic oceanographic conditions; fisheries management;marine spatial planning; species distribution modeling; swordfish.
INTRODUCTION
Fisheries bycatch, the unintentional capture of non-
target species during fishing operations, threatens the
survival of a number of vulnerable marine species
(Lewison et al. 2004, Zydelis et al. 2009) and has
become a key factor in shaping management decisions.
Although bycatch research has increased dramatically
over the past few decades, the limited data available for
most fisheries still hinders management and conserva-
tion efforts (Soykan et al. 2008). Specifically, high
quality data on fisheries bycatch collected by trained
observers are unavailable or limited for most interna-
tional and many U.S. fisheries. The prospects for future
expansion of data collection efforts are likewise modest
or negligible given the costs and logistics associated with
such efforts. Although such obstacles impede direct
assessment of bycatch rates, obtaining estimates of
bycatch rates and predictions of potential interactions is
critical for efforts to reduce bycatch and determine the
impact of fisheries on many marine protected species.
Researchers have thus begun to explore indirect
methods for estimating fisheries bycatch. One approach
involves port-based interviews to gather baseline data on
fishing effort and bycatch in artisanal fleets (Moore et al.
2010). This approach has the advantage of being rapid
and cost effective, but relies on honest, accurate
responses by the fishers. A second approach, which is
increasing in popularity, involves the estimation of
bycatch species distributions and their overlap with
fishing effort to assess threats, identify hotspots, and
guide decision-making (Cuthbert et al. 2005, Golds-
worthy and Page 2007, Hamel et al. 2008, McClellan et
al. 2009, Zhou et al. 2009).
The study by Cuthbert et al. (2005) provides an
illustrative example of the overlap approach. First, the
Manuscript received 21 May 2012; revised 9 April 2013;accepted 17 April 2013; final version received 8 May 2013.Corresponding Editor: S. S. Heppell.
3 Present address: National Audubon Society 220 Montgom-ery Street, Suite 1000, San Francisco, California 94104 USA.E-mail: [email protected]
71
data due to cloud cover or gaps in satellite coverage.
Missing data could skew a comparison of temporally
static vs. dynamic predictor variables. We assessed the
effects of missing data by examining the relationship
between VI scores and missing data. For each of the
satellite-derived oceanographic variables we correlated
its VI score (based on yearly BRT models built for this
analysis) with the percentage of non-zero fishing effort
records from that year that had a value for the variable.
We used non-zero fishing effort records because records
with zero fishing effort contributed less to model
development with these data sets, which were dominated
by zero fishing effort records (good BRT models can be
developed with presence-only information [Elith et al.
2008]).
RESULTS
The DGN fishing effort data set spanned 11.668 oflatitude and 78 of longitude, comprising 547 100 3 100
grid cells (Fig. 1A). Fishing effort in each of these cells
was recorded monthly for 20 years, resulting in a total of
131 280 records, of which 12 577 (;9.6%) had non-zerofishing effort. The AT fishing effort data set spanned 228of latitude and 448 of longitude, comprising 911 18 3 18grid cells (Fig. 1B). Fishing effort in each of these cells
was recorded monthly for 20 years, resulting in a total of
218 640 records, of which 12 346 (;5.6%) had non-zerofishing effort.
For the DGN fishery, SSHV was involved in four of
the 10 strongest interactions between predictor vari-
ables, while latitude, month, and year each were
involved in three (Appendix: Table A1). For the AT
fishery, DistCoast was involved in four of the 10
strongest interactions between predictor variables, while
depth, latitude, and SST each were involved in three
(Table A2). For the majority of pairs of the predictor
variables, collinearity among them was low for both
data sets. For the DGN fishery, four pairs had
correlation coefficients with absolute values .0.5, wherethe largest correlation was found between latitude and
longitude (0.93). For the AT fishery, seven pairs hadcorrelation coefficients with absolute values .0.5, wherethe largest correlations were found between EKE and
UGEO (0.78) and between latitude and SST (0.78;Appendix: Tables A3 and A4).
BRT model performance
Using stringent across-year cross-validation, the
boosted regression tree models effectively predicted
fishing effort for the DGN and AT fisheries (Table 2).
The DGN model explained 58.7% of the deviance in thedata, had a mean correlation between predicted and
observed data of 0.589, and had low false positive and
false negative error rates (11.4% and 1.7% respectively).The AT model explained 65.6% of the deviance in thedata, had a mean correlation between predicted and
observed data of 0.579, had a false positive error rate of
8.0%, and a false negative error rate of 2.1%.An examination of the relationships between envi-
ronmental variables and fishing effort showed a range of
patterns for the DGN model (Fig. 2). The model showed
a peak in effort early in the time series, followed by an
initial rapid decline that was then followed by a more
gradual decline in effort over the years. Seasonally,
FIG. 1. Maps of cumulative fishing effort: (A) West Coast drift gillnet (DGN; measured as number of gear sets) and (B) NorthPacific albacore troll (AT; measured as number of days fished) fisheries. Individual grid cells are 1003100 for the drift gillnet fisheryand 18318 for the albacore troll fishery. The drift gillnet fishery data cover the period 19812001, and the albacore troll fishery datacover the period 19912010. Grid cells with fewer than three total sets or days fished have been censored for confidentiality.
January 2014 75PREDICTING FISHING EFFORT DISTRIBUTIONS
-
City of SydneyRose Bay
Lane Cove River
Manly
Sydney Institute of Marine Science
-
Sydney Harbour573 surveys6 months12 000 events15 personnel
-
predictors
0
1
2
3
4
prediction
occurrences
4 D. I. WARTON AND L. C. SHEPHERD
converge to the point process slope estimates (Section 3). These two keyresults have important ramifications for species distribution modeling inecology (Section 5), in particular, we provide a solution to the problem ofhow to select pseudo-absences. We illustrate our results for the A. costatadata of Figure 1(a) (Section 4).
2. Poisson point process models for presence-only data. Presence-onlydata are a set y= {y1, . . . , yn} of point locations in a two-dimensional regionA, where the locations where presences are recorded (the yi) are out of thecontrol of the researcher, as is the total number of presence points n. We alsoobserve a map of values over the entire region A for each of k explanatoryvariables, and we denote the values of these variables at yi as (xi1, . . . , xik).
We propose analyzing y= {y1, . . . , yn} as a point process, hence, we jointlymodel number of presence points n and their location (yi). This has notpreviously been proposed for the analysis of presence-only data, despitethe extensive literature on the analysis of presence-only data. We considerinhomogeneous Poisson point process models [Cressie (1993); Diggle (2003)],which make the following two assumptions:
1. The locations of the n point events (y1, . . . , yn) are independent.2. The intensity at point yi [(yi), denoted as i for convenience], the lim-
iting expected number of presences per unit area [Cressie (1993)], canbe modeled as a function of the k explanatory variables. We assume alog-linear specification:
log(i) = 0 +k
j=1
xijj ,(2.1)
although note that the linearity assumption can be relaxed in the usualway (e.g., using quadratic terms or splines). The parameters of the modelfor the i are stored in the vector = (0, 1, . . . , k).
Note that the process being modeled here is locations where an organism hasbeen reported rather than locations where individuals of the organism occur.Hence, the independence assumption would only be violated by interactionsbetween records of sightings rather than by interactions between individ-ual organisms per se. The atlas data of Figure 1 consist of 721 A. costatarecords accumulated over a period of 35 years in a region of 86,000 km2, soindependence of records seems a reasonable assumption in this case, giventhe rarity of event reporting. Nevertheless, the methods we review here canbe generalized to handle dependence between point events [Baddeley andTurner (2005)].
model
explanation | correlation
MaxEntBoostingGLM | GAMRandom Forrest
-
51015202530354045505560657075
Fishers (km2)
Mean weather conditions
-
Boat based fishery
Weekd
ay
Weeke
nd
Weekend
Weekday
Recreati
onal Int
ensity
*link *link
Morning
Midday
Afternoon
-
0.000000 0.000005 0.000010 0.000015
Recreational Activities (m2)
Mean weather conditions
-
0.000000 0.000005 0.000010 0.000015 0.000020
Recreational Activities non boat related (m2)
Mean weather conditions
-
CREDIBLEINTERPRETABLE
ACTIONABLE
-
Jessica Merrett
-
Jessica Merrett
-
Jessica Merrett
-
CREDIBLEINTERPRETABLE
ACTIONABLE
-
Al As Cr
Cu Fe Mn
Pb Sn
1000
2000
3000
2
4
6
8
4
8
12
0
10
20
30
40
50
2000
4000
6000
10
20
30
40
50
10
20
30
40
50
1
2
3
4
20
40
60
80
Ba Cl NH QB RB Ba Cl NH QB RB Ba Cl NH QB RB
Ba Cl NH QB RB Ba Cl NH QB RB Ba Cl NH QB RB
Ba Cl NH QB RB Ba Cl NH QB RB Ba Cl NH QB RBLocation
Met
al c
once
ntra
tion
PDL
PDL
Ref
Sn
As
Zn
Luke Hedge
-
Al As Cr
Cu Fe Mn
Pb Sn
1000
2000
3000
2
4
6
8
4
8
12
0
10
20
30
40
50
2000
4000
6000
10
20
30
40
50
10
20
30
40
50
1
2
3
4
20
40
60
80
Ba Cl NH QB RB Ba Cl NH QB RB Ba Cl NH QB RB
Ba Cl NH QB RB Ba Cl NH QB RB Ba Cl NH QB RB
Ba Cl NH QB RB Ba Cl NH QB RB Ba Cl NH QB RBLocation
Met
al c
once
ntra
tion
PDL
PDL
Ref
Sn
As
Zn
Luke Hedge
-
1.0
1.5
2.0
2.5
ClontarfNorth Harbour
Quarantine BayRose Bay
Shan
non
Dive
rsity
(H)
Boating Infrastructure Reference
Luke Hedge
-
Crustacea Mollusca
Other
0
50
100
150
200
0
100
200
0
25
50
75
100
0
100
200
300
Clon
tarf
North
Harb
our
Quar
antin
eBay
Rose
Bay
Clon
tarf
North
Harb
our
Quar
antin
eBay
Rose
Bay
Abun
danc
e
Boating Infrastructure
Reference
CRUSTACEA MOLLUSCA
OTHER POLYCHAETE
Luke Hedge
-
Clontarf NorthHarbour Balmoral
01020
Metal concentration (mm/kg)
Cu
Clontarf NorthHarbour Balmoral
1020304050
Metal concentration (mm/kg)
Zn
Clontarf NorthHarbour Balmoral
10203040
Metal concentration (mm/kg)
Pb
Luke Hedge
-
140
150
160
170
180
190Grainsize (m)
Balmoral
Luke Hedge
-
2.40 2.45 2.50 2.55 2.60
mm/kg
CuQuarantine Bay
7.1 7.2 7.3 7.4 7.5 7.6
mm/kg
CuRose Bay
Luke Hedge
-
Al As Cr Cu Fe Mn Pb Sn Zn
0.0
0.5
1.0
1.5
2.0
2.5
0
1
2
3
Moorings
Control
0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60Distance
sem
ivaria
nce
Spatial Autocorrelation (Variograms)
Luke Hedge
-
80
120
160
200
0 10 20 30
Nearest Mooring (m)G
rain
size
(D10
)
a. Sample data (mm/kg)
10
15
20
25
30
2 14 24
b. Metal concentration (mm/kg)
Cu
80
120
160
200
0 10 20 30
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
30
40
50
10 23 35
b. Metal concentration (mm/kg)
Pb
80
120
160
200
0 10 20 30
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
40
60
80
7 28 47
b. Metal concentration (mm/kg)
ZnClontarf
120
140
160
180
5 10 15
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
4
6
8
10
9 13 16
b. Metal concentration (mm/kg)
Cu
120
140
160
180
5 10 15
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
6
8
10
12
14
17 22 26
b. Metal concentration (mm/kg)
Pb
120
140
160
180
5 10 15
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
8
10
12
14
16
17 26 32
b. Metal concentration (mm/kg)
ZnBalmoral
50
100
150
200
4 8 12 16
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
10
20
30
40
5 18 28
b. Metal concentration (mm/kg)
Cu
50
100
150
200
4 8 12 16
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
30
40
50
13 27 39
b. Metal concentration (mm/kg)
Pb
50
100
150
200
4 8 12 16
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
30
40
50
60
11 32 51
b. Metal concentration (mm/kg)
ZnNorth Harbour
80
120
160
200
0 10 20 30
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
10
15
20
25
30
2 14 24
b. Metal concentration (mm/kg)
Cu
80
120
160
200
0 10 20 30
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
30
40
50
10 23 35
b. Metal concentration (mm/kg)
Pb
80
120
160
200
0 10 20 30
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
40
60
80
7 28 47
b. Metal concentration (mm/kg)
ZnClontarf
120
140
160
180
5 10 15
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
4
6
8
10
9 13 16
b. Metal concentration (mm/kg)
Cu
120
140
160
180
5 10 15
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
6
8
10
12
14
17 22 26
b. Metal concentration (mm/kg)
Pb
120
140
160
180
5 10 15
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
8
10
12
14
16
17 26 32
b. Metal concentration (mm/kg)
ZnBalmoral
50
100
150
200
4 8 12 16
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
10
20
30
40
5 18 28
b. Metal concentration (mm/kg)
Cu
50
100
150
200
4 8 12 16
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
30
40
50
13 27 39
b. Metal concentration (mm/kg)
Pb
50
100
150
200
4 8 12 16
Nearest Mooring (m)
Gra
insi
ze (D
10)
a. Sample data (mm/kg)
20
30
40
50
60
11 32 51
b. Metal concentration (mm/kg)
ZnNorth Harbour
Luke Hedge
-
Scalibregmidae sp. Orbiniidae sp. Ostracod sp. 1 Ostracod sp. 3
Maldanidae sp. Ostracod sp. Nematode sp. Bivalve sp. 1
Syllidae sp. Amphipoda sp. 1 Nereididae sp. Nemertean sp.
0
2
4
6
8
0
2
4
6
0
1
2
3
4
0
1
2
3
4
5
0
10
20
30
0
5
10
15
0
2
4
6
0
10
20
30
0
20
40
0
3
6
9
0
3
6
9
0
2
4
6
8
0 10 20 0 10 20 0 10 20 0 10 20
0 10 20 0 10 20 0 10 20 0 10 20
0 10 20 0 10 20 0 10 20 0 10 20Distance from Moorings (m)
Abun
danc
e
Location
Clontarf
North Harbour
Luke Hedge
-
georgianus tricuspidata australis trachylepis testacea auratus ciliata subfasciatus
0
5
10
15
20
25
0
50
100
0
3
6
9
0
5
10
15
20
Clontarf
Hunters Bay
Manly
North H
arbour
0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30
Distance from Moorings (m)
Abun
danc
e Sampling Period1
2
Brendan Lanham
-
INTERPRETABLEACTIONABLE
CREDIBLE
Deposit Photos | 263Ben
-
DRLUKEHEDGE
Deposit Photos | 263Ben