Model Selection for Selectivity in Fisheries Stock Assessments
André Punt, Felipe Hurtado-Ferro, Athol Whitten
13 March 2013; CAPAM Selectivity workshop
2
Overview
• What is the problem we want to solve?• Can selectivity be estimated anyway?• Fleets and how we choose them• Example assessments• Alternative methods:
– fit diagnostics– model selection and model weighting
• What do simulation studies tell us?• Final thoughts
3
Definitions of Selectivity
Selectivity : Is the relative probability of being captured by a
fleet (as a function of age / length) Depends on how “fleet” is defined
Selectivity is NOT:• Gear selectivity• Availability
4
Some of the key questions-I
Should there be multiple fleets and, if so, how do we choose them?• More fleets (may) make the assumption of time- invariant selectivity more valid.• More fleets lead to more parameters (and potentially model instability).
5
Some of the key questions-II
Given a fleet structure:• What functional form to assume?• Should selectivity change with time?• Parametric or non-parameteric?
0 5 10 15 20
0.0
0.4
0.8
Age
sele
ctiv
ity
AgeTime
selectivity
6
Some of the key questions-IIIGiven time-varying selectivity:• Blocked or unblocked• Which parameters of the selectivity function (or all) should change?
AgeTime
selectivity
AgeTime
selectivity
Annual Five-year blocks
Age-at-50% selex
7
Caveat – Can selectivity be estimated anyway-I?
Selectivity is confounded with:• Trends in recruitment (with time)• Trends in natural mortality (with age / time)
8
Caveat – Can selectivity be estimated anyway-II?
Age
05
10
15
Age
Low recruitment? Low selectivity?
Declining recruitment? Declining selectivity?High F
9
Caveat – Can selectivity be estimated anyway-III?
5 10 15 20
0.00
0.02
0.04
0.06
0.08
Year
Cat
ch
S trend with ageM trend with ageR trend with time
(a)
5 10 15 200.
00.
20.
40.
60.
81.
0
Year
Num
bers
(b)
Fit of various selectivity-related models to a theoretical age-composition.
10
Caveat – Can selectivity be estimated anyway-III?
The Solution: MAKE ASSUMPTIONS:• Natural mortality is time- and age-invariant• Selectivity follows a functional form.• Selectivity is non-parametric, but there are penalties on changes in selectivity with age/ length
11
Example Stocks
Pink lingPacific sardine
12
Example Stocks(fleet structure)
-134 -130 -125 -120 -115 -110 -105
20
30
40
50
55
MexCalPNW
(a)
-134 -130 -125 -120 -115 -110 -105
20
30
40
50
55
ENSSCACCAPNW
(b)2011 2010
13
Example Stocks(fleet structure)
110 115 120 125 130 135 140 145 150 155 160 165
-50
-45
-40
-35
-30
-25
80
50
40
60
30
20
10
70
(c)
Pink Ling
One fleet ormany
Fleets:• Trawl vs Non-trawl• Zones 10,20,30• Onboard vs port samples
14
Sensitivity to Assumptions
1995 2000 2005 2010
050
010
0015
00
Year
SS
B (
'000
t)
Base caseNo time varyingMexCal f leets: same selexAll f leets: same selex
(a)
1970 1980 1990 2000 2010
020
0040
0060
0080
00Year
SS
B (
t)
Base caseAll traw l mirroredAll mirroredSpatial AggAll Asym
(b)
Largest impacts:• Is selectivity time-varying or static?• Number of fleets / treatment of spatial structure• Is selectivity asymptotic or dome-shaped?
15
Selection of Fleets
Definition:• Ideally – group of vessels fishing in the same spatio- temporal stratum using the same gear and with the same targeting practices• In practice – depends on data availability, computational resources, model stability, trends in monitored data.
16
Fleets as areas-I
It is common to represent “space” by “fleets” (e.g. pink ling): • what does this assume? • does it work?
Key Assumptions:• The population is fully mixed over its range• Differences in age / length compositions are due to differences in selectivity.
17
Fleets as areas-II(does it work)
In theory “no” – in practice “perhaps”!
Cope and Punt (2011) Fish Res. 107: 22-38
Clearly, the differences in length and age structure among regionsis due to differences in population structure; not selectivity! Self-evidently then the approach is wrong
Simulations suggest that treating fleets as areas can reduce bias(Ferro-Hurtado et al.) but that spatial models may perform better(if the data exist – and perhaps not) but M probably isn’t age
and time-invariant either!
18
The State of the Art (as I see it)
• Disaggregate data when including them in any assessment (it is easy to aggregate the data when fitting the model).
• Test for fleet structure early in the model development process.
• Apply clustering-type methods to combine areas / gear types (not statistical tests, which will lead to 100s of fleets).
19
Residual Analysis
In principle this is easy:• Plot the data• Compute some statistics• Compare alternative assumptions…
EBS Tanner crab
20
• We know how to do this for index data (well)
• It gets trickier for compositional data (and hence selecting functional forms for selectivity)
Trawl10A N=18495effN=2565.5
0.00
0.05
0.10
0.15
0.20
0.25
0.30
length comps, sexes combined, retained, aggregated across time by fleet
Length (cm)
Prop
ortio
n
NonTrawl20A N=9559effN=1190.1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Trawl20A N=20789effN=888.1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
NonTrawl30A N=8711effN=1115.1
20 40 60 80 100 120
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Trawl30A N=3268effN=696
Trawl10B N=4297effN=328.4
NonTrawl20B N=4189effN=769.2
Trawl20B N=5589effN=468.2
20 40 60 80 100 120
NonTrawl30B N=1430effN=210.6
Trawl30B N=3682effN=84
Kapala N=400effN=278.8
20 40 60 80 100 120
Fits to aggregated lengthdata for pink ling whenselectivity is assumed tobe independent of zone
Trawl10A N=2311.9effN=2454.6
0.00
0.05
0.10
0.15
0.20
0.25
0.30
length comps, sexes combined, retained, aggregated across time by fleet
Length (cm)
Prop
ortio
n
NonTrawl20A N=1433.8effN=1330.9
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Trawl20A N=2078.9effN=2621.1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
NonTrawl30A N=1742.2effN=1772.1
20 40 60 80 100 120
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Trawl30A N=653.6effN=641.3
Trawl10B N=429.7effN=548.6
NonTrawl20B N=628.4effN=694.1
Trawl20B N=279.4effN=286
20 40 60 80 100 120
NonTrawl30B N=214.5effN=231.8
Trawl30B N=184.1effN=170.7
Kapala N=200effN=145.3
20 40 60 80 100 120
21
BUT!
• Evaluating mis-specification for compositional data is usually not this easy:– The fit may be correct “on average” but there are
clear problems.– It may not be clear whether the model is mis-
specified
22
Is this acceptable?And this?
23
BUT!
• Evaluating mis-specification for compositional data is usually not this easy:– The fit may be correct “on average” but there are clear
problems.– It may not be clear whether the model is mis-specified
• Comparing time-varying and static selectivity can be even more challenging because it depends on how much selectivity can vary [Maunder and Harley identify an approach based on cross-validation to help with this]
24
Using profiles to identify mis-specification
0.5 1.0 1.5
010
20
30
40
logR0
Diffe
rence in log-lik
elih
ood
1 1 1 1 1 1 1 122
22
22
22
3 3 3 3 3 3 3 34 4 4 4 4 4 4 4
55
55 5 5 5 5
66
6 6 66
66
7 7 7 7 7 7 7 7
8
88
88 8 8 89 9 9 9 9 9 9 910 10 10 10 10 10 10 1011 11 11 11 11 11 11 1112 12 12 12 12 12 12 1213
1313
13
13
1313
13
(a)
0.0 0.5 1.0 1.5 2.0 2.5
05
10
15
20
logR0
Diffe
rence in log-lik
elih
ood
11
11
1
1
1
1
11
22
22 2 2
2
2
2
2
3333 3 3 3 3 3 34444 4 4 4 44
4
55555
55
5
55
(b)
Spatially-disaggregated Spatially-aggregated
Plot the negative log-likelihood [compositional data only] for each fleet to identify fleets whose compositional data are “unduly” informative
Fleets 2 and 13 (left) and 2 and 5 (right): fleet 13 (a) and 5 (b) are the same fleetand have only two length-frequencies… Should we learn this much?
25
Automatic Residual Analysis
Punt & Kinzey: NPFMC crab modelling workshop
Two sample Kologorov-Smirnov test applied toartificial data sets
26
The State of the Art (as I see it)-I
• Always:– examine plots of residuals – compare expected effective sample sizes with input values
• But:– Viewing plots of residuals can be difficult– How to define / test for time-varying selectivity is tough– Residual patterns in fits to compositions need not be due to choices related to selectivity– There is no automatic approach for evaluating residuals plots for compositional data.– No testing of methods based on residual plots has occurred (yet?)
27
The State of the Art (as I see it)-II
Aggregated compositions
Observed vs expected compositions
28
Model SelectionNo-one would say that model selection (and modelaveraging) are not part of the tool box of analysts BUTdo we know how well they work for stock assessment models?
Model selection methods used:Maximum Likelihood
• F-tests / likelihood ratio tests • AIC, BIC, AICc
Bayesian• DIC
29
Examples of Model Selection
• AIC:– Butterworth et al. [2003]: is selectivity for
southern bluefin tuna time-varying?– Butterworth & Rademeyer [2008]: is selectivity for
Gulf of Maine cod dome-shaped or asymptotic ?• DIC
– Bogards et al. [2009]: is selecticity for North Sea spatially-varying or not?
30
Examples of Model Selection(Issues)
• AIC, BIC and DIC are too subtle:– Often fits for two models are negligibly different
“by eye”, but highly “statistically significant” (AIC>200).
• All these metrics depend on getting the likelihood “right”, in particular the effective sample sizes for the compositional data.
31
Model Selection and weights
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
xx
yy
So which model fits the data best? And if we accidentallycopied the data file twice?
32
Effective Sample Sizes-I
Many assessments:• Pre-specify EffNs.• Use the “McAllister-Ianelli” approach.
But• Residuals are seldom independent
An alternative is Chris Francis’ approach, but that may fail whenthere is time-varying selectivity.
33
Effective Sample Sizes-II
• Maunder [2011] compared various likelihood formulations including:– Multinomial– Fournier et al. with observed rather than expected proportions– Punt-Kennedy (with observed proportions)*– Dirichlet – Iterative (essentially the “McAllister-Ianelli” method)– Multivariate normal
2, , ,
2,
ˆ(ln ln )
2,
ln ( | ) ln t a t a t a
t a
P P P
Pt a
L Data
Estimated effective sample size
34
AIC, BIC and Random Effects
Most (almost all) assessments using an “errors in variables” formulation of the likelihood function:
,argmax ( | , ) ( | )L D P
argmax ( | , ) ( | )L D P d
rather than the correct (marginal) likelihood:
How this impacts the performance of model selectionmethods is unknown.
35
The State of the Art (as I see it)• AIC, BIC, and DIC are commonly used.• But:
– Do we need an analogue to the “1% rule” as is the case for CPUE standardization?
– We need to get the effective sample sizes right! Using a likelihood function for which the effective sample size
can be estimated is a good start!– Performance also depends on treatment of random effects (recruitment, selectivity)
• What is the value of looking at retrospective patterns? Can we identify when the cause of a retrospective pattern is definitely selectivity?
36
Simulation Testing
Operating Model Operating Model
Method 1 Method n Method 1 Method n….. …..
Model Selection
Performance measures
Performance measures
37
Simulation Testing
• Caveats before we start:– Simulations are only as good as the operating
model• Most simulation studies assume that the likelihood
function is known (as is M)• Few simulation studies allow for over-dispersion.• No simulation studies simulate the “meta” aspects of
stock assessments (such as how fleets are selected).
– Avoid too many generalizations – most properties of estimators will be case-specific
38
Overdispersal?
How often do the data generated in simulation studies look like this?
How much does it matter?
39
Overview of Broad Results
• Getting selectivity assumptions wrong matters! HOWEVER, other factors (data quality, contrast, M) may be MORE important.
• Estimating time-varying selectivity when selectivity is static is safer than ignoring it when selectivity is time-varying.
• Model selection methods can discriminate among selectivity functions very well (do I really believe this – why then does it seem so hard in reality?)
40
The State of the Art (as I see it)
• The structure of most (perhaps all) operating models is too simple and leads to simulated data sets looking “too good” – Andre’s suggestion: if you show someone 99 simulated
data sets and the real data set, could they pick it out?• Future simulation studies should:
– Include model and fleet selection.– Focus on length-structured models.– Examine whether selectivity is length or age-based.
41
Final Thoughts
• Methods development– Non-additive models?– State-space models?
• Residuals and model selection– Weighting philosophy
• Simulation studies– Standards for what constitutes a
“decent” operating model?– Compare methods for
implementing time-varying selectivity (blocked vs annual)
– Consider length-structured models
42
Final Thoughts
• Ignore “space” at your peril!• What about model mis-specification in general.
43
Final Points to Ponder!
• Should guidelines be developed for when to:– downweight compositional data rather than modelling time-
varying selectivity– fix selectivity and not estimate it!– use retrospective patterns in model selection / bootstrapping– conduct model selection when the selectivity pattern is “non-
parameteric”– apply time-varying selectivity
• Model selection• Fixing / estimating sigma
– trump AIC, BIC and DIC using “by eye” residual patterns.
44
Questions?
Support for this paper was provided by NOAA:• The West Coast Groundfish project• Development of ADMB libraries• Simulation testing of assessment models