for peer review - montana state university · 2020. 8. 18. · 7 alberta, e-mail:...

32
For Peer Review Abundance estimation in the presence of zero inflation and detection error using single visit data Journal: Environmetrics Manuscript ID: Draft Wiley - Manuscript type: Research Article Date Submitted by the Author: n/a Complete List of Authors: Solymos, Peter; University of Alberta, Department of Biological Sciences Lele, S. Bayne, Erin Keywords: Open populations, Conditional likelihood, Ecological Monitoring, Mixture models, Pseudo-likelihood John Wiley & Sons Environmetrics

Upload: others

Post on 02-Apr-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

Abundance estimation in the presence of zero inflation and detection error using single visit data

Journal: Environmetrics

Manuscript ID: Draft

Wiley - Manuscript type: Research Article

Date Submitted by the Author:

n/a

Complete List of Authors: Solymos, Peter; University of Alberta, Department of Biological Sciences

Lele, S. Bayne, Erin

Keywords: Open populations, Conditional likelihood, Ecological Monitoring, Mixture models, Pseudo-likelihood

John Wiley & Sons

Environmetrics

Page 2: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

1

Abundance estimation in the presence of zero inflation and detection error using single 1

visit data 2

3

Péter Sólymos1, Subhash Lele

2 and Erin Bayne

3 4

5

1Alberta Biodiversity Monitoring Institute, Department of Biological Sciences, University of 6

Alberta, e-mail: [email protected] 7

2Department of Mathematical and Statistical Sciences, University of Alberta, e-mail: 8

[email protected] 9

3Department of Biological Sciences, University of Alberta, e-mail: [email protected] 10

11

Running title: Abundance estimation using single visit data 12

Word count in the abstract: 142 13

Word count in the manuscript as a whole: 7035 14

Word count in the main text: 4821 (from Introduction to Acknowledgements) 15

Number of references: 30 16

Number of figures and tables: 3 figures, 2 tables 17

18

Address of correspondence: Péter Sólymos, Alberta Biodiversity Monitoring Institute, 19

Department of Biological Sciences, CW 405, Biological Sciences Bldg., University of Alberta, 20

Edmonton, Alberta, T6G 2E9, Canada, Phone: 780-492-8534, Fax: 780-492-7635, e-mail: 21

[email protected] 22

Page 1 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 3: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

2

Abstract 23

It is well established that population surveys are subject to detection error. Current methods to 24

correct for detection error using mixture models require multiple visits to survey locations, and 25

assume closed populations that do not change during the full survey period. We show that 26

contrary to popular belief, multiple visits are not necessary to correct for detection error. The 27

parameters of the Binomial-zero inflated Poisson mixture model can be estimated using single 28

visit data. The use of conditional likelihood leads to estimators that are more stable than full 29

likelihood based estimators used in multiple visit survey approaches. Our single visit method has 30

several advantages: 1) it does not require the hard to satisfy closed population assumption; 2) it is 31

cost effective, enabling ecologists to cover a larger geographical region than possible with 32

multiple visit methods; and 3) resultant estimators are statistically efficient. 33

34

Keywords: Closed populations, Conditional likelihood, Ecological Monitoring, Mixture models, 35

Open populations, Pseudo-likelihood. 36

Page 2 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 4: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

3

37

Introduction 38

Ecologists are fundamentally interested in understanding the environmental factors that 39

influence variation in the size of populations. To understand variation in population size requires 40

information on how the occurrence or abundance of species changes in time and space. Many 41

ecologists rely on relative differences in occurrence or counts of the number of individuals 42

observed to draw inferences about factors influencing populations (Krebs 1985). However, 43

models that predict naïve estimates of occurrence (e.g. logistic regression) or abundance (e.g. 44

Poisson regression) are known to underestimate true occurrence and abundance because of 45

detection error. Detection error is the probability that a species (occurrence) or individual of a 46

species (counts) is present during the period of observation but is not detected. The probability of 47

detecting all individuals present in a survey area is rarely one (Yoccoz et al. 2001, Gu and 48

Swihart 2004). Environmental factors that influence population change also affect probability of 49

detection. Thus the issue of imperfect detection needs to be addressed if ecologists are to draw 50

correct conclusions about factors influencing population change (MacKenzie et al. 2002, Tyre et 51

al. 2003). 52

The last decade has seen an enormous growth in the statistical methodology to deal with 53

the problem of detection error (MacKenzie et al. 2006). One approach that has been widely 54

adopted is that of multiple visit surveys that use an N-mixture approach to estimate detection 55

error from count data (Royle 2004). In the N-mixture approach, true abundance has typically 56

been modeled using a Poisson or a Negative Binomial (NB) distribution, while detection error 57

has been modelled as a Binomial observation process. True abundance rates in the Poisson or 58

Negative Binomial model and detection probabilities of individuals in the Binomial model are 59

Page 3 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 5: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

4

commonly modeled as a function of habitat and survey-specific characteristics. By accounting 60

for detection error in the observed counts, N-mixture models differentiate between two kinds of 61

zeros: “false” zeros due to detection error where true abundance is greater than 0 but observed 62

count is 0; and “true” zeros due to the state process where the true abundance is 0 and the 63

observed count is also 0. 64

In many situations, a third type of zero can exist. When surveys take place on larger 65

geographic scales, “true” zeros arise not only as zeros due to the Poisson or NB distribution but 66

as a result of true zero-inflation (Martin et al. 2005). True zero-inflation can happen when a 67

species’ range is only partly covered by the extent of the area sampled, the species is quite rare, 68

or the distribution of individuals is highly aggregated. Joseph et al. (2009) proposed zero-inflated 69

Poisson (ZIP) and zero-inflated NB (ZINB) mixture models to account for this third type of true 70

zeros. They used Binomial-ZIP and Binomial-ZINB models with a multiple visit sampling 71

approach to account for detection error in overdispersed counts. 72

The goal of multiple visit methodologies is to provide a more accurate estimator of true 73

abundance by considering detection error than the naïve estimator that ignores detection error. 74

However, there is growing evidence that multiple survey models overestimate true abundance in 75

many situations (Joseph et al. 2009, Moreno and Lele 2010, Bayne et al. under review). For an 76

N-mixture estimator to be an accurate estimator of the true abundance of a species, it is 77

necessary that the population size does not change during the total duration of the repeated visits 78

(closure assumption). Violations of closure can happen for non-sessile organisms due to 79

dispersal or even daily movement (Joseph et al. 2009, Bayne et al. under review). One way to 80

ensure closure is to decrease the time elapsed between successive visits, however this can lead to 81

Page 4 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 6: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

5

the violation of the assumption of independent visits which is also required for multiple visit 82

approaches to provide accurate estimates of abundance. 83

Given the challenges inherent in meeting the assumptions of multiple visit N-mixture 84

models for some common ecological situations, a new approach to dealing with detection error 85

in count data is needed. In this paper, we show that detection error in abundance surveys can be 86

corrected using only a single visit to a site hence avoiding the assumption of closure. Our 87

approach requires that covariates that affect abundance or detectability are available. Such 88

covariates are commonly in most biological studies. We show that the parameters of the 89

Binomial-ZIP N-mixture model, that account for all three kinds of zeros, can be consistently and 90

efficiently estimated based on a single visit to sites. We also show that abundance estimators 91

based on single visit are robust, and ecologically sensible. 92

The Binomial-ZIP model 93

We consider the zero-inflated Poisson (ZIP) model for the true state. A hierarchical 94

representation of the ZIP model is (Ni | λi, Ai) ~ Poisson(λi Ai), (Ai | φ) ~ Bernoulli(1 - φ), where 95

Ni is the population abundance at location i (i = 1, 2, …, n; the total number of sites), λi is the rate 96

parameter of the Poisson distribution when the species is present at location i. The probability 97

that Ai = 0 is φ, consequently the probability that at least one individual is present is 98

(1−φ)(1−e−λi ). The φ = 0 case corresponds to a Poisson model for the true state. The Poisson 99

rate parameter can be modelled as a function of covariates using the log link function: log(λi) = 100

XiTβ, where β is a vector of regression coefficients including the intercept (β0), and Xi is the 101

covariate matrix with n rows and as many columns as the number of variables in the model. 102

Links other than the log-link for the Poisson model can be used. 103

Page 5 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 7: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

6

The observation process is modeled using the Binomial distribution as (Yi | Ni) ~ 104

Binomial(Ni, pi), where Yi is the observed count at site i, and pi is the probability of detecting an 105

individual given the true abundance Ni is greater than 0. The probability of detection can be 106

modeled as a function of covariates using the logistic link function: logit(pi) = ZiTθ, where θ is a 107

vector of regression coefficients including the intercept (θ0), and Zi is a covariate matrix similar 108

to Xi. One can use links other than the logistic link in the Binomial model. The covariate vectors 109

Xi and Zi can have common covariates, but needs to have at least one covariate that is unique to 110

either the abundance or detection error vectors. 111

Parameter estimation 112

The likelihood function corresponding to the Binomial-Poisson mixture based on single 113

visit is: 114

, 115

where I(.) is an indicator function. Because Ni is unknown, the likelihood involves summation 116

over all possible values of Ni. Direct maximization of this function leads to substantial 117

confounding between the parameters. We have observed that the parameter φ and the intercept 118

parameter θ0 in the detection model are especially confounded. To reduce this confounding, we 119

divide the problem in two parts. In the first part, we condition on a sufficient statistic for the 120

parameter φ and use the conditional distribution of the data given the sufficient statistics to form 121

a conditional likelihood function (Anderson, 1970) for the parameters (β, θ). The conditional 122

likelihood estimators are known to be consistent and asymptotically normal under fairly general 123

conditions. To estimate φ, we construct a new random variable Wi = I(Yi > 0). Then, we write the 124

likelihood function for (β, θ, φ) based on the distribution of W i . This likelihood function does 125

Page 6 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 8: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

7

not involve infinite summation and hence is easy to maximise. Further, it is a concave function 126

of φ and hence has a unique solution. Based on the idea of pseudo-likelihood described in Gong 127

and Samaniego (1981), we fix the values of (β, θ) at their conditional likelihood based estimates 128

and maximize the likelihood with respect to φ to obtain its estimate. The results in Gong 129

and Samaniego (1981) show that this pseudo-likelihood estimator is consistent and 130

asymptotically normal. The derivation of the conditional and pseudo-likelihood functions is 131

described in the Appendix. We use the bootstrap procedure (Efron and Tibshirani 1994) to 132

calculate confidence intervals for the estimated parameters. The software implementation is 133

available in the statistical package ‘occupy’ (Sólymos and Moreno 2010) written in the free 134

statistical software R (R Development Core Team 2009). To compare single visit conditional 135

likelihood estimators with the multiple visits Binomial-ZIP, we follow Joseph et al (2009) and 136

maximize the full likelihood function. 137

We use probability plots to evaluate model fit under the Binomial-Poisson and Binomial-138

ZIP models. The model fit is adequate if the values of the empirical and fitted cumulative 139

distribution function (CDF) fall along a line with intercept 0 and slope 1. 140

Simulation study 141

To study the properties of the estimation procedure described in the previous section, we 142

performed several simulations. We considered the situation where the covariates that affect 143

detection and abundance are distinct from each other and where some of the covariates are 144

common, that is we had covariates that affected both detection and abundance. Furthermore, we 145

considered eight different scenarios corresponding to combinations of low ( = 2.13) vs. high 146

abundance ( = 5.25), zero-inflated (φ = 0.25) vs. non zero-inflated data (φ = 0), and low ( = 147

Page 7 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 9: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

8

0.25) vs. high ( = 0.65) detection probability (for more details, see Appendix S1 in Supporting 148

Information). 149

We fitted the Binomial-ZIP mixture model to each simulated data set using 100, 300, 150

500, 700, 1000 sites. All together we used 160 different settings (4 settings x 8 scenario x 5 151

sample sizes) and ran100 simulations for each. Average of the true abundances varied between 152

1.6 to 5.2, while the average of the observed counts varied between 0.4 to 3.4 depending on the 153

parameter settings and the covariates used in the simulations. These settings represented a wide 154

range of ecologically plausible situations. 155

To compare the single visit results with those obtained from multiple surveys, assuming 156

the closed population assumption is satisfied, we considered our worst case setup (common 157

discrete covariate under all eight scenarios of combinations of abundance levels, zero inflation 158

and detection probabilities). We assumed four independent visits to each location. We then 159

compared the single visit n = 500 results with the 2 visits with n = 250 results, and the single 160

visit n = 1000 case with the 2 visits n = 500, and 4 visits n = 250 results. 161

Abundance parameters (β) were consistently estimated as the sample size increased, and 162

reliable estimates were obtained with n = 100 in most situations. Detection parameters (θ) were 163

also consistently estimated as sample size increased. The zero inflation parameter φ was well 164

estimated even at small sample sizes. Predicted values were somewhat overestimated for n = 165

100, otherwise for larger sample sizes they were consistent with the true values. Predicted 166

values were consistent for all sample sizes. The correlation between the true and predicted and 167

values were high ranging from 0.8 to 1 for sample sizes n = 300 and above. Even when the 168

data were simulated under no zero-inflation (φ = 0), the parameter φ was well estimated. Figure 1 169

represents the worst case scenario with a common discrete covariate for the abundance and 170

Page 8 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 10: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

9

detection models, and low abundance – zero-inflated data – low detectability scenario. Even in 171

this difficult situation, it is clear that the conditional likelihood method works well. A complete 172

summary of the results obtained for the 160 cases is available in the Appendix S1 in Supporting 173

Information. 174

Simulation results for estimation based on multiple visits indicate that parameter 175

estimates (especially intercepts β0, θ0 and φ) were quite biased and confounded when the worst 176

case scenario was used (common discrete covariate, low abundance – zero-inflated data – low 177

detectability). This resulted in biased estimates of mean abundance and detection probability. 178

Comparatively, the single survey conditional likelihood estimator worked very well in this 179

situation (Fig. 2). Under different simulation settings (i.e. high detectability), multiple visits 180

estimates weren’t as biased as in this case (see Appendix S1 in Supporting Information). But 181

even in these best case scenarios, single survey- conditional likelihood estimators were more 182

efficient than the often biased and inefficient estimators based on multiple visits (cf. Figs. 1 and 183

2). Sometimes, single survey estimators at smaller sample size (n = 100) were better than even 184

the multiple survey estimators based on larger sample sizes (n = 250 or n = 500). We believe this 185

gain in small sample efficiency is because of the separation of the parameters (β, θ) and φ by the 186

use of conditional and pseudo-likelihood method. 187

Analysis of the Ovenbird data 188

For data analysis, we used two examples where zero-inflation component was suspected. 189

We used observed counts of Ovenbirds (Seiurus aurocapilla) to illustrate the estimation of the 190

parameters for the Binomial-ZIP model. Data were collected in 1999 using Breeding Bird Survey 191

(BBS) Protocols (Downes and Collins 2003) in the boreal plains eco-region of Saskatchewan. 192

The goal of the study was to determine whether the occupancy of this species was influenced by 193

Page 9 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 11: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

10

the amount of forest around each survey point. Data were collected along 36 BBS routes each 194

consisting of 50 survey locations with survey locations separated by 800 meters. To increase 195

independence of observations we used every second survey point along each route in our 196

analysis (n = 891 survey locations). Attributes about the forest type and amount of forest 197

remaining with a 400 meter radius were estimated from the Saskatchewan Digital Land Cover 198

Project (MacTavish 1995). The same data set was used in Lele et al. (2010, manuscript) for 199

studying single survey based estimation of site occupancy of the species. 200

The habitat requirements of the Ovenbird are well understood in the boreal forest 201

(Hobson and Bayne 2002) and we expected that Ovenbird abundance would be positively 202

influenced by the amount of forest, or deciduous forest remaining and negatively by amount of 203

agricultural land. The zero-inflation component is likely to be present because of the marked 204

difference in habitat suitability for the species along the agricultural area gradient. We also 205

included latitude-longitude as the study covered an east-west gradient over 1000 kilometers and 206

a 400 km north-south gradient in length although a priori we were not sure what effect this 207

would have on abundance. 208

We expected four factors to influence detection probability: observer, time of day, time 209

of year, and amount of forest. Observers differ in their ability to hear birds in part due to skill but 210

also due to fundamental differences in the distance over which they hear things. In general, male 211

songbirds sing very regularly early in the breeding season making it easy to detect individuals 212

that are present. As the breeding season progresses however, males spend less time singing as 213

they focus on other activities. This often results in lower detectability later in the breeding 214

season. We included Julian date as a variable influencing detection error. Male songbirds also 215

have a tendency to sing earlier in the day, shortly after sunrise, and then later in the morning 216

Page 10 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 12: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

11

focus on guarding the mate or foraging. To account for this, we included time of the day as a 217

factor influencing detectability. Detectability can also be influenced by habitat attributes. In more 218

open environments where forest loss has occurred it is plausible that birds can be heard from 219

long distances increasing the likelihood an individual is detected (Schieck 1997). Alternatively, 220

in areas with more forest the chance of multiple males singing may be higher, increasing 221

detection probability relative to areas with less forest where only one individual may exist. 222

Proportional covariates (ranging from 0-1) were logit transformed and all covariates were 223

scaled to unit variance and centered. We performed backward stepwise model selection starting 224

with the full model including all abundance and detection covariates, and dropped insignificant 225

terms until all remaining terms were significant. Then we compared the models based on the 226

Akaike’s Information Criterion (AIC) to select the final model. We calculated 90% confidence 227

limits based on 100 bootstrap samples. 228

We fitted the Binomial-ZIP mixture model to the single visit Ovenbird data set. We 229

started with the full model including habitat characteristics and geographic coordinates for the 230

abundance model, and observer, Julian day and time of day for the detection model. Proportion 231

of forest area was used in both the abundance and detection model, because it was a priori 232

assumed to influence both processes (model 1; Table 1). We started by simplifying the detection 233

model first. We dropped the time of day, because that term was not significant based on a Wald 234

test (model 2). All remaining terms in the detection model were significant. Then we started 235

dropping terms from the abundance model. After eliminating non-significant variables 236

(proportion of deciduous forest and agricultural area that were correlated with proportion of 237

forest area), we found the best fit model (model 4), which couldn’t be further simplified without 238

an increase in the Akaike’s information criteria (AIC) value. The AIC value corresponding to the 239

Page 11 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 13: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

12

Binomial-Poisson mixture with the same covariates as model 4 was 1456.7. This is much higher 240

than the AIC value 860.7 of the Binomial-ZIP model. Aside from better AIC value, the 241

probability plot clearly shows that the Binomial-ZIP model fit is better than the Binomial-242

Poisson model (Fig. 3B). 243

Proportion of forest area had significant positive effect on Ovenbird abundance. 244

Geographic coordinates were not significant predictors of abundance as the confidence intervals 245

overlapped zero. However, their inclusion based on AIC suggested there was a spatial pattern 246

that explained some of the variation in Ovenbird abundance. Ovenbird abundance increases as 247

one goes further north and east in the study area. Observer effects were pronounced. Julian date 248

had significant negative effect on detectability of individuals probably because of decreased 249

singing activity later in the season. Time of day had a negative relationship (in the full model) 250

with detectability. This is concordant with the singing behaviour of the males but the effect was 251

not significant. Proportion of forest area had significant negative effect on detectability. This 252

indicates that individuals are more detectable in open habitats, in spite of lower abundances in 253

such habitats. The zero-inflation component was 0.41, and the average probability of Poisson 254

zeros (P(N = 0) = mean{(1-φ) }) was 0.15 (Table 1, Fig. 3A). The probability of occurrence 255

(mean{ (1−φ)(1−e−λi )}) was 0.44 and predicted mean abundance for the entire study area was 256

(1-φ) = 1.54. This translates into 11.36 birds per point count station at point count stations 257

where the entire area was forested (100% forest cover). Mean probability of detection of 258

individual Ovenbirds was 0.51. 259

Given that Breeding Bird Survey uses an unlimited sampling distance to count birds, 260

absolute density cannot be directly estimated from the Ovenbird example. However, Rosenberg 261

and Blancher (2004) as part of the Partners in Flight planning process estimated that the 262

Page 12 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 14: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

13

maximum distance over which Ovenbirds could be heard was 200 metres. Using this as the area 263

sampled by BBS counts, our mean count when a point count station has 100% forest cover 264

converted to a density of 0.904 male Ovenbirds per hectare. This is very close to the density 265

estimate of 0.99 (95% confidence limits (CL): 0.85-1.12) found by Bayne (2000) who mapped 266

the territories of color-banded male Ovenbirds and determined absolute density in the same 267

region. 268

Lele at al. (2010, manuscript) used the same Ovenbird data set but treated it as 269

detection/non-detection data to estimate site occupancy and detectability based on single visit. 270

They found that proportion of forest positively influenced the probability of detecting a species. 271

This finding is in contrast with our results but is easy to resolve. The probability of detecting a 272

species (at least one individual) is highest where population density is higher (i.e. in forest). But 273

the probability of detecting an individual can be higher in open areas where sound can travel 274

greater distances than in more forested landscapes. Lele at al.(2010, manuscript) found the 275

average detection probability to be 0.49 (90% CL: 0.38-0.61). This coincides well with our mean 276

detectability estimate of 0.51. Lele et al. (2010, manuscript) found the mean probability of 277

occurrence to be 0.5 (90% CL: 0.41-0.64). Based on the abundance data, the average probability 278

of occupancy was 0.44 which is within the confidence limits of their estimates. By using the 279

abundance data instead of the detection/non-detection transformation of it, we were able to 280

differentiate between zero-inflation, Poisson and non-detection zeros. Using occupancy, only the 281

distinction between non-detection (false) and true zeros is possible which for many species does 282

not provide a complete picture. 283

Analysis of the Mallard data 284

Page 13 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 15: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

14

To compare multiple and single visits based estimates, we used the data set reported in 285

Kéry et al. (2005) for Mallards (Anas platyrhynchos) from the Swiss monitoring program for 286

common breeding bird species. This species is easy to detect, has a narrow local distribution and 287

low abundance in wetland habitats in Switzerland. The dataset contained n = 235 sites with 1-3 288

visits to the sites (2 sites had only one visits, 42 sites had 2 visits and 191 sites had 3 visits). Sites 289

represented 1 km2 quadrats distributed in a grid across Switzerland. Territory mapping was 290

carried out along quadrat specific routes by experienced observer. The data set included route 291

length, elevation and forest cover for the sites and date and survey effort for each visits to the 292

sites. All variables were scaled to unit variance and centered in the original data set. 293

We fitted multiple-visit Binomial-ZIP model to the Mallard data set, and also fitted 294

individual single-visit Binomial-ZIP models to each visit separately. We also fitted the naïve ZIP 295

model (without detection error) using the ‘pscl’ (Zeileis et al. 2008) R package based on the 296

maximum counts per site over all 3 visits. We calculated 90% confidence limits based on 100 297

bootstrap samples. 298

Kéry et al. (2005) used this data to fit Binomial-Poisson and Binomial-NB N-mixture 299

models. Out of the 235 sites, only 39 contained non-zero counts for at least one visit, out of 300

which only 15 sites had maximum counts larger than 1. This indicated the possibility of zero 301

inflation prior to any analysis. The AIC value (509.2) corresponding to the Binomial-Poisson 302

mixture (Kéry et al. 2005) was substantially higher than the AIC value (-393.9) for the multiple 303

visits Binomial-ZIP model (Table 2). 304

As the Binomial-ZIP model fits the data better than the Binomial-Poisson model, we 305

compared the naïve ZIP and the Binomial-ZIP models based on single versus multiple visits. The 306

naïve estimate of Mallard mean abundance ((1-φ) ) using the ZIP model (without detection 307

Page 14 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 16: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

15

error) was 0.36 per km2, and gave 0.21 as probability of occurrence (mean{ (1−φ)(1−e−λi )}) 308

(Table 2). Based on the multiple visits Binomial-ZIP model, mean abundance was 4.32 per km2 309

with probability of occurrence 0.17. There were no Poisson zeros due to the high value. We 310

used the single visit Binomial-ZIP model estimation for each visit separately. The mean 311

abundances were 2.57, 0.58 and 0.39 per km2 respectively, indicating a negative trend in 312

abundance over the 3 visits. This was accompanied by only a slight change in probability of 313

occurrence (mean{ (1−φ)(1−e−λi )}; 0.24, 0.14, 0.17 for the three visits), but the probability of 314

Poisson zeros increased with the third visit substantially (from 0.03 to 0.14), indicating a drop in 315

abundance. Given the difference between predicted abundance values based on the multiple and 316

single visit approaches, and given the trend in the single visit estimates, we suspect that the 317

closed population assumption is violated for this data set. 318

The mallard data that was originally analyzed using the multiple visit approach in Kéry et 319

al. (2005) predicted mean abundance under the Binomial-NB model to be 0.43 per km2. They 320

argued this was an inaccurate estimate given that observed average density (with no detection 321

error) was 0.41. They interpret this small difference to indicate that three visits were insufficient 322

for detecting all territories. Based on the comparison of Binomial-Poisson and Binomial-ZIP 323

AIC values, the alternative explanation is that the data is zero inflated with very high probability 324

of zero-inflated zeros (0.69-0.83). Given that the zero-inflation model represents the data better, 325

and that wetlands that this species prefer are inherently patchily distributed, the non-zero inflated 326

Poisson or NB model is hard to interpret ecologically without other habitat characteristics. Using 327

the multiple visits Binomial-ZIP likelihood estimates, average abundance was a magnitude 328

higher (4.3) than the estimate of Kéry et al. (2005). The multiple visits method estimated very 329

low average probability of detection (0.016), which resulted in the high abundance estimate. The 330

Page 15 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 17: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

16

single visit based Binomial-ZIP estimates were lower and showed a decrease in mean abundance 331

(2.6, 0.6, 0.4 for the three visits) over the 3 month interval of the visits (15 April – 15 July). 332

Given the 3 months time span of the repeated visits, and that Mallards are not strongly territorial, 333

the closed population assumption is likely violated. The decreasing trend during the breeding 334

season is somewhat surprising, but might be explained by movement out of sites to non-breeding 335

wetlands prior to migration. 336

Discussion 337

The N-mixture models that account for the detection error in wildlife studies represent an 338

important class of models. However, it is widely believed that correcting for detection error 339

requires temporal replication (e.g. Royle et al. 2005). Our results show that this is not true. Under 340

fairly general conditions, detection error can be corrected with single visit survey data for 341

occupancy and abundance studies and will give estimates similar to multiple visit approaches 342

when the assumption of closure is met. The single survey methodology, however, does not 343

require close population assumption to correctly estimate the population abundance. Thus, 344

single-visit approaches can save money, time and effort for field ecologists while accounting for 345

detection error. 346

For zero-inflated models, when the probability of detection is low, we found that 347

maximizing the likelihood function based on multiple visits leads to unstable inference. This is 348

because the different sources of zeros are confounded. As the use of N-mixture models is 349

growing and new variants are appearing in the literature, we feel that parameter identifiability 350

issues are often neglected (Lele, 2010, manuscript). For example, Royle (2004) carefully used 351

extensive numerical simulations to judge the validity of the estimators under multiple visits when 352

all assumptions are satisfied. On the other hand, Joseph et al. (2009) provided no such evidence 353

Page 16 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 18: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

17

for the ZIP and ZINB extension of the N-mixtures. We acknowledge their efforts in raising the 354

important problem of zero-inflation but our simulation results suggest that there is substantial 355

confounding of the parameters. The conditional likelihood, proposed in our paper, separates the 356

parameter space and hence reduces the extent of confounding. According to our simulations, 357

even when the close population assumption was satisfied, the single visit, conditional likelihood 358

based estimators (n = 100, one visit) outperformed the multiple visits, likelihood based 359

estimators (n = 250, 4 visits; n = 500, 2 visits) under many scenarios. As a consequence, even 360

when the assumption of closure is met, multiple visits results should be viewed cautiously. 361

N-mixture models based on multiple visits should be highly suspect when the assumption 362

of closure is thought to be violated. When closure is violated, Bayne et al. (under review) found 363

that multiple visit N-mixture models overestimated density by several hundred percent. This was 364

likely the case in the Mallard example. In most practical situations, the closed population 365

assumption is likely to be violated for simple ecological reasons such as within territory 366

movement, dispersal, etc. This is a widely acknowledged fact among wildlife biologists. A 367

number of papers are appearing that try to deal with the lack of closeness by redefining the time 368

or space interval over which multiple surveys are done (e.g. Kendall and White 2009). However, 369

the results in Lele et al. (2010, manuscript) and this paper suggest that, when covariates are 370

available, multiple surveys are not necessary and hence the closed population assumption is 371

irrelevant. In addition to the robustness against closed population assumption, our results indicate 372

that for most practical situations, conditional likelihood based estimation of the single survey 373

data requires smaller sample size and provides more efficient estimators than multiple survey 374

approaches. 375

Page 17 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 19: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

18

Many sampling methods and statistical analyses have been developed to estimate species 376

abundance. Even when it is possible to measure density accurately, the economics of doing so 377

can be prohibitive for large-scale applications. As a result, collecting presence-absence data at a 378

series of locations to get coarse measures of species abundance has become a preferred method 379

of evaluating ecological status and trends because of the simplicity of data collection. 380

Comparison of the simulations in Lele et al (2010, manuscript) with simulations in this paper 381

suggest that one may need substantially larger samples for occupancy data as compared to 382

abundance data. When abundance data are available, the estimators are stable and efficient, even 383

at much smaller sample sizes. Furthermore, using the zero-inflated Poisson model for the true 384

abundance, one can differentiate between zero-inflation and Poisson zeros. This is not possible 385

when using detected/not-detected data to model site occupancy. Hence we encourage ecologists 386

to collect count data whenever possible. Single survey methods increase the cost effectiveness of 387

monitoring studies without sacrificing statistical validity and efficiency of the estimates. 388

Acknowledgements 389

We would like to thank Stan Boutin, Steve Cumming, Monica Moreno, Jim Schieck, 390

Fiona Schmiegelow, Samantha Song, and the Boreal Avian Modeling Project Technical 391

committee for helpful discussions on the issue of detection error. Funding for this research was 392

provided by the Alberta Biodiversity Monitoring Institute, Environment Canada, North 393

American Migratory Bird Conservation Act, and Natural Sciences and Engineering Research 394

Council. 395

References 396

Anderson, E. B. (1970) Asymptotic properties of conditional maximum likelihood estimators. J. 397

Royal Stat. Soc. B 32: 283-301. 398

Page 18 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 20: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

19

Bayne, E. M. (2000) Effects of forest fragmentation on the demography of ovenbirds (Seiurus 399

aurocapillus) in the boreal forest. University of Saskatchewan, Saskatoon, Canada. PhD 400

Thesis. 401

Bayne, E., Lele, S. R. & Sólymos, P. (2010) Bias in the estimation of bird density and relative 402

abundance when the closure assumption of multiple survey approaches is violated: a 403

simulation study. The Auk, under review 404

Downes, C. M. & Collins, B. T. (2003) The Canadian breeding bird survey, 1967-2000. 405

Canadian Wildlife Service, Progress Notes No. 219. National Wildlife Research Centre, 406

Ottawa, ON. 407

Efron, B. & Tibshirani, R. (1994) An introduction to the bootstrap. Chapman & Hall/CRC. 436 408

p. 409

Casella, G. & Berger, R. L. (2002) Statistical inference. 2nd edn. Australia, Pacific Grove, CA. 410

Thomson Learning. 660 p. 411

Gong G. & Samaniego F. J. (1981) Pseudo-likelihood estimation: theory and applications. 412

Annals of Statistics 9: 861-869. 413

Gu, W. & Swihart R. K. (2004) Absent or undetected? Effects of non-detection of species 414

occurrence on wildlife-habitat models. Biol. Conserv. 116: 195-203. 415

Hobson, K. A. & Bayne E. M. (2002) Breeding bird communities in boreal forest of Western 416

Canada: Consequences of “unmixing” the mixed woods. Condor 102: 759-769. 417

Joseph, L.N., Elkin, C., Martin, T. G., Possinghami, H. P. (2009) Modeling abundance using N-418

mixture models: the importance of considering ecological mechanisms. Ecol. Appl. 19: 419

631-42. 420

Page 19 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 21: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

20

Kendall, W. L., & White, G. C. (2009) A cautionary note on substituting spatial subunits for 421

repeated temporal sampling in studies of site occupancy. J. Appl. Ecol. 46: 1182-1188. 422

Kéry, M., Royle, J. A., & Schmid, H. (2005) Modeling avian abundance from replicated counts 423

using binomial mixture models. Ecol. Appl. 15: 1450-1461. 424

Krebs, C.J. (1985) Ecology: The experimental analysis of distribution and abundance. 3rd edn. 425

Harper and Row, New York, USA. 426

Lele S. R. (2010) Model complexity and information in the data: could it be a house built on 427

sand? Ecology, in press 428

Lele, S. R., Moreno, M. & Bayne, E. (2010) Dealing with detection error in site occupancy 429

surveys: What can we do with a single survey? (Manuscript) 430

MacKenzie, D. I., Nichols J. D., Lachman G.B., Droege S., Royle J. A. & Langtimm C. A. 431

(2002) Estimating site occupancy rates when detection probabilities are less than one. 432

Ecology 83: 2248-2255. 433

MacKenzie, D. I., Nichols, J. D., Royle, A. J., Pollock, K. H., Bailey, L. L. & Hines, J. E. (2006) 434

Occupancy estimation and modeling: inferring patterns and dynamics of species 435

occurrence. Elsevier, Amsterdam, Netherlands. 324 pp. 436

MacTavish, P. (1995) Saskatchewan digital landcover mapping project. Report I-4900-15-B-95. 437

Saskatchewan Research Council, Saskatoon, SK. 438

Martin, T.G., Wintle B.A., Rhodes J.R., Kuhnert P.M., Field S.A., Low-Choy S.J., Tyre A.J. & 439

Possingham H.P. (2005) Zero tolerance ecology: improving ecological inference by 440

modeling the source of zero observations. Ecol. Lett. 8: 1235-1246. 441

Moreno M. & Lele S.R. (2010) Improved estimation of site occupancy using penalized 442

likelihood. Ecology, 91: 341-346. 443

Page 20 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 22: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

21

R Development Core Team (2009) R: A language and environment for statistical computing. R 444

Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL 445

http://www.R-project.org. 446

Rosenberg, K. V. & Blancher, P. J. (2005) Setting numerical population objectives for priority 447

landbird species. In: Bird Conservation and Implementation in the Americas: 448

Proceedings of the Third International Partners in Flight Conference (eds. Ralph, C. J. & 449

Rich, T. D.). U.S. Department of Agriculture, Forest Service, General Technical Report 450

PSW-GTR-191. Vol. 1, pp. 57-67. 451

Royle, J. A. (2004) N-mixture models for estimating population size from spatially replicated 452

counts. Biometrics 60: 108-115. 453

Royle, J. A., Nichols, J. D., & Kéry, M. (2005) Modelling occurrence and abundance of species 454

when detection is imperfect. Oikos 110: 353-359. 455

Schieck, J. (1997) Biased detection of bird vocalizations affects comparisons of bird abundance 456

among forested habitats. The Condor 99: 179-190. 457

Sólymos, P. & Moreno, M. (2010) ‘occupy’: analyzing single visit data with detection error. R 458

package version 1.0-0. URL: http://cran.r-project.org/package=occupy 459

Tyre, A.J., Tenhumberg, B., Field, S.A., Niejalke, D., Parris, K. & Possingham, H. P. (2003) 460

Improving precision and reducing bias in biological surveys: estimating false negative 461

error rates. Ecol. Appl. 13: 1790-1801. 462

Zeileis, A., Kleiber, C. & Jackman, S. (2008) Regression models for count data in R. J. Stat. 463

Soft., 27(8). URL http://www.jstatsoft.org/v27/i08/. 464

Yoccoz, N. G., Nichols, J. D. & Boulinier, T. (2001) Monitoring of biological diversity in space 465

and time. Trends in Ecol. Evol. 16: 446-453. 466

Page 21 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 23: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

22

467

Appendix: Conditional and pseudo-likelihood estimation for the Binomial-Zero Inflated 468

Poisson mixture model 469

Let ),(~| iiii pNBinomialNY where pi = p(Zi,θ) is a function of detection covariates Zi. Let 470

N i | Ai ~ Poisson(Aiλi) where λi = λ(X i,β) is a function of abundance covariates X i . Further 471

)1(~ φ−BernoulliAi . Then the random variableYi is said to follow a Binomial-Zero Inflated 472

Poisson distribution. We first derive some elementary mathematical statistics results related to 473

this distribution. 474

Result 1: Consider the conditional distribution 475

P(Yi = yi |Yi > 0) =P(Yi = yi)

1− P(Yi = 0) for yi =1,2,3,.... 476

The probability mass function for this conditional distribution is given by: 477

P(Yi = yi |Yi > 0) =

Ni

yi

Ni = yi

∑ pi

yi (1− pi)Ni −yi e

−λi λi

N i /Ni!

1− e−λi pi

for yi =1,2,3,... 478

Notice that this conditional distribution does not depend on the parameter φ. 479

Proof: This proof follows elementary probability theory (e.g. Casella and Berger, 2002). 480

P(Yi = y i |Yi > 0) =P(Yi = y i)

1− P(Yi = 0)

=

(1− φ)N i

y i

N i = y i

∑ pi

y i (1− pi)N i −y i e−λi λi

N i /N i!

1− P(Yi = 0)

………(1) 481

Further, 482

Page 22 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 24: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

23

P(Yi = 0) = φ + (1− φ)N i

0

N i = 0

∑ pi

0(1− pi)

N i −0e

−λi λi

N i /N i!

= φ + (1− φ)e−λi (1− pi)λi[ ]N i /N i!N i = 0

= φ + (1− φ)e−λi e(1− p i )λi

= φ + (1− φ)e−λi p i

483

Hence, we can write 484

1− P(Yi = 0) = (1− φ) 1− e−λi pi( ) ………..(2) 485

Combining equations (1) and (2), it follows that: 486

P(Yi = yi |Yi > 0) =

Ni

yi

Ni = yi

∑ pi

yi (1− pi)N i −yi e

−λi λi

Ni /Ni!

1− e−λi pi

. 487

Result 2: The binary random variable defined by W i = I(Yi >0) has the following distribution: 488

P(W i = 0) = φ + (1− φ)e−λi p i

489

P(Wi =1) = (1−φ) 1−e−λi pi( ). 490

Proof: Follows from equation (2) in the proof of the previous result. 491

Conditional likelihood estimation of (β,θ): 492

To estimate the parameters (β,θ), we use the likelihood using only those sites that have at 493

least one individual observed. This is called the conditional likelihood function (Anderson 1970). 494

The conditional likelihood is given by: CL(β,θ) = P(Yi = y i |Yi > 0)y i >0

∏ where the product is only 495

on those sites where yi > 0 . We maximize this function to obtain the estimates of the parameters 496

(β,θ). The conditional likelihood estimators are known to be consistent (Anderson 1970) as the 497

number of sites that have at least one individual observed increases. 498

Pseudo-likelihood estimation of φ: 499

Page 23 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 25: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

24

To estimate the parameter φ, we consider the likelihood based on the random 500

variablesW i where parameters (β,θ) are fixed at their conditional likelihood estimates ( ˆ β , ˆ θ ). 501

Gong and Samaniego (1981) call such likelihood ‘pseudo-likelihood’. 502

PL(φ;W , ˆ β , ˆ θ ) = (1− φ)(1− e− ˆ λ i ˆ p i ){ }

Wi

φ + (1− φ)e− ˆ λ i ˆ p i{ }i=1

n

∏1−Wi

503

Because the conditional likelihood estimates ( ˆ β , ˆ θ ) are consistent, the pseudo-likelihood 504

estimator of φ obtained by maximizing the pseudo-likelihood is also consistent (Gong and 505

Samaniego 1981). 506

507

SUPPORTING INFORMATION 508

The following Supporting Information is available for this article: 509

Appendix S1 Simulation results 510

511

Page 24 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 26: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

25

512

Table 1. Model selection results for the Ovenbird data set based on the Binomial-ZIP mixture. 513

Model terms not significant (based on Wald test) were backward dropped until only significant 514

terms remained (model 4). Bootstrap based 90% confidence intervals are provided in parentheses 515

for the best fit model 4. 516

Model 1 Model 2 Model 3 Model 4

Abundance

Intercept 0.300 0.211 0.161 0.508 (-0.071, 0.578)

Proportion of forest area 0.820 0.977 0.999 0.825 (0.697, 1.138)

Proportion of deciduous area 0.044 0.005

Proportion of agricultural area -0.058 0.067 0.029

Latitude 0.300 0.282 0.205 0.236 (-0.011, 0.379)

Longitude 0.214 0.201 0.137 0.195 (-0.037, 0.330)

Detection

Intercept -0.607 -0.107 -0.916 -1.719 (-3.275, -0.315)

Observer (DW) -1.037 -0.360 -0.510 -0.453 (-1.889, -0.886)

Observer (RDW) 0.818 0.737 1.813 1.753 (-6.379, 2.770)

Observer (SVW) 1.436 1.415 2.444 2.329 (1.099, 4.045)

Proportion of forest area -1.155 -1.355 -1.444 -1.019 (1.700, 4.428)

Julian day -0.512 -0.542 -0.501 -0.470 (-0.689, -0.357)

Time of day -0.079

φ 0.366 0.389 0.376 0.410 (0.311, 0.451)

P(N = 0) 0.200 0.206 0.218 0.150 (0.136, 0.276)

2.387 2.214 2.142 2.604 (1.858, 2.717)

(1-φ) 1.513 1.352 1.337 1.537 (1.171, 1.595)

0.559 0.641 0.646 0.513 (0.514, 0.720)

AIC 977.7 1146.1 1093.4 860.7

517

Page 25 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 27: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

26

518

Table 2. Naïve (detection probability is 1), multiple visits and single visit N-mixture estimates 519

for the Mallard dataset based on the zero-inflated Poisson distribution for abundances. Naïve 520

estimates are based on maximum counts from the visits to the sites, multiple visits include the 521

three visits, single visit N-mixture model was fitted for each visit separately. Bootstrap based 522

90% confidence limits are in parentheses. 523

Naïve (max counts) Multiple visits Visit 1 Visit 2 Visit 3

Abundance

Intercept -1.264 3.232 1.253 0.736 -0.430

(-2.224, -0.833) (2.909, 3.280) (-0.626, 2.094) (-3.641, 2.910) (-8.126, 1.039)

Route -0.453 -0.043 -0.744 -0.839 -1.134

length (-0.964, 0.164) (-0.084, -0.011) (-1.216, 0.117) (-1.988, 0.572) (-2.099, 0.941)

Elevation -1.163 -0.055 0.595 -0.459 -0.908

(-1.923, -0.625) (-0.136, 0.018) (-2.307, 1.261) (-4.198, 1.620) (-6.883, 1.551)

Forest (%) -0.600 0.007 -0.132 0.465 0.036

(-1.185, -0.262) (-0.050, 0.065) (-0.837, 0.344) (-1.217, 2.283) (-4.481, 1.278)

Detection

Intercept -4.307 -8.437 1.285 13.169

(-4.688, -3.910) (-29.378, -3.321) (-4.181, 35.904) (2.768, 47.476)

Effort 0.326 0.681 0.794 0.939

(-0.589, 0.870) (-0.720, 5.796) (-3.847, 20.326) (-2.756, 12.341)

Date -0.448 -5.069 1.667 -10.650

(-0.538, -0.287) (-20.432, -0.923) (-11.770, 33.068) (-33.749, -4.707)

φ 0.418 0.830 0.729 0.824 0.688

(0.131, 0.619) (0.790, 0.877) (0.520, 0.817) (0.373, 0.879) (0.226, 0.797)

P(N = 0) 0.370 0.000 0.033 0.034 0.141

(0.239, 0.686) (0.000, 0.000) (0.007, 0.223) (0.000, 0.399) (0.053, 0.654)

0.540 25.364 9.468 3.295 1.261

(0.303, 0.833) (18.395, 26.653) (1.102, 82.829) (0.796, 1188.788) (0.397, 13.532)

(1-φ) 0.360 4.318 2.565 0.581 0.394

(0.243, 0.492) (2.785, 4.932) (0.367, 25.047) (0.307, 286.227) (0.242, 4.491)

0.016 0.502 0.672 0.563

(0.012, 0.027) (0.100, 0.624) (0.092, 0.825) (0.323, 0.692)

524

Page 26 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 28: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

27

525

Figure captions 526

Figure 1. Simulation results with a common discrete covariate used both for the 527

abundance (β2) and the detection (θ2) model. Each box and whiskers correspond to 100 528

simulations; horizontal axes give the sample size (n) used for estimation. As n increases, medians 529

(thick black lines) are getting closer to the true parameter values (thick grey lines), and estimates 530

are getting accurate (inter-quartile boxes and range whiskers getting narrower). The low 531

abundance – zero inflated data – low detectability scenario was used. β, θ, and φ are model 532

parameters (see text), is the mean of the predicted rate parameter of the Poisson distribution, 533

is the mean of the detection probabilities. Correlations between true and predicted λ and p values 534

are shown in the lowest row. Right bottom insert represents the count distribution for an example 535

data set out of the 100 simulated ones, black bars are true, grey bars are observed counts. 536

Figure 2. Comparison of single (1x) and multiple (2x, 4x) visits estimates based on 100 537

simulations (settings are the same as for Fig. 1). Sample sizes (visits x number of sites) are 500 538

(1 and 2 visits) and 1000 (1, 2, and 4 visits). Single visit estimation was based on the conditional 539

approach (see text) whereas likelihood based estimation was used for multiple survey analysis. 540

Results show that intercept (β0, θ0) and φ estimates based on multiple visits are biased, and 541

variability of the estimates is greater than for the single visit estimator. 542

Figure 3. Count distribution for the Ovenbird data set (A) and probability plot (B) for the 543

N-mixture model fitted to the data set. Ovenbird abundances are actual counts from 891 544

locations (grey bars), the estimated proportion of zero-inflation (black) and Poisson zeros (white) 545

are shown beside the zero point mass bar, the difference between the observed and predicted zero 546

point mass is due to non-detection zeros. The probability plot shows the values of the empirical 547

and fitted cumulative distribution functions (CDF) based on the Binomial-Poisson (open circles) 548

Page 27 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 29: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

28

and the Binomial-ZIP (filled circles) mixtures. Scattered line represent the line with slope 1; 549

values closer to this line indicate better fit. 550

551

Page 28 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 30: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

Figure 1

273x331mm (72 x 72 DPI)

Page 29 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 31: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

Figure 2

248x321mm (72 x 72 DPI)

Page 30 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 32: For Peer Review - Montana State University · 2020. 8. 18. · 7 Alberta, e-mail: solymos@ualberta.ca 8 2Department of Mathematical and Statistical Sciences, University of Alberta,

For Peer Review

Figure 3

408x204mm (72 x 72 DPI)

Page 31 of 31

John Wiley & Sons

Environmetrics

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960