rmark: an r interface to capture-recapture analysis with mark · 2013. 8. 16. · rmark: an r...

RMark: An R Interface to Capture-Recapture Analysis withMARK

Mark Workshop Notes, Ft. Collins, CO

Bret A. Collier & Jeffrey L. Laake

Institute of Renewable Natural ResourcesTexas A&M University, College Station, TX 77845

Email: [email protected]

&

National Marine Mammal LaboratoryAlaska Fisheries Science Center

7600 Sand Point Way NE, Seattle, WA 98115Email: [email protected]

August 2013

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Where to find help? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

Advantages/Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

First Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

RMark Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Importing and Manipulating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Processing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Design Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Model Formula & Design Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Fitting Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

Plotting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Other stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

INTRODUCTION

The most comprehensive software package for analysis of capture-recapture data is the

program MARK (White and Burnham 1999). While it is unparalleled in the range of mod-

els, quality of the user documentation (http://www.phidot.org/software/mark/docs/

book/), and active base of user-driven support (http://www.phidot.org/forum/index),

the interface for building models can be limiting for large data sets and complex mod-

els. While there is some capability for automatic model creation in MARK, most models

are built manually with a graphical user interface to specify the parameter structures and

design matrices. Manual model creation can be useful during the learning process but

eventually it becomes a time-consuming and sometimes frustrating exercise that may add

an unnecessary source of error in the analysis. Finally, for those that analyze data from

on-going monitoring programs, there is no way to extend the capture-history in MARK,

which necessitaes manual recreation of all models as data from future sampling occasions

is collected.

RMark is a R package that provides a formula based interface for MARK. RMark has been

available since 2005 and is on the Contributed R Archive Network (CRAN) (http://cran.

r-project.org). RMark contains functionality to build models for MARK from formulas,

run the model with MARK, extract the output, and summarize and display the results with

automatic labeling. RMark also has functions for model averaging, prediction, variance

components, and exporting models back to the MARK interface. In addition, all of the tools

in R are available which enable a completely scripted analysis from data to results and

inclusion into a document with Sweave (Leisch 2002) and LATEX to create a reproducible

manuscript such as this one. The report which represents the appropriate citation (effective

2013) for RMARK can be found at http://www.afsc.noaa.gov/Publications/ProcRpt/

PR2013-01.pdf and is included in the workshop notes as well. I have not included the

Here we are going to provide an overview of the RMark package and how it can be used to

benefit MARK users. For more detailed documentation, refer to the online documentation at

http://www.phidot.org/software/mark/rmark/and the help within the RMark package.

And, just to be fair, a signficiant portion of these course notes came from various documents

1

Jeff created while explaining or documenting RMark for teaching purposes and to a lesser

extent from some notes I have put together for students at A&M.

Background

RMark does not fit models to data, rather, RMark is a R package that was designed to

provide a alternative user interface to MARK and its GUI. RMark uses the R language to

construct models, create the input file (.inp), then call MARK which fits the model(s) to

the data, extracts the results from the output file created by MARK, and allows the user to

manipulate (via R or some other program) the resultant model output. Thus, RMark is a R

interface to MARK, not a stand along capture-recapture modeling environment. That said,

if results you got using MARK do not match the results you got when you used RMark, then

you have made a mistake in one or the other.

Where to find help?

Currently, or at least as best we can tell, MARK supports ≥ 140 different modeling options.

At present, RMark does not fully replicate every option available in MARK. Although new

models are added to RMark fairly regularly, not every model in MARK is available in RMark,

and some things you can do in MARK such as data bootstrapping or computing median

c-hat values are not available through the RMark interface. For a list of models available in

RMark, you can use something like system.file("MarkModels.pdf", package="RMark")

which will provide you with a PATH statement telling you where you can access the pdf

file containing the list of MARK models available in RMark, along with the appropriate code,

parameter, and help file names (or, if you have a specific R_LIBS path where you R pack-

ages are installed locally, just go there and look for the RMark and the file MarkModels.pdf

will be found there. First, it is important to remember that RMark needs MARK, so without

an understanding of MARK, you will be limited in your ability to use RMark. So, your first

stop should always be the the "MARKBOOK", authored/edited by Evan Cooch and Gary

White, with contributions from a wide variety of others. The MARKBOOK is freely avail-

able (all 900+ pages of it) at http://www.phidot.org/software/mark/docs/book/. Un-

equivocally, this is the primary desk reference for capture-recapture modeling approaches

2

supported by MARK (although you should never cite it in a manuscript; see MARK FAQ at

http://www.phidot.org/forum/index.php). Details on RMark are found in Appendix

C. Additionally, there is a very active community of ecologists who use MARK regularly

that are willing to provide expertise to folks across a wide variety of capture-recapture

modeling techniques, and a online forum (managed by Evan Cooch) is available at http:

//www.phidot.org/forum/index.php. The user group of the phidot.org forum is typi-

cally extremely helpful, given you have read the MARKBOOK and have searched the archives.

If you are not already a member, sign up. Finally, RMark operates just like any other R

package, if you need the help/reference files for a particular function within RMark, you can

access that function using the “?” followed by the name of the function you are interested

in (e.g., ?mark).

Advantages/Disadvantages

So, why would one want to use RMark as an interface to MARK rather than MARK's GUI?

Reasons abound, some are valid, some are not, lots of it is just individual point of view or

project-specific needs. We think that there are some convincing reasons to use a scripted

approach for your MARK analysis, but in the end it becomes a personal choice (one I think

it is obvious that Jeff and I have already made). A few of the primary reasons we like to

use RMark are (but not limited to):

1. RMark provides the user with the ability to automate analysis of monitoring data sets

even as monitoring occassions are added. This is a significant benefit that RMark

brings to MARK users as script generation of PIM and DM allow you to create the

script once and if as monitoring data are collected, typically no changes to the script

are needed. You just re-run the script with the new datafile.

2. Design matrix creation. RMark uses a formula-based approach, which is faster and

typically less error-prone (although not entirely error prone). Thus, less need to

manually create the PIMS or DM. But, understanding of what the DM should look

like is still necessary.

3. PIM simplification. RMark automatically creates the simplest PIM structure for each

3

model, as opposed to MARK which uses the full DM even when reduced models are

created. This will speed up model evaluation.

4. Collaborative Development: MARK and RMark play well together, so you can move

analyses back and forth fairly cleanly using functions such as export.MARK() and

convert.inp().

5. Entire analyses can be scripted. Although this is related to No.1 above, the scripting

of analyses and the ability to use some of the functionality that comes along with R for

additional computational support, publication quality graphing, among other things

is quite beneficial.

6. Reproducible analysis and documentation. Nearly all MARK analyses are reproducible

so long as one keeps the .inp/.dbf/.fpt files and documents what was done. One

thing that RMark excels at is that documentation support capabilities for R are widely

applicable for MARK analyses. Thus, complete data sets and analysis, with metadata

and detailed documentation, can be developed as R packages or data/code can be

seemlessly integrated into LATEX style manuscripts and documents (although Evan

does a pretty good job with the MARKBOOK). We find it really useful that the entirely

of a dataset and analysis can be documented cleanly in one place (see ?dipper for

an example). Obviously, good data management protocols for reproducible analyses

using only MARK are equally good, so this is more of a personal preference.

Getting Started

Ok, enough of this introductory material, lets get rolling with RMark! First, I am work-

ing under the assumption that you managed to get the latest version of R installed on your

computer (from CRAN: http://cran.r-project.org/) either by downloading the exe-

cutable or compiling from source (if not, holler at me via email: [email protected]). The basic

R environment consists of a command window (Console) or graphical user interface (GUI).

Interacting with R can be accomplised in a variety of ways, most folks tend to use a script ed-

itor to write their programming code and then send the code to R (see list @: http://www.

sciviews.org/_rgui/projects/Editors.html or http://www.sciviews.org/_rgui/).

Assuming most of you are Windows users and want a command line interface, I like Eclipse

4

(http://www.walware.de/goto/statet) although there are many IDE script editors on

their you can choose. If you are on a Mac, the probably textmate (at earlier url), and if

you are on Linux then you should be using ESS- Emacs (http://ess.r-project.org/) or

alternatively Eclipse (http://www.walware.de/goto/statet). Note: when you are using

one of these editors, you can save your code using File–>Save As “filename.R” just as you

would with any other file type. Note the ’.R’ extension, I tend to use .R as a matter of

habit, but you can use ’.txt’ as well. For simplicity, I will use the base R command line

GUI for the short course, but you are welcome to use any editor you choose.

First Steps

Ok, so lets jump in with a quick example. As with most R packages, to access the

functionality in RMark you type library(RMark) and R will respond with its appropriate

version number and relevant information (I have it in .RProfile on my system, so no output

will be show below when I do it). For a quick example, we will use the ubiquitos European

dipper (Cinclus cinclus) capture-recapture data from many examples in the MARKBOOK and

a variety of manuscripts (it is included as a datafile in RMark). For the dipper example, if

we look at the structure of the dataset, we can see that it is a dataframe with 2 fields. The

first field is the encounter history, which has a required column heading name of ’ch’ and

must be a character (chr) variable. The field label ch is required for all MARK analyses, and

typically a field identifying the number of individuals with that specific encounter history

(denoted ’freq’) is included, along with additional fields are all optional. In this example,

the field “sex” specifies group structure (e.g., whether an individual is male or female) and

is identified as a factor variable (Factor) with values 1=Female and 2=Male as ordering is

alphabetic and ignores the ordering of the columns in the dipper.inp file which we can see

using levels(). Finally, we can run a simple CJS analysis using the default of constant

survival and constant recapture probabilities for the dipper data using the simple code

mark(dipper).

5

> library(RMark)> data(dipper)> str(dipper)

'data.frame': 294 obs. of 2 variables:$ ch : chr "0000001" "0000001" "0000001" "0000001" ...$ sex: Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...

> levels(dipper$sex)

[1] "Female" "Male"

> ex=mark(dipper)

Output summary for CJS modelName : Phi(~1)p(~1)

Npar : 2-2lnL: 666.8377AICc : 670.866

Betaestimate se lcl ucl

Phi:(Intercept) 0.2421484 0.1020127 0.0422035 0.4420934p:(Intercept) 2.2262660 0.3251094 1.5890516 2.8634804

Real Parameter Phi

1 2 3 4 5 61 0.560243 0.560243 0.560243 0.560243 0.560243 0.5602432 0.560243 0.560243 0.560243 0.560243 0.5602433 0.560243 0.560243 0.560243 0.5602434 0.560243 0.560243 0.5602435 0.560243 0.5602436 0.560243

Real Parameter p

2 3 4 5 6 71 0.9025835 0.9025835 0.9025835 0.9025835 0.9025835 0.90258352 0.9025835 0.9025835 0.9025835 0.9025835 0.90258353 0.9025835 0.9025835 0.9025835 0.90258354 0.9025835 0.9025835 0.90258355 0.9025835 0.90258356 0.9025835

OK, so, yay, we can use RMark to do capture-recapture analysis, first hurdle lept.

6

RMark Internals

Importing and Manipulating Data

Now that we have RMark up and running (and we know that it works), the first thing

we all want to do it load our data and do some analysis! RMark has several options/ways

for one to create or load data for analysis in MARK. As most are familiar with the file.inp

structure used by MARK, lets start with the approach that converts a encounter history inp

file to a dataframe for use in RMark. For this demonstration, we will use the dipper.inp

file which on my 64bit system is located in- “C:\Program Files (x86)\MARK\Examples”

and the RMark function convert.inp(). Conversion of a .inp file to a dataframe using

convert.inp() requires that that we specify the input file location and name, group and

optional covariate names, and if the .inp file has commented areas (/* and */ in MARK

parlance), that we let RMark know. So you don’t have to go look (or you can look above),

the structure of dipper.inp is pretty straightforward, the encounter history has 7 encounter

occasions, does include the freq column given the number of individuals with each specific

encounter history, and has 2 groups (columns) representing either Male or Female (1 or

0). Because Males are in the first column and females are in the second column, when we

define group.df= that will be the order we use. So, converting the dipper.inp data would

work as follows:

> dipper.convert=convert.inp("C:/Program Files (x86)/MARK/Examples/dipper.inp",

+ group.df= data.frame(sex=c("Male", "Female")))

When we look at the structure of the newly created file dipper.convert, we will see

that it is now a R dataframe with 3 fields. The first field is the capture history (ch) which

is a character values, the second field is the frequency variable (freq) or the number of

individuals with that unique encounter history (a numeric value), and the third field is

the grouping variable sex, which is a factor variable with 2 levels and can be shown using

levels().

7

> str(dipper.convert)

'data.frame': 294 obs. of 3 variables:$ ch : chr "1111110" "1111000" "1100000" "1100000" ...$ freq: num 1 1 1 1 1 1 1 1 1 1 ...$ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...

> levels(dipper.convert$sex)

[1] "Female" "Male"

Once your data is in R as a dataframe, there are some handy options for manipulating

data that you can do using standard R functions. A simple example is to add a nu-

meric column representing some covariate (weight in typically used) to the newly created

dipper.convert dataframe.

> dipper.convert$weight=rnorm(nrow(dipper.convert), mean=11, sd=3)> summary(dipper.convert$weight)

Min. 1st Qu. Median Mean 3rd Qu. Max.2.493 8.900 11.000 10.940 12.830 20.380

Processing Data

Many of you will be familiar with the MARK model specification window as it is where

you identify the dataset you want to use for analysis, choose the model type specific for

your analysis as well as providing details on the various descriptors for your dataset such as

the number of encounter occasions, name and number of groups and individual covariates.

8

RMark (read: Jeff when he wrote it) takes care of some of these specifications such as

number of occasions, group labels and individual covariate names (drawn from the input

file column names) by setting these for you. However, some of the options such as titles,

number of mixtures, time intervals, among others are all argument options for the function

process.data(), which takes the place of the model specification window from MARK.

process.data() does exactly what it sounds like, it processes the specified input data file,

and creates a R list structure that include the original dataframe, all the required attribute

data, and what model the dataset should be analyzed with:

> dipper.proc=process.data(dipper.convert, model="CJS", groups="sex", begin.time=1980)> str(dipper.proc)

List of 15$ data :'data.frame': 294 obs. of 5 variables:..$ ch : chr [1:294] "1111110" "1111000" "1100000" "1100000" .....$ freq : num [1:294] 1 1 1 1 1 1 1 1 1 1 .....$ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 .....$ weight: num [1:294] 13.14 9.25 14.04 13.66 5.87 .....$ group : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...

$ model : chr "CJS"$ mixtures : num 1$ freq :'data.frame': 294 obs. of 2 variables:..$ sexFemale: num [1:294] 0 0 0 0 0 0 0 0 0 0 .....$ sexMale : num [1:294] 1 1 1 1 1 1 1 1 1 1 ...

$ nocc : num 7$ nocc.secondary : NULL$ time.intervals : num [1:6] 1 1 1 1 1 1$ begin.time : num 1980$ age.unit : num 1$ initial.ages : num [1:2] 0 0$ group.covariates:'data.frame': 2 obs. of 1 variable:..$ sex: Factor w/ 2 levels "Female","Male": 1 2

$ nstrata : num 1$ strata.labels : NULL$ counts : NULL$ reverse : logi FALSE

So, we can see above that the processed data now consists of a list with different elements

that include the capture-recapture data as a dataframe, defines what model we are using,

how many encounter occassions there are, what year each data point is collected, and

a host of other information that is either 1) not require for the particular model (e.g.,

strata.label= or 2) is added to the data by RMark (e.g., age.unit). We can look at

specific values of the processed dataset, for instance, look at the first 10 records of the

dataset, or determine how many encounter occasions we have:

9

> dipper.proc$data[1:10,]

ch freq sex weight group1:1 1111110 1 Male 13.138956 21:3 1111000 1 Male 9.249987 21:6 1100000 1 Male 14.041952 21:7 1100000 1 Male 13.662031 21:8 1100000 1 Male 5.874328 21:9 1100000 1 Male 8.460792 21:12 1010000 1 Male 11.035386 21:14 1000000 1 Male 11.202669 21:15 1000000 1 Male 4.326237 21:16 1000000 1 Male 8.564734 2

> dipper.proc$nocc

[1] 7

In general, once the data has been processed, not much else needs to be done with it.

The primary exceptions would be if there are changes to the original dataframe, such as

addition of new individuals or a new encounter history, or perhaps strata are added or

grouped differently.

Design Data

Design data is likely an unfamiliar concept for users of MARK. However, design data

underlies how data are to be associated with the various parameters that are estimated

by MARK thus understanding the intricacies of how design data are created by RMark, how

they equate to values in MARK, and how to create or manipulation design data is one of the

most important aspects of using RMark. Thus, we are going to spend a good bit of time

focused on detailing design data and how it transfers to MARK in the form of a PIM so the

link between will be clearer.

Underlying design data creating in RMark are several R functions, with the primary

being make.design.data, but other important ones that will regularly be used being

add.design.data and merge_design.covariates. In addition, basic R data manipulation

methods can be used to add, remove, create or manipulate design data in RMark. We will

get to all of this shortly, but first we need to outline what design data is and how it links

to parameters in MARK.

First, lets turn the RMark processed data frame dipper.proc into design data using

make.design.data. The results from make.design.data creates a R list with list headings

being the specific parameters (Phi, p, Psi, etc.) used in MARK by the model chosen in

10

process.data. Note that we have adopted Jeff’s naming nomenclature here and named

the new object dipper.ddl, where the .ddl stands for design data list as the object is design

data, and is a list. For our dipper example, the top-level headings of the resultant data list

are shown using names and as expected are Phi and p (the 2 parameters used by a CJS

model in MARK).

> dipper.ddl=make.design.data(dipper.proc)> names(dipper.ddl)

[1] "Phi" "p" "pimtypes"

However, there is another list heading, pimtypes, that is likely unexpected by everyone.

pimtypes, defines what type of PIM is to be used as the base, either all (all PIM values

different), time (values in each PIM are specific to columns), or constant (all PIM values

the same). For instance, when the default of ’all’ is used in make.design.data, then the

PIM for the parameters of Phi and p are all different. So, using a little code trick so that

we can see the PIMS without actually running the MARK, we can use the PIMS function to

see what the all different PIMS look like for our dipper example:

> PIMS(mark(dipper.proc, dipper.ddl, invisible=FALSE, run=FALSE), "Phi", simplified=F)

group = sexFemale1980 1981 1982 1983 1984 1985

1980 1 2 3 4 5 61981 7 8 9 10 111982 12 13 14 151983 16 17 181984 19 201985 21group = sexMale

1980 1981 1982 1983 1984 19851980 22 23 24 25 26 271981 28 29 30 31 321982 33 34 35 361983 37 38 391984 40 411985 42

Thus, we can see that for this case, we have the ’all different’ PIMS structure, defined

by the grouping (factor) variable sex, for the parameter of interest, Phi (although the

same could be done for the p parameter as well). Note that the row and column labels are

labeled based on the date we provided in process.data) and the rows represent the cohort

which is the time of year they were first captured while the columns are the initial date

for the beginning of the survival period. Thus, parameter 14 for survival for the period

1984-1985 of female dippers captured for the first time in 1982.

11

When using make.design.data on a processed data frame, RMark, for those param-

eters with the default of triangular PIMS, by default creates design data representing

time, age, and cohort, in addition to any grouping (factor) variables that were defined in

process.data. So, sticking with our example PIM from the dipper dataset, lets look at

the structure of the Phi portion of the R list dipper.ddl using:

> str(dipper.ddl$Phi)

'data.frame': 42 obs. of 11 variables:$ par.index : int 1 2 3 4 5 6 7 8 9 10 ...$ model.index: num 1 2 3 4 5 6 7 8 9 10 ...$ group : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...$ cohort : Factor w/ 6 levels "1980","1981",..: 1 1 1 1 1 1 2 2 2 2 ...$ age : Factor w/ 6 levels "0","1","2","3",..: 1 2 3 4 5 6 1 2 3 4 ...$ time : Factor w/ 6 levels "1980","1981",..: 1 2 3 4 5 6 2 3 4 5 ...$ occ.cohort : num 1 1 1 1 1 1 2 2 2 2 ...$ Cohort : num 0 0 0 0 0 0 1 1 1 1 ...$ Age : num 0 1 2 3 4 5 0 1 2 3 ...$ Time : num 0 1 2 3 4 5 1 2 3 4 ...$ sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...

First, you can see that the portion we extracted using dipper.ddl$Phi is a data frame

having 11 variables with 42 observations. For CJS, the dataframe for Phi contains 42

records: 21 for females and 21 for males. There are five factor variables, group, sex, cohort,

age and time and there are 5 numeric variables Cohort, Time and Age which are continuous

versions of the respective factor variables with the lower case letter, model.index which is a

unique index across all parameters as is par.index which is an integer, and occ.cohort which

is a continuous numeric value equivalent to cohort with an start value of 1. The group

variable is a composite of the values of all factor variables used to define the groups (sex in

this example) but when there is a single factor variable as in this case it is redundant. So,

lets have a look at the design data for Phi (note that I removed the values formodel.index

and the variable sex which I used as a check to make sure everything lined up but is not

needed as groups represents the same value in this example). Looking at the design data

for the female group of the dipper data, we can see that the par.index values match up

exactly with the PIM values for the fully time-dependent model we looked at earlier (if we

dig down into the MARK output a little bit), providing a simple link between each showing

us the relationship between the PIM and the design data for the first group (female) that

we specified in our process.data call.

12

> dipper.ddl$Phi[(1:21),-c(2, 11)]

par.index group cohort age time occ.cohort Cohort Age Time1 1 Female 1980 0 1980 1 0 0 02 2 Female 1980 1 1981 1 0 1 13 3 Female 1980 2 1982 1 0 2 24 4 Female 1980 3 1983 1 0 3 35 5 Female 1980 4 1984 1 0 4 46 6 Female 1980 5 1985 1 0 5 57 7 Female 1981 0 1981 2 1 0 18 8 Female 1981 1 1982 2 1 1 29 9 Female 1981 2 1983 2 1 2 310 10 Female 1981 3 1984 2 1 3 411 11 Female 1981 4 1985 2 1 4 512 12 Female 1982 0 1982 3 2 0 213 13 Female 1982 1 1983 3 2 1 314 14 Female 1982 2 1984 3 2 2 415 15 Female 1982 3 1985 3 2 3 516 16 Female 1983 0 1983 4 3 0 317 17 Female 1983 1 1984 4 3 1 418 18 Female 1983 2 1985 4 3 2 519 19 Female 1984 0 1984 5 4 0 420 20 Female 1984 1 1985 5 4 1 521 21 Female 1985 0 1985 6 5 0 5

> out=mark(dipper.proc, dipper.ddl, invisible=FALSE, output=FALSE)> out$pims$Phi[[1]]

$pim[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 1 2 3 4 5 6[2,] 0 7 8 9 10 11[3,] 0 0 12 13 14 15[4,] 0 0 0 16 17 18[5,] 0 0 0 0 19 20[6,] 0 0 0 0 0 21

$group[1] 1

Thus, cohort represents the row, time the columns, age is along the diagonal, and group

is specified by the collection of parameters in a PIM. Based on that knowledge, we can

easily relate the design data to the PIM values,. For example, knowing that par.index

indexes the values in the PIM, consider the design data columns ’cohort’ and ’Cohort’, for

cohort, the first 6 rows of the design data represent the 1980 release cohort, or all those

individuals who were released in 1980 and recaptured in subsequent capture occasions.

Cohort (capital C), provides the same information via an numeric, as opposed to factor,

representation. Thus, for the release cohort of 1985, because recapture could only occur

in 1996, we can only estimate the survival parameter for the period 1985-1986, so there

is a single value for cohort and Cohort, both of which relate back to the single value for

13

par.index (21). For those that are interested, the above was just shown as an example,

you can use the PIMS function to look at the output in a prettier format (which is preferable

to some) that has the appropriate row and column labels and are labeled by group as we

saw earlier.

> PIMS(mark(dipper.proc, dipper.ddl, invisible=FALSE, run=FALSE), "Phi", simplified=F)

group = sexFemale1980 1981 1982 1983 1984 1985

1980 1 2 3 4 5 61981 7 8 9 10 111982 12 13 14 151983 16 17 181984 19 201985 21group = sexMale

1980 1981 1982 1983 1984 19851980 22 23 24 25 26 271981 28 29 30 31 321982 33 34 35 361983 37 38 391984 40 411985 42

Obviously, if we wanted to see the PIM structure for the recapture parameter p, then it

is equally simple. Just to relate it to the above, you can se that the recapture parameters

are +1 sampling occasion from the survival parameters, exactly the same as they should

be. Important note, the time and age parameters for the interval parameters (Phi) are

labeled based on the time or age value at the beginning of the interval, while the occasion

parameters (p) are labeled by the occasion time or age. So, there will be a little bit of

difference in the design data between Phi and p in our example.

> PIMS(mark(dipper.proc, dipper.ddl, invisible=FALSE, run=FALSE), "p", simplified=F)

group = sexFemale1981 1982 1983 1984 1985 1986

1980 43 44 45 46 47 481981 49 50 51 52 531982 54 55 56 571983 58 59 601984 61 621985 63group = sexMale

1981 1982 1983 1984 1985 19861980 64 65 66 67 68 691981 70 71 72 73 741982 75 76 77 781983 79 80 811984 82 831985 84

14

Given the default structure of the design data, one can specify a significant number of

explanatory models. However, we are not restricted to just using the available structure

of the default design data for model development and analysis. RMark via the R interface

allows the user to add any number of fields to the design data, such as bins based on age,

time or cohort, that the user can use to constrain parameters to be the same within a

specific bin. New variables can be defined, and new data can be merged into the design

data easily and efficiently. There is significantly more detail on manipulating design data

in Appendix C of the MARKBOOK, but for brevity we will just show a couple of quick

examples here. For instance, lets use the ubiquitous dipper example of adding flood years

to the design data for the 2 periods (1981- 1982 and 1982-1983 intervals) for the apparent

survival parameter Phi.

> dipper.ddl$Phi$Flood=0> dipper.ddl$Phi$Flood[dipper.ddl$Phi$time==1981 | dipper.ddl$Phi$time==1982]=1> dipper.ddl$Phi[(1:21),-c(2, 11)]

par.index group cohort age time occ.cohort Cohort Age Time Flood1 1 Female 1980 0 1980 1 0 0 0 02 2 Female 1980 1 1981 1 0 1 1 13 3 Female 1980 2 1982 1 0 2 2 14 4 Female 1980 3 1983 1 0 3 3 05 5 Female 1980 4 1984 1 0 4 4 06 6 Female 1980 5 1985 1 0 5 5 07 7 Female 1981 0 1981 2 1 0 1 18 8 Female 1981 1 1982 2 1 1 2 19 9 Female 1981 2 1983 2 1 2 3 010 10 Female 1981 3 1984 2 1 3 4 011 11 Female 1981 4 1985 2 1 4 5 012 12 Female 1982 0 1982 3 2 0 2 113 13 Female 1982 1 1983 3 2 1 3 014 14 Female 1982 2 1984 3 2 2 4 015 15 Female 1982 3 1985 3 2 3 5 016 16 Female 1983 0 1983 4 3 0 3 017 17 Female 1983 1 1984 4 3 1 4 018 18 Female 1983 2 1985 4 3 2 5 019 19 Female 1984 0 1984 5 4 0 4 020 20 Female 1984 1 1985 5 4 1 5 021 21 Female 1985 0 1985 6 5 0 5 0

In addition, we can also check what the PIM looks like for the model that uses our

newly created Flood parameter by defining a new formula (more on that soon) and looking

at the PIMS:

15

> PIMS(mark(dipper.proc, dipper.ddl, model.parameters=list(p=list(formula=~time),+ Phi=list(formula=~Flood)), invisible=FALSE, run=FALSE), "Phi", simplified=TRUE)

group = sexFemale1980 1981 1982 1983 1984 1985

1980 1 2 2 1 1 11981 2 2 1 1 11982 2 1 1 11983 1 1 11984 1 11985 1group = sexMale

1980 1981 1982 1983 1984 19851980 1 2 2 1 1 11981 2 2 1 1 11982 2 1 1 11983 1 1 11984 1 11985 1

We can also use some handy RMark functions to manipulate and create new design data.

For instance, lets say that we want to create age intervals for survival (young, sub-adult,

adult), we can use add.design.data and create the new design data. Also, note the use

of right=FALSE and replace=TRUE, which are used to define what is included in, or

excluded from, the interval. And, just for consistency we wil output the PIM structure as

well.

16

> dipper.ddl=add.design.data(dipper.proc, dipper.ddl, parameter="Phi",+ type="age", bins=c(0,1,3,6), right=FALSE, replace=TRUE, name="newages")> dipper.ddl$Phi[(1:21), -c(2, 11)]

par.index group cohort age time occ.cohort Cohort Age Time Flood newages1 1 Female 1980 0 1980 1 0 0 0 0 [0,1)2 2 Female 1980 1 1981 1 0 1 1 1 [1,3)3 3 Female 1980 2 1982 1 0 2 2 1 [1,3)4 4 Female 1980 3 1983 1 0 3 3 0 [3,6]5 5 Female 1980 4 1984 1 0 4 4 0 [3,6]6 6 Female 1980 5 1985 1 0 5 5 0 [3,6]7 7 Female 1981 0 1981 2 1 0 1 1 [0,1)8 8 Female 1981 1 1982 2 1 1 2 1 [1,3)9 9 Female 1981 2 1983 2 1 2 3 0 [1,3)10 10 Female 1981 3 1984 2 1 3 4 0 [3,6]11 11 Female 1981 4 1985 2 1 4 5 0 [3,6]12 12 Female 1982 0 1982 3 2 0 2 1 [0,1)13 13 Female 1982 1 1983 3 2 1 3 0 [1,3)14 14 Female 1982 2 1984 3 2 2 4 0 [1,3)15 15 Female 1982 3 1985 3 2 3 5 0 [3,6]16 16 Female 1983 0 1983 4 3 0 3 0 [0,1)17 17 Female 1983 1 1984 4 3 1 4 0 [1,3)18 18 Female 1983 2 1985 4 3 2 5 0 [1,3)19 19 Female 1984 0 1984 5 4 0 4 0 [0,1)20 20 Female 1984 1 1985 5 4 1 5 0 [1,3)21 21 Female 1985 0 1985 6 5 0 5 0 [0,1)

> PIMS(mark(dipper.proc, dipper.ddl, model.parameters=list(p=list(formula=~time),+ Phi=list(formula=~newages)), invisible=FALSE, run=FALSE), "Phi", simplified=TRUE)

group = sexFemale1980 1981 1982 1983 1984 1985

1980 1 2 2 3 3 31981 1 2 2 3 31982 1 2 2 31983 1 2 21984 1 21985 1group = sexMale

1980 1981 1982 1983 1984 19851980 1 2 2 3 3 31981 1 2 2 3 31982 1 2 2 31983 1 2 21984 1 21985 1

As an alternative option, we can use merge_design.covariates with the .ddl and a

user-defined dataframe to add data to the .ddl for temporal covariates or fro a group-specific

variable (you can see ?merge_design.covariates) for additional details.

17

> df=data.frame(time=c(1980:1986), covar=c(4,5,6,7,8,9,10))> dipper.ddl$Phi=merge_design.covariates(dipper.ddl$Phi, df, bytime=TRUE)> dipper.ddl$Phi[(1:21),-c(1:2, 11)]

model.index group cohort age occ.cohort Cohort Age Time Flood newages covar1 1 Female 1980 0 1 0 0 0 0 [0,1) 42 2 Female 1980 1 1 0 1 1 1 [1,3) 53 3 Female 1980 2 1 0 2 2 1 [1,3) 64 4 Female 1980 3 1 0 3 3 0 [3,6] 75 5 Female 1980 4 1 0 4 4 0 [3,6] 86 6 Female 1980 5 1 0 5 5 0 [3,6] 97 7 Female 1981 0 2 1 0 1 1 [0,1) 58 8 Female 1981 1 2 1 1 2 1 [1,3) 69 9 Female 1981 2 2 1 2 3 0 [1,3) 710 10 Female 1981 3 2 1 3 4 0 [3,6] 811 11 Female 1981 4 2 1 4 5 0 [3,6] 912 12 Female 1982 0 3 2 0 2 1 [0,1) 613 13 Female 1982 1 3 2 1 3 0 [1,3) 714 14 Female 1982 2 3 2 2 4 0 [1,3) 815 15 Female 1982 3 3 2 3 5 0 [3,6] 916 16 Female 1983 0 4 3 0 3 0 [0,1) 717 17 Female 1983 1 4 3 1 4 0 [1,3) 818 18 Female 1983 2 4 3 2 5 0 [1,3) 919 19 Female 1984 0 5 4 0 4 0 [0,1) 820 20 Female 1984 1 5 4 1 5 0 [1,3) 921 21 Female 1985 0 6 5 0 5 0 [0,1) 9

18

Model Formula and Design Matrix

RMark relies on a formula based- approach to development of the design matrix structure

that is passed to MARK. For those of you that use R or most any other command line interface

for inference this will make sense, but for the strict MARK user this may be a new experience.

Depending on your point of view, the formula-driven interface for creating model structure

and data in RMark is either the best thing since sliced bread, or a bastardisation of all

things MARK (you can see regular snarky comments to the second point by some faceless

person who’s user name rhymes with ’hooch’ that likes to troll around on http://www.

phidot.org/forum/index.php).

For what its worth, the approach of manual model creation used by MARK is a fantastic

learning tool, but can be frustrating and extremely time-consuming as model complexity

increases, or as additional data are gathered. Both Jeff and I are obvious proponents of

starting with the basics when learning about how model structures are designed, but Jeff

developed RMark such that 1) he could have a formula based interface to model creation,

2) to simplify his own work as many of his projects were monitoring focused, thus new

data was added regularly, requiring him to re-develop the MARK structure each year, and 3)

to support making his work more reliable/reproducible. Obviously RMark does not negate

errors, I have surely made them when using it, but it does simplify some aspects of model

development.

Most of what we have seen so far has been showing the relationship between the design

data list and the PIM structure. However, the underlying approach that RMark uses is to

create a design matrix that is then passed to MARK for evaluation. The crux of this is the

R function model.matrix. As an example, lets consider our dipper data and assume we

want a model that has time-varying survival, if we had read the MARKBOOK (Chapter

6 to be exact) we would know that you can specify an effect of time using diagonal values

of 1 in the design matrix. Not surprisingly, we can do this in RMark as well (first 21 rows

shown here for the first time, otherwise these might get a bit tedious page wise):

19

> dm=model.matrix(~time, dipper.ddl$Phi)> head(dm, 21)

(Intercept) time1981 time1982 time1983 time1984 time19851 1 0 0 0 0 02 1 1 0 0 0 03 1 0 1 0 0 04 1 0 0 1 0 05 1 0 0 0 1 06 1 0 0 0 0 17 1 1 0 0 0 08 1 0 1 0 0 09 1 0 0 1 0 010 1 0 0 0 1 011 1 0 0 0 0 112 1 0 1 0 0 013 1 0 0 1 0 014 1 0 0 0 1 015 1 0 0 0 0 116 1 0 0 1 0 017 1 0 0 0 1 018 1 0 0 0 0 119 1 0 0 0 1 020 1 0 0 0 0 121 1 0 0 0 0 1

We can use any formula to create the design matrix from the design data fields, assuming

they are correct. For instance, we can create and show the design matrix for a model having

effects of sex and Flood,

> dm=model.matrix(~sex + Flood, dipper.ddl$Phi)> head(dm, 6)

(Intercept) sexMale Flood1 1 0 02 1 0 13 1 0 14 1 0 05 1 0 06 1 0 0

or perhaps models with a sex*Flood effect,

> dm=model.matrix(~sex*Flood, dipper.ddl$Phi)> head(dm, 6)

(Intercept) sexMale Flood sexMale:Flood1 1 0 0 02 1 0 1 03 1 0 1 04 1 0 0 05 1 0 0 06 1 0 0 0

So, what is happening here? Well, RMark is taking the formulas ~time or ~sex + Flood

and creates the design matrix for each parameter we define a model structure for using

20

model.matrix and then pastes the resultant matrices together to create a single design

matrix that works for MARK. Which, for our simple example would look like this,

Phi design matrix 0

0 p design matrix

Formula in RMark are not limited to only those fields in the design data, because if you

noticed, individual covariate data is not included in the design data list (which you can see

using str(dipper.ddl)). Rather, individual covariate data are housed in the processed

data dataframe. Because MARK uses the names of the individual covariates in the design

matrix, which model.matrix cannot handle, RMark inserts dummy data into the design

data for any individual covariates used in the formula, then calls model.matrix and inserts

the covariate names as needed to construct the full design matrix for MARK.

Fitting Models

Now that we have a processed data frame and the appropriate design data list set

up, we can proceed with specifying the model we want to run by defining the parameter

specifications we want to use and then call MARK to fit the model. These next steps tend to

have a workflow in concert with each other, so they will be detailed together here, however

that does not mean tha they cannot be separated. The workflow pattern is simple, create

the parameter specification lists and insert those into the mark call with whatever optional

arguments are need at both steps (detailed in ?mark and hence not discussed here) and

poof, MARK and RMark make some magic.

Parameter specifications are pretty straightforward for folks used to CLI approaches to

modeling. In a nutshell, the parameter specification is where you define what variable you

want to affect what model parameter. For instance (and perhaps easiest), we can create an

object for each parameter specification we want to use, giving each object a specific (unique)

name (although it is fine to use the same specification in multiple models), following:

> Phi.1=list(formula=~time)> Phi.2=list(formula=~Time)> Phi.3=list(formula=~sex*Flood)> p.1=list(formula=~1)> p.2=list(formula=~time)

21

The values used in the parameter specification are similar to those used in R in some

cases, such as for a constant model (intercept only, so all Phi are equal) is ~1, while ~time

is equivalent to the t model and ~Time is the linear trend over time in MARK. Then, once

your parameter specifications are set up, you can fit MARK models by including one of the

object names in the model.parameters argument of mark. For instance, and there are

several ways to do this, but to keep it simple first off we wil run a model using Phi.1

and p.1 parameter specifications defined above. So, what do we see? First, we can see

that the output summary tells us that we used a CJS model, which is what we specified

in process.data() so that’s good. Also, we get the name of the model, the number of

parameters (NPar), model fit criterion, and the model’s resultant beta and real estimates.

Couple of things to notice: the beta parameters are labeled specifically to the time period

of interest , and the real parameter estimates are inserted into a diagonal matrix, labeled by

the grouping variable (sex), and exported as part of the RMark output. Now, if you wanted

to just see the MARK output file (the marknnn.out one would typically see by retrieving the

model results in MARK, you can just type the model name (model.1) into R and it will open

the .txt file in whatever editor you have selected.

22

> model.1=mark(dipper.proc, dipper.ddl, model.parameters=list(Phi=Phi.1, p=p.1))

Output summary for CJS modelName : Phi(~time)p(~1)

Npar : 7-2lnL: 659.7301AICc : 673.998


Phi:(Intercept) 0.5143907 0.4767803 -0.4200988 1.4488802Phi:time1981 -0.6981405 0.5537199 -1.7834315 0.3871506Phi:time1982 -0.6009358 0.5300996 -1.6399310 0.4380593Phi:time1983 -0.0061058 0.5334610 -1.0516895 1.0394778Phi:time1984 -0.0757114 0.5276503 -1.1099061 0.9584832Phi:time1985 -0.1780631 0.5265653 -1.2101310 0.8540049p:(Intercept) 2.2203956 0.3288850 1.5757810 2.8650102

Real Parameter PhiGroup:sexFemale

1980 1981 1982 1983 1984 19851980 0.6258352 0.4541914 0.4783772 0.6244043 0.6079443 0.58329821981 0.4541914 0.4783772 0.6244043 0.6079443 0.58329821982 0.4783772 0.6244043 0.6079443 0.58329821983 0.6244043 0.6079443 0.58329821984 0.6079443 0.58329821985 0.5832982

Group:sexMale1980 1981 1982 1983 1984 1985

1980 0.6258352 0.4541914 0.4783772 0.6244043 0.6079443 0.58329821981 0.4541914 0.4783772 0.6244043 0.6079443 0.58329821982 0.4783772 0.6244043 0.6079443 0.58329821983 0.6244043 0.6079443 0.58329821984 0.6079443 0.58329821985 0.5832982

Real Parameter pGroup:sexFemale

1981 1982 1983 1984 1985 19861980 0.9020661 0.9020661 0.9020661 0.9020661 0.9020661 0.90206611981 0.9020661 0.9020661 0.9020661 0.9020661 0.90206611982 0.9020661 0.9020661 0.9020661 0.90206611983 0.9020661 0.9020661 0.90206611984 0.9020661 0.90206611985 0.9020661

Group:sexMale1981 1982 1983 1984 1985 1986

1980 0.9020661 0.9020661 0.9020661 0.9020661 0.9020661 0.90206611981 0.9020661 0.9020661 0.9020661 0.9020661 0.90206611982 0.9020661 0.9020661 0.9020661 0.90206611983 0.9020661 0.9020661 0.90206611984 0.9020661 0.90206611985 0.9020661

23

One of the real benefits of RMark is the ability to script analyses, especially for those folks

who are involved in continual, mark-recapture based monitoring programs. Jeff developed

RMark specifically because it simplified how he could approach re-analysis of monitoring

data without having to re-create the MARK analysis each year as one more column of data

was added. The ability to easily script analyses of monitoring data is one of the best

reasons to learn enough RMark. As an example, lets assume, using our dipper dataset that

we have a set of models which we run after each years data collection.

> dipper.monitoring=function()+ {+ Phi.1=list(formula=~1)+ p.1=list(formula=~1)+ cml=create.model.list("CJS")+ mark.wrapper(cml, data=dipper.proc, ddl=dipper.ddl, output=FALSE)+ }

> dipper.res$model.table

Phi p model npar AICc DeltaAICc weight Deviance

1 ~1 ~1 Phi(~1)p(~1) 2 670.866 0 1 84.36055

O, so lets add a year’s worth of information typical of a monitoring program (for sim-

plicity we will just assume none of the critters were captured during this new encounter

occasion and that we are only using a constant models for Phi and p) and then re-initiate

the modeling process. First, lets load the original data and add an additional encounter

history to that dataset and have a look at both.

24

> dipper.nextyr=convert.inp("C:/Program Files (x86)/MARK/Examples/dipper.inp",+ group.df= data.frame(sex=c("Male", "Female")))> head(dipper.nextyr)

ch freq sex1:1 1111110 1 Male1:3 1111000 1 Male1:6 1100000 1 Male1:7 1100000 1 Male1:8 1100000 1 Male1:9 1100000 1 Male

> nextyr=rep(0, nrow(dipper.nextyr))> dipper.nextyr$ch=paste(dipper.nextyr$ch, nextyr, sep="")> head(dipper.nextyr)

ch freq sex1:1 11111100 1 Male1:3 11110000 1 Male1:6 11000000 1 Male1:7 11000000 1 Male1:8 11000000 1 Male1:9 11000000 1 Male

Next, lets process the new capture history data and re-run the analysis in full.

> dipper.next=process.data(dipper.nextyr, model="CJS", groups="sex", begin.time=1980)> dipper.ddlnext=make.design.data(dipper.next)> dipper.nextmonitoring=function()+ {+ Phi.1=list(formula=~1)+ p.1=list(formula=~1)+ cmlnext=create.model.list("CJS")+ mark.wrapper(cmlnext, data=dipper.next, ddl=dipper.ddlnext, output=FALSE)+ }

> dipper.resnext$model.table

Phi p model npar AICc DeltaAICc weight Deviance1 ~1 ~1 Phi(~1)p(~1) 2 789.6593 0 1 203.159

25

And quickly look at the resultant real estimates for each model.

> dipper.res[[1]]$results$real

estimate se lcl ucl fixedPhi gFemale c1980 c1 a0 t1980 0.5602430 0.0251330 0.5105493 0.6087577p gFemale c1980 c1 a1 t1981 0.9025835 0.0285857 0.8304826 0.9460113

notePhi gFemale c1980 c1 a0 t1980p gFemale c1980 c1 a1 t1981

> dipper.resnext[[1]]$results$real

estimate se lcl ucl fixedPhi gFemale c1980 c1 a0 t1980 0.4759493 0.0244834 0.4283289 0.5240111p gFemale c1980 c1 a1 t1981 0.8722790 0.0360719 0.7835798 0.9279669

notePhi gFemale c1980 c1 a0 t1980p gFemale c1980 c1 a1 t1981

As you can see, this is one of the primary strengths of RMark for MARK style analyses, the

simplicity of using a script to re-evaluate updated datasets. Additionally, this is the first

place we have actually extracted the information that folks typically want from an analysis,

the real estimates. So, to see where stuff is in a MARK object, look at the structure of the

below list for model.extract below. What you can see is that the MARK model object is a

list with 16 objects, which include the mode fit criterion (lnl=log likelihood, npar=number

of parameters, etc.), as well as the beta and real estimates and their associated measure

of precision (SE, CL). In addition, there are a few values (e.g., derived, covariate.values)

that are NULL in this example, primarily because they are not used with model=”CJS”.

Looking at the structure of a mark() object is one of the quickest ways to identify the

parameters from the model.

26

> model.extract=mark(dipper.proc, dipper.ddl, model.parameters=list(p=list(formula=~time),+ Phi=list(formula=~time)), invisible=FALSE, output=FALSE)

Note: only 11 parameters counted of 12 specified parametersAICc and parameter count have been adjusted upward

> str(model.extract$results)

List of 16$ lnl : num 657$ deviance : num 74.5$ deviance.df : num 30$ npar : int 12$ npar.unadjusted : num 11$ n : int 426$ AICc : num 682$ AICc.unadjusted : num 680$ beta :'data.frame': 12 obs. of 4 variables:..$ estimate: num [1:12] 0.935 -1.198 -1.023 -0.42 -0.536 .....$ se : num [1:12] 0.769 0.871 0.805 0.809 0.803 .....$ lcl : num [1:12] -0.571 -2.905 -2.6 -2.006 -2.11 .....$ ucl : num [1:12] 2.442 0.508 0.555 1.166 1.038 ...

$ real :'data.frame': 12 obs. of 6 variables:..$ estimate: num [1:12] 0.718 0.435 0.478 0.626 0.599 .....$ se : num [1:12] 0.1555 0.0688 0.0597 0.0593 0.0561 .....$ lcl : num [1:12] 0.361 0.308 0.364 0.505 0.486 .....$ ucl : num [1:12] 0.92 0.571 0.594 0.733 0.702 .....$ fixed : Factor w/ 1 level " ": 1 1 1 1 1 1 1 1 1 1 .....$ note : Factor w/ 1 level " ": 1 1 1 1 1 1 1 1 1 1 ...

$ beta.vcv : num [1:12, 1:12] 0.591 -0.635 -0.591 -0.591 -0.591 ...$ derived :'data.frame': 0 obs. of 0 variables$ derived.vcv : NULL$ covariate.values: NULL$ singular : int 6$ real.vcv : NULL

So, if we want to see the real estimates for this model, simple enough to do:

> model.extract$results$real[-c(5:6)]

estimate se lcl uclPhi gFemale c1980 c1 a0 t1980 0.7181814 0.1555470 3.610406e-01 0.9199573Phi gFemale c1980 c1 a1 t1981 0.4346708 0.0688290 3.075048e-01 0.5710588Phi gFemale c1980 c1 a2 t1982 0.4781705 0.0597091 3.643839e-01 0.5942685Phi gFemale c1980 c1 a3 t1983 0.6261177 0.0592656 5.048461e-01 0.7333741Phi gFemale c1980 c1 a4 t1984 0.5985334 0.0560517 4.855434e-01 0.7019412Phi gFemale c1980 c1 a5 t1985 0.7655931 71.1786310 3.220256e-304 1.0000000p gFemale c1980 c1 a1 t1981 0.6962022 0.1657638 3.302966e-01 0.9141505p gFemale c1980 c1 a2 t1982 0.9230770 0.0728777 6.161498e-01 0.9889758p gFemale c1980 c1 a3 t1983 0.9130436 0.0581757 7.140651e-01 0.9778505p gFemale c1980 c1 a4 t1984 0.9007892 0.0538330 7.360175e-01 0.9672855p gFemale c1980 c1 a5 t1985 0.9324138 0.0458025 7.684927e-01 0.9828579p gFemale c1980 c1 a6 t1986 0.6930735 64.4363440 3.231497e-258 1.0000000

27

> model.extract$results$beta[-c(5:6)]

estimate se lcl uclPhi:(Intercept) 0.9354584 0.7685249 -0.5708503 2.4417672Phi:time1981 -1.1982777 0.8706725 -2.9047958 0.5082404Phi:time1982 -1.0228321 0.8049170 -2.6004694 0.5548053Phi:time1983 -0.4198612 0.8091512 -2.0057975 1.1660751Phi:time1984 -0.5361003 0.8031457 -2.1102660 1.0380653Phi:time1985 0.2481340 396.6238100 -777.1345600 777.6308300p:(Intercept) 0.8292778 0.7837356 -0.7068440 2.3653996p:time1982 1.6556297 1.2913792 -0.8754736 4.1867330p:time1983 1.5220984 1.0729143 -0.5808136 3.6250104p:time1984 1.3767462 0.9884820 -0.5606785 3.3141708p:time1985 1.7950952 1.0688767 -0.2999032 3.8900937p:time1986 -0.0147500 302.9104900 -593.7193300 593.6898300

> model.extract$results$beta$estimate[1]

[1] 0.9354584

Just so it makes sense on how the beta and real estimates are linked, we can estimate the

apparent survival probability for the 1981 period of 0.4346708 by extracting out the appro-

priate beta estimates and using the logistic distribution function plogis() to transform

the beta’s back to the real scale.

> round(plogis(model.extract$results$beta$estimate[1] ++ model.extract$results$beta$estimate[2]*1), 7)

[1] 0.4346708

Plotting Results

One of the benefits of RMark, probably nearly as useful as model scripting, is that once

we have run a set of models, we can use all the tools available in R to further manipulate

the results from the fitted model(s). There are quite a few ways you can manipulate the

RMark output in R, but the one of primary interest will likely be how to build plots from

your data. Sticking with the dipper example, I am going to recreate from scratch using the

dipper data hosed in RMark a sequence of models and use covariate.predictions() to

build what has become the standard example plot for showing how RMark will work. So,

using the standard example:

28

> data(dipper)> dipper$weight=rnorm(nrow(dipper), mean=10, sd=2)> dipper.proc=process.data(dipper, model="CJS", groups="sex", begin.time=1980)> dipper.ddl=make.design.data(dipper.proc)> dipper.ddl$Phi$flood=ifelse(dipper.ddl$Phi$time%in%1982:1983,1,0)> dipper.analysis=function()+ {+ Phi.1=list(formula=~time)+ Phi.2=list(formula=~-1+time,link="sin")+ Phi.3=list(formula=~sex+weight)+ Phi.4=list(formula=~flood)+ p.1=list(formula=~1)+ p.2=list(formula=~time)+ p.3=list(formula=~Time)+ p.4=list(formula=~sex)+ cml=create.model.list("CJS")+ mark.wrapper(cml,data=dipper.proc,ddl=dipper.ddl,output=FALSE)+ }

Here, we use the model for sex and weight detailed above as an example of the how

to get predictions of survival for a range of weight value for sex (image found at end of

document).

> minmass=min(dipper$weight)> maxmass=max(dipper$weight)> mass.values=minmass+(0:30)*(maxmass-minmass)/30> PIMS(dipper.results[[11]],"Phi",simplified=FALSE)

group = sexFemale1980 1981 1982 1983 1984 1985

1980 1 2 3 4 5 61981 7 8 9 10 111982 12 13 14 151983 16 17 181984 19 201985 21group = sexMale

1980 1981 1982 1983 1984 19851980 22 23 24 25 26 271981 28 29 30 31 321982 33 34 35 361983 37 38 391984 40 411985 42

29

> pdf("weight_plots.pdf")> par(mfrow=c(2,1))> Phibymass=covariate.predictions(dipper.results[[11]],+ data=data.frame(weight=mass.values), indices=c(1))> plot(Phibymass$estimates$covdata, Phibymass$estimates$estimate,+ type="l",lwd=2,xlab="Mass(g)",ylab="Female Survival",+ ylim=c(0,1),las=1)> lines(Phibymass$estimates$covdata, Phibymass$estimates$lcl,lty=2)> lines(Phibymass$estimates$covdata, Phibymass$estimates$ucl,lty=2)> # Compute and plot survival values for males> Phibymass=covariate.predictions(dipper.results[[11]],+ data=data.frame(weight=mass.values),indices=c(22))> plot(Phibymass$estimates$covdata, Phibymass$estimates$estimate,+ type="l",lwd=2,xlab="Mass(g)",ylab="Male Survival",+ ylim=c(0,1),las=1)> lines(Phibymass$estimates$covdata, Phibymass$estimates$lcl,lty=2)> lines(Phibymass$estimates$covdata, Phibymass$estimates$ucl,lty=2)> dd=dev.off()

Other Stuff

Not sure how far we will get today, but I wanted to at least bullet out a few additionaltopics we can/may look at or discuss individually/as a group post the introductory materialfor those of you that are more advanced. A couple I would like to at least show an exampleof are listed below, but if there are any you would like to discuss let me know and we willwork some time in on them.

• Archiving MARK analysis/code/metadata with R and RMark

• Exporting RMark creates files back to MARK using bfdeeR example

• Time-dependence & sampling covariates in robust design approaches• Creating model/parameters combinations

In class, I did a couple of quick examples using the occupancy data provided by Larissaand discussed on the first day to show how we could use RMark to get real parameterestimates specific to a particular group (Occurrence of Barred Owl’s in Larissa’s example).I had worked up a couple of options for how to create those estimates, shown here. Thefirst example shows how to create the appropriate estimate using a individual covariateapproach, similar to what Larissa did in MARK while the second example shows how tocreate a R factor variable and treat Barred Owl occurrence as a grouping variable in RMarkthat then provides estimates for both levels of ’extra owl occurrence’.

30

6 8 10 12 14 16 18

0.0

0.2

0.4

0.6

0.8

1.0

Mass(g)

Fe

ma

le S

urv

iva

l

6 8 10 12 14 16 18

0.0

0.2

0.4

0.6

0.8

1.0

Mass(g)

Ma

le S

urv

iva

l

Figure 1: -- Predicted female and male dipper survival at various initial values of mass(g).

31

> x=convert.inp(+ "F:/BretResearch/Workshops/AdvancedMarkworkshop/Advanced Mark workshop/NSO_SSoccupancy.inp",+ covariates=c("BO", "s1", "s2", "s3", "s4", "s5", "s6"), use.comments=TRUE)> x.proc=process.data(x, model="Occupancy")> x.ddl=make.design.data(x.proc)> mod=mark(x.proc, x.ddl, model.parameters=list(Psi=list(formula=~BO),+ p=list(formula=~BO)), output=FALSE)> fc=find.covariates(mod, x)> fc$value[fc$var=="BO"]=1> design=fill.covariates(mod,fc)> compute.real(mod, design=design)

estimate se lcl ucl fixed1 0.3514709 0.04935413 0.2617359 0.45308982 0.6661287 0.10196915 0.4482072 0.8305276

> xx=convert.inp(+ "F:/NSO_SSoccupancy.inp", covariates=c("BO","s1", "s2", "s3","s4", "s5", "s6"),+ use.comments=FALSE)> xx$BO=factor(xx$BO,labels=c("N","Y"))> xx.proc=process.data(xx, model="Occupancy", groups="BO")> xx.ddl=make.design.data(xx.proc)> mod=mark(xx.proc, xx.ddl, model.parameters=list(Psi=list(formula=~group),+ p=list(formula=~group)))

Output summary for Occupancy modelName : p(~group)Psi(~group)

Npar : 4-2lnL: 792.2784AICc : 800.5382


p:(Intercept) 0.9046457 0.1205681 0.6683322 1.1409592p:groupY -1.5172257 0.2478285 -2.0029696 -1.0314818Psi:(Intercept) 0.3494579 0.1856169 -0.0143513 0.7132671Psi:groupY 0.3412686 0.4946385 -0.6282229 1.3107602

Real Parameter p1 2 3 4 5 6

Group:BON 0.7119033 0.7119033 0.7119033 0.7119033 0.7119033 0.7119033Group:BOY 0.3514709 0.3514709 0.3514709 0.3514709 0.3514709 0.3514709

Real Parameter Psi1

Group:BON 0.5864861Group:BOY 0.6661285

Finally, a couple of you asked about how to create data files that RMark could read.

There are a host of ways to do this, but using a simple example of made up data, one could

do this:

32

> mat=matrix(rbinom(100, 1, .5), ncol=10)> x=data.frame(mat)> ch=paste(x$X1, x$X2,x$X3, x$X4, x$X5, sep="")> my.data=data.frame(ch = paste(x$X1, x$X2,x$X3, x$X4, x$X5, sep=""), freq=1)> my.data$ch=as.character(my.data$ch)> str(my.data)

'data.frame': 10 obs. of 2 variables:$ ch : chr "11011" "10001" "11000" "10100" ...$ freq: num 1 1 1 1 1 1 1 1 1 1

> my.data

ch freq1 11011 12 10001 13 11000 14 10100 15 11100 16 10100 17 00010 18 00010 19 00011 110 10100 1

Or I wrote this little function (based on pasteCols in plotrix package) which does themerge for you based on the columns you select out of your data (as either a matrix or adata.frame):

> mat=matrix(rbinom(100, 1, .5), ncol=10)> x=data.frame(mat)> ch.merge=function(x){+ ch=paste("list(", paste("x", "[,", 1:dim(mat)[2],"]", sep = "", collapse = ","), ")", sep = "")+ return(data.frame(ch=do.call(paste, c(eval(parse(text = ch)), sep = "")),+ stringsAsFactors=FALSE))+ }> xx=ch.merge(x)> xx

ch1 10110101112 10011101103 10011101004 01111101015 11101011106 01111110017 01110000118 01001110109 010001101010 1000111100

> str(xx)

'data.frame': 10 obs. of 1 variable:$ ch: chr "1011010111" "1001110110" "1001110100" "0111110101" ...

Or, as Jeff Laake just informed me, he have a function called collapseCH in his markedpackage which does the same (roughly) as I show above:

33

> library(marked)> mat=matrix(rbinom(100, 1, .5), ncol=10)> xy=data.frame(ch=collapseCH(mat,prefix="X"))> xy

ch1 11100101002 01001010013 11110000114 10011110105 10010011006 01010111007 10011000018 10100011019 000110111110 1101101011

> str(xy)

'data.frame': 10 obs. of 1 variable:$ ch: Factor w/ 10 levels "0001101111","0100101001",..: 9 2 10 6 4 3 5 7 1 8

34

rmark: an r interface to capture-recapture analysis with mark · 2013. 8. 16. · rmark: an r...

Documents