adjustment procedures to account for nonignorable missing data in environmental surveys
DESCRIPTION
Aquatic Resource Surveys. Designs and Models for. DAMARS. R82-9096-01. Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys. Breda Munoz Virginia Lesser. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/1.jpg)
1
Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys
Breda Munoz
Virginia Lesser
R82-9096-01
![Page 2: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/2.jpg)
2
This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this presentation are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.
![Page 3: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/3.jpg)
3
Outline Missing data in environmental surveys
Nonignorable missing data mechanism
Model-based approach for nonignorable missing data
Design-based estimation and nonignorable missing data
Illustration
Summary
![Page 4: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/4.jpg)
4
Missing Data in Environmental Surveys
Researchers in environmental studies must obtain access to selected sites to gather field data
Denial of access: common problem in environmental surveys unit non-response affects the results of data analysis
![Page 5: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/5.jpg)
5
Response Disposition 1995/1996 EMAP North Dakota Prairie Wetlands Studies
(Lesser, 2001)
Result 1995 1996
Private Landowners
Agreed to access 43% 40%
Refused access 36% 37%
Undeliverable 2% 2%
Not returned/no contact 16% 14%
Public Land 3% 7%
Total 100% 100%
![Page 6: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/6.jpg)
6
Introduction
(Boward et.al.,1999) The 1995-1997 Maryland Biological Stream Survey Results: overall denial access rate of 10%.
ODFW habitat surveys overall rate of access denial (Flitcroft et.al., 2002): 1998: 10.0% 1999: 6.0% 2000: 12.5%
![Page 7: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/7.jpg)
7
Assumptions
A probability sampling design to collect outcomes of a spatial random process Y
is a collection of sampling sites selected using the probability sampling design.
auxiliary variables
1{ , , }ns s
s ( ), ( )Y s X s
1 if access was granted for site ( )
0 otherwise R
ss
![Page 8: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/8.jpg)
8Smith, Skinner and Clark (1999), Rubin and Little (2002)
X1
X2
Y R
( ) | ( ), ( ) ( ( ))i i i iP R Y P Rs X s s s
Missing Mechanism: Missing Completely at Random (MCAR)
![Page 9: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/9.jpg)
9
X1
X2
Y R
Missing Mechanism: Missing at Random (MAR)
( ) | ( ), ( ) ( ( ) | ( ))i i i i iP R Y P Rs X s s s X s
Smith, Skinner and Clark (1999), Rubin and Little (2002)
![Page 10: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/10.jpg)
10
X1
X2
Y R
Missing Mechanism: Nonignorable
( ) | ( ), ( )i i iP R Ys X s s
Smith, Skinner and Clark (1999), Rubin and Little (2002)
![Page 11: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/11.jpg)
11
Model-based Approach Under a nonignorable mechanism: we model the joint probability of the data and the missing mechanism indicator (“response” indicator) :
R(si) ~ Bernoulli(pi),
Data model Missing Mechanism model
0 1logit( ) ( ) βi ip Y s X
covariates
( , | covariates) ( | covariates) ( | , covariates)f f fY R Y R Y
![Page 12: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/12.jpg)
12
Model-assisted estimation and nonignorable missing data
Assume the parameter of interest:
Total of the response Y
( )y
R
T y d s s
R
![Page 13: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/13.jpg)
13
Model-assisted estimation and nonignorable missing data
Continuous form of the Horvitz-Thompson estimator for the total (Cordy, 1993):
Let be a collection of fixed values
1
( )ˆ( )
ni
yi i
yT
s
s
11
1 2 1
( ) ( )( ) ( )ˆ( ) ( )
n k ni j i ji i
yi j ii i
y I Q y Qy I y QT
s ss s
s s
1{ , , }kQ Q
![Page 14: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/14.jpg)
14
Model-assisted estimation (cont.)
Sample size n: observed, n-n* missing nonignorable
* *
11
1 2 1
( ) ( )( ) ( )ˆ( ) ( )
n k ni j i ji i
yi j ii i
y I Q y Qy I y QT
s ss s
s s
missing
*n
* *
11
21 1
( ) ( )( ) ( )
( ) ( )
n k ni j i ji i
ji n i ni i
y I Q y Qy I y Q s ss s
s s
![Page 15: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/15.jpg)
15
Model-assisted estimation (cont.)
Observed Missing
Class *
1Q *2Q
… *kQ Total
*1Q
*2Q
… *kQ Total Total
1 *11n
*12n … *
1kn *1n 11m 12m
… 1km 1m 1 1n m
2 *21n
*22n … *
2kn *2n 21m 22m
… 2km 2m 2 2n m
c *1cn
*2cn … *
ckn *cn 1cm 2cm
… ckm cm c cn m
* *1( ) : ( ) , 1, , , 2 ,i j i jy Q y Q i n j k s s denotes the
*jQ
![Page 16: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/16.jpg)
16
Model-assisted estimation (cont.)
Likelihood:
*
1 1 1 1
1| ,Class 0 | ,Classij ijc k c k
n m
i ij i j i
L P R Q j P R Q j
![Page 17: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/17.jpg)
17
Model-assisted estimation (cont.)
Reparameterize model parameters (Baker and Laird
(1988)):
| Class ij ijj
i i
N MP Q i
N M
( ) 0 | , Class ijj
ij ij
MP R Q i
N M
s
Expected cell counts
![Page 18: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/18.jpg)
18
Model-assisted estimation (cont.)
Use EM algorithm to estimate expected counts of missing cells, Mij.
E-step:
ijE m
1
0 | Class , | Class
0 | Class , | Class
j j
i k
j jj
P R i Q P Q im
P R i Q P Q i
ij ijE n n
![Page 19: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/19.jpg)
19
M-step: iterative proportional fitting (IPF) (Bishop et.al., 1975) Algorithm based on fit of marginal totals.
EM algorithm always converges to a solution when using IPF in the M-step (Baker and Laird, 1988)
Model-assisted estimation (cont.)
![Page 20: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/20.jpg)
20
Possible estimators for the total of Y:
Cell adjustment:
Model-assisted estimation (cont.)
( ) ( )1 1
ˆ ( ( ) 1) ( ) ( )c k
y ij ij iji j
T I R y w
s s s adjustment weight
(1)
1 1
( )
( )1
( )
ijij
ij ij
ij c k
i j ijij
ij ij
N
n
m nw
n
m n
s
s
s
(Little and Rubin, 2002)
1yT
![Page 21: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/21.jpg)
21
Column adjustment:
Model-assisted estimation (cont.)
2yT (2)
1 1
( )
( )1
( )
jij
j j
ij c k
i j jij
j j
N
ns
m nw
n
m n
s
s
![Page 22: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/22.jpg)
22
Row adjustment:
Model-assisted estimation (cont.)
(3)
1 1
( )
( )1
( )
iij
i iij c k
i j iij
i i
N
n
m nw
n
m n
s
s
s
3yT
![Page 23: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/23.jpg)
23
Model-assisted estimation (cont.)
Variance estimators obtained using bootstrap
(Efron, 1994) Bootstrap produces asymptotically valid variance.
( ) ( ) 2( ) ( ) ( )
1
1var( ) ( )
Mi i
y y yi
T T TM
![Page 24: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/24.jpg)
24
Illustration
We simulate a continuous multivariate normal spatial random process for y
Population: John Day Middle Fork stream reaches
143 stream reaches divided in survey segments (~1 mile)
6536 survey segments
Area of 785 mi2
![Page 25: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/25.jpg)
25
Illustration
The population of stream reaches was stratified in 6 strata based on the number of survey segments:
“<10 ” “10-20” “20-30”
“30-50” “50-100” “>100”
Nonignorable missing data was generated as:
Missing rates of 15%, 30% and 50% were created.
0 if ( ) ( )
1 otherwise
y zR
ss
![Page 26: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/26.jpg)
26
John Day Middle Fork stream network
![Page 27: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/27.jpg)
27
Population Summary
Strata1 Strata2 Strata3 Strata4 Strata5 Strata6
Size 246 433 269 1059 1208 3321
Class
Class 1
Class 2
64.23%
35.77%
65.13%
34.87%
64.31%
35.69%
65.44%
34.56%
65.48%
34.52%
61.70%
38.30%
Summary
Minimum
Mean
Max
-2.07
1.63
7.01
-2.99
1.68
7.95
-3.96
1.66
8.04
-2.18
1.70
6.15
-2.37
1.73
8.65
-5.47
1.80
9.87
![Page 28: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/28.jpg)
28
Illustration
Sample size n = 100
Allocation proportional to number of survey segments on each strata
Q1 = first sample quantile
![Page 29: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/29.jpg)
29
John Day Middle Fork stream network and sample points
![Page 30: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/30.jpg)
30
Modified Bootstrap
We draw 1000 random samples of size 100 from the observed sample: Independently across strata Maintain proportional allocation Maintain the row totals by the auxiliary variable
For each of the 1000 samples, we estimate
We obtain a standard error and MSE for each estimate
We repeat this process 1000 times
1 2 3ˆ ˆ ˆ ˆ, , , , y y y yHTT T T T
![Page 31: Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813f71550346895daa5351/html5/thumbnails/31.jpg)
31
Summary
15% Missing Rate 30% Missing Rate 50% Missing Rate
Estimate MSE
/100,000 Coverage 95% CI
Estimate
MSE /100,000
Coverage 95% CI
Estimate
MSE /100,000
Coverage 95% CI
yHTT
10,624.20
17.24
73.5
8,299.13
109.70
20.0%
7,646.15
149.81
0.1%
1yT
11,266.93
12.78
94.3
10,929.37
23.12
91.2%
14,788.26
130.94
7.2%
2yT
11,183.85
13.77
93.5
10,860.73
23.47
90.6%
14,790.36
131.02
7.1%
3yT
12,401.27
23.60
80.2
11,741.42
22.09
94.1%
14,380.28
105.22
14.1%
yT
11,445.13