![Page 1: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/1.jpg)
A Theoretical Framework for Adaptive Collection Designs
Jean-François Beaumont, Statistics CanadaDavid Haziza, Université de Montréal
International Total Survey Error WorkshopQuébec, June 19-22, 2011
![Page 2: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/2.jpg)
Overview
Selected literature review
Framework
• Definition of the problem
• Choice of quality indicator and cost function
• Mathematical formulation of the problem
Solution and discussion
Conclusion2
![Page 3: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/3.jpg)
Literature review: Groves & Heeringa (2006, JRSS, Series A)
Responsive designs: Use paradata to guide changes in the features of data collection in order to achieve higher quality estimates per unit cost
• Paradata: Data about data collection process
• Examples of features: mode of data collection, use of incentives , …
• Need to define quality and determine quality indicators
• Two main concepts: phase and phase capacity
3
![Page 4: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/4.jpg)
Literature review: Groves & Heeringa (2006, JRSS, Series A)
Phase: Period of data collection during which the same set of methods is used
• Phase 1: gather information about design features
• Phases 2+: alter features (e.g., subsampling of nonrespondents, larger incentives,
…)
A phase is continued until its phase capacity is reached
• Judged by the stability of an indicator as the phase matures
4
![Page 5: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/5.jpg)
Literature review: Schouten, Cobben & Bethlehem (2009,
SM) Goal: determine an indicator of nonresponse bias
as an alternative to response rates
Proposed a quality indicator, called R-indicator:
• Population standard deviation must be estimated
• Response probabilities, , must be estimated using some model
An issue: indicator depends on the proper choice of model (choice of auxiliary variables)
( ) 1 2 Pop.Std.Dev.( , ) , 0 ( ) 1iR i U R ρ ρ
i
5
![Page 6: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/6.jpg)
Literature review: Schouten, Cobben & Bethlehem (2009,
SM) Another issue: indicator does not depend on the
variables of interest but nonresponse bias does
Maximal bias of :
is the unadjusted estimator of the population mean:
Two limitations of maximal bias (and R-indicator):
• unadjusted estimator is rarely used in practice
• depends on proper specification of
1 ( ) ( )
2
R S
ρ y
i
ˆNA
ˆNA
ˆr r
NA i i ii s i sw y w
6
![Page 7: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/7.jpg)
Literature review: Peytchev, Riley, Rosen, Murphy & Lindblad (2010,
SRM)
Goal: Reduce nonresponse bias through case prioritization
Suggest targeting individuals with lower estimated response probabilities
• For instance, give them larger incentives or give interviewer incentives
• Their approach is basically equivalent to trying to increase the R-indicator (or achieving a more balanced sample)
Recommend using auxiliary variables that are associated with the variables of interest
7
![Page 8: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/8.jpg)
Literature review: Laflamme & Karaganis (2010, ECQ)
Development and implementation of responsive designs for CATI surveys at Statistics Canada
Planning phase:
• before data collection starts (determination of strategies, analyses of previous data, …)
Initial collection phase:
• evaluate different indicators to determine when the next phase should start
Two Responsive Designs (RD) phases 8
![Page 9: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/9.jpg)
Literature review: Laflamme & Karaganis (2010, EQC)
RD phase 1:
• prioritize cases (based on paradata or other information) with the objective of improving response rates
• increase the number of respondents (desirable)
RD phase 2:
• prioritize cases with the objective of reducing the variability of response rates between domains of interest (increasing R-indicator)
• likely reduce the variability of weight adjustments (desirable)
9
![Page 10: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/10.jpg)
Literature review: Schouten, Calinescu & Luiten (2011, Stat. Netherlands)
First paper to propose a theoretical framework for adaptive survey designs
Suggest:
• Maximizing quality for a given cost; or
• Minimizing cost for a given quality
Requires a quality indicator (e.g., overall response rate, R-indicator, Maximal bias, …)
• Which one to use?
10
![Page 11: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/11.jpg)
Definition of the problem
Adaptive collection design: Any procedure of calls prioritization or resources allocation that is dynamic as data collection progresses
• Use paradata (or other information) to adapt itself to what is observed during data collection
• Focus on calls prioritization
Our objective: Maximize quality for a given cost
Context: CATI surveys
11
![Page 12: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/12.jpg)
Choice of quality indicator
Focus of the literature: Find collection designs that reduce nonresponse bias (or maximize R-indicator) of an unadjusted estimator
We think the focus should not be on nonresponse bias. Why?
• Any bias that can be removed at the collection stage can also be removed at the estimation stage
We suggest reducing nonresponse variance of an estimator adjusted for nonresponse
12
![Page 13: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/13.jpg)
Quality indicator
Suppose we want to estimate the total:
Assuming that nonresponse is uniform within cells, an asymptotically unbiased estimator is:
Quality indicator: The nonresponse variance
1
ˆ ˆwithˆrg
Ggi rg
A gi gi sg g g
w ny
n
ii Uy
1 2,
1
ˆvar 1 1G
q A g g wy gg
s n S
ˆg q g q rg gE s E n s n
13
![Page 14: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/14.jpg)
Overall cost
Overall cost:
, , , ,( 1)rg g rg
TOT g gi NR g R g gi NR gi s i s s
C m C C m C
,1
G
TOT TOT ggC C
14
,
,
:total number of attempts for unit
:cost of an unsuccessfulattempt
:cost of an interview
gi
NR g
R g
m i
C
C
![Page 15: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/15.jpg)
Expected overall cost
Expected overall cost:
, , , ,
g
TOT g R g NR g g g NR g gii s
C C C n C m
,1
G
TOT q TOT TOT ggC E C s C
0 11
G
TOT g g gg
C n
15
,gi q gi gi gim E m s m p M
does not dependongi gm Assumption :
![Page 16: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/16.jpg)
Mathematical formulation
Objective: Find that minimizes the nonresponse variance
subject to a fixed expected overall cost,
Solution:
Note:Equivalent to maximizing the R-indicator only in a very special scenario
ˆvarq A s
, 1,..., ,g g G
TOTC K
1
1 2 2,
,1
1 g wy g
g wy gg
n SS
16
![Page 17: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/17.jpg)
Implementation
Find the effort (number of attempts) necessary to achieve the target response probability
Procedure: Select cases to be interviewed with probability proportional to the effort
Issues: 1) Avoid small estimated to avoid an unduly large effort
2) Might want to ensure that a certain time has elapsed between two consecutive calls
gieg
ln(1 )
ln(1 )g
gigi
ep
gie
gipgie
17
![Page 18: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/18.jpg)
Graph of variance vs cost
Minimum nonresponse variance
Expected overall cost18
![Page 19: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/19.jpg)
Revised solution
Solution of the optimization problem is found before data collection starts
May be a good idea to revise the solution periodically (e.g., daily)
• Some parameters might need to be modified
• Update remaining budget and expected overall cost
• The revised optimization problem is similar to the initial one
19
![Page 20: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/20.jpg)
Revised solution
Solution (same as before):
Revised target response probability:
Effort:
20
1
1 2 2,
1
1 g wy g
gg
n S
g g rgg
g rg
n n
n n
Could be negative
ln(1 )
ln(1 )g
gigi
ep
![Page 21: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/21.jpg)
Conclusion
Next steps:
• Simulation study
• Adapt the theory for practical applications
• Test in a real production environment
Which quality indicator? Nonresponse variance? Others?
Reduction of nonresponse bias: subsampling of nonrespondents
• Our approach could be used within the subsample
21
![Page 22: A Theoretical Framework for Adaptive Collection Designs](https://reader036.vdocuments.us/reader036/viewer/2022062315/56815a25550346895dc76779/html5/thumbnails/22.jpg)
Thanks - Merci
For more information, please contact:
Pour plus d’information, veuillez contacter :
Jean-François Beaumont ([email protected])
David Haziza ([email protected])
22