estimating personal network size with non-random mixing ... · references 1. mccormick, t.h.,...

1
References 1. McCormick, T.H., Salganik, M.J., Zheng, T.: How many people do you know?: Efficiently estimating personal network size. Journal of the American Statistical Association 105(489), 59–70 (2010) 2. Stan Development Team: Stan modeling language: User’s guide and reference manual (2016). Version 2.14.0, http://mc-stan.org/ 3. Zheng, T., Salganik, M.J., Gelman, A.: How many people do you know in prison? Using overdispersion in count data to estimate social structure in networks. Journal of the American Statistical Association 101(474), 409–423 (2006) Swupnil Sahai; Timothy Jones; Sarah Cowan; Tian Zheng Department of Statistics, Columbia University Correspondence should be sent to Tian Zheng Department of Statistics Columbia University Email: [email protected] Estimating Personal Network Size with Non-Random Mixing via Latent Kernels Estimation of the non-random mixing matrix. Simulation results show instability of the estimates. Background The McCormick et al non-random mixing model. ! "# : the number of people person $ knows in subpopulation % . : denotes subpopulation %. : the probability that alter ' is in age group ( ) and gender * ) given that ego $ is in group ( " and * " and that $ knows ' . : the number of people in + # and in age group ( ) and gender * ) . , " : the number of people person $ knows. McCormick Model Contact Prevalence Study Data - Our data came from a nationally representative cross-sectional survey of American adults aged 18 and older with 1,190 respondents. Results - We used twelve subpopulations defined by twelve first names. Kernel estimates for male and female egos of age 21, 38, and 70. The model captures the variability in network composition by gender. Posterior draws of name-based kernel bandwidth splines for male and female egos. Name-based degree estimates for both genders across three age groups. What did we propose? – A continuous, structured framework for representing social mixing patterns due to age and gender. Findings - Results from real studies provide validating evidence of the advantages of the proposed method over McCormick’s method. Future work - The Gaussian kernel provides room for flexibility. A mixture of Gaussians could provide insights into inter-generational mixing patterns. A spline framework applied to race could be used to understand social mixing rates between races Conclusion Motivation: Estimating the number of people an individual knows (degree) in a large network. Data: Aggregated relational data (ARD) via survey questions “How many X’s do you know?” Issues with Conventional methods: barrier effects (some individuals systematically know more (or fewer) members of a specific subpopulation than would be expected under random mixing). The McCormick et al. Non-random Mixing Model corrected for barrier effects but suffered from instability. Our proposed model replaces the discrete mixing matrix by a continuous, structured kernel-based framework for social mixing patterns. Introduction Proposed Model Non-Random Mixing via Latent Kernels. (use non-random mixing in age and gender) Model Assumptions: q Gender mixing does not depend on an ego’s age. q People most likely know others of the same age. q The bandwidth depends on the ego and alter genders. To retain flexibility without drastically increasing model complexity, we parametrize the bandwidth as a function of age using a fourth order spline Model and Inference Full Posterior

Upload: others

Post on 25-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimating Personal Network Size with Non-Random Mixing ... · References 1. McCormick, T.H., Salganik, M.J., Zheng, T.: How many people do you know?: Efficiently estimating personal

References1. McCormick, T.H., Salganik, M.J., Zheng, T.: How many people do you know?: Efficiently estimating personal network size. Journal of the American Statistical Association 105(489), 59–70 (2010) 2. Stan Development Team: Stan modeling language: User’s guide and reference manual (2016). Version 2.14.0, http://mc-stan.org/ 3. Zheng, T., Salganik, M.J., Gelman, A.: How many people do you know in prison? Using overdispersionin count data to estimate social structure in networks. Journal of the American Statistical Association 101(474), 409–423 (2006)

Swupnil Sahai; Timothy Jones; Sarah Cowan; Tian ZhengDepartment of Statistics, Columbia University

Correspondence should be sent to

Tian ZhengDepartment of Statistics

Columbia UniversityEmail: [email protected]

Estimating Personal Network Size with Non-Random Mixing via Latent Kernels

Estimation of the non-random mixing matrix.

Simulation results show instability of the estimates.

BackgroundThe McCormick et al non-random mixing model.

!"# : the number of people person $ knows in subpopulation %.

: denotes subpopulation %.: the probability that alter '

is in age group () and gender *) given that ego $ is in group (" and *" and that $ knows '.

: the number of people in +# and in age group () and gender *).

," : the number of people person $ knows.

McCormick Model

Contact Prevalence Study

Data - Our data came from a nationally representative cross-sectional survey of American adults aged 18 and older with 1,190 respondents.

Results - We used twelve subpopulations defined by twelve first names. Kernel estimates for male and female egos of age 21, 38, and 70.

The model captures the variability in network composition by gender.

Posterior draws of name-based kernel bandwidth splines for male and female egos.

Name-based degree estimates for both genders across three age groups.

What did we propose? – A continuous, structured framework for representing social mixing patterns due to age and gender.

Findings - Results from real studies provide validating evidence of the advantages of the proposed method over McCormick’s method.

Future work - The Gaussian kernel provides room for flexibility. A mixture of Gaussians could provide insights into inter-generational mixing patterns. A spline framework applied to race could be used to understand social mixing rates between races

Conclusion

Motivation: Estimating the number of people an individual knows (degree) in a large network. Data: Aggregated relational data (ARD) via survey questions “How many X’s do you know?”Issues with Conventional methods: barrier effects (some individuals systematically know more (or fewer) members of a specific subpopulation than would be expected under random mixing).The McCormick et al. Non-random Mixing Model corrected for barrier effects but suffered from instability.Our proposed model replaces the discrete mixing matrix by a continuous, structured kernel-based framework for social mixing patterns.

Introduction

Proposed Model

Non-Random Mixing via Latent Kernels. (use non-random mixing in age and gender)

Model Assumptions:q Gender mixing does not depend on an ego’s age.

q People most likely know others of the same age.

q The bandwidth depends on the ego and alter genders.

To retain flexibility without drastically increasing model complexity, we parametrize the bandwidth as a function of age using a fourth order spline

Model and Inference

Full Posterior