modeling big count data: an irls framework for com-poisson regression and gam
TRANSCRIPT
![Page 1: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/1.jpg)
Modeling Big Count DataAn IRLS framework for COM-Poisson regression and GAM
Suneel ChatlaGalit ShmueliNovember 12, 2016
Institute of Service ScienceNational Tsing Hua University, Taiwan (R.O.C)
![Page 2: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/2.jpg)
Table of contents
1. Speed Dating Experiment- Count data models
2. Motivation
3. An IRLS framework
4. Simulation Study-Comparison of IRLS with MLE
5. A CMP Generalized Additive Model
6. Results & Conclusions
1
![Page 3: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/3.jpg)
Speed Dating Experiment- Countdata models
![Page 4: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/4.jpg)
Speed dating experiment
Fisman et al. (2006) conducted a speed dating experiment toevaluate the gender differences in mate selection 1.
Total sessions 14Decision 1 or 0
Attractiveness 1-10Intelligence 1-10Ambition 1-10
......
Control variables
1https://www.kaggle.com/annavictoria/speed-dating-experiment
2
![Page 5: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/5.jpg)
Outcome/Count variables
Matches : When both persons decide YesTot.Yes : Total number of Yes for each subject in a particular session
3
![Page 6: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/6.jpg)
Summary Statistics
Statistic N Mean St. Dev. Min Maxmatches 531 2.524 2.304 0 14Tot.Yes 531 6.433 4.361 0 21
Tot.partner 531 15.311 4.967 5 22age 531 26.303 3.735 18 55perc.samerace 531 0.391 0.242 0.000 0.833avg.intcor 531 0.190 0.167 −0.298 0.569attr 531 6.195 1.122 1.818 10.000sinc 531 7.205 1.108 2.773 10.000intel 531 7.381 0.988 3.409 10.000func 531 6.438 1.103 2.682 10.000amb 531 6.812 1.133 3.091 10.000shar 531 5.511 1.333 1.409 10.000like 531 6.157 1.072 1.682 10.000prob 531 5.234 1.525 0.778 10.000mean.agep 531 26.314 1.674 20.444 31.667attr_o 531 6.200 1.186 2.333 8.688sinc_o 531 7.224 0.690 4.167 9.000intel_o 531 7.410 0.614 4.875 9.150fun_o 531 6.438 1.015 2.625 8.615amb_o 531 6.827 0.756 4.600 8.842shar_o 531 5.498 0.942 1.375 7.700like_o 531 6.161 0.873 2.333 8.300prob_o 531 5.256 0.736 3.200 7.200Tot.part.Yes 531 6.420 4.128 0 20 4
![Page 7: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/7.jpg)
Tools:
• Poisson Regression• Negative Binomial Regression• Conway-Maxwell Poisson (CMP) Regression
5
![Page 8: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/8.jpg)
The CMP distribution
From Shmueli et al. (2005),
Y ∼ CMP(λ, ν)
implies
P(Y = y) = λy
(y!)νZ(λ, ν) , y = 0, 1, 2, . . .
Z(λ, ν) =∞∑s=0
λs
(s!)ν
for λ > 0, ν ≥ 0.
The CMP distribution includes three well-known distributions asspecial cases:
• Poisson (ν = 1),• Geometric (ν = 0, λ < 1),• Bernoulli (ν → ∞ with probability λ
1+λ ).6
![Page 9: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/9.jpg)
CMP distribution for different (λ, ν) combinations
λ=2,ν=0.5
Den
sity
0 5 10 15
0.00
0.05
0.10
0.15
λ=2,ν=0.75
0 2 4 6 8 10 12
0.00
0.10
0.20
λ=2,ν=1
0 2 4 6 8
0.0
0.2
0.4
λ=2,ν=3
0 1 2 3 4
0.0
1.0
2.0
λ=8,ν=0.5
Den
sity
40 60 80 100
0.00
00.
015
0.03
0
λ=8,ν=0.75
5 10 15 20 25 30 35
0.00
0.04
0.08
λ=8,ν=1
0 5 10 15 20
0.00
0.06
0.12
λ=8,ν=3
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
λ=15,ν=0.5
Den
sity
150 200 250 300
0.00
00.
010
λ=15,ν=0.75
20 30 40 50 60
0.00
0.02
0.04
λ=15,ν=1
5 10 15 20 25 30
0.00
0.04
0.08
λ=15,ν=3
0 1 2 3 4 5 6
0.0
0.4
0.8
7
![Page 10: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/10.jpg)
CMP Regression
CMP regression models can be formulated as follows:
log(λ) = Xβ (1)log(ν) = Zγ (2)
Maximizing the log-likelihood w.r.t the parameters β and γ will yieldthe following normal equations Sellers and Shmueli (2010):
U =∂logL∂β
= XT(y− E(y)) (3)
V =∂logL∂γ
= νZT(−log(y!) + E(log(y!))) (4)
8
![Page 11: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/11.jpg)
Motivation
![Page 12: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/12.jpg)
Exploration of Speed Dating data
●
●
●
●
● ●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
● ●
●
● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●●
●
● ●
● ●
●
●
●● ●
●
●
●● ●
●● ●
●
●
● ● ●
●
●
● ●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
● ●●●
●
●
●●
●
●
●
●● ●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
4 5 6 7 8 9
−2
−1
01
23
Sincerity (Others)
Tot.Y
es (
log)
●
●
●
●
● ●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
● ●
●
● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●●
●
●●
●●
●
●
●● ●
●
●
●●●
●●●
●
●
● ●●
●
●
● ●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●●● ●
●
●
● ●
●
●
●
●● ●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
5 6 7 8 9
−2
−1
01
23
Intelligence (Others)
Tot.Y
es (
log)
●
●
●
●
● ●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
● ●
●
● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●●
●
●●
● ●
●
●
●●●
●
●
●●●
●● ●
●
●
●● ●
●
●
● ●
●
●
● ●
●
●
●●
●
●●
●●
●
●
●
●
● ●● ●
●
●
● ●
●
●
●
●● ●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
4 6 8 10
−2
−1
01
23
Sincerity
Tot.Y
es (
log)
●
●
●
●
●●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
● ●
●
● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●●
●
●●
● ●
●
●
●●●
●
●
●●●
●● ●
●
●
●● ●
●
●
●●
●
●
● ●
●
●
●●
●
●●
●●
●
●
●
●
● ●● ●
●
●
● ●
●
●
●
●● ●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
4 6 8 10
−2
−1
01
23
Fun seeking
Tot.Y
es (
log)
9
![Page 13: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/13.jpg)
More flexibility?
Generalized Additive Models
• Smoothing Splines• Penalized Splines
Both implementations are dependent upon the Iterative ReweightedLeast Squares (IRLS) estimation framework.
At present, there is no IRLS framework available for CMP !!
10
![Page 14: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/14.jpg)
An IRLS framework
![Page 15: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/15.jpg)
Update for each iteration
I[β
γ
](m)
= I[β
γ
](m−1)
+
[UV
]
which implies the following equations
XTΣyXβ(m) − XTΣy,log(y!)νZγ(m) = XTΣyXβ(m−1) −XTΣy,log(y!)νZγ(m−1) + XT(y− E(y))
and
− νZTΣy,log(y!)Xβ(m) + ν2ZTΣlog(y!)Zγ(m) = −νZTΣy,log(y!)Xβ(m−1) +
ν2ZTΣlog(y!)Zγ(m−1) +
νZT(−log(y!) + E(log(y!)))
11
![Page 16: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/16.jpg)
For the fixed values of both β and γ the equations
XTΣyXβ(m) = XTΣyXβ(m−1) + XT(y− E(y)) (5)
ν2ZTΣlog(y!)Zγ(m) = ν2ZTΣlog(y!)Zγ(m−1) + νZT(−log(y!) + E(log(y!))).(6)
12
![Page 17: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/17.jpg)
Algorithm
https://arxiv.org/abs/1610.08244
13
![Page 18: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/18.jpg)
Practical issues
Initial Values
• For λ = (y+ 0.1)ν
• For ν = 0.2
Calculation of Cumulants
• Bounding error 10−8 or 10−10
• Asymptotic expressions
Stopping Criterion
• Based on −2∑l(yi; λ̂i, ν̂i)
Step size
• Step halving
14
![Page 19: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/19.jpg)
Simulation Study-Comparison ofIRLS with MLE
![Page 20: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/20.jpg)
Study design
We compare our IRLS algorithm with the existing implementationwhich is based on maximizing the likelihood function (through optimin R).
(a) Set sample size n = 100(b) Generate x1 ∼ U(0, 1) and x2 ∼ N(0, 1)(c) Calculate x3 = 0.2x1 + U(0, 0.3) and x4 = 0.3x2 + N(0, 0.1) (to
create correlated variables)(d) Generate
y ∼ CMP(log(λ) = 0.05+ 0.5x1 − 0.5x2 + 0.25x3 − 0.25x4, ν)where ν = {0.5, 2, 5}
15
![Page 21: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/21.jpg)
Results
●
●●●
IR MLE IR MLE IR MLE
−0.
50.
00.
51.
01.
5
x1
● ●
●
●
●
●
●
●
IR MLE IR MLE IR MLE
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
x2
●
●
●
IR MLE IR MLE IR MLE
−4
−2
02
46
x3
●
●
●
●
●
●
●●
●●
IR MLE IR MLE IR MLE
−4
−2
02
4
x4
●
●
●
IR MLE IR MLE IR MLE
−2
−1
01
23
4
log(ν)
ν=0.5ν=2ν=5
16
![Page 22: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/22.jpg)
A CMP Generalized AdditiveModel
![Page 23: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/23.jpg)
Additive Model
log(λ) = α+
p∑j=1
fj(Xj)
log(ν) = Zγ
where fj (j = 1, 2, . . . ,p) are the smooth functions for the p variables.
17
![Page 24: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/24.jpg)
Backfitting
Based on Hastie and Tibshirani (1990); Wood (2006), the algorithm asfollows
1. Initialize: fj = f(0)j , j = 1, . . . ,p2. Cycle: j = 1, . . . ,p, 1, . . . ,p, . . .
fj = Sj(y−
∑k̸=j
fk|xj)
3. Continue (2) until the individual functions don’t change.
One more nested loop inside theIRLS framework !
18
![Page 25: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/25.jpg)
Results & Conclusions
![Page 26: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/26.jpg)
Comparison of Regression models on Tot.Yes
Poisson Negative Binomial CMP(Intercept) 0.49 0.59 0.14
(0.43) (0.55) (0.33)GenderMale 0.05 0.05 0.03
(0.04) (0.06) (0.03)age −0.01 −0.01 −0.004
(0.01) (0.01) (0.004)Tot.partner 0.07∗∗∗ 0.07∗∗∗ 0.04∗∗∗
(0.00) (0.01) (0.003)avg.intcor −0.04 −0.04 −0.02
(0.11) (0.15) (0.09)attr 0.19∗∗∗ 0.18∗∗∗ 0.11∗∗∗
(0.03) (0.04) (0.02)sinc −0.06 −0.05 −0.04
(0.03) (0.04) (0.02)intel 0.05 0.06 0.03
(0.04) (0.05) (0.03)func 0.03 0.04 0.02
(0.04) (0.05) (0.03)amb −0.12∗∗∗ −0.13∗∗ −0.07∗∗
(0.03) (0.04) (0.02)shar 0.10∗∗∗ 0.10∗∗∗ 0.06∗∗∗
(0.02) (0.03) (0.02)mean.agep −0.01 −0.01 −0.007
(0.01) (0.02) (0.009)attr_o −0.10∗∗∗ −0.10∗∗∗ −0.06∗∗∗
(0.02) (0.03) (0.02)sinc_o 0.02 0.02 0.01
(0.04) (0.05) (0.03)intel_o 0.08 0.08 0.05
(0.05) (0.07) (0.04)fun_o −0.01 −0.01 −0.003
(0.03) (0.04) (0.02)amb_o −0.00 −0.01 0.0005
(0.04) (0.05) (0.03)shar_o 0.02 0.03 0.01
(0.03) (0.04) (0.02)ν 0.53∗∗∗AIC 2844.92 2777.24 2751.7BIC 3011.64 2948.23 2922.66Log Likelihood -1383.46 -1348.62 -1335.33Deviance 970.04 637.25Num. obs. 531 531 531∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05
19
![Page 27: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/27.jpg)
Comparison of Additive Models on Tot.Yes
Dependent variable:Tot.Yes
CMP(Chi.Sq) Poisson(Chi.Sq)s(sinc) 7.16 11.53∗∗s(func) 7.51 11.40∗∗s(sinc_o) 13.96∗∗ 29.30∗∗∗s(intel_o) 14.06∗∗ 13.26∗∗∗
ν 0.56AIC 2737.03 2804.77
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
It’s more about the behavior of opposite person that guide us toselect her/him.
20
![Page 28: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/28.jpg)
Summary
• The IRLS framework is far more efficient than the existinglikelihood based method and provides more flexibility.
• Since CMP is computationally heavier than the other GLMs wecould parallelize some matrix computations inorder to increasethe speed.
• The IRLS framework allows CMP to have other modelingextensions such as LASSO etc.
Full paper available from https://arxiv.org/abs/1610.08244and the source code is available fromhttps://github.com/SuneelChatla/cmp
21
![Page 29: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/29.jpg)
Suggestions and1. 1.1
Questions?
21
![Page 30: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/30.jpg)
References
Fisman, R., Iyengar, S. S., Kamenica, E., and Simonson, I. (2006).Gender differences in mate selection: Evidence from a speeddating experiment. The Quarterly Journal of Economics, pages673–697.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models,volume 43. CRC Press.
Sellers, K. F. and Shmueli, G. (2010). A flexible regression model forcount data. Annals of Applied Statistics, 4(2):943–961.
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., and Boatwright, P.(2005). A useful distribution for fitting discrete data: revival of theconway–maxwell–poisson distribution. Journal of the RoyalStatistical Society: Series C (Applied Statistics), 54(1):127–142.
![Page 31: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM](https://reader031.vdocuments.us/reader031/viewer/2022021919/5883b18f1a28ab3b488b719b/html5/thumbnails/31.jpg)
Wood, S. (2006). Generalized additive models: an introduction with R.CRC press.