generalized - mathcs.duq.edumathcs.duq.edu/~kern/jen/presentation.pdf · generalized p oisson...
TRANSCRIPT
Thesis Presentation: April 15, 2004 1
A Pie ewise Linear GeneralizedPoisson Regression Approa h toModeling Longitudinal Freqen yDataJennifer Jo BorgesiAdvisor: John C. Kern IIDepartment of Mathemati s and ComputerS ien eDuquesne UniversityApril 15, 2004
Thesis Presentation: April 15, 2004 2
Outline1. Introdu tion2. Generalized Poisson Distribution3. Generalized Poisson Appli ation4. Hot Flush Appli ation5. Dis ussion6. Limitations7. Future Work
Thesis Presentation: April 15, 2004 3
Introdu tionProblem: Inadequate modeling te hniques fordata from experiments that produ e longitudinalfrequen y data.Current Status: Di�erent models have been �tted tosu h data (i.e. the Negative Binomial Distribution).These models are inadequate be ause they onlyallow for overdispersion.Proposed Methodology: Use the Generalized PoissonDistributionto model data.
Thesis Presentation: April 15, 2004 4
Generalized Poisson DistributionThe genralized Poisson distribution (see Consul andJain 1973) is de�ned by the mass fun tionp(xj�; �) = �(� + x�)x�1e�(�+x�) 1x! ; x = 0; 1; 2; : : :for � > 0, and j�j < 1, su h thatp(xj�; �) = 0 for x � m when � < 0 ;m is the largest positive integer for whi h� +m� � 0.
Thesis Presentation: April 15, 2004 5Generalized Poisson Distribution
It an be shown that if X is a random variablewith this generalized Poisson distribution then theexpe ted value � of X is� = �1� �with varian e �2 = �(1� �)3 :An alternative, more onvenient prarameterizationof the generalized Poisson distribution isX � GP (�; k);where � > 0 is the expe ted value of the generalizedPoisson random variable and k is the dispersionparameter (Famoye and Wang 1997).
Thesis Presentation: April 15, 2004 6
Generalized Poisson DistributionThe GP density f(xj�; k) in terms of the parameters� and k is thenf(xj�; k) = �1 + k�!x (1 + kx)x�1x!�exp ��(1 + kx)1 + k� ! ; x = 0; 1; 2; : : :
Thesis Presentation: April 15, 2004 7
Generalized Poisson DistributionThe varian e of X, denoted by V X, is given byV X = �(1 + k�)2 :� for k > 0 the varian e of X ex eeds its expe tedvalue (overdispersion)� for �2� < k < 0 the expe ted value � ex eeds thevarian e of X (underdispersion)� for k = 0, � = V X, and the generalized Poissondistribution redu es to a standard Poissondistribution (equidispersion)
Thesis Presentation: April 15, 2004 8Generalized Poisson Appli ation
Univariate Parameter Estimation - begin byupdating � and k as follows:� Choose urrent (initial) values of the parameters, all these values �(0) and k(0).� Propose a value for �� from a uniformdistribution entered at �(0) the urrent value.�� � Unif(�(0) � a; �(0) + a);where a > 0 is a tuning parameter in theproposal distribution for ��. We have let theproposal distribution for �� be uniform whi hgivesq(��j�) = 1min(2a; �+ a; 1�2k) :
Thesis Presentation: April 15, 2004 9
Generalized Poisson Appli ation� For �xed k, the proposed value �� is thena epted with the following probability:
� = min 1; L(��; k)�(��; k)q(�(0)j��)L(�(0); k)�(�(0); k)q(��j�(0))! :The value that is a epted for �| with probability�| is then used to update k in analogous fashion.The values �(1) and k(1) represent the updated valuesof these parameters. These values are then treatedas the new \ urrent" parameter values, and theentire updating pro ess is repeated.
Thesis Presentation: April 15, 2004 10Generalized Poisson Appli ation
The Equidispersed Data Set:
OBSERVATION
DATA
VAL
UE
0 50 100 150 200
24
68
1012
1416
For the omputer simulated equidispersed data setof n = 200 realizations:� �x = 7:03� s2 = 7:074472� k̂ = 0:0004492
Thesis Presentation: April 15, 2004 11Generalized Poisson Appli ation
The Equidispersed Data Set:
ITERATION
MU
0 1000 2000 3000 4000
6.57.0
7.5
ITERATION
k
0 1000 2000 3000 4000
-0.02
-0.01
0.00.0
10.0
20.0
3
Thesis Presentation: April 15, 2004 12Generalized Poisson Appli ation
The Underdispersed Data Set:
OBSERVATION
DATA
VAL
UE
0 50 100 150 200
46
810
12
For the omputer simulated underdispersed data setof n = 200 realizations:� �x = 7:84� s2 = 3:622513� k̂ = �0:0408486677
Thesis Presentation: April 15, 2004 13Generalized Poisson Appli ation
The Underdispersed Data Set:
ITERATION
MU
0 1000 2000 3000 4000
7.47.6
7.88.0
8.2
ITERATION
k
0 1000 2000 3000 4000
-0.06
-0.05
-0.04
-0.03
Thesis Presentation: April 15, 2004 14Generalized Poisson Appli ation
The Overdispersed Data Set:
OBSERVATION
DATA
VAL
UE
0 50 100 150 200
05
1015
2025
For the omputer simulated overdispersed data setof n = 200 realizations:� �x = 10:05� s2 = 17:50503� k̂ = 0:03181794
Thesis Presentation: April 15, 2004 15Generalized Poisson Appli ation
The Overdispersed Data Set:
ITERATION
MU
0 1000 2000 3000 4000
9.09.5
10.0
10.5
11.0
ITERATION
k
0 1000 2000 3000 4000
0.01
0.02
0.03
0.04
0.05
0.06
Thesis Presentation: April 15, 2004 16Hot Flush Appli ation
Data: A lini al trial investigating a upun ture asan alternative treatment to alleviate symptoms ofmenopause for breast an er survivors.� Treatment group� Pla ebo group� Edu ation group
DAY
FREQ
UENC
Y
0 20 40 60 80
1012
1416
1820
22
Figure 1: The HFF pro�le of an individual from thetreatment group.
Thesis Presentation: April 15, 2004 17
Hot Flush Appli ationData Model:We model longitudinal frequen y responses frommultiple individuals using a GP distribution whosemean � is a fun tion of time.The log-likelihood fun tion lt(�t; k) based on the ntobservations olle ted at time t:lt(�t; k) = log ntYj=1 �t1 + k�t!ytj (1 + kytj)ytj�1ytj!exp ��t(1 + kytj)1 + k�t ! :
Thesis Presentation: April 15, 2004 18Hot Flush Appli ation
Flexibility in allowing �t to vary with time isobtained by modeling �t as a pie ewise-linearfun tion of time.We de�ne a ve tor of knot lo ationsK = fK1; K2; : : : ; Kmgand a ve tor of orresponding heights� = f�1; �2; : : : ; �mg(Ki; �i) represents the lo ation at whi h two linesegments meet (referred to as a node).�t = f(�i; Ki; t) = �i+1 � �iKi+1 �Ki! (t�Ki) + �i;for i = 1; :::;m.
Thesis Presentation: April 15, 2004 19Hot Flush Appli ation
Five nodes were sele ted to formulate the pie ewiselinear fun tion.� three �xed knot lo ations f0:5; 7:5; 91:5g� two additional knot lo ations randomly hosenin the time interval (7:5; 91:5)K = fK1; K2; : : : ; K5g� = f�1; �2; : : : ; �5gK1 < K2 < K3 < K4 < K5 and �i > 0.The omplete list of parameters that de�nes thismodel is fK;�; kg, where k is the dispersion term.
Thesis Presentation: April 15, 2004 20Hot Flush Appli ation: Parameter EstimationLet f�(i)1 ; : : : ; �(i)5 ; K(i)3 ; K(i)4 ; k(i)grepresent the \ urrent" value of the parameters.� To update �1 let the proposal distribution beuniform over an interval entered around the urrent value �(i)1 . Then��1 � Unif(�(i)1 � a; �(i)1 + a)where a is a tuning parameter. For �xed valuesof the other 7 parameters, a ept ��1 as the next urrent value with probability � given by
� = min�1; L(��1; �(i)2 ; : : : ; �(i)5 ;K(i); k(i))�(��1)q(�(i)1 j��1)L(�(i)1 ; �(i)2 ; : : : ; �(i)5 ;K(i); k(i))�(�(i)1 )q(��1 j�(i)1 )� :The other seven parameters are then updatedusing the \ urrent" value of �1.
Thesis Presentation: April 15, 2004 21Hot Flush Appli ation: Parameter Estimation� The remaining node heights, �2; : : : ; �5, areupdated in a manner analogous to updating �1.� K3 follows the same pro ess as updating �1, butwith the following ex eption: the proposed valueof K�3 must be dis rete uniform.� K4 is updated in a manner that is analogous toupdating K3� The dispersion parameter, k, is updated in thesame way as �1.After this y le of updating the 8 parameters in turnis omplete, it is repeated. This pro ess is ontinueduntil the values stored for f�(i+1);K(i+1); k(i+1)ghave onverged to the posterior distribution forf�;K; kg.
Thesis Presentation: April 15, 2004 22Hot Flush Appli ation: Results
ITERATION
k
0 500 1000 1500 2000
0.24
0.26
0.28
0.30
0.32
Tra e plot of the marginal posterior draws for k in the treatment group.
ITERATION
k
0 500 1000 1500 2000
0.14
0.15
0.16
0.17
0.18
Tra e plot of the marginal posterior draws for k in the pla ebo group.
ITERATION
k
0 500 1000 1500 2000
0.08
0.10
0.12
0.14
Tra e plot of the marginal posterior draws for k in the edu ation group.
Thesis Presentation: April 15, 2004 23Hot Flush Appli ation: Results
DAY
HOT F
LUSH
FREQ
UENC
Y
0 20 40 60 80
46
810
•
• •
•
•
•
•
• •• •
•• •
•
• ••
• • •
•
•
•
•• •
•
•
•••••
•
•••
• • • •
• ••
•
•• • •
•
• • • ••• •
•• • • •
•
• •• • • •
• •• • •
• •••
••••• • •
••• • •Treatment Group
DAY
HOT F
LUSH
FREQ
UENC
Y
0 20 40 60 80
46
810
••••
•
•••
•
•
•
•
••
••
•
••••
•
•
•
•
•
•
•• • •
•
•
•
•••
•• •
• •• • • •
• •
•
•
••
••
•
•
• •
•
•••• •
•
• •
• ••
• •• •
••
••
•••
•
•
•••
• •
•
• •
Pla ebo Group
DAY
HOT F
LUSH
FREQ
UENC
Y
0 20 40 60 80
46
810
•
••
••
••
•
•
•
•
••
••
• ••
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•• •
•
•
•
•••
•
••
•
• •
•
•
• •
•
•
•
•
•
••• •
• •
•••
•
•
•
•••••• •
• ••
••• •
•
•
Edu ation Group
Thesis Presentation: April 15, 2004 24
Dis ussionUsefulness of the GP distribution:� Re ognizes and treats the dis rete nature of thedata.� Not only allows for, but dete ts equidispersed,underdispersed, and overdispersed data.Usefulness of the pie ewise-linear model:� Ability to make omparisons a ross experimentalgroups.� Allows for time orrelation through the pie ewiselinear fun tion for the mean.� Useful in other appli ations involvinglongitudinal frequen y data.
Thesis Presentation: April 15, 2004 25LimitationsThe value hosen for k, determines the range ofvalues that � an take:k = �� , � = �1�� , and �1 < � < 0:By substiution, �1 < k� < 0;0 < � < �k:� = �1�k� over the interval for � gives0 < � < 1�2kand �12� < k < 0:
Thesis Presentation: April 15, 2004 26LimitationsExample: Daily hot ush frequen ies experien ed bya subje t in the pla ebo group.
DAY
FREQ
UENC
Y
0 20 40 60 80
1112
1314
1516
17
� �x = 13:73626� s2 = 1:640781� k̂ = �0:0478Upper bound on �:1�2k = 10:0956 = 10:46025:
Thesis Presentation: April 15, 2004 27
Future Work� Assign a mixture prior distribution for k(i.e. allow �(k = 0) � 0).� Compare with alternative longitudinal frequen ymodels (i.e. a model that expli itly in orporatesthe dependen e stru ture of the data).