implementation of threshold regression: programs for sas ... › ... › files ›...

34
Version 2006-Dec. 23 Implementation of Threshold Regression: Programs for SAS, R-code and STATA This version of the threshold regression program is implemented by Qing Hu, Department of Mathematical sciences – Applied Statistics, Worcester Polytechnic Institute, Worcester, MA Introduction and Acknowledgements Threshold regression refers to regression structures in first hitting time (FHT) models. In this technical report, the next section gives a brief overview of the theoretical foundations of threshold regression. The subsequent sections then describe simple programs that may be used to implement this type of regression analysis in SAS, R-code and Stata. This document was drawn heavily on the published work of Mei-Ling Ting Lee and G. A. Whitmore on the topic of threshold regression. In particular, the Stata program presented here was modeled on a version provided by Lee and Whitmore and used by them in earlier research. For an overview of threshold regression, the reader is referred to Lee M-LT, Whitmore GA (2007). Threshold regression for survival analysis: Modeling event times by a stochastic process, Statistical Science . (in press). The Basics of Threshold Regression A FHT model has two basic components: (1) a parent stochastic process with initial value { Χ Τ x t t X , ), ( } 0 ) 0 ( x X = , where T is the time space and is the state space of the process, and (2) a boundary set Χ Β , where X Β . The initial value x 0 is assumed to lie outside of set B. The word “threshold” refers to the fact that the FHT is triggered by the parent stochastic process reaching a threshold state within the boundary set for the first time. In other words, the first hitting time of Β is the random variable S defined as follows: } ) ( : inf{ B t X t S = 1

Upload: others

Post on 30-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Version 2006-Dec. 23

Implementation of Threshold Regression:

Programs for SAS, R-code and STATA

This version of the threshold regression program is implemented by Qing Hu, Department of Mathematical sciences – Applied Statistics, Worcester Polytechnic Institute, Worcester, MA Introduction and Acknowledgements Threshold regression refers to regression structures in first hitting time (FHT) models. In this technical report, the next section gives a brief overview of the theoretical foundations of threshold regression. The subsequent sections then describe simple programs that may be used to implement this type of regression analysis in SAS, R-code and Stata. This document was drawn heavily on the published work of Mei-Ling Ting Lee and G. A. Whitmore on the topic of threshold regression. In particular, the Stata program presented here was modeled on a version provided by Lee and Whitmore and used by them in earlier research. For an overview of threshold regression, the reader is referred to

Lee M-LT, Whitmore GA (2007). Threshold regression for survival analysis: Modeling event times by a stochastic process, Statistical Science. (in press).

The Basics of Threshold Regression A FHT model has two basic components: (1) a parent stochastic process

with initial value{ Χ∈Τ∈ xttX , ),( } 0)0( xX = , where T is the time space and is the

state space of the process, and (2) a boundary set

Χ

Β , where X⊂Β . The initial value x0 is assumed to lie outside of set B. The word “threshold” refers to the fact that the FHT is triggered by the parent stochastic process reaching a threshold state within the boundary set for the first time. In other words, the first hitting time of Β is the random variable S defined as follows: })( :inf{ BtXtS ∈=

1

Page 2: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

The parent stochastic processes may take many forms. In this technical report, we assume

the parent stochastic process is a Wiener process { }0 ),( ≥ttX with µ as its mean

parameter, as its variance parameter, and its initial value2σ 0)0( 0 >= xX . The

boundary set is taken as the zero level of the process. Then the first hitting time S of the

boundary B has an inverse Gaussian distribution if the process mean parameter µ is

zero or if it is negative so the process tends to drift toward the zero level (the boundary). The inverse Gaussian distribution depends on the mean and variance parameter of the

underlying Wiener process (µ and ) and the initial value . Let 2σ 0x ) , ,( 02 xtf σµ

and ) , ,( 02 xtF σµ denote the probability density function (p.d.f.) and cumulative

distribution (c.d.f.) of the FHT distribution. These two functions can be written as

,2

)(exp

2) , ,( 2

20

32

00

2⎥⎦

⎤⎢⎣

⎡ +−=

rrx

r

xxtf

σµ

πσσµ for 0 ,0 , 0

2 >>∞<<∞− xσµ

and

,)(

)2

exp()(

) , ,(2

02

0

2

00

2⎥⎦

⎤⎢⎣

⎡ −−Φ−+⎥

⎤⎢⎣

⎡ +−Φ=

r

rxx

r

rxxtF

σ

µσµ

σ

µσµ

respectively, where is the c.d.f. of the standard normal distribution. If )(⋅Φ µ >0, the

FHT is not certain to occur and the p.d.f is improper. In this case,

)2

exp(1)( 20

σµx

XP −−=∞= .

When the parent process is latent (unobserved), one parameter may be fixed. For

instance, the variance parameter may be set to unity, which we choose to do in this

technical report .

Now we introduce a regression structure by expressing parameters µ and as

regression functions of covariates. Assume that there are k covariates that

0x

, , ,1 kzz

2

Page 3: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

might be related to µ and . Let an identity function link parameter 0x µ to the

covariates as follows:

kk zz βββµ +++== 110zβ .

Similarly, let a logarithmic function link parameter to the covariates as follows: 0x

kk zzx γγγ +++== 1100 )ln( zγ .

Here ,) , , ,1( 1 kzz=z ) ,, ,( 10 ′= kββββ , and ) , , ,( 10 ′= kγγγγ . The unit element in

the covariate vector allows for a regression intercept term. Other link functions can be chosen. The general criterion is that the link function should map the parameter into the whole real line.

Suppose survival data of the form ( iit z , , iδ ), ni , , ,1= , are collected. Here iδ is an

indicator variable ( iδ =0 if the ith item is censored; iδ =1 if the ith item failed), and is

the corresponding failure or censoring time. As before, is a vector of covariate

values, in this case the covariate vector for item i. Likewise, we let µ

it

iz

i and x0i denote the values of these parameters for item i. When we apply threshold regression in survival data analysis, the state of the underlying process represents the strength of an item, and the item fails when the process reaches an adverse threshold state for the first time (assumed to be zero in this report). Thus the sample log-likelihood function can be written as

]) ,(ln)1() ,(ln[) ,(ln 001

iiiiiii

n

ii xtFxtfL µδµδ −+= ∑

=

γβ

Gradient algorithms, such as the Newton-Raphson algorithm, are efficient numerical methods for maximizing the log-likelihood function to find the estimates for regression

parameter vectors and . β γ

3

Page 4: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Implementing Threshold Regression in SAS Here is a sample code for implementing threshold regression with SAS: **********************************************************************;

* MANUAL DATA INPUT *;

* We can input data manually or by pasting the data from another data source

(such as a text file). Consider the following hypothetical case illustration.

A study has 49 patients diagnosed with myeloma. They are administered a drug

at one of three dose levels, 0, 1 and 2, with the dose level being randomly

assigned. Zero indicates placebo. The time from the point of randomization

to either death or censoring has been tracked. The first line of the following

input code gives the data set name, 'myeloma'. The second line is an input

statement. The variables are listed after ‘input’ in the order they appear

in each line of the data record. The data follows the command 'datalines'

and the data set ends with a semicolon like other statements. Variable ‘id’

refers to the patient’s identification number. The 'time' variable gives

the survival or censoring time in years. Variable ‘age’ is the patient’s

age in years at enrolment into the study. Variable 'gender'is an indicator

variable that is coded 1 for male patients and 0 for female patients. Variable

'treat' indicates the assigned treatment dose. Variable 'fail' has a value

of 1 for patients who died and 0 for those who were censored.

*;

**********************************************************************;

data myeloma;

input id time age gender treat fail;

datalines;

1 3.657 53 1 2 1

2 3.175 47 0 0 0

3 3.09 50 1 1 1

4 3.288 55 1 2 1

5 2.579 31 1 1 0

6 3.52 63 1 0 1

7 2.912 62 0 0 1

8 3.458 45 0 0 1

9 3.175 58 1 1 1

10 4.224 59 1 1 1

11 4.23 62 1 2 1

12 3.626 52 1 0 1

4

Page 5: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

13 3.78 51 0 1 1

14 4.053 48 0 0 1

15 3.77 59 0 2 1

16 3.515 40 0 2 1

17 4.224 56 1 2 1

18 3.458 67 0 2 1

19 3.349 66 1 0 1

20 4.45 50 1 1 1

21 3.486 59 0 0 1

22 3.288 60 1 1 1

23 3.827 64 1 1 1

24 2.608 56 0 2 1

25 2.069 64 0 2 1

26 4.269 47 1 2 0

27 3.402 52 0 2 1

28 4.195 46 0 0 1

29 3.628 51 1 0 1

30 2.919 61 1 1 1

31 3.855 43 1 1 1

32 3.628 50 0 0 1

33 4.053 30 1 1 1

34 2.962 68 0 0 1

35 2.948 53 1 0 1

36 2.834 58 1 0 1

37 2.948 51 1 1 0

38 3.061 37 1 1 0

39 3.203 59 1 2 1

40 4.28 49 1 1 0

41 2.551 64 0 4 1

42 3.175 39 1 0 0

43 3.288 65 0 1 1

44 3.674 54 1 0 1

45 4.082 37 0 1 0

46 3.061 53 1 0 1

47 3.033 57 0 0 1

48 3.203 55 1 1 1

49 3.379 59 1 2 1

;

run;

5

Page 6: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

**********************************************************************;

* IMPORTING A DATA FILE *;

* *;

If the size of the raw data set is large and it is already stored in an external

file (for example, in “C:\my file\myeloma data.txt”), then we do not have

to bring that file into the data stream. Use an INFILE statement to specify

the file containing the raw data:

* *;

*data myeloma; *;

*infile 'C:\my file\myeloma data.txt'; *;

*input id time age gender treat fail; *;

*run; *;

*proc print; *;

*run; *;

* *;

We can also use the 'proc import' statement to import an external file. For

instance, if the raw data is stored in an Excel workbook (the same path as

above), then we can use the following code to create a new SAS data set.

This new SAS data set is named 'myeloma' and the output displays the first

10 observations of the data set. *;

* *;

* *;

* proc import datafile="C:\my file\myeloma data.xls" *;

* out=myeloma; *;

* getnames=no; *;

* run; *;

* proc print data=myeloma(obs=10); *;

* run; *;

**********************************************************************;

**********************************************************************;

* IMPLEMENTATION OF THRESHOLD REGRESSION *;

* Now we use the nonlinear procedure ('proc nlp') to maximize the sample

log-likelihood function, assuming that we adopt a Wiener process and

inverse Gaussian first-hitting-time model. Here there are three

covariates: 'age', 'gender', and 'treat'. Thus, we need to estimate four

regression coefficients for the initial value ‘ln(x0)’ of the Wiener process

(corresponding to 'b0', 'b1', 'b2', and 'b3') and four coefficients for the

mean parameter ‘mu’ (corresponding to 'g0', 'g1', 'g2', and 'g3'). Since

the nonlinear procedure will use an iterative algorithm, initial values

6

Page 7: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

for estimates may be specified before executing 'proc nlp'.*;

**********************************************************************;

data par1(type=est);

keep _type_ b0 b1 b2 b3 g0 g1 g2 g3;

_type_='parms'; b0 = 1; b1 = 0; b2 = 0;

b3 = 0; g0 = 1; g1 = 0; g2 = 0;

g3 = 0; output;

run;

**********************************************************************;

* The following is the nonlinear procedure 'proc nlp'. 'tech= NEWRAP' means

that we choose the Newton-Raphson method to maximize the log-likelihood

function. 'inest=par1' indicates that the initial estimates come from data

set 'par1'. 'cov=2' means that we use the Hessian matrix to compute an

approximate covariance matrix and obtain the standard errors for the

parameter estimates. 'parms' indicates estimated regression coefficients.

'pcov' and 'phes' display the covariance matrix and Hessian matrix,

respectively. 'pshort' restricts the amount of default output. *;

**********************************************************************;

ods rtf file='myeloma.rtf';

ods html file='myeloma.html'

headtext='<link rel=alternate media=print href="myeloma.rtf">';

proc nlp data=myeloma tech=NEWRAP inest=par1 outest=opar1

outmodel=model cov=2 pcov phes pshort;

max logf;

parms b0 b1 b2 b3 g0 g1 g2 g3;

lnx0=b1*age+b2*gender+b3*treat+b0;

mu=g1*age+g2*gender+g3*treat+g0;

d=-mu/exp(lnx0);

v=exp(-2*lnx0);

PI=constant('pi');

s =

7

Page 8: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

fail*(-.5*(log(2*PI*v*(time**3))+(d*time-1)**2/(v*time)))+

(1-fail)*log(probnorm((1-d*time)/sqrt(v*time))-exp(2*d/v)*probnorm(-(1+

d*time)/sqrt(v*time)));

logf = s;

run;

ods _all_ close;

**********************************************************************;

* OUTPUT *;

* The first three and the last lines of the above program uses the SAS Output

Delivery System (ODS) to produce HTML and RTF format output. If we delete

those lines and run the program, we get traditional SAS output (listing

output). *;

**********************************************************************;

Listing output: PROC NLP: Nonlinear Maximization

Gradient is computed using analytic formulas.

Hessian is computed using analytic formulas.

Hessian Matrix

b0 b1 b2 b3

b0 -292.8043153 -16342.29103 -167.9522894 -290.9081576

b1 -16342.29103 -929252.9377 -9280.123714 -16579.89413

b2 -167.9522894 -9280.123714 -167.9522894 -151.1981067

b3 -290.9081576 -16579.89413 -151.1981067 -566.6219864

g0 -112.0912575 -6216.849199 -65.71325746 -109.3091441

g1 -6216.849199 -351606.83 -3619.36394 -6184.568773

g2 -65.71325746 -3619.36394 -65.71325746 -60.29045687

g3 -109.3091441 -6184.568773 -60.29045687 -207.3500053

Hessian Matrix

8

Page 9: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

g0 g1 g2 g3

b0 -112.0912575 -6216.849199 -65.71325746 -109.3091441

b1 -6216.849199 -351606.83 -3619.36394 -6184.568773

b2 -65.71325746 -3619.36394 -65.71325746 -60.29045687

b3 -109.3091441 -6184.568773 -60.29045687 -207.3500053

g0 -141.4080762 -7781.386905 -85.09663323 -135.4596092

g1 -7781.386905 -437101.6457 -4670.577749 -7599.138766

g2 -85.09663323 -4670.577749 -85.09663323 -80.46164261

g3 -135.4596092 -7599.138766 -80.46164261 -247.8485808

Determinant = 1.0977379E20

Matrix has Only Negative Eigenvalues

Newton-Raphson Optimization with Line Search

Without Parameter Scaling

Parameter Estimates 8

Functions (Observations) 49

Optimization Start

Active Constraints 0 Objective Function -298.9373042

Max Abs Gradient Element 13942.816778

Objective Max Abs Slope of

Function Active Objective Function Gradient Step Search

Iter Restarts Calls Constraints Function Change Element Size Direction

1 0 2 0 -70.93339 228.0 2523.6 1.000 -458.2

2* 0 4 0 -59.55906 11.3743 1726.1 0.100 -118.8

3* 0 5 0 -51.15298 8.4061 814.4 0.400 -24.780

4* 0 6 0 -45.44170 5.7113 1110.5 1.000 -8.166

5* 0 7 0 -43.54292 1.8988 194.3 1.000 -2.920

6* 0 8 0 -43.10946 0.4335 90.2206 1.000 -0.615

9

Page 10: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

7* 0 9 0 -42.51736 0.5921 26.5363 1.000 -0.635

8* 0 10 0 -38.76730 3.7501 537.9 1.000 -4.768

9* 0 11 0 -33.64198 5.1253 665.8 1.000 -6.345

10* 0 12 0 -31.60764 2.0343 674.4 1.000 -3.431

11* 0 13 0 -31.37806 0.2296 27.3533 1.000 -0.333

12* 0 16 0 -30.78358 0.5945 466.1 4.703 -0.170

13* 0 17 0 -29.73203 1.0515 2942.2 1.000 -2.642

14* 0 18 0 -28.22719 1.5048 117.6 1.000 -2.705

15* 0 19 0 -28.05135 0.1758 153.1 1.000 -0.300

16* 0 20 0 -28.03813 0.0132 2.5290 1.000 -0.0245

17* 0 21 0 -28.03792 0.000207 0.2292 1.000 -0.0004

18* 0 22 0 -28.03792 8.237E-7 0.000804 1.000 -16E-7

19* 0 23 0 -28.03792 8.98E-10 9.695E-7 1.000 -18E-10

Optimization Results

Iterations 19 Function Calls 24

Hessian Calls 20 Active Constraints 0

Objective Function -28.0379201 Max Abs Gradient Element 9.6954136E-7

Slope of Search Direction -1.766049E-9 Ridge 0.0025047222

GCONV convergence criterion satisfied.

Optimization Results

Parameter Estimates

Gradient

Approx Approx Objective

N Parameter Estimate Std Err t Value Pr > |t| Function

1 b0 4.204297 0.513946 8.180421 3.751631E-10 2.3626066E-8

2 b1 -0.021501 0.008875 -2.422569 0.019915 -0.000000970

3 b2 -0.326445 0.161388 -2.022737 0.049655 -9.534265E-9

4 b3 -0.172050 0.068415 -2.514810 0.015926 -1.705446E-8

5 g0 -8.872168 2.804854 -3.163148 0.002937 -0.000000272

6 g1 0.052843 0.045101 1.171675 0.248092 -0.000000176

7 g2 1.715020 0.796717 2.152610 0.037285 3.0433854E-8

8 g3 0.717975 0.319585 2.246585 0.030115 -1.580613E-9

10

Page 11: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Value of Objective Function = -28.0379201

Hessian Matrix

b0 b1 b2 b3

b0 -3080.039749 -161500.1879 -1313.87271 -2205.434811

b1 -161500.1879 -8702302.606 -68676.00514 -116379.5812

b2 -1313.87271 -68676.00514 -1313.87271 -960.3437895

b3 -2205.434811 -116379.5812 -960.3437895 -3882.787199

g0 -649.7078481 -34668.42972 -335.4503895 -543.9908595

g1 -34668.42972 -1896834.921 -17903.96217 -29362.67332

g2 -335.4503895 -17903.96217 -335.4503895 -288.3116265

g3 -543.9908595 -29362.67332 -288.3116265 -958.1327324

Hessian Matrix

g0 g1 g2 g3

b0 -649.7078481 -34668.42972 -335.4503895 -543.9908595

b1 -34668.42972 -1896834.921 -17903.96217 -29362.67332

b2 -335.4503895 -17903.96217 -335.4503895 -288.3116265

b3 -543.9908595 -29362.67332 -288.3116265 -958.1327324

g0 -151.3731966 -8234.231002 -92.41125304 -148.3763404

g1 -8234.231002 -457925.7652 -5023.688094 -8191.124714

g2 -92.41125304 -5023.688094 -92.41125304 -90.89439622

g3 -148.3763404 -8191.124714 -90.89439622 -267.1056159

Determinant = 1.0092775E20

Matrix has Only Negative Eigenvalues

Covariance Matrix 2: H = (NOBS/d) inv(G)

b0 b1 b2 b3

b0 0.2641407491 -0.004313671 -0.020848402 0.0020665212

b1 -0.004313671 0.0000787744 0.0000988431 -0.000123362

b2 -0.020848402 0.0000988431 0.0260459462 0.0004133177

11

Page 12: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

b3 0.0020665212 -0.000123362 0.0004133177 0.0046805924

g0 -1.382558845 0.0214494189 0.117743015 -0.01128899

g1 0.0218647751 -0.000382897 -0.000598503 0.0006333133

g2 0.1383698768 -0.000744467 -0.120547164 -0.00323911

g3 0.0017463597 0.0005417751 -0.004032575 -0.019931527

Covariance Matrix 2: H = (NOBS/d) inv(G)

g0 g1 g2 g3

b0 -1.382558845 0.0218647751 0.1383698768 0.0017463597

b1 0.0214494189 -0.000382897 -0.000744467 0.0005417751

b2 0.117743015 -0.000598503 -0.120547164 -0.004032575

b3 -0.01128899 0.0006333133 -0.00323911 -0.019931527

g0 7.8672081962 -0.118948021 -0.878685632 -0.052283148

g1 -0.118948021 0.0020340797 0.005625891 -0.002281071

g2 -0.878685632 0.005625891 0.6347575724 0.04134739

g3 -0.052283148 -0.002281071 0.04134739 0.1021345586

Factor sigm = 1.1951219512

Determinant = 4.123701E-20

Matrix has 8 Positive Eigenvalue(s)

Approximate Correlation Matrix of Parameter Estimates

b0 b1 b2 b3

b0 1 -0.945663895 -0.251353536 0.0587721741

b1 -0.945663895 1 0.0690054946 -0.203160897

b2 -0.251353536 0.0690054946 1 0.0374337702

b3 0.0587721741 -0.203160897 0.0374337702 1

g0 -0.959081692 0.8616139604 0.2601086298 -0.058829391

g1 0.9432858771 -0.956545591 -0.082226574 0.2052505081

g2 0.3379246763 -0.105280779 -0.93752527 -0.0594253

g3 0.0106323591 0.1910029953 -0.078185457 -0.911598911

Approximate Correlation Matrix of Parameter Estimates

12

Page 13: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

g0 g1 g2 g3

b0 -0.959081692 0.9432858771 0.3379246763 0.0106323591

b1 0.8616139604 -0.956545591 -0.105280779 0.1910029953

b2 0.2601086298 -0.082226574 -0.93752527 -0.078185457

b3 -0.058829391 0.2052505081 -0.0594253 -0.911598911

g0 1 -0.940292385 -0.393205205 -0.058326384

g1 -0.940292385 1 0.1565681395 -0.158259024

g2 -0.393205205 0.1565681395 1 0.1623894446

g3 -0.058326384 -0.158259024 0.1623894446 1

Determinant = 1.5669555E-8

Matrix has 8 Positive Eigenvalue(s)

HTML output viewed with Microsoft Internet Explore:

The SAS System

PROC NLP: Nonlinear Maximization

Gradient is computed using analytic formulas.

Hessian is computed using analytic formulas.

Hessian Matrix

b0 b1 b2 b3 g0 g1 g2 g3

b

0

-292.80431

53

-16342.291

03

-167.95228

94

-290.90815

76

-112.09125

75

-6216.8491

99

-65.713257

46

-109.30914

41

b

1

-16342.291

03

-929252.93

77

-9280.1237

14

-16579.894

13

-6216.8491

99-351606.83

-3619.3639

4

-6184.5687

73

b

-167.95228

94

-9280.1237

14

-167.95228 -151.19810 -65.713257 -3619.3639 -65.713257

2 94 67 46 4 46

-60.290456

87

b

3

-290.90815

76

-16579.894

13

-151.19810

67

-566.62198

64

-109.30914

41

-6184.5687

73

-60.290456

87

-207.35000

53

13

Page 14: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Hessian Matrix

b0 b1 b2 b3 g0 g1 g2 g3

g

0

-112.09125

75

-6216.8491

99

-65.713257

46

-109.30914

41

-141.40807

62

-7781.3869

05

-85.096633

23

-135.45960

92

g

1

-6216.8491

99 -351606.83

-3619.3639 -6184.5687 -7781.3869

4 73 05

-437101.64

57

-4670.5777

49

-7599.1387

66

g

2

-65.713257

46

-3619.3639

4

-65.713257

46

-60.290456

87

-85.096633

23

-4670.5777

49

-85.096633

23

-80.461642

61

g

3

-109.30914

41

-6184.5687

73

-60.290456

87

-207.35000

53

-135.45960

92

-7599.1387

66

-80.461642

61

-247.84858

08

Determinant = 1.0977379E20

Matrix has Only Negative Eigenvalues

The SAS System

PROC NLP: Nonlinear Maximization

Newton-Raphs Line Search on Optimization with

Without Parameter Scaling

Parameter Estimates 8

Functions (Observations) 49

O iz ti ptim a on Start

Active Constraints 0 Objective Function -298.9373042

Max Abs Gradient Element 13942.816778

Iteration

Restarts

Function

Calls

Active

Constraints

Objective

Function

Objective

Function

Change

Max Abs

Gradient

Element

Step

Size

Slope of

Search

Direction

14

Page 15: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Iteration

Restarts

Function

Calls

Active

Constraints

Objective

Function

Objective

Function

Change

Max Abs

Gradient

Element

Step

Size

Slope of

Search

Direction

1 0 2 0 -70.93339 228.0 2523.6 1.000 -458.2

2 * 0 4 0 -59.55906 11.3743 1726.1 0.100 -118.8

3 * 0 5 0 -51.15298 8.4061 814.4 0.400 -24.780

4 * 0 6 0 -45.44170 5.7113 1110.5 1.000 -8.166

5 * 0 7 0 -43.54292 1.8988 194.3 1.000 -2.920

6 * 0 8 0 -43.10946 0.4335 90.2206 1.000 -0.615

7 * 0 9 0 -42.51736 0.5921 26.5363 1.000 -0.635

8 * 0 10 0 -38.76730 3.7501 537.9 1.000 -4.768

9 * 0 11 0 -33.64198 5.1253 665.8 1.000 -6.345

10 * 0 12 0 -31.60764 2.0343 674.4 1.000 -3.431

11 * 0 13 0 -31.37806 0.2296 27.3533 1.000 -0.333

12 * 0 16 0 -30.78358 0.5945 466.1 4.703 -0.170

13 * 0 17 0 -29.73203 1.0515 2942.2 1.000 -2.642

14 * 0 18 0 -28.22719 1.5048 117.6 1.000 -2.705

15 * 0 19 0 -28.05135 0.1758 153.1 1.000 -0.300

16 * 0 20 0 -28.03813 0.0132 2.5290 1.000 -0.0245

17 * 0 21 0 -28.03792 0.000207 0.2292 1.000 -0.0004

18 * 0 22 0 -28.03792 8.237E-7 0.000804 1.000 -16E-7

19 * 0 23 0 -28.03792 8.98E-10 9.695E-7 1.000 -18E-10

Optimizati lts on Resu

15

Page 16: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Optimization Results

Iterations 19 Function Calls 24

Hessian Calls 20 Active Constraints 0

Objective Function -28.0379201 Max Abs Gradient Element 9.6954136E-7

Slope of Search Direction -1.766049E-9 Ridge 0.0025047222

GCONV convergence criterion satisfied.

The SAS System

PROC ine izNLP: Nonl ar Maxim ation

Optimizati on Results

Parameter Estimates

N Parameter Estimate

Approx

Std Err t Value

Approx

Pr > |t|

Gradient

Objective

Function

16

Page 17: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Optimization Results

Parameter Estimates

N Parameter Estimate

Approx

Std Err t Value

Approx

Pr > |t|

Gradient

Objective

Function

1 b0 4.204297 0.513946 8.180421 3.751631E-10 2.3626066E-8

2 b1 -0.021501 0.008875 -2.422569 0.019915 -0.000000970

3 b2 -0.326445 0.161388 -2.022737 0.049655 -9.534265E-9

4 b3 -0.172050 0.068415 -2.514810 0.015926 -1.705446E-8

5 g0 -8.872168 2.804854 -3.163148 0.002937 -0.000000272

6 g1 0.052843 0.045101 1.171675 0.248092 -0.000000176

7 1.715020 0.796717g2 2.152610 0.03728 3.043385 5 4E-8

8 g3 0.7 0.319585 2.246585 0.030115 - 9 17975 1.580613E-

Value of Objective Function = -28.0379201

Determinant = 1.0092775E20

Matrix has Only Negative Eigenvalues

Hessian Matrix

b0 b1 b2 b3 g0 g1 g2 g3

b0 -3080.039749 -161500.1879 -1313.87271 -2205.434811 -649.7078481 -34668.42972 -335.4503895 -543.9908595

b1 -161500.1879 -8702302.606 -68676.00514 -116379.5812 -34668.42972 -1896834.921 -17903.96217 -29362.67332

b2 -1313.87271 -68676.00514 -1313.87271 -960.3437895 -335.4503895 -17903.96217 -335.4503895 -288.3116265

b3 -2205.434811 -116379.5812 -960.3437895 -3882.787199 -543.9908595 -29362.67332 -288.3116265 -958.1327324

g0 -649.7078481 -34668.42972 -335.4503895 -543.9908595 -151.3731966 -8234.231002 -92.41125304 -148.3763404

g1 -34668.42972 -1896834.921 -17903.96217 -29362.67332 -8234.231002 -457925.7652 -5023.688094 -8191.124714

17

Page 18: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

g2 -335.4503895 -17903.96217 -335.4503895 -288.3116265 -92.41125304 -5023.688094 -92.41125304 -90.89439622

-543.9908 -29362.67 -288.3116 -958.1327 -148.3763 -8191.124 -90.89439 -267.1056g3 595 332 265 324 404 714 622 159

Determ 9inant = 1.00 2775E20

Matrix has Only Negative Eigenvalues

Covariance Matrix 2: H = (NOBS/d) inv(G)

b0 b1 b2 b3 g0 g1 g2 g3

b 0.264140749-0.004313671 -0.020848402

0.002066521-1.382558845

0.021864775 0.138369876 0.001746359

0 1 2 1 8 7

b

1 -0.004313671

0.000078774

4

0.000098843

1-0.000123362

0.021449418

9- 0.000744467

0.000541775-0.000382897

1

b-0.020848402

0.000098843 0.026045946 0.000413317

2 1 2 70.117743015 -0.000598503 -0.120547164 -0.004032575

b

3

0.002066521

2 -0.000123362

0.000413317

7

0.004680592

4-0.01128899

0.000633313

3-0.00323911 -0.019931527

g

0 -1.382558845

0.021449418

9 0.117743015 -0.01128899

7.867208196

2-0.118948021 -0.878685632 -0.052283148

g

1

0.021864775

1 -0.000382897 -0.000598503

0.000633313

3-0.118948021

0.002034079

70.005625891 -0.002281071

g 0.138369876

2 8 -0.000744467 -0.120547164 -0.00323911 -0.878685632 0.005625891

0.634757572

4 0.04134739

g

3

0.001746359

7

0.000541775

1 -0.004032575 -0.019931527 -0.052283148 -0.002281071 0.04134739

0.102134558

6

Factor sigm = 1.1951219512

Determinant = 4.123701E-20

Matri e Eix has 8 Positiv genvalue(s)

Approximate Correl x of P stimaation Matri arameter E tes

b0 b1 b2 b3 g0 g1 g2 g3

b 1 -0.945663895 -0.251353536 0.058772174 -0.959081692 0.943285877 0.337924676 0.010632359

18

Page 19: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Approximate Correlation Matrix of Parameter Estimates

b0 b1 b2 b3 g0 g1 g2 g3

0 1 1 3 1

b

1 -0.945663895 1

0.069005494

6-0.203160897

0.861613960

4-0.956545591 -0.105280779

0.191002995

3

b

2 -0.251353536

0.069005494

6 1

0.037433770

2

0.260108629

8-0.082226574 -0.93752527 -0.078185457

b

3

0.058772174

1 -0.203160897

0.037433770

21 -0.058829391

0.205250508

1-0.0594253 -0.911598911

g

0 -0.959081692

0.861613960 0.260108629

8-0.058829391 1 -0. -0 -0.940292385 .393205205 058326384

4

g

1

0.943285877

1 -0.956545591 -0.082226574

0.205250508

1-0.940292385 1

0.156568139-0.158259024

5

g

2

0.337924676

3 -0.105280779 -0.93752527 -0.0594253 -0.393205205

0.156568139

51

0.162389444

6

g

3

0.010632359

1

0.191002995

3 -0.078185457 -0.911598911 -0.058326384 -0.158259024

0.162389444

6 1

Determinant = 1.5669555E-8

Matrix has 8 Positive Eigenvalue(s)

Gradient is computed using analytic formulas.

Hessian is computed using analytic formulas.

RTF output viewed with Microsoft Word:

19

Page 20: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Hessian Matrix

b0 b2 b3 g0 g1b1 g2 g3

b0 -292.8043153 - -112.0912575 -6216.849199 -65. 325746 -109.309144116342.29103 -167.9522894 -290.9081576 71

b1 -16342.29103 -929252.9377 -9280.123714 -16579.89413 -6216.849199 -351606.83 -3619.36394 -6184.568773

b2 -167.9522894 -9280.123714 -167.9522894 -151.1981067 -65.713257 1 -65.713257 0.2904568746 -36 9.36394 46 -6

b3 -290.9 6 -15 67 9864 91441 4. -6 07.3500053081576 -1 579.89413 1.19810 -566.621 -109.30 -618 568773 0.29045687 -2

g0 -112.0912575 6216.8491 -65.71 46 -109.3 1441 0762 38 3323 -135.4596092 - 99 3257 09 -141.408 -7781. 6905 -85.0966

g1 -6216.849199 -351606. -3619. 394 -6184. 8773 6905 101. 7749 599.13876683 36 56 -7781.38 -437 6457 -4670.57 -7

g2 -65.71325746 -3619.363 -65.71 46 -60.29 5687 3323 577 33 0.4616426194 3257 04 -85.0966 -4670. 749 -85.0966 23 -8

g3 -109.3091441 -6184.5687 -60.29 87 -207.3 0053 6092 13 4261 -247.848580873 0456 50 -135.459 -7599. 8766 -80.4616

Determinant = 1.0977379E20

Matrix has Only Nega s

Newton-Raphson Optimization with Line Search

Without Parame

tive Eigenvalue

ter Scaling

Parameter Estimates 8

Functions (Observations) 49

Optimization Start

Active Co straints 0n Objective Func -29 30tion 8.937 42

Max Abs Gradient El ent 1394 6778em 2.81

Iteration Restarts

Function

Calls

Active

Constraints

Objective

Function

Objective

Function

Change

Max Abs

Gradient

Element

Step

Size

Slope of

Search

Direction

1 0 2 0 -70.93339 228.0 2523.6 1.000 -458.2

2 * 0 4 0 -59.55906 11.3743 1726.1 0.100 -118.8

3 * 0 5 0 -51.15298 8.4061 814.4 0.400 -24.780

20

Page 21: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Iteration Restarts

Function

Calls

Active

Constraints

Objective

Function

Objective

Function

Change

Max Abs

Gradient

Element

Step

Size

Slope of

Search

Direction

4 * 0 6 0 -45.44170 5.7113 1110.5 1.000 -8.166

5 * 0 7 0 -43.54292 1.8988 194.3 1.000 -2.920

6 * 0 8 0 -43.10946 0.4335 90.2206 1.000 -0.615

7 * 0 9 0.5921 26.5363 1.000 -0.6350 -42.51736

8 * 0 10 0 -38.76730 3.7501 537.9 1.000 -4.768

9 * 0 11 5.1253 665.8 1.000 -6.3450 -33.64198

10 * 0 12 2.0343 674.4 1.000 -3.4310 -31.60764

11 * 0 13 0 -31.37806 0.2296 27.3533 -0.3331.000

12 * 0 16 -30.78358 0 466.1 -0.1700 .5945 4.703

13 * 0 17 0 -29.73203 1.0515 2942.2 1.000 -2.642

14 * 0 18 0 -28.22719 1.5048 117.6 1.000 -2.705

15 * 0 19 0 -28.05135 0.1758 153.1 1.000 -0.300

16 * 0 20 0 -28.03813 0.0132 2.5290 1.000 -0.0245

17 * 0 21 0 -28.03792 0.000207 0.2292 1.000 -0.0004

18 * 0 22 0 -28.03792 8.237E-7 0.000804 1.000 -16E-7

19 * 0 23 0 -28.03792 8.98E-10 9.695E-7 1.000 -18E-10

Optimization Results

Iterations 19 Function Calls 24

Hessian Calls 20 Active Constraints 0

Objective Function -28.0379201 Max Abs Gradient Element 9.6954136E-7

Slope of Search Direction -1.766049E-9 Ridge 0.0025047222

21

Page 22: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

GCONV con satisfied. vergence criterion

Optimization Results

Parameter Estimates

N Parameter Estimate

Approx

Std Err t

Approx

Pr > |t|

Gradient

Objective

Value Function

1 b0 4.204297 0.513946 8.1 516 6280421 3.7 31E-10 2.3 6066E-8

2 b1 -0.021501 0.008875 -2.4 0 .0022569 .019915 -0 0000970

3 b2 -0.326445 0.161388 -2.0 0 .5322737 .049655 -9 4265E-9

4 b3 -0.172050 0.068415 -2.5 0 .7014810 .015926 -1 5446E-8

5 g0 -8.872168 2.804854 -3.163148 0.002937 -0.000000272

6 g1 0.052843 0.045101 1.171675 0.248092 -0.000000176

7 g2 1.7150 0.037285 3.0433854E-8 20 0.796717 2.152610

8 g3 0.717975 0.319585 2.246585 0.030115 -1.580613E-9

Value of Objective Function = -28.037

9201

Hessian Matrix

b0 b1 b2 b3 g0 g1 g2 g3

b0 -3080.039749 -161500.1879 -1313.87271 -2205.434811 -64 -339.7078481 -34668.42972 5.4503895 -543.9908595

b1 -161500.1879 - - - - 28702302.606 -68676.00514 116379.5812 -34668.42972 1896834.921 17903.96217 - 9362.67332

b2 -1313.87271 -68676.00514 -1313.87271 -960.3437895 - -33 -28335.4503895 -17903.96217 5.4503895 8.3116265

b3 -2205.434811 -116379.5812 -38 -54 -29 -28 -95-960.3437895 82.787199 3.9908595 362.67332 8.3116265 8.1327324

g0 -649.7078481 -34668.42972 -335.4503895 -543.9908595 -15 -82 -92.1.3731966 34.231002 41125304 -148.3763404

g1 -34668.42972 -1896834.921 -17903.96217 -29362.67332 -8234.231002 -457925.7652 -5023.688094 -8191.124714

g2 -335.4503895 -17903.96217 -335.4503895 304 -5023.688094 -92.41125304 -90.89439622-288.3116265 -92.41125

g3 -543.9908595 -29362.67332 -288.3116265 -958.1327324 -148.3763404 -8191.124714 -90.89439622 -267.1056159

Determinant = 1.0092775E20

Matrix has Only Negative Eigenvalues

22

Page 23: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Covariance Matrix 2: H = (NOBS/d) in ) v(G

b0 b1 b2 b3 g0 g1 g2 g3

b0 0.2641407491 -0.004313671 -0.020848402 0.0020665212 -1.382558845 0.0218647751 0.1383698768 0.0017463597

b1 -0.004313671 0.0000787744 0.0000988431 -0.-0.000123362 0.0214494189 -0.000382897 000744467 0.0005417751

b2 -0.020848402 0.0000988431 0.0260459462 0.0004133177 -0.0.117743015 -0.000598503 120547164 -0.004032575

b3 0.0020665212 -0.000123362 0.0004133177 0.0046805924 -0.01128899 0.0006333133 -0.00323911 -0.019931527

g0 -1.382558845 0.0214494189 0.117743015 -0.01128899 7. -0.118948028672081962 1 -0.878685632 -0.052283148

g1 0.0218647751 -0.000382897 -0.000598503 0.00 0.0056258906333133 -0.118948021 0.0020340797 1 -0.002281071

g2 0.1383698768 -0.000744467 -0.120547164 -0.00323911 -0.878685632 0.005625891 0.6347575724 0.04134739

g3 0.0017463597 0.0005417751 -0.004032575 -0.019931527 -0.052283148 -0.002281071 0.04134739 0.1021345586

Factor sigm = 1.1951219512

Determinant = 4.123701E-20

Matrix has 8 Positive Eigenvalue(s)

Approximate Correlation Matrix of Parameter Estimates

b0 b1 b2 b3 g0 g1 g2 g3

b0 911 -0.945663895 -0.251353536 0.0587721741 -0.959081692 0.9432858771 0.3379246763 0.01063235

b1 -0.105280779 0.1910029953-0.945663895 1 0.0690054946 -0.203160897 0.8616139604 -0.956545591

b2 90054946 1 0.0374337702 0.2601086298 -0.082226574 -0.93752527 -0.078185457-0.251353536 0.06

b3 989110.0587721741 -0.203160897 0.0374337702 1 -0.058829391 0.2052505081 -0.0594253 -0.9115

g0 0292385 -0.393205205 -0.058326384-0.959081692 0.8616139604 0.2601086298 -0.058829391 1 -0.94

g1 0.9432858771 -0.956545591 -0.082226574 0.2052505081 -0.940292385 1 0.1565681395 -0.158259024

g2 1 0.16238944460.3379246763 -0.105280779 -0.93752527 -0.0594253 -0.393205205 0.1565681395

g3 0.0106323591 0.1910029953 -0.078185457 -0.911598911 -0.058326384 -0.158259024 0.1623894446 1

Determinant = 1.5669555E-8

Matrix has 8 Positive Eigenvalue(s)

23

Page 24: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Implementing Threshold Regression in R

ode for implementing threshold regression in R code. Two types of R

The sample uses a case illustration and data set that are described in the ening paragraph of the preceding section on SAS implementation.

reading data into R from an external text file by using 'read.table'

txt")

esponding columns

me<-myeloma [,2]

eat<-myeloma [,5]

components of vector 'par'(parameter) correspond to the

regression

coefficients for lnx0, respectively, and the last four correspond to those

-function(par) {par[1]* age+par[2]*gender+par[3]*treat+par[4]}

transformation into functions 'd' and 'v'

-function(par) {-mu(par)/exp(lnx0(par))}

-function(par) {exp(-2*lnx0(par))}

he optimization function we choose

Here is sample cfunctions for nonlinear optimization (‘nlm’ and ‘optim’) and their output are displayed. op

# DATA INPUT

#

myeloma<-read.table("C:/my file/myeloma data.

# assigning the variable names to corr

ti

age<-myeloma [,3]

gender<-myeloma [,4]

tr

fail<-myeloma [,6]

# defining lnx0 and mu

# the first four

4

#

for mu

lnx0<

mu<-function(par) {par[5]*age+par[6]*gender+par[7]*treat+par[8]}

#

d<

v<

# defining minus log-likelihood since t

24

Page 25: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

# carries out a minimization of 'logf'

tion(par) {-sum(fail*(-.5*(log(2*pi*v(par)*(time^3))+

(par)*time-1)^2/(v(par)*time))))-

orm((1-d(par)*time)

par)*time))-exp(2*d(par)/v(par))*pnorm(-(1+d(par)*time)/sqrt(v(

r)*time))))}

, 0, 4, 0, 0, 0, 1)' is the initial vector of

r

y sensitive to the initial values.

'iterlim'specifies the maximum number of iterations

nlm(logf, c(0, 0, 0, 4, 0, 0, 0, 1), iterlim=200, hessian = TRUE)

defining the standard error

tderr=sqrt(diag(solve(est$hessian)))

TPUT

for a summary

rr

et the estimates. For example, we choose the

ch is a quasi-Newton method. The first argument is the

ar'.

, 1, 0, 0, 0, 1), logf, method = "BFGS", hessian = TRUE)

standard errors

logf<-func

(d

sum((1-fail)*log(pn

/sqrt(v(

pa

# using 'nlm' (nonlinear minimization) to obtain the estimates 'par'

# the second argument 'c(0, 0

pa

# be careful as for our case example, 'nlm' (a Newton-type algorithm) is

# ver

#

est=

#

s

# OU

est#

stde

# we can also use 'optim' to g

#'BFGS' method, whi

#initial vector of 'p

est=optim(c(0, 0, 0

est

# list estimates and

cbind(est$p, stderr)

25

Page 26: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

R output: > # DATA INPUT

t file by using 'read.table'

matosis.txt")

ssigning the variable names to corresponding columns

e<-myelomatosis[,3]

regression

]*treat+par[4]} <-function(par) {par[5]*age+par[6]*gender+par[7]*treat+par[8]}

ning minus loglikelihood since the optimization function we choose out a minimization of 'logf'

f<-function(par) {-sum(fail*(-.5*(log(2*pi*v(par)*(time^3))+ m((1-fail)*log(pnorm((1-d(par)*time)

rt(v(par)*time))-exp(2*d(par)/v(par))*pnorm(-(1+d(par)*time)/sqrt(v(par)*time))))}

ng 'nlm' (nonlinear minimization) to obtain the estimates 'par'

> # reading data into R from an external tex> > myelomatosis<-read.table("C:/my file/myelo> > > # a> > time<-myelomatosis[,2] > ag> gender<-myelomatosis[,4] > treat<-myelomatosis[,5] > fail<-myelomatosis[,6] > > > # defining lnx0 and mu > # the first four components of vector 'par'(parameter) correspond to the 4> # coefficients for lnx0, respectively, and the last four correspond to those for mu > > lnx0<-function(par) {par[1]* age+par[2]*gender+par[3> mu> > > # transformation into functions 'd' and 'v' > > d<-function(par) {-mu(par)/exp(lnx0(par))} > v<-function(par) {exp(-2*lnx0(par))} > > > # defi> # carries > > log+ (d(par)*time-1)^2/(v(par)*time))))- su+ /sq> > > # usi

26

Page 27: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

> # the second argument 'c(0, 0, 0, 4, 0, 0, 0, 1)' is the initial vector of par ul that for our myelomatosis example, 'nlm' (Newton-type algorithm) is very

the initial values. # 'iterlim'specifies the maximum number of iterations

: NA/Inf replaced by maximum positive value eplaced by maximum positive value

: NA/Inf replaced by maximum positive value replaced by maximum positive value

7182341 4.21219438 0.05354565 1.71633949

] -7.282431e-04 -3.883750e-05 1.966909e-04 -2.124487e-05 4.821368e-05 18985e-06 4.241704e-05 -4.393308e-07

hessian [,1] [,2] [,3] [,4] [,5] [,6]

881.64 69208.8438 117358.5039 162828.0286 1900978.101 17944.77651 ,] 69208.84 1315.0375 961.8332 1315.5942 17903.397 335.53301

358.50 961.8332 3888.3610 2209.6520 29367.550 288.46243

17903.3970 29367.5501 34672.1578 457859.509 5023.35160

> # be caref> # sensitive to>> > est=nlm(logf, c(0, 0, 0, 4, 0, 0, 0, 1), iterlim=200, hessian = TRUE) Warning messages: 12: NA/Inf r3: NA/Inf replaced by maximum positive value 4: NA/Inf replaced by maximum positive value 56: NA/Inf> > > # defining the standard error > > stderr=sqrt(diag(solve(est$hessian))) > > > # OUTPUT > > est#for a summary $minimum [1] 28.03808 $estimate [1] -0.02164592 -0.32661780 -0.1[7] 0.71697326 -8.91144522 $gradient [1[6] -8.8 $ [1,] 8835[2[3,] 117[4,] 162828.03 1315.5942 2209.6520 3085.5563 34672.158 335.57311 [5,] 1900978.10

27

Page 28: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

[6,] 17944.78 335.5330 288.4624 335.5731 5023.352 92.41347 29436.27 288.4569 958.5019 544.2803 8190.571 90.90154

29436.26640 34746.34151

958.50190 544.23685 544.28031 649.94152

,] 267.11352 148.37780 48.37780 151.36806

code

] 150

rr ] 0.01087984 0.15151974 0.06177786 0.73354293 0.06057708 0.81799590 0.28657324

46

n also use 'optim' to get the estimates. For example, we choose 'BFGS' method,

[7,] [8,] 34746.34 335.5337 544.2368 649.9415 8233.396 92.41363 [,7] [,8] [1,] [2,] 288.45686 335.53367 [3,] [4,] [5,] 8190.57053 8233.39563 [6,] 90.90154 92.41363 [7[8,] 1 $[1] 3 $iterations [1 > stde[1[8] 4.230269> >> # we ca> # which is a quasi-Newton method. The first argument is the initial vector of 'par'.> > est=optim(c(0, 0, 0, 1, 0, 0, 0, 1), logf, method = "BFGS", hessian = TRUE) > est $par [1] -0.03021783 -0.30351978 -0.14957256 4.64111087 0.09361074 [6] 1.60330486 0.61021962 -10.94116789 $value [1] 28.80981 $counts function gradient 204 38 $convergence

28

Page 29: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

[1] 0 $message

L

[,2] [,3] [,4] [,5] [,6]

119959.8567 160319.5295 1862008.075 18018.27678 56.8486 1380.7370 18007.774 341.13454 49.8995 2338.8182 29524.640 299.32735 38.8182 3124.3080 34302.920 341.13454 524.6401 34302.9200 457695.076 5034.05744

299.3274 341.1345 5034.057 92.59602 969.8646 553.1733 8195.211 91.35890 553.1735 649.2431 8223.862 92.59602

,] 29530.4980 34326.56170 ,] 299.3274 341.13454 ,] 969.8646 553.17346

,] 8195.2109 8223.86171

t$p, stderr)

stderr

,] -0.14957256 0.06177786 64111087 0.73354293

0.06057708 0.81799590 0.28657324

89 4.23026946

NUL $hessian [,1] [1,] 8489727.98 70373.9964[2,] 70374.00 1380.7370 10[3,] 119959.86 1056.8486 40[4,] 160319.53 1380.7370 23[5,] 1862008.07 18007.7742 29[6,] 18018.28 341.1345 [7,] 29530.50 299.3274 [8,] 34326.56 341.1345 [,7] [,8] [1[2[3[4,] 553.1733 649.24314 [5[6,] 91.3589 92.59602 [7,] 267.4703 148.33323 [8,] 148.3332 151.00833 > > # list estimate and Standard error >> cbind(es [1,] -0.03021783 0.01087984 [2,] -0.30351978 0.15151974 [3[4,] 4.[5,] 0.09361074[6,] 1.60330486 [7,] 0.61021962[8,] -10.941167

29

Page 30: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

Implementing Threshold Regression in Stata Implementation of threshold regression in Stata requires two files, in addition to the data file. One file is a ‘do’ file that controls the main execution. The ‘do’ file calls in the data

t and, a little later, an ‘ado’ file. The ‘ado’ file contains the computational subroutine. entation using the myeloma case illustration. Refer to the

f the preceding section on SAS implementation for a description of e case illustration and data set.

seWe illustrate the Stata implemopening paragraph oth Do file

*The routine begins by clearing the data set and setting memory requirements.

The Stata version is set to version 7.0.

clear

set mem 200m

set mat 200

set more off

version 7.0

*The data is entered from a text file called melanoma.txt

infile input id time age gender treat fail using “melanoma.txt”

*The failure indicator and time variables are set to variable names used

in the subroutine

global f "fail"

global t "time"

*Statement ‘ml model’ is a Stata maximum likelihood routine. The model fitting

method (lf) and subroutine (treg.ado) are specified. The covariates are then

listed for each parameter as shown. The parameters are ‘lnx0’ and ‘m’. The

covariates are ‘age’, ‘gender’ and ‘treat’. Method 'lf' is a Stata gradient

(hill climbing) method. File treg.ado contains the likelihood subroutine

function. Statement ‘ml init’ allows initial values to be specified. Initial

values must include those for the regression coefficients (0 is chosen in

each case here) and one value for each intercept (1 and -1 are chosen for

lnx0 and m here). Statement ‘ml maximize’ starts the optimization.

#delimit ;

30

Page 31: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

ml model lf treg

(lnx0: age gender treat)

(m: age gender treat);

ml init

0 0 0 1

0 0 0 -1, copy;

#delimit cr

ml maximize

*In addition to tabular output of regression results, the following optional

commands output the vector of regression coefficient estimates and their

variance-covariance matrix and correlation matrix.

matrix list e(b)

matrix list e(V)

matrix C=corr(e(V))

matrix list C

Ado file

*The following sequence of commands defines the contribution of each data

point to the sample log-likelihood function.

program define treg

version 7.0

args lnf lnx0 m

tempvar d v t

quietly gen double `d'=-(`m')/exp(`lnx0')

quietly gen double `v'=exp(-2*(`lnx0'))

quietly gen double `t'= $t

quietly replace `lnf'= /*

*/ $f*(-.5*(ln(2*_pi*`v'*(`t'^3))+(`d'*`t'-1)^2/(`v'*`t'))) /*

*/ +(1-$f)*ln(norm((1-`d'*`t')/sqrt(`v'*`t'))- /*

*/ exp(2*`d'/`v')*norm(-(1+`d'*`t')/sqrt(`v'*`t')))

End

Regression output including optional matrices

31

Page 32: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

initial: log likelihood = -87.302643

rescale: log likelihood = -45.752893

rescale eq: log likelihood = -45.752893

Iteration 0: log likelihood = -45.752893 (not concave)

Iteration 1: log likelihood = -41.248496 (not concave)

Iteration 2: log likelihood = -39.431809 (not concave)

Iteration 3: log likelihood = -38.493665 (not concave)

Iteration 4: log likelihood = -37.590725 (not concave)

Iteration 5: log likelihood = -34.068905 (not concave)

Iteration 6: log likelihood = -33.777801 (not concave)

Iteration 7: log likelihood = -33.593659 (not concave)

Iteration 8: log likelihood = -33.402762 (not concave)

Iteration 9: log likelihood = -32.483353

Iteration 10: log likelihood = -31.584248

Iteration 11: log likelihood = -29.243274

Iteration 12: log likelihood = -28.196583

Iteration 13: log likelihood = -28.048789

Iteration 14: log likelihood = -28.037928

Iteration 15: log likelihood = -28.03792

Iteration 16: log likelihood = -28.03792

Number of obs = 49

Wald chi2(3) = 21.74

Log likelihood = -28.03792 Prob > chi2 = 0.0001

------------------------------------------------------------------------------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

lnx0 |

age | -.0215015 .0081187 -2.65 0.008 -.0374139 -.0055891

gender | -.3264446 .1476263 -2.21 0.027 -.6157868 -.0371024

treat | -.1720504 .0625812 -2.75 0.006 -.2947074 -.0493934

_cons | 4.204297 .4701233 8.94 0.000 3.282872 5.125722

-------------+----------------------------------------------------------------

m |

age | .0528434 .0412551 1.28 0.200 -.0280152 .133702

gender | 1.71502 .7287818 2.35 0.019 .2866343 3.143406

treat | .7179748 .2923346 2.46 0.014 .1450096 1.29094

_cons | -8.87217 2.565691 -3.46 0.001 -13.90083 -3.843508

------------------------------------------------------------------------------

32

Page 33: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

. matrix list e(b)

e(b)[1,8]

lnx0: lnx0: lnx0: lnx0: m: m:

age gender treat _cons age gender

y1 -.0215015 -.32644459 -.17205037 4.2042972 .05284345 1.7150203

m: m:

treat _cons

y1 .71797482 -8.8721697

. matrix list e(V)

symmetric e(V)[8,8]

lnx0: lnx0: lnx0: lnx0: m:

age gender treat _cons age

lnx0:age .00006591

lnx0:gender .00008271 .02179353

lnx0:treat -.00010322 .00034584 .00391641

lnx0:_cons -.0036094 -.01744453 .00172913 .22101592

m:age -.00032038 -.00050079 .00052991 .01829504 .00170199

m:gender -.00062292 -.10086586 -.00271029 .11577865 .00470737

m:treat .00045332 -.00337419 -.01667739 .00146124 -.00190865

m:_cons .0179475 .09851934 -.00944586 -1.1568359 -.09952805

m: m: m:

gender treat _cons

m:gender .53112288

m:treat .03459676 .0854595

m:_cons -.73522511 -.0437471 6.5827706

. matrix C=corr(e(V))

. matrix list C

33

Page 34: Implementation of Threshold Regression: Programs for SAS ... › ... › files › TR-software-20061224.pdf · Implementation of Threshold Regression: Programs for SAS, R-code and

34

symmetric C[8,8]

lnx0: lnx0: lnx0: lnx0: m:

age gender treat _cons age

lnx0:age 1

lnx0:gender .0690052 1

lnx0:treat -.20316074 .03743401 1

lnx0:_cons -.94566404 -.25135286 .05877202 1

m:age -.95654565 -.0822261 .2052503 .94328597 1

m:gender -.10528063 -.93752522 -.05942563 .33792411 .15656779

m:treat .19100283 -.07818537 -.91159895 .01063234 -.15825889

m:_cons .86161426 .26010781 -.0588292 -.95908174 -.94029253

m: m: m:

gender treat _cons

m:gender 1

m:treat .16238943 1

m:_cons -.39320448 -.05832634 1