mir masoom ali - bsu · dr. ali a sagamore of the wabash , the state of indiana’s highest award,...

377
Festschrift In Honor of George and Frances Ball Distinguished Professor of Statistics Mir Masoom Ali On The Occasion of his Retirement Muncie, Indiana, USA May 18-19, 2007

Upload: others

Post on 19-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Festschrift

    In Honor of

    George and Frances Ball Distinguished Professor of Statistics

    Mir Masoom Ali

    On

    The Occasion of his Retirement

    Muncie, Indiana, USA May 18-19, 2007

  • George and Frances Ball Distinguished Professor Mir Masoom Ali

  • Photograph with Sir R. A. Fisher, November 18, 1954, Dhaka University Statistics Department

    Professor Mir Masoom Ali seated on chair first from the right, Sir R. A. Fisher seated on chair sixth from right and Professor Qazi Motahar Husain seated on chair seventh from the right

  • i

    Preface

    INTRODUCTION This Festschrift, consisting of forty papers contributed by eighty-one authors and co-authors, honors Mir Masoom Ali, George and Frances Ball Distinguished Professor of Statistics and Professor of Mathematical Sciences, on the occasion of his retirement from Ball State University. Many of these papers were presented at the Conference marking Dr. Ali’s retirement, May 18-19, 2007, on the Ball State campus. This outpouring from colleagues is a fitting tribute to Dr. Ali’s record of research, his dedication to his students, and his contributions to the University and the profession. MIR MASOOM ALI Mir Masoom Ali joined the Department of Mathematical Sciences at Ball State University in Muncie, Indiana in 1969 after completing his doctoral work at the University of Toronto. He had obtained his B.Sc. (Honours) degree in 1956 and M.Sc. degree in 1957, both in Statistics, from the University of Dhaka. He came to the University of Toronto, Canada on leave of absence from the Government of Pakistan in 1966 for graduate studies and obtained a second Master’s degree in 1967 and Ph.D. degree in 1969, both in Mathematical Statistics. Prior to coming to the United States in 1969, he worked for a brief period in 1957 for the Socio-Economic Survey Board of the University of Dhaka and then served in various statistical positions with the Government of Pakistan. Professor Ali has been the director of the graduate program in statistics at Ball State University since 1971, the year in which he founded the program. His students have gone on to doctoral programs at many universities, including Bowling Green, Brown, Colorado State, Indiana-Bloomington, North Carolina-Chapel Hill, North Carolina State, Ohio State, Oregon State, Pittsburgh, Purdue, and Southern Methodist. In recognition of his excellence in teaching, research and professional service, Dr. Ali received the 1992-93 Outstanding Faculty Award and the 1985 Outstanding Researcher Award from Ball State University. In 1990, the Bangladesh Statistical Association awarded him the first Qazi Motahar Husain Gold Medal for his outstanding contributions in the field of statistics. He was also awarded the Meritorious Service Awards in 1987, 1997, and 2002 by the Midwest Biopharmaceutical Statistics Workshop, which is co-sponsored by the American Statistical Association, for his role as a co-founder, program co-chair, and local arrangements chair. The thrust of Professor Ali’s research has been in the areas of finite sampling, statistical inference, and order statistics. He has published extensively in leading statistical journals. Dr. Ali is an elected Fellow of the American Statistical Association, the Royal Statistical Society, the Institute of Statisticians, and the Bangladesh Academy of Sciences. He is also an elected member of the International Statistical Institute. He is the founding president of the North America Bangladesh Statistical Association. Professor Ali has served or is serving as an editor/associate editor of several international statistical journals published in Bangladesh, India, Pakistan, and South Korea. He has served on several committees of the American Statistical Association including two terms as President of the Central Indiana Chapter. He also held visiting appointments at a number of universities and

  • ii

    statistical institutes in Canada, United States, Bangladesh, India, Korea, and Japan. The Journal of Statistical Studies and the Pakistan Journal of Statistics published Special Volumes in Professor Ali’s honor in 2002 and 2004, respectively. Currently, the International Journal of Statistical Sciences is in the process of publishing a Special Volume in Dr. Ali’s honor on the occasion of his 70th birthday. On October 24, 2002, Indiana Governor Frank O'Bannon named Dr. Ali a Sagamore of the Wabash, the State of Indiana’s highest award, in recognition of Dr. Ali's tremendous contributions to Ball State University, to the statistics profession and, especially to higher education in the State of Indiana for over three decades. In 2005, Professor Ali was awarded the “Our Pride Award” by the Bangladeshi-American Foundation, Inc. in Washington, DC, for his distinguished achievements in the field of statistics. Later that same year he was awarded a Gold Medal by the Islamic Society of Statistical Sciences (ISOSS) for his outstanding contributions in the field of statistics and ISOSS affairs. With his retirement in June 2007, Dr. Ali will have completed fifty years of service as a statistician, including thirty-eight years at Ball State University. FROM THE ORGANIZERS This Conference was organized by Dr. Dale E. Umbach, Professor of Statistics and former Chairman of the Department of Mathematical Sciences at Ball State University, as a personal tribute to Mir Ali, as friend, colleague, and collaborator. In the later stages of preparation, Dale was assisted by Drs. Ralph J. Bremigan and John W. Emert of the Department of Mathematical Sciences. We thank Mr. M. Mahbubul Majumder, a graduate student in statistics and research assistant in our Department, for his many hours of hard work in formatting the soft copies of the papers and putting them together in printable form for the Festschrift. Without his expertise and generous gift of time, it would have been impossible to meet the printing deadline. Our efforts relied on the constant assistance of our Department’s outstanding Administrative Coordinator, Mrs. Susan Bourne. We deeply appreciate her cheerful and knowledgeable support in working through the logistics of hosting the Conference. We gratefully acknowledge financial support from the College of Sciences and Humanities of Ball State University, its Department of Mathematical Sciences, and the generous private donors to the Ball State University Foundation. Our profound gratitude and appreciation go to the many colleagues of Mir who spoke at the Conference or who contributed a manuscript to the Festschrift. Obviously, their work represents the scholarly substance of the Conference and Festschrift. It is our hope that those in attendance at the Conference received some measure of return on their generosity, through new and renewed friendships, stimulating discussion, and enjoyment of our Department’s hospitality. To have worked through the years with such an inspiring and kind colleague as Mir Masoom Ali has been an honor, and to honor him through this Conference and Festschrift has been a joy. D.E.U. R.J.B. J.W.E.

  • iii

    CONTENTS

    Let’s Get Together and Have a Conference ……………………………….. CHARLES B. SAMPSON 1-2

    Flexible Univariate and Multivariate Models Based on Hidden

    Truncation ……………………………………………………………….. BARRY C. ARNOLD 3-13

    Using Control Information to Design Type I Censored Treatment Versus Control Clinical

    Trials ………………………….. P. L. GRAHAM, S. N. MACEACHERN AND D. A. WOLFE 14-22

    A Comparison of Graphical Methods for Assessing the Proportional Hazards

    Assumption in the Cox Model ……………………….. INGER PERSSON, HARRY KHAMIS 23-43

    A Multistage Model for Analyzing Repeated Observations on Depression In

    Elderly ……... M. ATAHARUL ISLAM, RAFIQUL I. CHOWDHURY, SHAHARIAR HUDA 44-54

    The Stochastic Analysis of Minimum2x for Various Clumping Models …………….

    M.I. A. AGEEL

    55-60

    Two Nonlinear Models for Time Series ……………… DAVID A. DICKEY AND SANGIL HWANG 61-70

    On A- and D-Rotatability of Two-Dimensional

    Third-Order Designs …………………… S. HUDA, L. BENKHEROUF AND F. ALQALLAF 71-77

    Weibull-Based Approaches to Survival Analysis: An Application to a Breast

    Cancer Data Set …… ABDUS S. WAHED, THE MINH LUONG AND JONG-HYEON JEONG 78-95

    Optimum Designs for Estimation in Accelerated Life

    Testing Problems ……………………….. MANISHA PAL AND NRIPES KUMAR MANDAL 96-102

    Some Variations of Blackwell Martingate Inequality ……………………………. RASUL A. KHAN 103-105

    A Note on the Modified Box-Cox

    Transformation ………………………. MEZBAHUR RAHMAN AND LARRY M. PEARSON 106-115

    Inference About a Common Mean in One Way

    Random Effects Model ………….. GUIDO KNAPP, PRANAB K. MITRA, BIMAL K. SINHA 116-140

    LM-Stationary Processes to Analyse Time Series Data With Linearly Compressing Periodic

    Behavior …. MD. JOBAYER HOSSAIN, WAYNE A. WOODWARD AND HENRY L. GRAY 141-151

    A Permutation Test for the Stimuli Effect in the Spatio-Spectral

    Profile of Brain Signals ………………………… ZHEWEN FAN AND HERNANDO OMBAO 152-162

    Inference on Limiting Availability of a

    One-Unit Repairable System ……………………….. FANG LI AND JYOTIRMOY SARKAR 163-174

    On The UMVUE of Reliability in Some Discrete Distributions ……………………. JUNGSOO WOO 175-178

    A Stochastic Representation of Matrix Variate Skew Normal Models ………………... JOHN T. CHEN 179-184

    Prior Statistics Coursework and Student Expectations of a

    Graduate Statistics Class …………………………. HOLMES FINCH, MOLLY M. JAMESON 185-189

    Price Dispersion and Border Effect: A Survey Paper ……. JULIUS HORVATH, BALINT HERCZEG 190-196

    Identities Based on Probability Mass Functions … M. FORHAD HOSSAIN, ANWAR H. JOARDER 197-200

    Mahalanobis Moments of Bivariate Distributions …………………………… ANWAR H. JOARDER 201-206

    Minimax Estimation of the Parameter of the Rayleigh Distribution

    M. KAMRUJ JAMAN BHUIYAN, MANINDRA KUMAR ROY, M. FAROUQ IMAM 207-212

  • iv

    Detection of Outliers in Non-linear Time Series: A Review …………... MURUGASON KALLIANNA

    GOUNDER, MAHENDRAN SHITAN AND A.H.M. RAHMATULLAH IMON 213-224

    An Alternative Method for Selecting the Number of Components for Smooth

    Lack of Fit Tests ……………………………… P.P.B. EGGERMONT AND V.N. LARICCIA 225-234

    On the Asymptotic Variance of the Estimator of Attributable Risk and

    Testing Association in a 2X2 Contingency Table ………………….. TANWEER J. SHAPLA 235-241

    Thoughts on Actuarial Science, Demography, Probability, Statistics and

    Stochastic Processes ……………………………………………………… JOHN A. BEEKMAN 242-255

    Estimating a Population Median with

    a Small Sample ………………………… BORIS SHULKIN AND SHLOMO SAWILOWSKY 256-267

    Sum of Hypergeometric Series Functions Using

    Probability Distributions ……………………… MUNIR AHMAD AND AYESHA ROOHI 268-273

    Fuzzy Set Representation of a Prior Distribution ……………………………………. GLEN MEEDEN 274-275

    The Impact of Different Math Prerequisites on the Performance of Business Statistics Students

    JEFFREY J. GREEN, COURTENAY C. STONE, ABERA ZEGEYE AND THOMAS A. CHARLES 276-285

    Training Environmental Statisticians – Tomorrow’s Problem Solvers

    WILLIAM F. HUNT, JR., KIMBERLY WEEMS, WILLIAM SWALLOW, ANDREW

    MOORE, NAGAMBAL SHAH AND MONICA STEPHENS 286-295

    Joint Distribution of Some Random Variables in Crossing of Two-Stage

    Erlang Processes ………………… MIR GHULAM HYDER TALPUR AND IFFAT ZAMEER 296-304

    Sampling Plans Excluding Certain Neighboring Units ….. KYOUNGAH SEE AND A. JOHN BAILER 305-312

    Difference Based Variance Estimators for

    Partially Linear Models …………………………….. KARON KLIPPLE AND R. L. EUBANK 313-323

    Bootstrap Bias Reduction and Estimator Decomposition …………………. KRISTOFER JENNINGS 324-327

    Bayesian Prediction for the Linear Model With Equi-correlated Responses:

    An Application of the Generalized Multivariate Modified Bessel Distribution

    LEHANA THABANE AND B. M. GOLAM KIBRIA 328-335

    Middle Censoring for Circular Data …. S. RAO JAMMALAMADAKA AND VASUDEVAN MANGALAM 336-339

    Highly Efficient Nonlinear Regression Based on the

    Wilcoxon Norm ……………………………... ASHEBER ABEBE AND JOSEPH W. MCKEAN 340-357

    On Tail Probabilities of the t-Distribution for Samples from the Uniform

    Distribution ………………………………………………… R. AYESHA ALI AND M. M. ALI 358-363

  • 1

    Festschrift in honor of Distinguished Professor Mir Masoom Ali On the occasion of his retirement May 18-19, 2007, pages 1-2

    “Let’s get together and have a conference”

    (Some personal reflections on Professor Mir Ali and the story of the MBSW)

    Charles B. Sampson

    During the spring of 1976, I was sitting in my office at Eli Lilly and Company, minding my own business, trying to get some work done, when a coworker came by and asked me to do a favor for him. He had committed Lilly to participate in some sort of statistics conference at Ball State. He told me he could not do it and asked if I would help out. Well, I said, I am not exactly in the business of proving theorems and that is probably what was desired at a such a conference. But I agreed to call Dr. Mir Ali, whom I had never met, to negotiate a topic which I could handle. I subsequently agreed to participate. Mir can sell, when he wants to, and he wanted to sell for his conference. It was a fine conference with a balance among mathematical statistics topics, applied statistics, career possibilities in statistics and so forth. The conference was appropriately sized for interaction between all parties should they so choose and indeed the interaction was very good. I was impressed. Mir Ali had done a wonderful job with his “statistics days” conference. I asked Mir if he wanted to do a “statistics conference” again with a different theme. Mir said we should talk about it. I told Mir that I had been watching the Princeton Conference, held sometimes at Princeton, and would like to have a similar, but smaller, conference in the Midwest emphasizing health statistics and the Pharmaceutical Industry. I said we had lots of Pharmaceutical Companies in the midwest (which was true in 1976) and I thought there was a desire to have a forum to discuss problems relevant to the Pharmaceutical Industry and the FDA. We decided to seek the approval, participation, and financial backing of the leaders of the Midwest Pharmaceutical Companies’ statistics groups. Those who attended the first meeting (7/14/1977) to discuss the feasibility of what is now known as the Midwest Biopharmaceutical Statistics Workshop (MBSW) were Wen-Dar Chang (Dow),

    Ken Falter (Searle), Saul Gitomer (Marion), Bernie McDonagh (Riker), Tony Orlando (Mead Johnson), Lyman Ott (Merrill-National), Ron Platt (Miles), Alan Sampson (Abbott), Charles Sampson (Lilly), Roy Sanford (Baxter), John Schultz (Upjohn), Ron Schwartz (Arnar-Stone), Mir Ali (Ball State University) and Tom Spradlin (Lilly). Please note that Dow, Searle, Marion, Miles, Mead-Johnson, Merrill-National, Upjohn, Arnar-Stone and 3-M Pharmaceuticals do not exist anymore as they were merged or bought out by other companies. Mergers were a potential threat to the continued existence of the MBSW. The important home bases of a number of companies were to disappear -- from Chicago, Kalamazoo, Ann Arbor, Cincinnati, Evansville and Indianapolis. Those in attendance at the organization meeting of the first MBSW were unanimous in not wanting to replicate the format and style of the already popular national statistical meetings of the ASA. In addition to an emphasis on practical problems, it seemed that the format of the meeting could be varied so as to enhance interest and to promote participation by the younger members of the statistical profession. A planning meeting was then held in Chicago on August 15, 1977 and the program content for the first MBSW was determined. One additional meeting was held in October of 1977 and it appeared very likely that the first conference would be held in May of 1978. The statistical heads of the Midwest Pharmaceutical Companies agreed to hold the conference once, for a test, and each provided $500 for the kitty in case we bombed out completely and lost money. It was agreed that the MBSW was to include roundtable discussions, pedagogical lectures, workshops, poster sessions, analysis of data sets, panel discussions, methodology sessions, and discussion groups. The first meeting of the MBSW was held May 23-24, 1978 and was a great success.

  • 2

    The location of the Workshop was a heated discussion topic during its formative years. The urge for an urban setting was strong among some of the early charter members, but the hospitality of Ball State University and the Muncie community has served the workshop well for the last 30 years. Keeping the workshop in Muncie was enhanced by low fees and lots of “bang for the buck”. For example, the first registration fee was $40 and included two luncheons as well as a banquet at the Morris Bryant Restaurant which featured all the lobster one could eat. The motel rooms ranged from$13 to $25. Attendees came from as far away as Palo Alto and Europe. We joked that the operating definition of the “Midwest” was every where in the world but New Jersey, Manhattan, and perhaps Philadelphia. However, as one can observe with this year’s workshop, most of our organizing committee is from the East. Mergers, mergers, and more mergers. Those of us associated with MBSW over the years have witnessed the effect of numerous mergers in our registration documents. The 30th MBSW will be held this year (2007). Mir Ali should be proud of this contribution to the statistical community. Without those statistics days held in 1976 there would have been no MBSW, I am sure. I wish to offer some observations and comments regarding Professor Ali’s professional life. Professor Ali is an outstanding teacher. You ask how would I know this? I have interviewed (for employment) over 150 students from Ball State University over the years. These students were computer scientists, mathematicians, statisticians, and information scientists. As part of my interview, and as a warmup exercise, I would ask these applicants to name the 2-3 best teachers they have had so far in their post secondary education. Professor Ali was almost always mentioned, conditional upon the fact that the students had had him. I found this tremendously impressive. This helps validate yet another positive dimension in Professor Ali's career, that of being a very fine teacher. Now I would like to discuss Mir Ali, as my friend. These are my observations only, after 30 years of friendship and working on the MBSW together and after having similar health (heart) experiences and so on. Over the years my wife and I have been to parties in his house, met his kids, and had him to our house. We enjoyed their wedding anniversary celebration in Indianapolis arranged by his children.

    Mir Ali is a gentleman, he is tenacious, and I believe he can be a task master. But he is also kind and patient, at least with me. He is a wonderful friend and we have shared many thoughts about our personal lives, something I do not do easily. In early Y-2002, I went in for what I thought was going to be a routine heart exam and 3 days later I was coming out of a anesthetic fog after open heart surgery and one of the first, maybe the first, persons I recognized was Mir Ali and his wife, Leena. Mir and Leena were standing there looking at me and I was beginning to wonder where I might be. Maybe I was in some sort of after life and Mir and Leena were already there. I asked my wife if I was in heaven or in Muncie. She said I was in St. Vincent's Hospital in Indianapolis. I really appreciated the attention he gave me, and the counsel he offered, having had his experiences as a heart victim. Mir Ali was, and still is, a “buttoned down” gentleman. I worked with him on the conference for 25 years before I saw him without a tie. I once asked him if he slept in a suit and he looked at me with that incredulous but controlled concerned look not knowing if he should reply to such a statement. Here is one experience I will never forget. Some years ago, Mir and I were walking through some local arrangement issues at Ball State talking intently. We were heading out of the Student Union on the west side and going down the stairs when Mir stopped cold to stare at a handsome young boy coming up the stairs into the union. Yes, it was his youngest, a son, Ishti, who was in school a short distance from the Union. Apparently this was not the appropriate time for the son to be coming to the union and he was perhaps playing hooky. The boy looked up and froze. No words were exchanged but there was a great deal of intense eyeball to eyeball communication. The boy turned around and headed back to school He must have gotten back to school since Ishti is now a successful cardiologist. Everything worked out fine, it seems. What I learned from that experience was to try my hardest never to displease Mir. I am very pleased to have known Mir Masoom Ali for all these years. My wife and I value Mir’s and Leena’s friendship greatly and we wish Mir’s retirement to be exciting and fulfilling.

  • Festschrift in honor of

    Distinguished Professor Mir Masoom Ali

    On the occasion of his retirement

    May 18-19, 2007, pages 3-13

    Flexible Univariate and Multivariate Models

    Based on Hidden Truncation

    Barry C. Arnold

    Department of Statistics, University of California, Riverside, USA.

    Abstract

    A broad spectrum of flexible univariate and multivariate models can be constructed byusing a hidden truncation paradigm. Such models can be viewd as being characterized by abasic marginal density, a family of conditional densities and a specified hidden truncation point,or points. The resulting class of distributions includes the basic marginal density as a specialcase ( or as a limiting case ), but also includes an array of models that may unexpectedly includemany well known densities. Most of the well known skew normal models ( developed from theseed distribution popularized by Azzalini (1985) ) can be viewed as being products of such ahidden truncation construction. However, the many hidden truncation models with non-normalcomponent densities undoubtedly deserve further attention.

    Key Words: Skew-normal distribution, conditional specification, weighted distribution, multivari-ate normal, normal conditionals, exponential conditionals, Pareto distribution.

    1 Introduction

    The skew-normal model, popularized and studied by Azzalini (1985) and his coworkers, is a oneparameter family of densities of the form

    f(x; λ) = 2φ(x)Φ(λx), −∞ < x

  • and

    f(x;µ, σ, λ0, λ1) =φ(x−µ

    σ)Φ(λ0 + λ1(

    x−µσ

    ))

    Φ( λ0√1−λ2

    1

    ). (4)

    Many extensions of these models have been proposed. For example, we could select an arbitrarydensity ψ1(x) to play the role of φ(x) in (3) and an equally arbitrary distribution function Ψ2(x)to play the role of Φ(x). The resulting family of models, before introducing location and scaleparameters, is of the form:

    f(x; λ0, λ1) ∝ ψ1(x)Ψ2(λ0 + λ1x). (5)Computation of the required normalizing constant in (5) may be troublesome and indeed it will

    frequently be necessary to determine the constant by numerical integration.There are multivariate extensions of the model (3) which may be viewed as having begun with

    a k + m dimensional random vector (X, Y ) ( here X is of dimension k and Y of dimension m )and only observing X if Y > y

    0, where y

    0is a pre specified vector in IRm. Most skew models

    of this genre begin by assuming a classical k + m dimensional normal distribution for (X, Y ). Anextensive survey of such models may be found in Azzalini (2006). In the present paper we will focuson general hidden truncation models ( which of course include skew-normal models ) beginningwith a completely general distribution for(X, Y ) ( or for (X, Y ) in higher dimensional settings ).This general hidden truncation paradigm will be shown to yield a remarkably rich vein of modelswhich may profitably be used to fit univariate and multivariate data sets. Naturally, it would bedesirable to identify a stochastic mechanism involving hidden truncation which can plausibly beargued to have played a role in generating the data set that is fitted by such a model. However,absent such identification, the hidden truncation model, provided that it fits well, may still be usefulfor prediction purposes.

    Returning to the simple hidden truncation model in which we observe X only if Y > y0, it isevident that the density of the observed X’s will have a ( conditional ) distribution of the form:

    FX|Y >y0(x0) = P (X ≤ x0|Y > y0) =∫ x0−∞

    ∫ ∞

    y0fX,Y (x, y)dxdy

    ∫ ∞

    y0fY (y)dy

    (6)

    Assuming the existence of densities ( as will be done throughout most of this paper ) we canwrite the corresponding conditional density as

    fX|Y >y0 (x) =

    ∫ ∞

    y0fX|Y (x|y)dyF Y (y0)

    (7)

    In this formulation, the marginal density of Y and the conditional density of Y given X willdetermine the resulting hidden truncation model. In a sense, the model is parameterized by y0 ∈ IRand fY (y) ( or, more generally, a parametric family of densities fY (y; θ) ) and by fX|Y (x|y) (or, more generally, by a parametric family of densities fX|Y (x|y; τ ) ). Clearly this represents anenormously flexible family of models. For example, we could take fY (y) to be a normal density andtake fX|Y (x|y) to be normal with linear regression and constant conditional variance. Inexorablywe are led to the skew-normal model (4). But we could get a richer family by allowing fX|Y (x|y) tohave a more general regression function and perhaps a non-constant conditional variance function.This approach merits further investigation. However it is not the approach that will be followed inthe rest of the present paper.

    A joint density for (X, Y ) can be written as the product fY (y)fX|Y (x|y), but equally well it canbe written as fX(x)fY |X(y|x). Using this expression for the joint density, it is readily verified that

    fX|Y >y0(x) = fX(x)P (Y > y0|X = x)

    P (Y > y0)(8)

    In this formulation, the skewed distribution obtained by hidden truncation is clearly shown to bea weighted version of the original density for X. The weight function, P (Y > y0|X = x), dependson y0 and on the conditional density of Y given X. The representation of the hidden truncation

    4

  • density in the form (8) may be found in Arellano-Valle et al. (2002) ( their equation (5.1) ) in thecase in which y0 = 0 but, as they remark, it is likely that it had appeared elsewhere at some timeprevious to 2002.

    In subsequent sections, we will investigate hidden truncation models of the form (8) ( truncationfrom below ) as well as other truncation paradigms. In all cases, the basic components of the modelswill be a given density for X ( or X in higher dimensions ) and a given conditional density for Ygiven X ( or for Y given X ).

    2 Basic hidden truncation models

    Begin with a two dimensional absolutely continuous random vector (X, Y ). We might focus on theconditional distribution of X given Y ∈ C where C is a Borel set in IR. Indeed we could write

    fX|Y ∈C(x) = fX(x)P (Y ∈ C|X = x)

    P (Y ∈ C) (9)

    ( see Arellano-Valle,Branco and Genton (2006), where such general models are introduced ). How-ever, we will concentrate on hidden truncation of one of three forms only :

    1. Lower truncation, where C = (y0,∞).

    2. Upper truncation, where C = (−∞, y0].

    3. Two sided truncation where C = (a, b].

    For upper truncation at y0, in which observations are only available for X’s whose concomitantvariable Y is less than y0, equation (9) becomes

    fy0−(x) = fX(x)P (Y ≤ y0|X = x)

    P (Y ≤ y0)(10)

    Models of this type are thus characterized by

    1. fX(x), the density assumed for X.

    2. The conditional density of Y given X, fX|Y (x|y).

    3. The specific truncation point, y0.

    Note that the distribution function corresponding to (10) is of the form

    Fy0−(x) = P (X ≤ x|Y ≤ y). (11)Consequently, a convenient way to generate models of this type is to begin with a joint distribution

    for (X, Y ) for which P (X ≤ x|Y ≤ y) is available in a simple form ( discussion of such bivariatedistributions may be found in Arnold, Castillo and Sarabia (1999) and Arnold (1995) ).

    Models involving lower truncation will be of the form

    fy0+(x) = fX(x)P (Y > y0|X = x)

    P (Y > y0)(12)

    with corresponding survival function

    F y0+(x) = P (X > x|Y > y). (13)Technically, models of the form (12) could be viewed as equivalent to those given by (10). One

    merely needs to replace the concomitant variable Y by −Y ( or for non-negative variables, by 1/Yto go from one to the other. In practice, such a transformation may not seem to be natural and theconcepts of upper and lower truncation are best dealt with separately.

    Two sided truncation models are of the form

    5

  • fa,b(x) = fX(x)P (a < Y ≤ b|X = x)

    P (a < Y ≤ b) (14)

    Such models are determined by the choice of the basic marginal density fX(x), the choice ofconditional density fY |X(y|x) and the truncation points, a and b.

    It will be observed that the upper and lower truncation models can be obtained as limiting casesof two sided truncation models, so in a sense we need only to deal with two sided truncation models.Typically the one sided models are simpler in structure and they sometimes can be obtained directlymore easily, without first considering a two sided model. Note that, in order for any of these modelsto assume attractable form, it is necessary that the conditional distribution of Y given X shouldhave an analytic expression for its distribution function, or at least that the conditional distributioncan be evaluated by reference to available tables.

    When using the formulations (10), (12) and (14) to construct flexible families of densities it will,as remarked earlier, typically be the case that the density of X is assumed to be a member of someparametric family of densities fX(x, θ) and that the conditional density of Y given X is a memberof another parametric family of densities fX|Y (x|y; η). We will consider some examples in which thefamily of marginal densities for X and the family of conditional densities for Y given X are of thesame form ( e.g. they might both be normal ), but we can flexibility by mixing and matching ( e.g.one family might be Weibull and the other gamma ).

    Before embarking on an investigation of some of the many parametric families of models that canbe generated by such hidden truncation constructions, it is appropriate to remark that, beginningwith a given choice of density function for X, say f0(x), it is possible to generate, via hiddentruncation, an extremely broad class of densities by judicious choice of the conditional density of Ygiven X. Just about any density with the same support as f0(x) and lighter tails than f0(x) canbe generated in this fashion. For example, suppose that we wish to generate the density f1(x) byapplying hidden truncation to f0(x). In order to achieve this in a simple fashion, we need to assumethat there exists c > 0 such that f1(x)/f0(x) ≤ c, ∀x. If such a c exists, we can choose a family ofconditional densities of Y given X in such a fashion that

    P (Y ≤ 0|X = x) = 1c

    f1(x)

    f0(x). (15)

    With this choice of conditional distributions of Y given X and by setting y0 = 0, we may verifythat hidden truncation above at 0, applied to f0(x) will yield, via equation (10), the desired densityf1(x).

    3 Hidden truncation using normal component densities

    We begin by considering hidden truncation applied to classical bivariate normal data. In this casetwo sided hidden truncation will be considered ( from which, results for upper and lower truncationcan be readily derived ). Thus we begin with X ∼ N(µ, σ2) and we will assume the linear regressionand constant conditional variance that is associated with the classical bivariate normal distributionfor (X, Y ). Thus we assume that Y |X = x ∼ N(α+βx, τ2). Referring to (14), for hidden truncationpoints a and b, we have

    fa,b(x) = fX(x)P (a < Y ≤ b|X = x)

    P (a < Y ≤ b)

    =1

    σφ(x− µσ

    )[Φ(b− α− βx

    τ) − Φ(a− α− βx

    τ)] (16)

    In this expression, µ, α, β ∈ IR and σ, τ ∈ IR+ and a < b. It is convenient to introduce newparameters δ1, δ2 and λ1 where −∞ < δ1 < δ2 < ∞ and λ1 ∈ IR, allowing us to rewrite the model(16) as

    6

  • fa,b(x) =1

    σφ(x− µσ

    )[Φ(δ2 + λ1(

    x−µσ

    )) − Φ(δ1 + λ1(x−µσ ))][Φ( δ2√

    1+λ21

    ) − Φ( δ1√1+λ2

    1

    )](17)

    This is the model involving two sided hidden truncation that was discussed, for example, inArnold, Beaver et al (1993). If we consider upper truncation ( letting δ1 → −∞ in (18) we obtainthe Henze-Arnold-Beaver skew-normal model (4) ( where δ2 is replaced by λ0 ).

    Instead of using a conditional distribution for Y given X that is normal with a linear regressionfunction and a constant conditional variance function, we could consider a normal distribution withmore general regression and conditional variance functions. Thus, if we assume that Y |X = x ∼N(µ(x), τ2(x)), our two sided hidden truncation model becomes

    fa,b(x) ∝1

    σφ(x− µσ

    )[Φ(b− µ(x)τ (x)

    ) − Φ(a− µ(x)τ (x)

    )] (18)

    The model (18) includes ( as limiting cases ) densities of the form

    f(x; λ) ∝ φ(x)Φ( λ00 + λ10x√1 + (λ01 + λ11x2)

    ). (19)

    Such densities have been studied earlier in the literature. They are identifiable as marginaldensities of the following class of bivariate distributions with conditionals in the skew normal family(4)

    f(x, y; λ) ∝ φ(x)φ(y)Φ(λ00 + λ10x+ λ01y + λ11xy), (20)( See, for example, Arnold, Castillo and Sarabia (2002) ). Such models can also be obtained asmixtures of univariate skew-normal densities ( see, for example, Arellano-Valle et al. (2004) ).

    In fact, model (18)is, in a sense, completely general. Any weighted version of the normal(µ, σ2)density can be represented in the form (18). Suppose that we wish to have

    fa,b(x) ∝ w(x)1

    σφ(x− µσ

    ) (21)

    for some specified weight function w(x). We can choose a = −∞, b = 0 and τ (x) = 1. Thechoice of µ(x) which will then enable us to identify (21) as a special case of (18) will be such thatΦ(−µ(x)) = w(x), i.e. we should choose µ(x) = −Φ−1(w(x)).

    4 Hidden truncation applied to normal conditionals distri-

    butions

    Following early work by Bhattacharyya (1943), Arnold, Castillo and Sarabia (1999) provided detaileddiscussion of the class of bivariate densities, fX,Y (x, y), which have all of their conditional densities( 0f X given Y and of Y given X ) of the normal form. Such bivariate densities are necessarily ofthe form

    fX,Y (x, y) = exp−

    (1, x, x2)

    m00 m01 m02m10 m11 m12m20 m21 m22

    1yy2

    (22)

    where the mij’s satisfy certain constraints to ensure integrability. For our hidden truncation con-structions, we need expressions for the corresponding marginal fX(x) and for the conditional densi-ties fY |X(y|x). It is not difficult to verify that, if (X, Y ) has density (22), then

    fX(x) =exp − (12 (2(m20x2 +m10x+m00) −

    (m21x2+m11x+m01)

    2

    2(m22x2+m12x+m02))

    2(m22x2 +m12x+m02)(23)

    7

  • while Y |X = x ∼ N(µ(x), σ2(x) in which

    µ(x) = − (m21x2 +m11x+m01)

    2(m22x2 +m12x+m02)(24)

    and

    σ2(x) =1

    2(m22x2 +m12x+m02). (25)

    The corresponding two sided hidden truncation model will be

    fa,b(x) ∝ fX(x)[Φ(b− µ(x)σ(x)

    ) − Φ(a − µ(x)σ(x)

    )] (26)

    where fX(x),µ(x) and σ(x) are as defined in (23),(24) and (25) respectively.The centered normal conditionals model is considerably simpler. For it, we set m01 = m10 =

    m11 = m12 = m21 = 0 in (22). This leaves us with a 3 parameter bivariate density for which

    fX(x) =e−m20x

    2

    2(m22x2 +m02)(27)

    and

    Y |X = x ∼ N(0, 12(m22x2 +m02)

    ), (28)

    so that

    fa,b(x) ∝e−m20x

    2

    2(m22x2 +m02)[Φ(b

    2(m22x2 +m02)) − Φ(a√

    2(m22x2 +m02))]. (29)

    5 Hidden truncation with exponential component densities

    Suppose now that X has an exponential distribution, i.e.

    P (X > x) = e−αx, x > 0. (30)

    Now assume that, for each x > 0, the conditional density of Y given X = x is also an exponentialdensity with a constant failure rate which depends linearly on x. Thus

    P (Y > y|X = x) = e−(β+γx)y , y > 0. (31)The resulting joint density is of the form

    f(x, y) = (αβ + αβx)exp(−[αx+ βy + γxy]), x > 0, y > 0. (32)The corresponding two sided hidden truncation model will then be

    fa,b(x) ∝ αe−αx[e−(β+γx)a − e−(β+γx)b], x > 0, (33)a linear combination of two exponential densities. The lower hidden truncation model is obtainedfrom (33) by setting b = ∞ and a = y0, in this manner we find

    fy0+(x) = (α+ γy0)e−(α+γy0)x, x > 0, (34)

    i.e. again an exponential density. Thus, in this situation, lower hidden truncation does not lead toan enrichment of the class of distributions for X. A similar phenomenon is observable if we beginwith (X, Y ) having an exponential conditionals distribution ( see Arnold and Strauss (1988) ). Thecorresponding joint density is of the form

    8

  • f(x, y) ∝ exp−(αx+ βy + γxy), x > 0, y > 0. (35)In this case the marginal density for X is

    fX(x) ∝ (β + γx)−1e−αx, x > 0, (36)and the conditional survival function of Y given X = x is of the form

    P (Y > y|X = x) = e−(β+γx)y , x > 0, (37)( the same as (31). The corresponding lower hidden truncation model will be

    fy0+ ∝ (β + γx)−1e−(α+γy0)x, x > 0. (38)Observe that (38) is obtainable from (36)by a simple change of one of the parameters and the familyof lower hidden truncation models coincides with the original family of densities for X.

    It becomes evident that the use of lower hidden truncation with a conditional distribution givenby (31) will be ineffective in enriching the class of densities assumed for X whenever fX(x) includesa factor of the form e−g(θ)x, where g(θ) > 0. Thus, for example, if we begin with an assumptionthat X ∼ N(µ, σ2) where µ < 0 and assume that Y |X = x has a distribution satisfying (31) ( i.e.an exponential conditional distribution with a constant failure rate that is a linear function of x ),then the resulting lower hidden truncation models will again be normal with negative means.

    If we allow the failure rate for the conditional distribution of Y given X = x to depend on xin a non-linear fashion, we can, of course, get new densities by using the lower hidden truncationparadigm as the following examples show.

    Let us begin with X ∼ exp(α), i.e. P (X > x) = e−αx. Now assume that P (Y > y|X = x) =e−γ(x)y for some positive function γ(x) defined on IR+. It follows that

    fy0+ ∝ e−(αx+γ(x)y0), x > 0. (39)As a special case, consider γ(x) = γx2 for some γ > 0. In this case, we find that

    fy0+ ∝ e−(αx+γy0x2), x > 0. (40)

    It is perhaps surprising that a truncated normal density such as (40) can arise via hidden truncationapplied to a model with an exponential marginal distribution for X and exponential conditionals forY given X.

    However, it is true that a very broad class of densities can be obtained by hidden truncation,even when we restrict attention to pre-truncated models involving an exponential marginal andexponential conditionals. Suppose that we wish to obtain a specific target density g(x)I(x > 0) inthis manner. To do this, we must select y0, k, α and γ(x) such that, for x > 0,

    g(x) = exp(m− αx− y0γ(x)). (41)So γ(x) must satisfy

    γ(x) =m− αx− log g(x)

    y0. (42)

    For certain choices of g(x), the corresponding function γ(x) given by (42) will not be always positive,so that some mild conditions must be imposed on the form of g(x) in order for it to be obtainablevia hidden truncation using exponential model components. Nevertheless, an extremely broad classof densities g(x) on IR+ can be so constructed.

    9

  • 6 Hidden truncation with Pareto component densities

    A bivariate Pareto conditionals density is of the form ( Arnold, Castillo and Sarabia (1999) )

    f(x, y) = (α+ βx + γy + δxy)−(τ+1), (43)

    where α, β, γ, δ, τ > 0. The corresponding marginal and conditional densities are:

    fX(x) ∝ [(α+ βx)τ (γ + δx)]−1, x > 0, (44)and for x > 0,

    fY |X(y|x) ∝ [1 +γ + δx

    α+ βxy]−(τ+1), y > 0, (45)

    ( i.e. Y |X = x ∼Pareto( γ+δxα+βx , τ ) ). The two sided hidden truncation distribution derived from thisjoint density will then be given by

    fa,b(x) ∝ fX(x)P (a < Y ≤ b|X = x)

    ∝ [(α+ βx)τ (γ + δx)]−1{

    [

    1 +γ + δx

    α+ βxa

    ]−τ

    −[

    1 +γ + δx

    α+ βxb

    ]−τ}

    ∝ 1γ + δx

    {

    1

    (α+ βx+ γa + δax)τ− 1

    (α+ βx + γb + δbx)τ

    }

    (46)

    for x > 0. Recalling that α, β, γ, δ > 0 and 0 ≤ a < b, we may write this as

    fa,b(x) ∝1

    γ + δx

    [

    1

    (α1 + β1x)τ− 1

    (α2 + β2x)τ

    ]

    , x > 0, (47)

    a linear combination of two densities of the same form as the original marginal density of X ( as in(44).

    To obtain fy0−(x) we just set a = 0 and b = y0 in (46) and we obtain a density , also of the form(47), with α1 = α and β1 = β.

    The lower hidden truncation model is simpler. To get fy0+(x) we must set a = y0 and b = ∞ in(46) to obtain

    fy0+(x) ∝1

    γ + δx

    1

    (α+ βx + γy0 + δy0x)τ, x > 0, (48)

    or equivalently

    fy0+(x) ∝1

    γ + δx

    1

    (α2 + β2x)τ, x > 0. (49)

    So in this case we again observe the phenomenon in which lower hidden truncation fails to augmentthe class of models already assumed for X.

    If we begin with a Pareto distribution for X and Pareto conditional distributions for Y given X,as in (45, then lower hidden truncation will lead to an enriched family of densities. We will have

    fX(x) ∝ (α+ βx)−(η+τ+1) , x > 0. (50)and

    P (Y > y0|X = x) =[

    1 +γ + δx

    α+ βxy0

    ]−τ

    , (51)

    so that, after reparameterization, we have

    fy0+(x) ∝ (α+ βx)−(η+1)(α′ + β′x)−τ , (52)where α ≤ α′ and β ≤ β′.

    10

  • 7 Multivariate cases

    In the development thus far, both X and Y have been scalar variables. Of course, analogousarguments can be advanced when the variables are of higher dimensions. Thus one may consider ak +m dimensional random vector (X, Y ) where X is of dimension k and Y is of dimension m. Wewill consider the distribution of X subject to hidden truncation on Y of the form Y ≤ y

    0. We can

    write this conditional density ( in a form analogous to (10) as:

    fy0−(x) = fX(x)

    P (Y ≤ y0|X = x)

    P (Y ≤ y0)

    . (53)

    Hidden truncation models of this type will be determined by the marginal density fX(x), the con-ditional density of Y given X , fY |X(y|x), and the truncation point y0.

    In this section, we will restrict attention to an illustrative case in which the component densities( fX and fY |X ) are multivariate normal although, of course, the ideas discussed can be extendedreadily to deal with other examples, perhaps with component densities of different ( non-normal)types.

    We consider such hidden truncation in a setting in which (X.Y ) has a classical k+m dimensionalnormal distribution. Thus we begin with

    (

    XY

    )

    ∼ N (k+m)((

    µν

    )

    ,

    (

    Σ11 Σ12Σ21 Σ22

    ))

    . (54)

    In this case X ∼ N (k)(µ,Σ11) and the conditional distribution of Y given X = x is of the form:

    Y |X = x ∼ N (m)(ν + Σ21Σ−111 (x− µ),Σ22 − Σ21Σ−111 Σ12). (55)We will introduce notation as follows:

    φ(k)(x) =

    k∏

    i=0

    φ(xi), (56)

    and

    Φ(m)(y; δ,Λ) = P (Y ≤ y), (57)where Y ∼ N(δ,Λ). With this notation, referring to (53), we will have

    fy0−(x) = |Σ11|−1/2φ(k)(Σ−1/211 (x− µ))

    Φ(m)(y0− ν − Σ21Σ−111 (x− µ); 0,Σ22 − Σ21Σ−111 Σ12)

    Φ(m)(y0− ν ; 0,Σ22)

    . (58)

    At this point it is convenient to make a change of variables, defining Z = Σ−1/211 (X − µ) so that

    Z ∼ N (k)(0, I). If X has density (58), then Z will have a density of the following form

    fy0−(z) = φ

    (k)(z)Φ(m)(λ0 + Λz; 0,∆)

    Φ(m)(λ0; 0,∆ + ΛT Λ)

    (59)

    for suitably defined λ0,∆ and Λ ( which will depend on the choice of y0 ). The model (59) is knownin the literature under a variety of names with variations in the labeling of the parameters. Forexample Gonzalez-Farias et al (2004) call it the closed skew normal family, while Arellano-Valleand Genton (2005) refer to it as the fundamental skew-normal distribution. See Azzalini (2005) forfurther discussion of these and other aliases.

    It is not difficult to deal with analogous lower and two sided hidden truncation models. It

    is convenient to use the notation Φ(m)

    (y; δ,Λ) to denote P (Y > y) whereY ∼ N (m)(δ,Λ). Theresulting models obtained by hidden truncation on Y applied to the density of Z are:

    11

  • fy0+(z) = φ

    (k)(z)Φ

    (m)(λ0 + Λz; 0,∆)

    Φ(m)

    (λ0; 0,∆ + ΛT Λ)

    (60)

    and

    fa,b(z) = φ(k)(z)

    Φ(m)(δ2 + Λz; 0,∆)− Φ(m)(δ1 + Λz; 0,∆)Φ(m)(δ2; 0,∆ + Λ

    T Λ)) − Φ(m)(δ1; 0,∆ + ΛT Λ). (61)

    There is a considerable literature devoted to the discussion of the distribution of X = µ+Σ1/211 Z

    where Z has a hidden truncation density of the form (59). Of course, (60) can be viewed as aspecial case of (59) in which Y has been replaced by −Y and y by −y. Densities of the form (61)have received less attention, even though such two sided hidden truncation can be expected to beencountered in many real world data configurations.

    8 Envoi

    Generally speaking, hidden truncation models will be difficult to deal with analytically unless thejoint density of (X, Y ) ( or of (X, Y) in higher dimensions ) is a member of some tractable family ofmultivariate distributions. Even in such cases, an awkward normalizing constant may be associatedwith the hidden truncation distribution. Techniques for dealing with inference problems, even forhidden truncation models as simple as the basic Azzalini model (1), still require refinement. Muchwork remains to be done before the more complicated hidden truncation models can be expected toenter into the applied statistician’s toolkit.

    9 References

    References

    [1] Arellano-Valle, R.B., Branco, M.D. and Genton, M.G. (2006), A unified view on skewed distri-butions arising from selections. Canadian J. of Statistics, 34, XX–XX

    [2] Arellano-Valle, R.B. and Genton, M.G. (2005), On fundamental skew distributions. J. Mult.Anal., 96, 93–116

    [3] Arellano-Valle, R.B., Gómez, H.W. and Quintana, F.A. (2004), A New Class of Skew-NormalDistributions. Communications in Statistics, Theory and Methods, 33, 1465–1480.

    [4] Arellano-Valle, R.B., del Pino, G. and San Martin, E. (2002), Definition and probabilisticproperties of skew-distributions. Stat. and Prob. Letters, 58, 111–121.

    [5] Arnold, B. C. (1995), Conditional survival models. In Balakrishnan, N. ed. Recent Advancesin Life-Testing and Reliability,a Volume in Honor of Alonzo Clifford Cohen, Jr.. CRC Press,Boca Raton, FL 589–601.

    [6] Arnold, B. C. and Beaver R. J. (2000), Hidden truncation models. Sankhya, A62, 22–35.

    [7] Arnold, B. C. and Beaver R. J. (2002), Skewed multivariate models related to hidden truncationand/or selective reporting. Test, 11, 7–54.

    [8] Arnold, B.C., Beaver, R.J., Groeneveld, R.A. and Meeker, W.Q. (1993), The nontruncatedmarginal of a truncated bivariate normal distribution. Psychometrika, 58, 471–478.

    [9] Arnold, B.C., Castillo, E. and Sarabia, J. M. (1999), Conditional Specification of StatisticalModels. Springer Verlag, New York.

    12

  • [10] Arnold, B.C., Castillo, E. and Sarabia, J.M. (2002), Conditionally Specified MultivariateSkewed Distributions. Sankhya, 64, 1–21.

    [11] Arnold, B.C. and Strauss, D. (2002), Bivariate distributions with exponential conditionals. J.Amer. Stat. Assoc., 83, 522–527.

    [12] Azzalinin, A. (1985), A class of distributions which includes the normal ones. Scand.J.Statist.,12, 171–178.

    [13] Azzalinin, A. (2005), The skew-normal distribution and related multivariate families.Scand.J.Statist., 32, 159–188.

    [14] Bhattacharyya, A. (1943), On some sets of sufficient conditions leading to the normal bivariatedistribution. Sankhya, 6, 1–21.

    [15] Genton, M.G.,ed. (2004), Skew-elliptical Distributions and their Applications: a Journey beyondNormality.. Chapman and Hall/CRC, London.

    [16] Gonzalez-Farias, G., Dominguez-Molina, J.A. and Gupta, A.K. (2004), The closed skew-normaldistribution. In Genton, M. ed. Skew-elliptical Distributions and their Applications., 6, 1–21.Chapman and Hall/CRC, London. 25–42.

    [17] Henze, N. (1986), A probabilistic representation of the ”skew-normal” distribution.Scand.J.Statist., 13, 399–406.

    [18] Kumbhakar, S.C. and Knox Lovell, C.A. (2000), Stochastic Frontier Analysis. CambridgeUniversity Press, Cambridge.

    13

  • 14

    Festschrift in honor of Distinguished Professor Mir Masoom Ali On the occasion of his retirement May 18-19, 2007, pages 14-22

    Using Control Information to Design Type I

    Censored Treatment versus Control Clinical Trials

    P. L. Graham1, S. N. MacEachern2, and D. A. Wolfe2*

    1 CSIRO Mathematical and Information Sciences, North Ryde, NSW 2113, Australia

    2 Department of Statistics, Ohio State University, Columbus, OH 43210 USA

    * Corresponding author: Email: [email protected]

    Summary Constraints on time and finances are a continuing problem for researchers conducting clinical trials in which a new treatment is being compared to a standard (control) treatment. Information is often available, however, on the performance of the control from previous studies. We propose a way to utilize this previous knowledge about the control to provide a mechanism for completing a study earlier than might otherwise be possible. This approach can lead to significant savings in time and expense, while still retaining good power for detecting a treatment effect. Keywords: Designed censoring percentage; Early stopping; Termination time; Wilcoxon rank sum test. 1 Introduction and Notation Controlled clinical trials are expensive and time consuming, and as such are generally only conducted if there is some evidence of potential benefit to patients. In addition, for treatments that are found to be effective (particularly if they prevent death and increase survival time) there is an ethical question of whether it is appropriate to wait until the end of the trials before offering the treatment to all patients who could benefit from it. For many clinical trials, we are faced with evaluation of a new treatment relative to a standard current (control) treatment about which we have considerable knowledge from previous studies. When the clinical trials are to be conducted within a fixed time frame (i.e., subject to Type I censoring), this information from previous studies about the control can be used to provide for early completion of the study while still maintaining adequate power for hypothesis tests. This can result in significant savings in both time and expense and lead to earlier availability of an effective new treatment. Consider a clinical trial designed to compare a control (i.e., the standard current treatment), C, with a new treatment, V. We have N subjects available for the study and n of them are randomly assigned to the control C, with the

    remaining N - n assigned to the new treatment V. The measurement of interest is the time to event (for example, relapse or death) for the subjects. However, the clinical trials will be subject to Type I Censoring in that at the end of a pre-specified period of time, T, the study will be terminated and any subjects remaining in the study at that time (treatment or control) will be assigned censored measurement values T. Thus, for these censored subjects all that we know is that their true time-to-event values are greater than the censoring time T. Now, let C1, …, Cn denote the measurements for the n control subjects and V1, …, VN-n denote the measurements for the N – n treatment subjects. Note that these measurements are true times-to-event values for those subjects who experienced an event prior to termination of the study at time T and they are equal to T for those subjects who had not yet experienced an event by time T. In this paper, we propose an approach for determining an optimal termination time for studies with Type I censoring. For illustration, we demonstrate how to implement this approach with the Mann-Whitney-Wilcoxon rank sum test procedure in the two-sample setting. We note, however, that the same technique can be used with other two-sample test procedures (such as the two-

  • 15

    sample t-test) and could be extended easily to the multiple treatments versus control setting.

    In Section 2 we discuss the model and hypotheses of interest in the two-sample setting. In Section 3 we describe the application of the Mann-Whitney-Wilcoxon rank sum test procedure for these Type I censored data. The necessary details of the null distribution and critical values are provided in Section 4. Section 5 is devoted to simulation results related to the optimal choice of termination time T. An example is presented in Section 6 and we conclude with a short discussion in Section 7. 2 Model and Hypotheses We consider a general continuous model for this treatment versus control setting, with our interest being in making inferences about the relative effectiveness of the treatment and control. We assume that the control observations C1, …, Cn are a random sample of size n from a continuous distribution with c.d.f. F(x|θc) and the treatments observations V1, …, VN-n are a random sample of size N – n from the distribution with c.d.f. F(x|θv), where θc and θv are unknown, real-valued parameters. Thus the control and treatment populations are both members of a family of distributions, with the difference between the two distributions captured in the values of the parameters θc and θv. We presume that the family of distributions is strictly stochastically ordered in the parameter θ. We note that location families, scale families of positively-valued random variables, and many other common families satisfy this assumption. We also assume that the observations are mutually independent, but we take a nonparametric modeling approach and make no additional assumptions (other than continuity) about the form of the common F(.). Let µc = E[g(C)] and µv = E[g(V)] be the expected values of some increasing function g(.) for which both expectations exist and for which there is a 1-1 map between θ and µ. There is little loss in thinking of µc and µv as the means for the C and V distributions, respectively, and we will refer to them in this way throughout the paper. (These means are, of course, functions of the underlying parameters θc and θv; stochastic ordering of the distributions in θ results in the same ordering of the distributions in µ.) We are interested in making inferences about the difference in the means ∆ = µv - µc. In particular, in this paper we will concentrate on testing the null hypothesis H0: ∆ = 0, which

    corresponds to θc = θv and C=d

    V , against the

    alternative H1: ∆ > 0, corresponding to the treatment mean being larger than the control mean. We will also discuss how to test for a negative treatment effect, H2: ∆ < 0, or a general two-sided alternative, H3: ∆ ≠ 0. 3 Tests based on the Mann-Whitney-Wilcoxon Statistic Since we have not made any assumption about the form of F(.), other than the distribution being continuous, we will utilize the Wilcoxon form of the Mann-Whitney-Wilcoxon statistic to construct distribution-free procedures for testing H0: ∆ = 0. For this purpose, we jointly rank the n control and N – n treatment time-to-event observations from least to greatest. Thus, the rank 1 is assigned to the first subject (treatment or control) to experience an event, 2 is assigned to the second subject to experience an event, and so on until M is assigned to the last subject to actually experience an event prior to the censoring time T. Finally, each of the N – M subjects who has not yet experienced an event by time T is assigned the average, A, of the unassigned ranks given by

    A =i − j

    j =1

    M

    ∑i =1

    N

    N − M=

    N(N +1)2

    − M(M +1)2

    N − M (1) Thus, for the N combined control and treatment observations (C1, …, Cn, V1, …, VN-n) we obtain a joint rank vector R = (Q1, …, Qn, R1, …, RN-n), where Qi is the joint rank of Ci, for i = 1, …, n, and Rj is the joint rank of Vj, for j = 1, …, N – n. For given M (the number of subjects to experience an event prior to time T), R will be some permutation of the vector (1, …, M, A, …, A). Note that the support for M is m ∈ {0, 1, …, N}. For our test statistic, we consider the Wilcoxon form of the Mann-Whitney-Wilcoxon statistic, G, given by the sum of the joint ranks for the sample (control or treatment) with the fewer number of observations. Thus, if n ≤ N – n, we

    have G = Qii=1

    n

    ∑ = (sum of the ranks for the control

    sample), while if n > N – n we take G = Rjj =1

    N− n

    ∑ =

    (sum of the ranks for the treatment sample). We first describe the test procedures based

    on G for the setting where n > N – n so that

  • 16

    G = Rjj =1

    N− n

    ∑ . In this situation, the test statistic G is

    the sum of the treatment sample ranks and large (small) values of G are indicative of ∆ > ( 0 if and only if G ≥ gα ,n (2) and Reject H0: ∆ = 0 in favor of H1: ∆ < 0 if and only if G < g1−α ,n , (3) where gα, n is the upper αth percentile for the null (H0) distribution of G. The corresponding two-sided test is given by Reject H0: ∆ = 0 in favor of H1: ∆ ≠ 0 if and only if G ≥ gα2 ,n or G < g1−α1 ,n , (4) where α1 and α2 are chosen to satisfy α1 + α2 = α. Unless there are compelling reasons not to, we recommend choosing both α1 and α2 as close as

    possible to α2

    .

    For the setting where n ≤ N – n, the test statistic G is the sum of the ranks for the control sample and small (large) values of G are indicative of ∆ > ( 0 if and only if G < g1−α ,n (5) and Reject H0: ∆ = 0 in favor of H1: ∆ < 0 if and only if G ≥ gα ,n . (6) The level α two-sided test has the same form as in equation (4), except that here G is the sum of the control sample ranks. Now to conduct any of these test procedures we need to know the null (H0) distribution of the test statistic G. This, of course, depends on the censoring time T and the common underlying distribution under H0. In the next section we derive an expression for the null distribution of G as a function of the termination time T and show how we can use information from previous studies on the control to design our clinical trial to provide for early termination when the treatment is substantially more effective than the control. 4 Null Distribution of the Test Statistic G As defined in Section 3, let R = (Q1, …, Qn, R1, …, RN-n) be the joint rank vector for the combined control and treatment observations (C1, …, Cn, V1, …, VN-n) for given termination time T. Then the

    marginal probability distribution for R can be expressed as

    P(R= r ) = P(R= r | M = m)P(M = m)m= 0

    N

    ∑ (7)

    where, as before, M is the number of subjects (treatment or control) who experience an event prior to the termination time T. On the other hand, the marginal distribution of M is determined by the termination time T and the lifetime distributions for the treatment and control populations. Let

    pc = P(subject on control C has an event before time T) (8)

    and pv = P(subject on treatment V has an event

    before time T). (9) Then, it follows that

    ( ) ( ) ].)1(][)1([0201

    )( 222

    11

    1

    21

    unN

    vuv

    nNu

    unc

    uc

    nu

    muu

    ppppn

    u

    n

    umMP −−−−

    =+

    −−==

    == ∑∑

    (10) Now, when the null hypothesis H0: ∆ = 0

    is true, the C's and V's are i.i.d. random variables so that

    P0(R= r | M = m) =(N − m)!

    N!, r ∈ Hm, (11)

    where Hm is the set of all distinct permutations of the vector (1, …, m, A, …, A). Also, when the null hypothesis H0: ∆ = 0 is true, we have pc = pv and equation (10) simplifies to

    ( )( ) 2121210201

    21

    )( )1(0uuN

    cuu

    cnN

    unu

    n

    u

    n

    u

    muu

    mMPpp −−+−∑

    =∑=

    =+

    == −

    ∑=

    ∑−

    =−=

    =+

    −n

    u

    nN

    upp

    muu

    mNc

    mc

    01 02)1(

    21

    u1

    n( ) u2N−n( )

    = mN( )pcm(1− pc )N −m . (12)

    Thus (not surprisingly), the marginal distribution of M is binomial with parameters N and pc under H0: ∆ = 0, which depends on the termination time T only through pc. Combining the expressions in equations (7), (11), and (12), it follows that the null marginal distribution of the rank vector R has p.m.f

    P0(R= r) =pc

    m(1− pc )N −m

    m!m=0

    N

    ∑ ,

    r ∈ ∪m=0

    N

    Η m , (13)

  • 17

    where, for m = 1, …, N, Hmis once again the set of all distinct permutations of the vector (1, …, m, A, …, A); that is, the null distribution of R is uniform

    over ∪m=0

    N

    Η m and it depends on the termination time T only through pc. The null distribution of our test statistic G is then obtained immediately from the expression in (13) by noting that

    P(G = g) = pcm(1− pc )

    N −m

    m!m=0

    N

    ×Q(g)

    ,

    g ∈ S, (14) where Q(g) is the number of rank vectors r ∈

    ∪m=0

    N

    Η m for which G = g, and S is the set of possible values for G over all combinations of m ∈ {0, …, N} and r ∈ Hm. Null distribution tables for G were calculated using an R program (R Development Core Team, 2006) that is available from the authors on request. Some examples of these null distribution tables can be found in Appendix A and a more extensive set of tables for N ≤ 20 is given in Graham (2001) and Graham et al. (2001). In the event that N exceeds 20 a Monte-Carlo simulation program, also available from the authors on request, can be used to calculate an approximate p-value for an observed value of G. 5 Choice of Termination Time T—Simulation Studies As noted previously, properties of the tests based on G depend on the termination time T only through the probabilities of event occurrence, pc and pv, for the control and treatment populations, respectively. Clearly smaller choices of T lead to shorter and less expensive clinical trials. On the other hand, for typical distributions the powers of the test procedures based on G are increasing functions of T. We shall see that the use of prior information about the control can be useful in selecting a termination time that can provide dramatic shortening of the study length while still maintaining effective power properties for the test. The goal in our setting is to select as small a pc value (with associated early termination time) as possible without sacrificing too much power for the test procedure. To investigate this feature of the proposed test we conducted a Monte Carlo simulation power study as follows. For a given replicate, two random samples, each of size ten, of survival times are generated, one from the control distribution and the second from the treatment distribution. We consider a variety of different control proportions pc (and associated termination

    times T) and treatment proportions pv, corresponding to different alternative values ∆. For each replicate the censored rank sum statistic G is calculated as described in Section 3 and we record whether it leads to rejection of H0: ∆ = 0 in favor of the appropriate alternative at the available significance level closest to 0.045. (The level is chosen to be 0.045 so that we can make direct comparisons with the usual uncensored Mann-Whitney-Wilcoxon rank sum test for these sample sizes.) This process is repeated for 100,000 replicates for each (pc, pv) combination and the power of the test is estimated by the observed percentage of rejections. These simulations were carried out for underlying normal, exponential, and Weibull distributions. The results are similar for all three distributions, although the observed improvement in power is a bit greater for the normal distribution when more subjects are censored in the control than in the treatment and the time savings from early termination is a bit lower for the Weibull distribution when pc and pv do not differ by much. Thus, for brevity, in this paper we present details of the simulation results only for the underlying exponential distribution with c.d.f.

    F(x | λ) =1− e− λx , λ > 0. For this exponential setting, we take λ = 1 for the control population and use it to set the termination times T corresponding to a variety of values for pc. (At first glance, the choice of λ = 1 for our simulations appears to be somewhat arbitrary. However, its role is simply to serve as a baseline against which to compare treatment alternative λ values and associated pv percentages. Starting with a different baseline, say λ = 2, would certainly lead to different termination times T, but the observed power against comparable alternative λ and pv values would have been similar to what we found for λ = 1. It is solely the difference between the pc and pv percentages that controls the power.) Figure 1 shows how the power of the test varies with different combinations of pc and pv. As expected, the power is a monotonically increasing function of the absolute difference |pv – pc|.

    Our proposed rank sum procedure for censored data will typically have lower power than the corresponding Mann-Whitney-Wilcoxon rank sum procedure based on full and complete measurements. To provide some insight as to how much power is lost by using this early termination approach, we compare the difference between the power of our G test and the uncensored rank sum test. These simulation comparisons are presented in Figure 2.

  • 18

    Figure 1. Power of G test for various combinations of pc and pv.

    Figure 2. The difference in power between the usual Mann-Whitney-Wilcoxon rank sum test and the test based on G for various combinations of pc and pv.

    The biggest differences in power between

    the uncensored rank sum test and our G test occur when pv ≤ 0.3. In this region, the power of the uncensored rank sum test is frequently more than 0.15 higher than that for the G test, and this difference reaches nearly 0.5 when pv is as low as 0.1. Of course, this is not really surprising since a great deal of treatment information is being lost if we are censoring more than 70% of the treatment observations. Also as expected, when both pc and pv are large (so there is very little censoring in either sample), the two tests are very similar in power.

    We also investigated the time (and resultant expense) saved by using the early termination approach described in this paper. For each replicate of a given simulation (combination of pc and pv), we recorded the following information:

    1. Length of time to complete the uncensored

    experiment in which every subject experiences the event of interest—this, of course, is just the largest observation (time to event) in that iteration.

    2. Length of time to complete the censored experiment—this is most often just the early termination time for the study. However, there were some simulations where all of the observations (both treatment and control) were less than the

    termination value (that is, all the subjects observed events prior to the termination time). For such iterations, this measure is then the largest observation.

    These two pieces of information were then averaged over the 100,000 replicates in the simulation to obtain the summary statistics "Average time of uncensored experiment" and "Average time of censored experiment". We also computed the average of all of the actual values for observations that would have been censored across the entire 100,000 replicates in the simulation. This is reported as the "Conservative average of uncensored times", since it represents a more conservative estimate of the "length" of an uncensored experiment. We have chosen to include this calculation as well because outliers among the maximum observations could heavily influence the "Average time of uncensored experiment". For each (pc, pv) combination in our simulation study, we estimate the typical time saved from using the early termination approach by the observed difference Time saved = ["Average time of uncensored

    experiment" – "Average time of censored experiment"]. (15)

    To take into account the possible misleading effect of outliers among the maximum observed values, we also compute a conservative estimate of the time saved to be Conservative time saved = ["Conservative average

    of uncensored times" – "Average time of censored experiment"]. (16)

    In Figures 3 and 4 we graphically display both of these "time saved" measures for pc = 0.8 (Figure 3) and pc = 0.2 (Figure 4) and pv values ranging from 0.1 to 0.9. Figure 3 shows that when pc is large (0.8) and pv is small the conservative measure for uncensored experiments ranges, on average, from 5 to 15 times longer than the censored experiment even when we use the conservative average of uncensored times. For larger pv values the conservative time saved is less pronounced although the typical conservative measure for uncensored experiments is still 50% longer than the typical censored experiment. The savings are substantially greater when we compare against the average uncensored maximum time measurements.

    When pc is small as in Figure 4 (pc = 0.2), the differences between the lengths of time for

  • 19

    censored and uncensored (using either the conservative or maximum measure) experiments is less striking (than when pc = 0.8) for small pv values. However, a substantial timesaving is maintained over all studied values of pv. Similar results were obtained for other values of pc ranging from 0.9 to 0.1 and are presented in Figures 5-11 in Appendix B.

    Figure 3. Time saved with use of G test when pc = 0.8 for various values of pv.

    Figure 4. Time saved with use of G test when pc = 0.2 for various values of pv.

    6 Example The data used in this example are adapted from a subset of a University of Massachusetts Aids Research Unit (UMARU) IMPACT study by McCusker et al. (1997) and described in Hosmer and Lemeshow (1999). In that study patients were randomized to one of two drug treatment programs of differing lengths. For the purposes of our example we use the 360 patients from treatment site A who returned to drug use within 500 days; as such there is no censoring in the basic data set. The aim of the study was to determine which of the two treatment programs leads to a longer typical time until return to drugs. For our example we consider the short program to be the control. For our analysis the data have been slightly modified so that there are no ties in the time until relapse values. This example is typical of two-sample survival studies in which a control and a treatment are compared; a common feature of survival analysis data is censoring. As noted previously, Type I censoring occurs when the censoring of an event is dictated by a predetermined censoring time

    T. If an event has not occurred by time T the observation is censored and assigned an "observed" value of T. This study was the first of its kind so that previous data about the control are not available. Thus we randomly divided the 360 patients into a "prior data" set of 60 observations to facilitate the application of our procedure and a second test set consisting of the remaining 300 observations. The second set is used to examine the effect of various termination (censoring) times on the proposed hypothesis tests. Using subjects on the short program as controls we begin by looking at the "prior" data and determining times that correspond to different censoring proportions for the control group. The first column of Table 1 presents eleven potential termination times, T, for the study. The numbers of controls in the prior data set that would have been censored with these termination times are given in the second column of the table and the corresponding percentages, pc, of controls with observed times to event (not censored) in each case are given in the third column. Then using each of these termination times to analyze the second test data set, the p-values associated with tests of the null hypothesis of no difference in the two treatments against the alternative that the long treatment program has a greater average time until relapse are given in the final column of the table. Table 1: Setting Termination Time Using the Prior Data and its Effect on the P-value for the Test Data. Termination

    Time T (days)

    Prior Data Number Censored

    pc

    Test Data P-value

    499.00 471.01 306.14 292.03 136.11 130.98 126.98 111.08 95.02 90.05 85.98

    0 1 3 4 6 7 8 10 11 11 14

    0.99 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50

    .00287

    .00280

    .00263

    .00270

    .00580

    .00576

    .00753

    .00723

    .09998

    .15168

    .17829

    The results in Table 1 show that it would have been possible to conduct the clinical trial on the second set of data with pc as low as 0.65 and still obtain a significant p-value in support of the conclusion that the long treatment program leads to

  • 20

    a greater average period of time before return to drugs. The "previous pilot" study ran for 500 days, but using this previous data to determine earlier stopping times we could have chosen to terminate the "new" clinical trial after only 115 days, for example, and still have reached the conclusion that the longer treatment time is preferred. Since our power studies indicated that larger values of pc lead to tests with power comparable to that of the usual Mann-Whitney-Wilcoxon rank sum test, we might want to be more conservative and take pc = 0.85, for example. Even with this more conservative pc value, however, the termination time of 293 days is still 30% shorter than the full 500 days and the conclusion is the same. This would have permitted the investigators an opportunity to recognize that the longer treatment is more effective at least 200 days sooner than in the reported study. 7 Discussion In this paper we proposed a Mann-Whitney-Wilcoxon type test procedure that allows us to use previously available data on a control to design treatment versus control clinical trials with early stopping times while still maintaining reasonable power for detecting effective treatments. We found that the procedure had good power when the control and treatment groups are sufficiently far apart. Moreover, when pc < pv or when pc and pv are both large, the power is very close to that of the uncensored Mann-Whitney-Wilcoxon procedure even with early stopping of the study. The example showed that it is possible to decrease the length of the trial by as much as 60% and yet still retain sufficient power. Of course, the effectiveness of our procedure will vary from one setting to another. However, we feel that it provides another useful tool for reducing the length of many clinical trials when prior information is available on the baseline control. It is evident that the associated savings in trial time and expense can be considerable. Acknowledgments This work was supported in part by the National Science Foundation under Award Numbers DMS-0072526 and DMS-9802358. The authors thank Tim Keighley for his assistance with the R functions. References Graham, P. L. (2000). Using design characteristics of the control in type I

    censored experiments. Master's Thesis, Department of Statistics, Ohio

    State University.

    Graham, P. L., MacEachern, S. N., and Wolfe, D. A. (2001). "Using design

    characteristics of the control in type I censored experiments: Tables". Technical Report Number 569, Department of Statistics, Ohio State University.

    Hosmer, D. W. and Lemeshow, S. (1999). Applied Survival Analysis: Regression

    Modeling of Time to Event Data. John Wiley and Sons, Inc., New York.

    McCusker, J., Bigelow, C., Vickers-Lahti, M., Spotts, D., Garfield, F., and Frost, R.

    (1997). "Planned duration of residential drug abuse treatment: efficacy versus effectiveness". Addiction 92(11), 1467-1478.

    R Development Core Team (2006). R: A Language and Environment for Statistical

    Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. URL http://www.R-project.org.

    Appendices A Null Distribution Tables for G For the following tables, N is the total sample size (control plus treatment) and n is the smaller of the two sample sizes. The proportion of the control population that is designated to be fully ranked is denoted by pc. Tabulated entries are the upper tail critical values for the null distribution of G that are closest to 0.01 and 0.05. More comprehensive tables containing roughly the entire upper and lower fifteen percent of the null distribution of G are available in Graham (2000) and Graham et al. (2001). Table 2: Upper Tail Critical Values for the G Statistic, N = 8, n = 4

    pc = 0.1 pc = 0.6 G P(G ≥ g) G P(G ≥ g) 20 0.2367 23.5 0.0554

    20.5 0.0402 24 0.0421 22 0.0349 26 0.0118

  • 21

    22.5 0.0030

    pc = 0.2 pc = 0.7 G P(G ≥ g) G P(G ≥ g) 22 0.0809 23.5 0.0555

    22.5 0.0174 24 0.0477 24 0.0116 26 0.0135 25 0.0010

    pc = 0.3 pc = 0.8

    G P(G ≥ g) G P(G ≥ g) 22 0.1083 24 0.0531

    22.5 0.0416 25 0.0276 24 0.0227 26 0.0141 25 0.0036

    pc = 0.4 pc = 0.9

    G P(G ≥ g) G P(G ≥ g) 23 0.0533 24 0.0565

    23.5 0.0447 25 0.0285 24 0.0314 26 0.0143 25 0.0083

    pc = 0.5 pc = 1.0

    G P(G ≥ g) G P(G ≥ g) 23 0.0699 24 0.0571

    23.5 0.0528 25 0.0286 25 0.0143 26 0.0143 26 0.0091

    Table 3: Upper Tail Critical Values for the G Statistic, N = 14, n = 7

    pc = 0.1 pc = 0.6 G P(G ≥ g) G P(G ≥ g)

    59.5 0.0771 65 0.0537 60 0.0177 65.5 0.0429 63 0.0130 69.5 0.0108 64 0.0018 70 0.0097

    pc = 0.2 pc = 0.7

    G P(G ≥ g) G P(G ≥ g) 61 0.0574 65 0.0580

    61.5 0.0445 65.5 0.0471 65 0.0110 70 0.0113 66 0.0085 70.5 0.0083

    pc = 0.3 pc = 0.8

    G P(G ≥ g) G P(G ≥ g) 63 0.0601 65 0.0609

    63.5 0.0349 65.5 0.0490 66.5 0.0162 70 0.0124 67 0.0080 70.5 0.0087

    pc = 0.4 pc = 0.9

    G P(G ≥ g) G P(G ≥ g) 64 0.0503 65 0.0629 64.5 0.0386 65.5 0.0492 68 0.0120 70 0.0129 68.5 0.0087 70.5 0.0088

    pc = 0.5 pc = 1.0 G P(G ≥ g) G P(G ≥ g)

    64 0.0627 65 0.0641 64.5 0.0490 66 0.0487 69 0.0114 70 0.0131 69.5 0.0085 71 0.0087 Table 4: Upper Tail Critical Values for the G Statistic, N = 20, n = 10

    pc = 0.1 pc = 0.6 G P(G ≥ g) G P(G ≥ g)

    115 0.1098 126 0.0526 115.5 0.0418 126.5 0.0465 120 0.0279 134 0.0111 121 0.0070 134.5 0.0094

    pc = 0.2 pc = 0.7 G P(G ≥ g) G P(G ≥ g)

    120 0.0653 126.5 0.0503 120.5 0.0376 127 0.0483 125 0.0201 135 0.0102 125.5 0.0099 135.5 0.0086

    pc = 0.3 pc = 0.8 G P(G ≥ g) G P(G ≥ g)

    122 0.0577 127 0.0504 122.5 0.0487 127.5 0.0445 130 0.0109 135 0.0110 130.5 0.0067 135.5 0.0092

    pc = 0.4 pc = 0.9 G P(G ≥ g) G P(G ≥ g)

    124 0.0538 127 0.0517 124.5 0.0481 127.5 0.0450 131.5 0.0104 135 0.0114 132 0.0098 135.5 0.0093

    pc = 0.5 pc = 1.0 G P(G ≥ g) G P(G ≥ g)

    125 0.0552 127 0.0526 125.5 0.0482 128 0.0446 133 0.0112 135 0.0116 133.5 0.0093 136 0.0093

  • 22

    B Simulation Results—Time Savings The following seven figures display additional simulation results demonstrating the timesaving for various combinations of pc and pv. Figure 5. Time saved with use of G test when pc = 0.9 for various values of pv.

    Figure 6. Time saved with use of G test when pc = 0.7 for various values of pv.

    Figure 7. Time saved with use of G test when pc = 0.6 for various values of pv.

    Figure 8. Time saved with use of G test when pc = 0.5 for various values of pv.

    Figure 9. Time saved with use of G test when pc = 0.4 for various values of pv.

    Figure 10. Time saved with use of G test when pc = 0.3 for various values of pv.

    Figure 11. Time saved with use of G test when pc = 0.1 for various values of pv.

  • 23

    Festschrift in honor of Distinguished Professor Mir Masoom Ali On the occasion of his retirement May 18-19, 2007, pages 23-43

    A Comparison of Graphical Methods for Assessing the Proportional Hazards

    Assumption in the Cox Model

    Inger Persson, AstraZeneca R&D, Södertälje, SWEDEN

    Harry Khamis, Statistical Consulting Center, Wright State University, Dayton, Ohio 45435, USA ABSTRACT Six graphical procedures to check the assumption of proportional hazards for the Cox model are described and compared. A new way of comparing the graphical procedures using a Kolmogorov-Smirnov like maximum deviation criterion for rejection is derived for each procedure. The procedures are evaluated in a simulation study under proportional hazards and five different forms of nonproportional hazards: (1) increasing hazards, (2) decreasing hazards, (3) crossing hazards, (4) diverging hazards, and (5) nonmonotonic hazards. The procedures are compared in the two-sample case corresponding to two groups with different hazard functions. None of the procedures under consideration require partitioning of the survival time axis. Results indicate that the Arjas plot, a plot of estimated cumulative hazard versus number of failures, is superior to the other procedures under almost every form of nonproportional hazards, especially crossing and nonmonotonic hazards. For increasing hazards, the smoothed plot of the ratio of log cumulative baseline hazard rates versus time or the smoothed plot of scaled Schoenfeld residuals versus time perform the best. The Andersen plot performs very poorly for increasing, decreasing, and diverging hazards. 1. INTRODUCTION The relation between the distribution of event times and time-invariant covariates or risk factors z (z is a p × 1 vector) can be described in terms of a model according to Cox (1972), in which the hazard rate at time t for an individual is

    λ(t,z) = λ0(t)eβ'z, (1.1)

    where λ0(t) is the baseline hazard rate, an unknown (arbitrary) function giving the hazard rate for the standard set of conditions z = 0, and β is a p ×1 vector of unknown parameters. The factor eβ'z describes the hazard for an individual with covariates z relative to the hazard at standard conditions z = 0. The ratio of the hazard functions for two individuals with covariate values z and z* is

    )('*

    *

    )|(

    )|( zzezt

    zt −= βλλ

    , an expression that does not

    depend on t. Thus, the Cox model in (1.1) is only valid for data consistent with the assumption of proportional hazards. Since the validity of the Cox regression analysis based on the model in (1.1) relies on the assumption of proportionality of the hazard rates of

    individuals with distinct covariate values, it is important to be able to reliably determine if the assumption is plausible. This can be done graphically or numerically. A partial review of numerous graphical and analytical methods for checking the adequacy of Cox models was given by Lin and Wei (1991). Some authors recommend using numerical tests for such determinations (e.g., Hosmer and Lemeshow, 1999, p. 207). However, others recommend graphical procedures arguing that the proportional hazards assumption only approximates the correct model for a covariate, and that any formal statistical test, based on a large enough sample size, will reject the null hypothesis of proportionality (Klein and Moeschberger, 1997, p. 354). A comprehensive comparative study of numerical procedures is given elsewhere (Persson, 2002). This paper focuses on the effectiveness of graphical procedures. In section 2, six graphical methods for determining the plausibility of the proportional hazards assumption are described. In section 3, the results of a comparative simulation study are presented. A discussion is given in section 4, an example is presented in section 5, and conclusions are given in section 6. 2. GRAPHICAL METHODS COMPARED Hess (1995) describes eight graphical methods for detecting violations of the proportional hazards

  • 24

    assumption and demonstrated each on three authentic data sets. Five of those methods are described in sections 2.1 – 2.5 below. The methods not included in this paper are (1) methods that require a partitioning of the time axis, which introduces a certain degree of arbitrariness into the procedure, leading to different conclusions depending on the partition used, or (2) methods that do