statistics in clinical research for residents

Upload: iyanasiana

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Statistics in Clinical Research for Residents

    1/74

    Statistical Methods

    in Clinical Research

  • 7/27/2019 Statistics in Clinical Research for Residents

    2/74

  • 7/27/2019 Statistics in Clinical Research for Residents

    3/74

  • 7/27/2019 Statistics in Clinical Research for Residents

    4/74

    Overview Data types

    Summarizing data using descriptive statistics

    Standard error

    Confidence Intervals

  • 7/27/2019 Statistics in Clinical Research for Residents

    5/74

    Overview P values

    One vs two tailed tests

    Alpha and Beta errors Sample size considerations and power

    analysis

    Statistics for comparing 2 or more groups

    with continuous data Non-parametric tests

  • 7/27/2019 Statistics in Clinical Research for Residents

    6/74

    Overview

    Regression and Correlation

    Risk Ratios and Odds Ratios

    Survival Analysis

    Cox Regression

  • 7/27/2019 Statistics in Clinical Research for Residents

    7/74

    Types of Data Discrete Data-limited number of choices

    Binary: two choices (yes/no) Dead or alive

    Disease-free or not

    Categorical: more than two choices, not ordered Race

    Age group

    Ordinal: more than two choices, ordered Stages of a cancer

    Likert scale for response

    E.G. strongly agree, agree, neither agree or disagree, etc.

  • 7/27/2019 Statistics in Clinical Research for Residents

    8/74

  • 7/27/2019 Statistics in Clinical Research for Residents

    9/74

    Types of data Continuous data

    Theoretically infinite possible values (withinphysiologic limits) , including fractional values Height, age, weight

    Can be interval Interval between measures has meaning.

    Ratio of two interval data points has no meaning

    Temperature in celsius, day of the year). Can be ratio

    Ratio of the measures has meaning

    Weight, height

  • 7/27/2019 Statistics in Clinical Research for Residents

    10/74

    HistogramContinuous Data

    No segmentation of data into groups

  • 7/27/2019 Statistics in Clinical Research for Residents

    11/74

    Frequency Distribution

    Segmentation of data into groups

    Discrete or continuous data

  • 7/27/2019 Statistics in Clinical Research for Residents

    12/74

    Box and Whiskers Plots

  • 7/27/2019 Statistics in Clinical Research for Residents

    13/74

    Box and Whisker Plots

    Popular in Epidemiologic Studies

    Useful for presenting comparative data graphically

  • 7/27/2019 Statistics in Clinical Research for Residents

    14/74

    Numeric Descriptive Statistics Measures of central tendency of data

    Mean

    Median

    Mode

    Measures of variability of data

    Standard Deviation

    Interquartile range

  • 7/27/2019 Statistics in Clinical Research for Residents

    15/74

    Numeric Descriptive Statistics

  • 7/27/2019 Statistics in Clinical Research for Residents

    16/74

    Sample Mean Most commonly used measure of central tendency

    Best applied in normally distributed continuous data.

    Not applicable in categorical data

    Definition: Sum of all the values in a sample, divided by the number of

    values.

  • 7/27/2019 Statistics in Clinical Research for Residents

    17/74

    Sample Median Used to indicate the average in a skewed population

    Often reported with the mean If the mean and the median are the same, sample is normally

    distributed. It is the middle value from an ordered listing of the

    values If an odd number of values, it is the middle value

    If even number of values, it is the average of the two middle

    values.

    Mid-value in interquartile range

  • 7/27/2019 Statistics in Clinical Research for Residents

    18/74

    Sample Mode Infrequently reported as a value in studies.

    Is the most common value

    More frequently used to describe the

    distribution of data Uni-modal, bi-modal, etc.

  • 7/27/2019 Statistics in Clinical Research for Residents

    19/74

    Interquartile range Is the range of data from the 25th percentile to

    the 75th percentile

    Common component of a box and whiskers

    plot

    It is the box, and the line across the box is the

    median or middle value

    Rarely, mean will also be displayed.

  • 7/27/2019 Statistics in Clinical Research for Residents

    20/74

    Standard Error A fundamental goal of statistical analysis is to estimate

    a parameter of a population based on a sample

    The values of a specific variable from a sample are anestimate of the entire population of individuals whomight have been eligible for the study.

    A measure of the precision of a sample in estimatingthe population parameter.

  • 7/27/2019 Statistics in Clinical Research for Residents

    21/74

    Standard Error Standard error of the mean

    Standard deviation / square root of (sample size)

    (if sample greater than 60)

    Standard error of the proportion Square root of (proportion X 1 - proportion) / n)

    Important: dependent on sample size Larger the sample, the smaller the standard error.

  • 7/27/2019 Statistics in Clinical Research for Residents

    22/74

    Clarification Standard Deviation measures the

    variability or spread of the data in an

    individual sample.

    Standard error measures the precision

    of the estimate of a populationparameter provided by the sample meanor proportion.

  • 7/27/2019 Statistics in Clinical Research for Residents

    23/74

    Standard Error Significance:

    Is the basis of confidence intervals

    A 95% confidence interval is defined by Sample mean (or proportion) 1.96 X standard error

    Since standard error is inversely related to the

    sample size: The larger the study (sample size), the smaller the

    confidence intervals and the greater the precision of theestimate.

  • 7/27/2019 Statistics in Clinical Research for Residents

    24/74

    Confidence Intervals May be used to assess a single point

    estimate such as mean or proportion.

    Most commonly used in assessing the

    estimate of the difference between two

    groups.

  • 7/27/2019 Statistics in Clinical Research for Residents

    25/74

    Confidence Intervals

    Commonly reported in studies to provide an estimate of the precision

    of the mean.

  • 7/27/2019 Statistics in Clinical Research for Residents

    26/74

    Confidence Intervals

  • 7/27/2019 Statistics in Clinical Research for Residents

    27/74

    P Values The probability that any observation is due to chance

    alone assuming that the null hypothesis is true

    Typically, an estimate that has a p value of 0.05 or less is

    considered to be statistically significant or unlikely to occurdue to chance alone.

    The P value used is an arbitrary value

    P value of 0.05 equals 1 in 20 chance

    P value of 0.01 equals 1 in 100 chance

    P value of 0.001 equals 1 in 1000 chance.

  • 7/27/2019 Statistics in Clinical Research for Residents

    28/74

    P Values and Confidence Intervals P values provide less information than confidence

    intervals. A P value provides only a probability that estimate is due to chance

    A P value could be statistically significant but of limited clinicalsignificance.

    A very large study might find that a difference of .1 on a VAS Scale of 0 to10 is statistically significant but it may be of no clinical significance

    A large study might find many significant findings during multivariableanalyses.

    a large study dooms you to statistical significance

    Anonymous Statistician

  • 7/27/2019 Statistics in Clinical Research for Residents

    29/74

    P Values and Confidence Intervals Confidence intervals provide a range of plausible values of the

    population mean

    For most tests, if the confidence interval includes 0, then it is notsignificant.

    Ratios: if CI includes 1, then is not significant

    The interval contains the true population value 95% of the time.

    If a confidence interval range is very wide, then plausible value mightrange from very low to very high.

    Example: A relative risk of 4 might have a confidence interval of 1.05 to9, suggesting that although the estimate is for a 400% increased risk, anincreased risk of 5% to 900% is plausible.

  • 7/27/2019 Statistics in Clinical Research for Residents

    30/74

    Errors Type I error

    Claiming a difference between two samples

    when in fact there is none. Remember there is variability among samples-

    they might seem to come from differentpopulations but they may not.

    Also called the error. Typically 0.05 is used

  • 7/27/2019 Statistics in Clinical Research for Residents

    31/74

    Errors Type II error

    Claiming there is no difference between two

    samples when in fact there is.Also called a error. The probability of not making a Type II error

    is 1 - , which is called the power of the

    test. Hidden error because cant be detected

    without a proper power analysis

  • 7/27/2019 Statistics in Clinical Research for Residents

    32/74

    Errors

    Null Hypothesis

    H0

    Alternative

    Hypothesis

    H1

    Null Hypothesis

    H0 No Error Type I

    AlternativeHypothesis

    H1

    Type II

    No Error

    Test Result

    Truth

  • 7/27/2019 Statistics in Clinical Research for Residents

    33/74

    Sample Size CalculationAlso called power analysis.

    When designing a study, one needs to determine howlarge a study is needed.

    Power is the ability of a study to avoid a Type II error. Sample size calculation yields the number of study

    subjects needed, given a certain desired power todetect a difference and a certain level of P value that

    will be considered significant. Many studies are completed without proper estimate of

    appropriate study size.

    This may lead to a negative study outcome in error.

  • 7/27/2019 Statistics in Clinical Research for Residents

    34/74

    Sample Size Calculation Depends on:

    Level of Type I error: 0.05 typical

    Level of Type II error: 0.20 typical

    One sided vs two sided: nearly always two

    Inherent variability of population

    Usually estimated from preliminary data

    The difference that would be meaningful

    between the two assessment arms.

  • 7/27/2019 Statistics in Clinical Research for Residents

    35/74

    One-sided vs. Two-sided Most tests should be framed as a two-

    sided test.

    When comparing two samples, we usuallycannot be sure which is going to be bebetter. You never know which directions study results

    will go. For routine medical research, use only two-

    sided tests.

  • 7/27/2019 Statistics in Clinical Research for Residents

    36/74

    Sample size for proportions

    Stata input: Mean 1 = .2, mean 2 = .3, = .05, power (1-) =.8.

  • 7/27/2019 Statistics in Clinical Research for Residents

    37/74

    Sample Size for Continuous Data

    Stata input: Mean 1 = 20, mean 2 = 30, = .05, power (1-) =.8, std. dev. 10.

  • 7/27/2019 Statistics in Clinical Research for Residents

    38/74

    Statistical Tests Parametric tests

    Continuous data normally distributed

    Non-parametric tests

    Continuous data not normally distributed

    Categorical or Ordinal data

    Ch i t t f i th f 2

  • 7/27/2019 Statistics in Clinical Research for Residents

    39/74

    Choosing a test for comparing the averages of 2 or moresamples of scores of experiments with one treatment factor

    Data Between subjects

    (independent samples)

    Within subjects

    (related samples)2 samples

    Interval Independent t-test Paired t-test

    Ordinal Wilcoxon-Mann-Whitney

    test

    Wilcoxon signed ranks

    test, Sign test

    Nominal Chi-square test Mc Nemar test

    > 2 samples

    Interval One way ANOVA Repeated measuredANOVA

    Ordinal Kruskal-Wallis test Friedman test

    Nominal Chi-square test Cochrans Q test

    (dichotomous data only)

  • 7/27/2019 Statistics in Clinical Research for Residents

    40/74

    Scheme for choosing one-sample test

    Nominal 2 categories >2 categoriesBinomial test Chi-square

    testOrdinal Randomness Distribution

    Runs test Kolmogorov-Smirnov test

    Interval Mean Distribution

    t-test Kolmogorov-Smirnov test

  • 7/27/2019 Statistics in Clinical Research for Residents

    41/74

  • 7/27/2019 Statistics in Clinical Research for Residents

    42/74

    Comparison of 2 Sample Means Students T test

    Assumes normally distributed continuous

    data.

    T value = difference between meansstandard error of difference

    T value then looked up in Table todetermine significance

  • 7/27/2019 Statistics in Clinical Research for Residents

    43/74

    Paired T Tests Uses the change before

    and after intervention in asingle individual

    Reduces the degree ofvariability between thegroups

    Given the same number ofpatients, has greaterpower to detect adifference between groups

  • 7/27/2019 Statistics in Clinical Research for Residents

    44/74

    Analysis of Variance Used to determine if two or more samples are

    from the same population- the null hypothesis.

    If two samples, is the same as the T test. Usually used for 3 or more samples.

    If it appears they are not from same

    population, cant tell which sample is different.

    Would need to do pair-wise tests.

  • 7/27/2019 Statistics in Clinical Research for Residents

    45/74

    Non-parametric Tests Testing proportions

    (Pearsons) Chi-Squared ( 2) Test Fishers Exact Test

    Testing ordinal variables Mann Whiney U Test

    Kruskal-Wallis One-way ANOVA

    Testing Ordinal Paired Variables Sign Test

    Wilcoxon Rank Sum Test

  • 7/27/2019 Statistics in Clinical Research for Residents

    46/74

    Use of non-parametric tests Use for categorical, ordinal or non-normally

    distributed continuous data

    May check both parametric and non-parametric tests to check for congruity

    Most non-parametric tests are based on ranksor other non- value related methods

    Interpretation: Is the P value significant?

  • 7/27/2019 Statistics in Clinical Research for Residents

    47/74

    (Pearsons) Chi-Squared (

    2) Test Used to compare observed proportions of an

    event compared to expected.

    Used with nominal data (better/ worse;dead/alive)

    If there is a substantial difference between

    observed and expected, then it is likely that

    the null hypothesis is rejected.

    Often presented graphically as a 2 X 2 Table

  • 7/27/2019 Statistics in Clinical Research for Residents

    48/74

    Chi-Squared (

    2) Test Chi-Squared ( 2) Formula

    Not applicable in small samples If fewer than 5 observations per cell, use

    Fishers exact test

  • 7/27/2019 Statistics in Clinical Research for Residents

    49/74

    CorrelationAssesses the linear relationship between two variables

    Example: height and weight

    Strength of the association is described by a correlationcoefficient- r

    r= 0 - .2 low, probably meaningless

    r = .2 - .4 low, possible importance

    r = .4 - .6 moderate correlation

    r = .6 - .8 high correlation

    r = .8 - 1 very high correlation

    Can be positive or negative

    Pearsons, Spearman correlation coefficient

    Tells nothing about causation

  • 7/27/2019 Statistics in Clinical Research for Residents

    50/74

  • 7/27/2019 Statistics in Clinical Research for Residents

    51/74

    Correlation

    Source: Harris and Taylor. Medical Statistics Made Easy

  • 7/27/2019 Statistics in Clinical Research for Residents

    52/74

    Correlation

    Perfect Correlation

    Source: Altman. Practical Statistics for Medical Research

  • 7/27/2019 Statistics in Clinical Research for Residents

    53/74

    Correlation

    Source: Altman. Practical Statistics for Medical Research

    Correlation Coefficient 0 Correlation Coefficient .3

  • 7/27/2019 Statistics in Clinical Research for Residents

    54/74

    Correlation

    Source: Altman. Practical Statistics for Medical Research

    Correlation Coefficient -.5 Correlation Coefficient .7

  • 7/27/2019 Statistics in Clinical Research for Residents

    55/74

    Regression

    Based on fitting a line to data Provides a regression coefficient, which is the slope of the line

    Y = ax + b

    Use to predict a dependent variables value based on thevalue of an independent variable.

    Very helpful- In analysis of height and weight, for a knownheight, one can predict weight.

    Much more useful than correlationAllows prediction of values of Y rather than just whether there

    is a relationship between two variable.

  • 7/27/2019 Statistics in Clinical Research for Residents

    56/74

    Regression

    Types of regression

    Linear- uses continuous data to predict continuous

    data outcome

    Logistic- uses continuous data to predict probability

    of a dichotomous outcome

    Poisson regression- time between rare events.

    Cox proportional hazards regression- survivalanalysis.

  • 7/27/2019 Statistics in Clinical Research for Residents

    57/74

    Multiple Regression Models

    Determining the association between two

    variables while controlling for the values of

    others. Example: Uterine Fibroids

    Both age and race impact the incidence of fibroids.

    Multiple regression allows one to test the impact of

    age on the incidence while controlling for race (andall other factors)

  • 7/27/2019 Statistics in Clinical Research for Residents

    58/74

    Multiple Regression Models

    In published papers, the multivariable models aremore powerful than univariable models and takeprecedence.

    Therefore we discount the univariable model as it does notcontrol for confounding variables.

    Eg: Coronary disease is potentially affected by age, HTN,smoking status, gender and many other factors.

    If assessing whether height is a factor:

    If it is significant on univariable analysis, but not on multivariable

    analysis, these other factors confounded the analysis.

  • 7/27/2019 Statistics in Clinical Research for Residents

    59/74

    Risk Ratios

    Risk is the probability that an event will happen. Number of events divided by the number of people at risk.

    Risks are compared by creating a ratio Example: risk of colon cancer in those exposed to a factor vs

    those unexposed

    Risk of colon cancer in exposed divided by the risk in thoseunexposed.

  • 7/27/2019 Statistics in Clinical Research for Residents

    60/74

    Risk Ratios

    Typically used in cohort studies Prospective observational studies comparing

    groups with various exposures.

    Allows exploration of the probability thatcertain factors are associated with outcomesof interest For example: association of smoking with lung

    cancer Usually require large and long-term studies to

    determine risks and risk ratios.

  • 7/27/2019 Statistics in Clinical Research for Residents

    61/74

    Interpreting Risk Ratios

    A risk ratio of 1 equals no increased risk

    A risk ratio of greater than 1 indicates increased risk

    A risk ratio of less than 1 indicates decreased risk

    95% confidence intervals are usually presented Must not include 1 for the estimate to be statistically significant.

    Example: Risk ratio of 3.1 (95% CI 0.97- 9.41) includes 1, thuswould not be statistically significant.

  • 7/27/2019 Statistics in Clinical Research for Residents

    62/74

    Odds Ratios

    Odds of an event occurring divided by

    the odds of the event not occurring.

    Odds are calculated by the number of timesan event happens by the number of times it

    does not happen.

    Odds of heads vs the odds of tails is 1:1 or 1.

  • 7/27/2019 Statistics in Clinical Research for Residents

    63/74

    Odds Ratios

    Are calculated from case control studies

    Case control: patients with a condition (often rare) are comparedto a group of selected controls for exposure to one or morepotential etiologic factors.

    Cannot calculate risk from these studies as that requires theobservation of the natural occurrence of an event over time inexposed and unexposed patients (prospective cohort study).

    Instead we can calculate the odds for each group.

  • 7/27/2019 Statistics in Clinical Research for Residents

    64/74

    Comparing Risk and Odds Ratios

    For rare events, ratios very similar

    If 5 of 100 people have a complication: The odds are 5/95 or .0526.

    The risk is 5/100 or .05.

    If more common events, ratios begin to differ

    If 30 of 100 people have a complication: The odds are 30/70 or .43

    The risk is 30/100 or .30

    Very common events, ratios very different

    Male versus female births The odds are .5/.5 or 1

    The risk is .5/1 or .5

  • 7/27/2019 Statistics in Clinical Research for Residents

    65/74

    Risk reduction

    Absolute risk reduction: amount that the risk isreduced.

    Relative risk reduction: proportion or percentage

    reduction. Example:

    Death rate without treatment: 10 per 1000

    Death rate with treatment: 5 per 1000

    ARR = 5 per 1000

    RRR = 50%

  • 7/27/2019 Statistics in Clinical Research for Residents

    66/74

    Survival Analysis

    Evaluation of time to an event (death,recurrence, recover).

    Provides means of handling censored data Patients who do not reach the event by the end of

    the study or who are lost to follow-up

    Most common type is Kaplan-Meier analysis Curves presented as stepwise change from

    baseline There are no fixed intervals of follow-up- survival

    proportion recalculated after each event.

  • 7/27/2019 Statistics in Clinical Research for Residents

    67/74

    Survival Analysis

    Source: Altman. Practical Statistics for Medical Research

  • 7/27/2019 Statistics in Clinical Research for Residents

    68/74

    Kaplan-Meier Curve

    Source: Wikipedia

  • 7/27/2019 Statistics in Clinical Research for Residents

    69/74

    Kaplan-Meier Analysis

    Provides a graphical means of comparing the

    outcomes of two groups that vary by intervention or

    other factor.

    Survival rates can be measured directly from curve.

    Difference between curves can be tested for statistical

    significance.

  • 7/27/2019 Statistics in Clinical Research for Residents

    70/74

    Cox Regression Model

    AKA: Proportional Hazards Survival Model.

    Used to investigate relationship between an event

    (death, recurrence) occurring over time and possible

    explanatory factors.

    Reported result: Hazard ratio (HR).

    Ratio of the hazard in one group divided the hazard in

    another.

    Interpreted same as risk ratios and odds ratios HR 1 = no effect

    HR > 1 increased risk

    HR < 1 decreased risk

  • 7/27/2019 Statistics in Clinical Research for Residents

    71/74

  • 7/27/2019 Statistics in Clinical Research for Residents

    72/74

    Maksud lu??

  • 7/27/2019 Statistics in Clinical Research for Residents

    73/74

    Summary

    Understanding basic statistical concepts is central to

    understanding the medical literature.

    Not important to understand the basis of the tests or

    the underlying math.

    Need to know when a test should be used and how to

    interpret its results

  • 7/27/2019 Statistics in Clinical Research for Residents

    74/74