Download - 8[1].Basic Stat Inference
-
8/12/2019 8[1].Basic Stat Inference
1/41
Basics of
Statistical Inference
V. Sreenivas
-
8/12/2019 8[1].Basic Stat Inference
2/41
Basics of Statistical Inference
ImportancePrimary entities in statistical inference
Types of statisticalinference
EstimationconsiderationsMeasure of accuracy of estimates
General framework of testing
Types of errors in statistical testing
Form of a Statistical test
Interpretation of a test result
-
8/12/2019 8[1].Basic Stat Inference
3/41
Basics of Statistical Inference
In all clinical/epidemiological studies,information collected represents only asample from the target population
Drawing conclusions about thepopulation depends on statistical analysisof data
So the basis of statistical inference isimportant to understand & interpret theresults from the epidemiological studies
-
8/12/2019 8[1].Basic Stat Inference
4/41
Basics of Statistical Inference
3 primary entities
The target population
Set of characteristics orvariables
Probability distribution of thecharacteristics
-
8/12/2019 8[1].Basic Stat Inference
5/41
Basics of Statistical Inference
Population
Collection of units of observation that are of
interest & is the target of the investigation
Eg. In studying the prevalence of
osteoporosis in women of a city, all the
women in that city would form the target
population
Essential to identify the population clearly &
precisely
-
8/12/2019 8[1].Basic Stat Inference
6/41
Basics of Statistical Inference
Variables
Once population is identified, clearly define whatcharacteristics of the units of this population are tobe studied
In the above example, we need to define:
Osteoporosis (reliable & valid method of diagnosis:DEXA/ Ultrasound, normal values of BMD etc.)
Clear & precise methods of measuring thesecharacteristics are essential for the success of thestudy
-
8/12/2019 8[1].Basic Stat Inference
7/41
Basics of Statistical Inference
Variables
Qualitative: Take a few possible values (Eg.
Sex, Disease status)Quantitative: Can theoretically take any
value within a specified range (Eg.
Blood sugar, Syst. BP)
Type of analysis depends on the type of the
variable
-
8/12/2019 8[1].Basic Stat Inference
8/41
Basics of Statistical Inference
Probability distribution
Most crucial link between population & its
characteristics
Allows to draw inferences on the population,
based on a sample
Tells what different values can a variable take
How frequently each value can occur in the
population
-
8/12/2019 8[1].Basic Stat Inference
9/41
Basics of Statistical Inference
Probability distribution
Common distributions in health research areBinomial, Poisson & Normal
Eg. Incidence of a relatively common diseasemay be approximated by Binomial distribution
Incidence of a rare disease can be considered
to have a Poisson distribution
Continuous variables are often considered tobe Normally distributed
-
8/12/2019 8[1].Basic Stat Inference
10/41
Probability distribution
Prob. Distribution is characterized by certain quantities
called parametersThese quantities allow us to calculate the probabilities of
various events concerning the variable
Eg. Binomial dist. has 2 parameters nand p.
This distribution occurs when a fixed number (n) of
subjects is observed, the characteristic is dichotomous innature and each subject has the same probability (p) of
having one value and (1-p) of having the other value
The statistical inference then involves finding out the value
of pin the population, based on a carefully selected sample
-
8/12/2019 8[1].Basic Stat Inference
11/41
Binomial distribution
for n = 10 & p = 0.5
0.25
0.20
0.15
0.10
0.05
0.000 1 2 3 4 5 6 7 8 9 10
Number of successes
-
8/12/2019 8[1].Basic Stat Inference
12/41
Probability distribution
Eg. The Normal distribution is a mathematical curve
represented by two quantities (, )mean andstandard deviation respectively.
Most quantitative characteristics follow this
Symmetric, Bell shaped curve
One half is the mirror image of the other
Mean, median & mode are same and are at center
Mean 1SD covers 68% data, 2SD 95%, 3SD 99%
-
8/12/2019 8[1].Basic Stat Inference
13/41
X
0Z
68.6%
X-1SD1Z
X-2SD
1.96Z
X+2SD
1.96Z
X+1SD
1Z
95.0% area
X+2.58SD
2.58Z
X-2.58SD
2.58Z
99.0% area under the curve
Empirical properties of a Normal Deviate
X: Variable in original units Z: Standardized variable
-
8/12/2019 8[1].Basic Stat Inference
14/41
Statistical inference
Estimation
We estimate somecharacteristic of the
population, based
on a sample
Testing
We test some
hypotheses about
the population
parameters
-
8/12/2019 8[1].Basic Stat Inference
15/41
Descriptive Studies
In these, generally the objective is:To estimate the values of the parameters of
the Prob. dist., based on the sampled
observations
Best guess of the value in the population
and a measure of accuracy of this estimateare obtained
-
8/12/2019 8[1].Basic Stat Inference
16/41
Estimation
Best guesses
Population mean : Mean of samplePopulation proportion: Sample proportion
Considerations:
Consistency: As the sample size increases, theestimates approach their target values
Unbiased: The average value of the estimatedparameter over a large number of repeated samples ofsame size will be equal to the population value
Maximum likelihood: That value of the parameterwhich maximizes the probability of observing a
sample that has been observed
-
8/12/2019 8[1].Basic Stat Inference
17/41
Accuracy of estimates
When an estimate (E) of a parameter is obtained,
we need to know how this value (E) wouldchange if another sample is studied
The distribution of values of E over different
repeated samples (under identical conditions) isknown as the sampling distribution of E
This sampling distribution can be determined
empirically or purely based sampling theory
The standard deviation of the estimate E is calledthe Standard Error (SE)
-
8/12/2019 8[1].Basic Stat Inference
18/41
Accuracy of Estimates
Once the sampling distribution of theestimate is known, it can be answered
How close is my estimate likely to be
the true value of the parameter
Can state with certain confidence thatthe true value will be withincertaininterval (Confidence Interval)
-
8/12/2019 8[1].Basic Stat Inference
19/41
Confidence Interval
The more the confidence required, morethe width, for a given sample size
Intuitively, more the information wehave (larger sample), the smaller the
width of the interval (the more certain
we are about the result)
-
8/12/2019 8[1].Basic Stat Inference
20/41
Estimation of parameters from a Normalpopulation - Example
The average Bone Mineral Density (BMD) of 150elder women (60+ years) is 0.678gm/cm2with a SDof 0.12 gm/cm2, what is the 95% C.I of the meanBMD?
Sample size (n) = 150 Mean = 0.678 SD = 0.12
It has been shown that Mean will have a Normal
distribution with Mean as mean itself and theStandard Error as /SE n
0.678 / 150 0.055SE
-
8/12/2019 8[1].Basic Stat Inference
21/41
Confidence Interval for
BMD in elderly women
We have the Mean = 0.678, SE = 0.055; andwe also know that Mean follows a Normal
Distribution
Using the Normal distribution properties, we know
that Mean 1.96 SE covers 95% of values
0.678 (1.96*0.055) = (0.570 0.786) covers
95% of results if we repeat the study
This interval is called the 95% CI for mean BMD
-
8/12/2019 8[1].Basic Stat Inference
22/41
Interpretation of CI
Mean BMD = 0.678 and 95% CI: (0.570 0.786)
If we repeat the study 100 times, 95% times we geta mean BMD between a 0.57 and 0.79 gm/cm2
Another interpretation is:
There is 95% chance that these two limits cover thetrue, unknown, but fixed value of BMD in the
elderly women
There is 95% chance that the truth is somewhere in
this interval
Do not get the impression that truth varies from
0.57 to 0.79
-
8/12/2019 8[1].Basic Stat Inference
23/41
Interpretation of CIMean BMD = 0.678 and
95% CI: (0.570 0.786)
The narrower the interval, the more confidentwe are of the result
Alternatively, the wider this interval, the less
certain we are about the result
-
8/12/2019 8[1].Basic Stat Inference
24/41
Analytical Studies
Involve testing of hypothesisStudy will have formulated research questions(hypotheses)
Eg. Is treatment A is superior to treatment B ?Based on the observations from the sample, weneed to draw conclusions
Inference is a 2 step process:
- Estimate the parameters
- Test the hypotheses involving these parameters
-
8/12/2019 8[1].Basic Stat Inference
25/41
Statistical Tests of Hypotheses
Step 1: Identify the Null Hypothesis (H0)
- No additional effect of the new treatment;
- No difference in prevalence rates;
- Relative risk is one etc.
It should be testable
- Possible to identify which parameters
need to be estimated and their sampling
distribution, given the study design
-
8/12/2019 8[1].Basic Stat Inference
26/41
Statistical tests of Hypotheses
Null hypothesis: cure rate p1= p2
Alternative hypotheses to the null
hypothesis:The cure rates are different
(P1p2 two-tailed alternative)
The cure rate in new method is more
(p2 > p1 one-tailed alternative)
-
8/12/2019 8[1].Basic Stat Inference
27/41
Step 2: Determine the levels of errorsthat can be acceptable
Decision Truth in the populationH0is true H0is false
Accept H0 No error Type II Error ()
Reject H0 Type I error () No error
Analogous with a laboratory test
: False Positivity: False Negativity
1: Sensitivity (Power of a test)
St 2 D t i th l l f
-
8/12/2019 8[1].Basic Stat Inference
28/41
Step 2: Determine the levels of errorsthat can be acceptable
Decision
Truth in the population
H0is true H0is false
Accept H0 No error Type II Error ()
Reject H0 Type I error () No error
Impossible to reduce both the errors simultaneously
One decreases when the other increases
Design the study with a desired level of and
minimize the
choice of & is made after determining the
consequences of each of the errors and is made
at the design stage itself
-
8/12/2019 8[1].Basic Stat Inference
29/41
Step 3: Determine the best Statistical test for
the stated Null hypothesis
Depends on:
Study design (Cross over or Independent
groups, Paired or Unpaired observations etc.)
Type of variable (Qualitative / Quantitative)
The properties of the study variable
(Binomial/Normal distribution, Standard
Error of the estimate etc.)
-
8/12/2019 8[1].Basic Stat Inference
30/41
Step 3: Determining the best test
Common tests of significancet TEST
Chi-square (2
) testZ test
Non-parametric testsInvolves calculating a critical ratio
that helps to make a decision
-
8/12/2019 8[1].Basic Stat Inference
31/41
Tests of significance
ParameterCritical Ratio = ----------------------------------(Test Statistic) SE of that parameter
If we are comparing two proportions:
Diff. between
the two proportionsCritical Ratio = Z = ---------------------------------
SE of the differencebetween the two
proportions
-
8/12/2019 8[1].Basic Stat Inference
32/41
Step 4: Perform the Statistical Test
- Calculate the test statistic (Z / 2/ t etc)
- Using the properties of the distribution of the test
Statistic, obtain the probability of
observing such an estimate of the Statistic
- This probability is the probability of getting the
observed value of the test statistic if the Null
hypothesis is true- If this is small, Null hypothesis is an unlikely
explanation for the resultsReject the Null
hypothesis (Significant result). If not
-
8/12/2019 8[1].Basic Stat Inference
33/41
- tn-1, 1-/2 0 tn-1, 1-/2
Acceptance region
|t|< tn-1, 1- /2
Rejection region
t< -tn-1, 1- /2Rejection region
t> tn-1, 1- /2
Acceptance & Rejection regions for a paired ttest
-
8/12/2019 8[1].Basic Stat Inference
34/41
Step 5: If the Null hypothesis is not rejected
at the given level of significance, the
statistical power of the test (1-) should be
computed
Recall that is an error of accepting H0,when it is false. So 1- will be prob. of
rejecting H0, when it is false. If this
quantity is low, we recommend that thestudy be repeated with a larger sample
-
8/12/2019 8[1].Basic Stat Inference
35/41
Statistical test oh HypothesisAn example
We wish to compare the BMD of Indian elderly
women with Caucasian elderly women
We hypothesize that Indian women will have lower
BMD
Our Null hypothesis: both groups will have equal
BMD level
Our alternative hypothesis is: both groups BMD
will be unequal
We collected data on 150 Indian women and data
on Caucasian women is available from literature
-
8/12/2019 8[1].Basic Stat Inference
36/41
Statistical test oh HypothesisAn example
Indian data:
Caucasian data:
Since the sample sizes are large, we can apply a
test called Z test and Z statistic is calculated as:
Calculations give us a Z value:
0.176/0.0117 = 15.04
1 1 1150 0.678 0.12n x S
2 2 2300 0.854 0.11n x S
1 2
2 2
1 2
1 2
|x xZ
S S
n n
-
8/12/2019 8[1].Basic Stat Inference
37/41
Statistical test oh HypothesisAn example
From the data, we have Z = 15.04
We know that Z follows a Normal distribution
Using the properties of Normal distribution we
realize that the probability of observing this much
value of Z or more extreme in either directionis < 0.0000001 or < one in a million
In other words, if our Null hypothesis is correct, our
chance of finding a Z = 15.04 is so small
We suspect the Null hypothesis and Reject it and
conclude that both groups have statistically different
BMD levels
-
8/12/2019 8[1].Basic Stat Inference
38/41
Summary
3 entities viz. Population, Variables &Probability distribution of the variables are
important in Statistical Inference
Estimation & Testing are 2 components ofStatistical Inference
Descriptive studies generally deal with
estimation & Analytical studies deal with
testing of hypotheses
-
8/12/2019 8[1].Basic Stat Inference
39/41
Summary contd.
Estimation is followed by a measure of accuracy
Confidence Interval
2 types of errors can be committed in statisticaltesting
Type I error is nothing but the usual pvalue (FalsePositivity)
The compliment of type II error (False negativity)is called the Power of the test (Sensitivity)
Test estimator is generally of the form of a ratio of
2 quantities
-
8/12/2019 8[1].Basic Stat Inference
40/41
Summary contd.
The calculated Ratio under given
circumstances follows a known pattern called
its distribution
Using this distribution, we can know theprobability of observing a Ratio of the
magnitude that is observed, by chance alone
If this chance probability is low, chance is
unlikely to explain the observed result and we
Reject the null hypothesis
-
8/12/2019 8[1].Basic Stat Inference
41/41
Summary Contd..
If the Null hypothesis is rejected, we attribute the
observed difference to the exposure under
consideration
If the null hypothesis is not rejected (accepted), we
should be sure that our data is sensitive enough to
believe the negative result (statistical power
should be calculated)