basic statistical concepts donald e. mercante, ph.d. biostatistics school of public health l s u - h...
Post on 22-Dec-2015
216 Views
Preview:
TRANSCRIPT
Basic Statistical Concepts
Donald E. Mercante, Ph.D.
BiostatisticsSchool of Public Health
L S U - H S C
Two Broad Areas of Statistics
Descriptive Statistics- Numerical descriptors
- Graphical devices- Tabular displays
Inferential Statistics- Hypothesis testing- Confidence intervals- Model building/selection
Descriptive Statistics
When computed for a population of values, numerical descriptors are called
Parameters
When computed for a sample of values, numerical descriptors are called
Statistics
Descriptive Statistics
Two important aspects of any population
Magnitude of the responses
Spread among population members
Descriptive Statistics
Measures of Central Tendency (magnitude)
Mean - most widely used
- uses all the data- best statistical properties- susceptible to outliers
Median - does not use all the data
- resistant to outliers
Descriptive Statistics
Measures of Spread (variability)
range - simple to compute
- does not use all the data
variance - uses all the data
- best statistical properties- measures average
distance of values from a reference point
Properties of Statistics
• Unbiasedness - On target• Minimum variance - Most reliable
• If an estimator possesses both properties then it is a MINVUE = MINimum Variance Unbiased Estimator
• Sample Mean and Variance are UMVUE =Uniformly MINimum Variance Unbiased Estimator
Inferential Statistics
- Hypothesis Testing
- Interval Estimation
Hypothesis Testing
Specifying hypotheses:
H0: “null” or no effect hypothesis
H1: research or alternative hypothesis
Note: Only H0 (null) is tested.
Errors in Hypothesis Testing
Reality Decision H0 True H0 False
Fail to Reject H0
Reject H0
Hypothesis Testing
In parametric tests, actual
parameter values are specified
for H0 and H1.
H0: µ < 120
H1: µ > 120
Hypothesis Testing
Another example of explicitly
specifying H0 and H1.
H0: = 0
H1: 0
Hypothesis Testing
General framework:
• Specify null & alternative
hypotheses
• Specify test statistic
• State rejection rule (RR)
• Compute test statistic and
compare to RR
• State conclusion
Common Statistical TestsTest Name Purpose
One-sample (z) t-test Test value of a mean
Two-sample (z) t-test Compare two means
Paired t-test Compare difference in means (compare re-lated means)
ANOVA Test for differences in 2 or more means
Common Statistical Tests (cont.)Test Purpose
Test on binomial proportion(s)
Test whether binomial proportions =0, or each other.
Test on correlation coefficient(s)
Test whether correlation coefficient =0, or each other.
Regression Test whether slope = 0
RxC contingency table analysis
Test whether two categorical variables are related
Advanced Topics
Test Purpose
Multivariate Testse.g., MANOVA
Test value of severalparameters simultaneously
Repeated Measures /Crossovers
Test means when subjectsrepeatedly measured
Survival Analysis Estimate and comparesurvival probabilities forone or more groups
Nonparametric Tests Many analogous to standardparametric tests
P-Values
p = Probability of obtaining a
result at least this extreme given
the null is true.
P-values are probabilities
0 < p < 1
Computed from distribution of the
test statistic
Rate a proportion, specifically a fraction, where
The numerator, c, is included in the denominator:
-Useful for comparing groups of unequal size
Example:
Epidemiological Concepts
dcc
births live # totalold days 28deaths#
rate mortatilty neonatal
Measures of Morbidity:
Incidence Rate: # new cases occurring during a given time interval divided by population at risk at the beginning of that period.
Prevalence Rate: total # cases at a given time divided by population at risk at that time.
Epidemiological Concepts
Most people think in terms of probability (p) of an event as a natural way to quantify the chance an event will occur => 0<=p<=1
0 = event will certainly not occur
1 = event certain to occur
But there are other ways of quantifying the chances that an event will occur….
Epidemiological Concepts
Odds and Odds Ratio:
For example, O = 4 means we expect 4 times as many occurrences as non-occurrences of an event.
In gambling, we say, the odds are 5 to 2. This corresponds to the single number 5/2 = Odds.
Epidemiological Concepts
occurnot event will the times# expectedoccur event willan times# expected
eventan of Odds O
The relationship between probability & odds
Epidemiological Concepts
event no of probevent of prob
p-1p
O
O
Op
1
Epidemiological ConceptsProbability Odds
.1 .11
.2 .25
.3 .43
.4 .67
.5 1.00
.6 1.50
.7 2.33
.8 4.00
.9 9.00
Odds<1 correspond
To probabilities<0.5
0<Odds<
Blacks Nonblacks Total
Death 28 22 50
Life 45 52 97
Total 73 74 147
Death sentence by race of defendant in 147 trials
Example 1: Odds Ratio
Odds of death sentence = 50/97 = 0.52
For Blacks: O = 28/45 = 0.62
For Nonblacks: O = 22/52 = 0.42
Ratio of Black Odds to Nonblack Odds = 1.47
This is called the Odds Ratio
Example 2: Odds Ratio
47.1990
145645*2252*28
5222
4528
OR
Odds ratios are directly related to the parameters of the logit (logistic regression) model.
Logistic Regression is a statistical method that models binary (e.g., Yes/No; T/F; Success/Failure) data as a function of one or more explanatory variables.
We would like a model that predicts the probability of a success, ie, P(Y=1) using a linear function.
Logistic Regression
Problem: Probabilities are bounded by 0 and 1.
But linear functions are inherently unbounded.
Solution: Transform P(Y=1) = p to an odds. If we take the log of the odds the lower bound is also removed.
Setting this result equal to a linear function of the explanatory variables gives us the logit model.
Logistic Regression
Logit or Logistic Regression Model
Where pi is the probability that yi = 1.
The expression on the left is called the logit or log odds.
Logistic Regression
ikkiii
i XXXp
p
22111log
Probability of success:
Odds Ratio for Each Explanatory Variable:
Logistic Regression
ikkii XXXi e
YPp 22111
11
ieOR iXfor
Suppose a new screening test for herpes virus has been developed and the following summary for 1000 individuals has been compiled:
Has Herpes
Does Not
Have Herpes
Screened Positive 45 10
Screened Negative 5 940
Screening Tests
How do we evaluate the usefulness of such a test?
Diagnostics:
sensitivity
specificity
False positive rate
False negative rate
predictive value positive
predictive value negative
Screening Tests
Screening Tests
Generic Screening Test Table
With Disease
Without Disease
Total
Screened Positive
a b a+b
Screened Negative
c d c+d
Total a+c b+d N
Screening Tests
caa
ySensitivit
dbd
ySpecificit
dbb
ratepositiveFalse
cac
ratenegativeFalse
ba
avaluepredictiveorYield
Nca
prevalence
dc
dvaluepredictiveorYield
Screening Tests
%9050
45ySensitivit %95.98
950
940ySpecificit
%05.1950
10rate positive False %10
50
5 ratenegativeFalse
%82.8155
45 valuepredictiveorYield
%51000
50prevalence
%47.99950
940 valuepredictiveorYield
Interval Estimation
Statistics such as the sample mean, median, variance, etc., are called
point estimates-vary from sample to
sample-do not incorporate
precision
Interval Estimation
Take as an example the sample mean:
X ——————> (popn mean)
Or the sample variance:
S2 ——————> 2
(popn variance)
Estimates
Interval Estimation
Recall Example 1, a one-sample t-test on the population mean. The test statistic was
This can be rewritten to yield:
nsx
t 0
Interval Estimation
1
210
21t
nsx
tP
Which can be rearranged to give a(1-)100% Confidence Interval for :
nstx
n 1 ,21
Form: Estimate ± Multiple of Std Error of the Est.
Interval Estimation
Example 1: Standing SBP
Mean = 140.8, s.d. = 9.5, N = 12
95% CI for :140.8 ± 2.201 (9.5/sqrt(12))
140.8 ± 6.036(134.8, 146.8)
top related