introduction to statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf ·...
TRANSCRIPT
Introduction to Statistics
German Demidov
Center for Genomic Regulation, Barcelona, Spain
July 28, 2017
German Demidov (CRG) STATS July 28, 2017 1 / 14
The main pointPopulation – the entire group of objects of interestSample – part of population available for observationStatistic – function of sample
German Demidov (CRG) STATS July 28, 2017 2 / 14
Data types
Genotype (AA, Aa, aa) – nominal
Pain Scale (eg, from 1 to 5) – ordinal
Number of reads, aligned to some particular genomic regions –descrete numeric
Gene Expression Intensity – continuous numeric
German Demidov (CRG) STATS July 28, 2017 3 / 14
Measure of Central Tendency
mean
median
mode
German Demidov (CRG) STATS July 28, 2017 4 / 14
Measure of SpreadIQRvariance σ2 and standard deviation σ
German Demidov (CRG) STATS July 28, 2017 5 / 14
Law of Large Numbers
Theorem (Law of Large Numbers)
The average of the results obtained from a large number of trials should
be close to the expected value, and will tend to become closer as more
trials are performed.
German Demidov (CRG) STATS July 28, 2017 6 / 14
Central Limit Theorem
Theorem (Central Limit Theorem)
States that, given certain conditions, the arithmetic mean of a sufficiently
large number of iterates of independent random variables, each with a
well-defined expected value and well-defined variance, will be
approximately normally distributed, regardless of the underlying
distribution.
Figure: x ∈ (−3, 3)
German Demidov (CRG) STATS July 28, 2017 7 / 14
Central Limit Theorem
German Demidov (CRG) STATS July 28, 2017 8 / 14
Standard Error
If σ is known
SDx̄ =σ√n
If σ is unknown
SEx̄ = s√
n(where s
2 is an unbiased estimator for σ2) is biased by several
reasons which are not important
Intuitive Understanding
Put simply, the standard error of the sample mean is an estimate of howfar the sample mean is likely to be from the population mean, whereas thestandard deviation of the sample is the degree to which individuals withinthe sample differ from the sample mean.
German Demidov (CRG) STATS July 28, 2017 9 / 14
Standard Error
German Demidov (CRG) STATS July 28, 2017 10 / 14
Standard Error
German Demidov (CRG) STATS July 28, 2017 11 / 14
Hypothesis Testing: Life Example
German Demidov (CRG) STATS July 28, 2017 12 / 14
Hypothesis Testing
German Demidov (CRG) STATS July 28, 2017 13 / 14
Confusion MatrixWe do not prove or disprove the null hypothesis, we believe in it or no!
Table: Decision rule
H0 is True H0 is False
Fail to reject H0 Correct (1-α) Type II error (β)
Reject H0 Type I error (α) Correct (1-β)
α - False Positive Rate, β - False Negative Rate, Power=1− β.
German Demidov (CRG) STATS July 28, 2017 14 / 14
The End of Theoretical Part
German Demidov (CRG) STATS July 28, 2017 15 / 14