introduction to statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf ·...

Post on 30-Sep-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Statistics

German Demidov

Center for Genomic Regulation, Barcelona, Spain

german.demidov@crg.eu

July 28, 2017

German Demidov (CRG) STATS July 28, 2017 1 / 14

The main pointPopulation – the entire group of objects of interestSample – part of population available for observationStatistic – function of sample

German Demidov (CRG) STATS July 28, 2017 2 / 14

Data types

Genotype (AA, Aa, aa) – nominal

Pain Scale (eg, from 1 to 5) – ordinal

Number of reads, aligned to some particular genomic regions –descrete numeric

Gene Expression Intensity – continuous numeric

German Demidov (CRG) STATS July 28, 2017 3 / 14

Measure of Central Tendency

mean

median

mode

German Demidov (CRG) STATS July 28, 2017 4 / 14

Measure of SpreadIQRvariance σ2 and standard deviation σ

German Demidov (CRG) STATS July 28, 2017 5 / 14

Law of Large Numbers

Theorem (Law of Large Numbers)

The average of the results obtained from a large number of trials should

be close to the expected value, and will tend to become closer as more

trials are performed.

German Demidov (CRG) STATS July 28, 2017 6 / 14

Central Limit Theorem

Theorem (Central Limit Theorem)

States that, given certain conditions, the arithmetic mean of a sufficiently

large number of iterates of independent random variables, each with a

well-defined expected value and well-defined variance, will be

approximately normally distributed, regardless of the underlying

distribution.

Figure: x ∈ (−3, 3)

German Demidov (CRG) STATS July 28, 2017 7 / 14

Central Limit Theorem

German Demidov (CRG) STATS July 28, 2017 8 / 14

Standard Error

If σ is known

SDx̄ =σ√n

If σ is unknown

SEx̄ = s√

n(where s

2 is an unbiased estimator for σ2) is biased by several

reasons which are not important

Intuitive Understanding

Put simply, the standard error of the sample mean is an estimate of howfar the sample mean is likely to be from the population mean, whereas thestandard deviation of the sample is the degree to which individuals withinthe sample differ from the sample mean.

German Demidov (CRG) STATS July 28, 2017 9 / 14

Standard Error

German Demidov (CRG) STATS July 28, 2017 10 / 14

Standard Error

German Demidov (CRG) STATS July 28, 2017 11 / 14

Hypothesis Testing: Life Example

German Demidov (CRG) STATS July 28, 2017 12 / 14

Hypothesis Testing

German Demidov (CRG) STATS July 28, 2017 13 / 14

Confusion MatrixWe do not prove or disprove the null hypothesis, we believe in it or no!

Table: Decision rule

H0 is True H0 is False

Fail to reject H0 Correct (1-α) Type II error (β)

Reject H0 Type I error (α) Correct (1-β)

α - False Positive Rate, β - False Negative Rate, Power=1− β.

German Demidov (CRG) STATS July 28, 2017 14 / 14

The End of Theoretical Part

German Demidov (CRG) STATS July 28, 2017 15 / 14

top related