introduction to statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf ·...

15
Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona, Spain [email protected] July 28, 2017 German Demidov (CRG) STATS July 28, 2017 1 / 14

Upload: others

Post on 30-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Introduction to Statistics

German Demidov

Center for Genomic Regulation, Barcelona, Spain

[email protected]

July 28, 2017

German Demidov (CRG) STATS July 28, 2017 1 / 14

Page 2: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

The main pointPopulation – the entire group of objects of interestSample – part of population available for observationStatistic – function of sample

German Demidov (CRG) STATS July 28, 2017 2 / 14

Page 3: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Data types

Genotype (AA, Aa, aa) – nominal

Pain Scale (eg, from 1 to 5) – ordinal

Number of reads, aligned to some particular genomic regions –descrete numeric

Gene Expression Intensity – continuous numeric

German Demidov (CRG) STATS July 28, 2017 3 / 14

Page 4: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Measure of Central Tendency

mean

median

mode

German Demidov (CRG) STATS July 28, 2017 4 / 14

Page 5: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Measure of SpreadIQRvariance σ2 and standard deviation σ

German Demidov (CRG) STATS July 28, 2017 5 / 14

Page 6: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Law of Large Numbers

Theorem (Law of Large Numbers)

The average of the results obtained from a large number of trials should

be close to the expected value, and will tend to become closer as more

trials are performed.

German Demidov (CRG) STATS July 28, 2017 6 / 14

Page 7: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Central Limit Theorem

Theorem (Central Limit Theorem)

States that, given certain conditions, the arithmetic mean of a sufficiently

large number of iterates of independent random variables, each with a

well-defined expected value and well-defined variance, will be

approximately normally distributed, regardless of the underlying

distribution.

Figure: x ∈ (−3, 3)

German Demidov (CRG) STATS July 28, 2017 7 / 14

Page 8: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Central Limit Theorem

German Demidov (CRG) STATS July 28, 2017 8 / 14

Page 9: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Standard Error

If σ is known

SDx̄ =σ√n

If σ is unknown

SEx̄ = s√

n(where s

2 is an unbiased estimator for σ2) is biased by several

reasons which are not important

Intuitive Understanding

Put simply, the standard error of the sample mean is an estimate of howfar the sample mean is likely to be from the population mean, whereas thestandard deviation of the sample is the degree to which individuals withinthe sample differ from the sample mean.

German Demidov (CRG) STATS July 28, 2017 9 / 14

Page 10: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Standard Error

German Demidov (CRG) STATS July 28, 2017 10 / 14

Page 11: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Standard Error

German Demidov (CRG) STATS July 28, 2017 11 / 14

Page 12: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Hypothesis Testing: Life Example

German Demidov (CRG) STATS July 28, 2017 12 / 14

Page 13: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Hypothesis Testing

German Demidov (CRG) STATS July 28, 2017 13 / 14

Page 14: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

Confusion MatrixWe do not prove or disprove the null hypothesis, we believe in it or no!

Table: Decision rule

H0 is True H0 is False

Fail to reject H0 Correct (1-α) Type II error (β)

Reject H0 Type I error (α) Correct (1-β)

α - False Positive Rate, β - False Negative Rate, Power=1− β.

German Demidov (CRG) STATS July 28, 2017 14 / 14

Page 15: Introduction to Statisticsbioinformaticsinstitute.ru/sites/default/files/introtostatlecture.pdf · Introduction to Statistics German Demidov Center for Genomic Regulation, Barcelona,

The End of Theoretical Part

German Demidov (CRG) STATS July 28, 2017 15 / 14