introductory statistics for laboratorians dealing with high throughput data sets
DESCRIPTION
Introductory Statistics for Laboratorians dealing with High Throughput Data sets. Centers for Disease Control. Statistics and Order. Random vs. Accidental Snowflakes Quincunx’ http://www.stattucino.com/berrie/dsl/Galton.html http://www.jcu.edu/math/ISEP/Quincunx/Quincunx.html - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/1.jpg)
Introductory Statistics for Laboratorians dealing with High
Throughput Data sets
Centers for Disease Control
![Page 2: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/2.jpg)
Statistics and Order
• Random vs. Accidental• Snowflakes
• Quincunx’ – http://www.stattucino.com/berrie/dsl/Galton.html– http://www.jcu.edu/math/ISEP/Quincunx/Quincunx.html
• Statistical Determinism
![Page 3: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/3.jpg)
Hypothesis Testing
• Statistical methods are used to test scientific hypotheses.
• You already understand this logic, you do this every day.
• Statistical methods simply provide a way to put numbers to your logic.
• That is, compute the chances that you are wrong or right.
![Page 4: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/4.jpg)
Harry and Sue
• This is the story of Harry Heartthrob and his Girlfriend Sue Sweetheart. Harry would like to know that everything is fine between him and Sue.
• Begin by stating two hypotheses: – H0: Things are fine in Harry’s love life.
– Ha: Harry has romantic problems.
• We assume that H0: is true – that things are fine. • This leads us to expect to see certain things when
we observe nature/reality.
![Page 5: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/5.jpg)
Hypotheses
• H0: is the “Null Hypotheis.” It states that nothing is going on.
• Ha: is the “Alternative Hypothesis.” It states that something is going on.
• The null and alternative hypotheses must be mutually exclusive and exhaustive.
• That is: they can’t both be true and they can’t both be false.
![Page 6: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/6.jpg)
Harry and Sue
• Believing that H0: is true (as we assume) leads to certain expectations about what will be observed in reality.
• Specifically Harry will expect there to be “no signs of men” in Sue’s apartment.
• If Harry observes things in her apartment that differ from his expectations he will begin to doubt the truth of H0:
![Page 7: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/7.jpg)
H0: Assume All is OK
Expect no Signs of Men in Her Apartment
Harry’s Love Life
Ha: Harry has Romantic Problems
![Page 8: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/8.jpg)
Proof
• Can you ever prove that H0: is true? NO!!! • What is the strongest evidence that H0: is true?– No indication of men in the apartment!
• Could this happen if H0: is false? YES!! • The strongest evidence for H0: is still weak• We “assume” H0: is true if the evidence it is
false is weak.
![Page 9: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/9.jpg)
Proof
• Can Harry prove that H0: is false? YES!!!• There can be things that are so improbable if H0:
is true that when you observe those things, (XXX) for example, you know H0: is false.
• There can be strong evidence that H0: is false.
• When we see that strong evidence we reject H0:
• Then we say “we conclude Ha: is true” or “we have proved Ha: “
![Page 10: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/10.jpg)
H0: Assume All is OK
Expect no Signs of Men in Her Apartment
Statistically Speaking
Ha: Harry has Romantic Problems
98%
20%
2%
.05%
Reject H0: if the probability of the observed event is small enough if H0: is assumed to be true
![Page 11: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/11.jpg)
H0: Assume All is OK
Expect no Signs of Men in Her Apartment
Statistical Error
Ha: Harry has Romantic Problems
98%
20%
2%
.05%
Type I Error: Reject H0: when it is true
Type II Error: Continue to believe H0: when it is false
Two Types of Error
Type I: too jealous
Type II: too trusting
![Page 12: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/12.jpg)
Statistical Error
Truth about Sue Sweetheart
H0 TrueShe is actually not cheating
H0 False She is cheating her A off.
Harry’s Decision Basedon Observed Data
Reject H0 : Concludes she is Cheating
Type IError (Alpha)Too jealous
CorrectDecision
Fail ToReject H0 : Concludes everything is OK
CorrectDecision
Type IIError (Beta)Too trusting
![Page 13: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/13.jpg)
Quality ControlHave the samples been watered down?
• There is a severe shortage of flu vaccine in the USA this season. However, Canada has a large surplus and they are willing to sell it to us.
• We are a little paranoid and wonder if the reason they have extra is because they watered it down.
• We make a surprise visit to their warehouse and request a sample of the vaccine for evaluation purposes before we commit to purchase the lot.
• They allow us to select 70 vials at random from the whole lot for testing.
![Page 14: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/14.jpg)
Quality Control Example
• All flu vaccine is made to standard specifications. It is all supposed to be 16 m/dl with a standard deviation of 0.4. (That’s what it’s supposed to be if it is not watered down).
• We measure the 70 vials from Canada and get a mean of 15.8 m/dl.
• Is the Canadian surplus watered down?
![Page 15: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/15.jpg)
Step 1: State H0 and Ha
• H0: This sample of 70 vials (with a mean of 15.8) comes from a population with a mean of 16 and a standard deviation of 0.4.– (Everything is fine.)
• Ha: This sample of 70 vials could not have been drawn from a population with a mean of 16 and a standard deviation of 0.4. – (There is a problem.)
![Page 16: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/16.jpg)
Step 2: Select a Region of Rejection
• If the probability of the null hypothesis being true is less than 5 chances in 100 (.05) we will reject it.
• Alpha = .05
![Page 17: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/17.jpg)
Step 3: Make Observations
• Conduct the experiment – make surprise trip to the Canadian warehouse, select vials at random, test each vial.
• Compute the mean for the 70 vials. – mean = 15.8.
![Page 18: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/18.jpg)
Step 4: Test the Null Hypothesis
• What does the Central Limit Theorem tells us about the distribution of means of samples of size N = 70 from a population with a mean of 16 and standard deviation of 0.4.
• Central Limit Theorem says: – Mean of the means of all possible samples should
be 16– Standard Error (Standard Deviation) of the means
is 0.4/sqrt(70) = .048
![Page 19: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/19.jpg)
Step 4: Test the Null Hypothesis
• Use http://davidmlane.com/hyperstat/z_table.html to compute the probability that a mean would be 15.8 or greater if the Sampling Distribution of the Mean has a mean of 16 and a standard deviation of .048.
• 15.8 is 4.17 standard errors (standard deviations) below the mean of the population (16).
• Z = -4.17
![Page 20: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/20.jpg)
Step 4: Test the Null Hypothesis
• The probability of the mean of a sample of size 70 being 15.8 or less is .000015 (15 chances in 1,000,000).
• This is in the region of rejection
• Reject H0 -- There is a problem. This stuff has been watered down.
![Page 21: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/21.jpg)
Region of Rejection for a Sample of size N = 70 from a Population with mean 16 and standard deviation of 0.4
• The region of rejection is anything below 15.921.
• 15.921 cuts of .05 of the distribution.
• There are only 5 chances in 100 of a mean being less than 15.921
• Our mean was 15.8
![Page 22: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/22.jpg)
Types of ErrorTruth about Population from which sample came
H0 True H0 False
Decision Basedon Sample
Reject H0 Type IError (Alpha)
CorrectDecision
Fail ToReject H0
CorrectDecision
Type IIError (Beta)
![Page 23: Introductory Statistics for Laboratorians dealing with High Throughput Data sets](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681671f550346895ddba1b8/html5/thumbnails/23.jpg)
Error in Diagnostic TestingTruth about person
H0 True Really Don’t Have Disease
H0 False Really Do Have Disease
What Diagnostic Test Tells Us
Positive (Says they have disease)
Type I Error (Alpha)False Positive
True Positive
Negative (Says they don’t have disease)
True NegativeType II Error (Beta)False Negative
FNTPTPySensitivit
FPTN
TNySpecificit