bioe 439/539: applied statistics for biotechnology and ...nablab.rice.edu/bioe439/lecture1.pdf ·...
TRANSCRIPT
BioE 439/539: Applied Statistics for Biotechnology and Bioengineering
Lecture 1: Matlab Refresher and Probability
Instructor: Dave Zhang
http://nablab.rice.edu/bioe439
Course Grading Structure:
Lectures: 0%Attendance not mandatory, come only if you want to.
Problem Sets: 0% or 20%10 problem sets in total; they either all count or they all don’t.
Midterm: 40% or 32%Take-home exam. Will include Matlab programming questions.
Final: 60% or 48%Take-home exam. Will include Matlab programming questions.
Exams should take roughly 3 hours if you’ve mastered the materials, but you can take as long as you like. Open resources (e.g. Internet), but not open people.
BioE 539:Additional problems on both problem sets and exams.
(But bring Matlab-installed laptop if you do come.)
Course Details:
TA: Guanyi Xie ([email protected])
If you added the class late, please email Dave and TA to make sure you’re on the class email list.
Dave’s email address: [email protected]
Email TA first about course material questions; email Dave only if you have questions that Guanyi doesn’t know
Statistics... Why should you care?
“There are three kinds of lies: lies, damned lies, and statistics.” - Samuel Clemens (a.k.a. Mark Twain)
Use statistics to lie without legal consequences.Interpretation 1:
Understand statistics to see through mis-truths.Interpretation 2:
Statistic = a number summary of a pile of data
Matlab Refresher
• a = [1 50 100]• b = 1:100• c = 1:5:100• d = 20 * ones(1,100)• e = linspace(1,100, 12)• f = [1:20, 40:60]• g = [e, f]
Vectors (a.k.a. lists) - Creating them
Matlab RefresherVectors (a.k.a. lists) - Creating them
Problem 1: a = [4 7 10 13 16 19 21 22 23 24 25 26]
A1: a = [4:3:19, 21:26]
Problem 2: a = [20 18 16 14 12 100 80 60 40 20 0]
A2: a = [20:-2:12, 100:-20:0]
Matlab RefresherVectors (a.k.a. lists) - Manipulating them
a2 = 10*(1:10)
a2(7)
a2(10) = 0; a2
a2 = a2 / 50
a2 = floor(a2)
Matlab RefresherVectors (a.k.a. lists) - Manipulating them
a3 = [1 2 3] + [1 50 100]
b3 = [1 2 3].*[1 50 100]
c3 = (a3 > 50)
d3 = a3(a3 > 50)
e3 = a3(a3 ~= 50)
f3 = max(a3, b3)
g3 = fliplr(a3)
h3 = sort([1 20 10], ‘descend’)
Matlab Refresher
Problem 3: a = [2 6 12 20 30 42 56 72 90 110]
A3: a = [1:10] .* [2:11]
Vectors (a.k.a. lists) - Manipulating them
Problem 4: ref = [3 1 4 1 5 9 2 6 5 3 5 8 9 7]; a = [3 4 5 9 6 5 3 5 8 9 7] (1’s and 2’s removed)
A4: a = ref(ref > 2)
Problem 5: ref = [3 1 4 1 5 9 2 6 5 3 5 8 9 7]; a = [3 0 4 0 5 9 0 6 5 3 5 8 9 7] (1’s and 2’s replaced by 0’s)
A5: a = (ref > 2).*ref
ProbabilityProbability describes the uncertainty of an outcome
Rolling a die, there are 6 possible results.
Taking a card from a poker deck, 52 possible outcomes.
The probabilities of complex outcomes can be broken down into probabilities of simple outcomes
Pr(rolled die is odd) = Pr(X=1 OR X=3 OR X=5)
Assuming it’s a fair die, Pr(X=1) = 1/6
Pr(2 dice sum to 7) = Pr(X=1,Y=6) + Pr(X=2,Y=5) + Pr(X=3,Y=4)
= Pr(X=1) + Pr(X=3) + Pr(X=5)
+ Pr(X=4,Y=3) + Pr(X=5,Y=2) + Pr(X=6,Y=1)
Pr(become rich and famous) = Pr(become rich) * Pr(become famous)(assuming they’re independent)
Flavor 1: Discrete
Flavor 2: Continuous
Rolling a (fair) 6-sided die once and getting a 1.
Probability that a random potato chip weights between 0.2 and 0.4 ounces.
Flavor 3: Pseudo-continuous
Grabbing a random person in the world, and having his/her height be between 5’7” and 5’9”.
Probability... comes in 3 flavors
Statistics generally deals with pseudo-continuous probability
Intersections and Unions
Throwing Darts
Event 1
1 = win a teddy bear
Event 2
2 = win another chance to play
(1 ∩ 2)Intersection, get both!
(1 ∪ 2)Union, got something.
Event 1 Event 2
More complex dartboards
Event 1 Event 2
(1 ∩ 2)
Intersections and unions calculated the same
Pr(A ∪ B) = Pr(A) + Pr(B) - Pr(A ∩ B)
Principle of Inclusion-Exclusion
Example 1: Taking a card from a poker deck and getting a King or a Spade
Pr(K ∪ S) = Pr(K) + Pr(S) - Pr(K ∩ S)= 4/52 + 13/52 - 1/52 = 16/52
Example 2: Calling on a random student and getting a junior or a girl
Pr(junior ∪ female) = Pr(junior) + Pr(female) - Pr(junior ∩ female)
Events can be mutually exclusive or independent
Event 1: Rolling die #1 and getting a 1
Event 2a: Rolling die #1 and getting a 2
Event 2b: Rolling die #2 and getting a 2
The probability of two mutually exclusive events both occurring (∩) is zero.
= P( (D1=5) ∩ (D1=6) ) = 0P(rolling one die and getting a 5 AND a 6)
The probability of two independent events both occurring (∩) is their product
= P( (D1=6) ∩ (D2=6) ) = P(D1=6) * P(D2=6) = 1/36
P(rolling two fair dice and getting 6, 6)
mutually exclusive
Event 1 Event 2 Event 1 Event 2
independent
A conditional probability P(A | B) is the probability that event A occurs, given that event B occurs.
P(picking a girl | picking a random student in BioE 439)
P(A | B) = P (A ∩ B) / P (B)
A man shows you two coins, one is normal and the other has two heads. He shuffles the coins out of sight, picks one and flips it, and a head shows. What’s the probability the other side of the coin is also a head?
Event 1 Event 2 You get a prize for hitting Event 1.
... but you’re cheating. You stuck a super-strong magnet behind Event 2, guaranteeing that you hit Event 2.
Probability Practice Problems:One finds that in a population of 100,000 females, 75% can expect to live to age 60, 63% can expect to live to age 80, and 28% can expect to live to age 100.
3. A pair of female twins are currently 80 years old. What is the probability that exactly one of them survive to 100?
2. Given that a woman died before age 100, what is the probability that she died before age 60?
1. Given that a woman is currently 60, what is the probability that she lives to age 80?
A: Pr (age ≥ 80 | age ≥ 60) = 0.63 / 0.75 = 0.84
A: Pr (age < 60 | age < 100) = (1-0.75) / (1-0.28) = 0.347
A: Pr (age ≥ 100 | age ≥ 80) = 0.28 / 0.63 = 0.444
Pr(A ≥ 100 ∩ B < 100 | A ≥ 80 ∩ B ≥ 80) + Pr(A < 100 ∩ B ≥ 100 | A ≥ 80 ∩ B ≥ 80) = .444 * .555 + .555 * .444 = 0.494