introduction to bioinformatics* probability calculations in bioinformatics *...

Download Introduction to Bioinformatics* Probability Calculations in Bioinformatics * elhaij/bnfo301-12/elhaij/bnfo301-12

If you can't read please download the document

Upload: kaylyn-perkins

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1

Introduction to Bioinformatics* Probability Calculations in Bioinformatics * http://www.people.vcu.edu/~elhaij/bnfo301-12/http://www.people.vcu.edu/~elhaij/bnfo301-12/ If youre in the middle of bioinformatics, you are undoubtedly surrounded by a large number of things nucleotides, genes, metabolites. Large numbers means that your usual idiocy-checking facilities often dont work, and without them, large numbers often lead to large embarrassing mistakes. Probability calculations are often your best defense against foolishness. Slide 2 Probability Calculations in Bioinformatics TOPIC Utility of probability calculations in bioinformatics The Rule of Multiplication The Rule of Addition The Rule of Subtraction The Rule of Everything Final thoughts SLIDE 3 14 63 85 99 118 To navigate to a specific slide, type the slide number and press Enter Slide 3 Utility of Probability Calculations Such calculations are useful in a large variety of circumstances. Here are a few, chosen to illustrate certain tools useful in calculation. Slide 4 Utility of Probability Calculations How frequently would a DNA sequence appear by chance? This question arises in many disguises. Ill consider some youre familiar with. Slide 5 Utility of Probability Calculations How frequently would a DNA sequence appear by chance? Youll recall this question from Problem Set 1. I emphasized that question because similar questions so commonly arise when considering sequences. Slide 6 Utility of Probability Calculations How frequently would a DNA sequence appear by chance? Given the proposed binding site, how frequently would you expect RNA polymerase to bind to random DNA? Heres another instance of the same sort of question. You have some idea about what a certain protein is looking for when it binds DNA. If youre right, then you might expect the probability of random binding to be relatively low. Slide 7 Utility of Probability Calculations How frequently would a DNA sequence appear by chance? How specific does the DNA binding site need to be to prevent unwanted repression? The same question approached from the opposite end. A probability calculation tells you how specific you should expect a biologically relevant binding site to be. Slide 8 Utility of Probability Calculations How frequently would a DNA sequence appear by chance? How much overlap is required to ensure a meaningful sequence assembly? GAATATGAGCCTCTTCCTGA GAAGTTTTCGCATAAAT In sequence assembly, a simple probability calculation helps you judge whether an overlap is worth your attention. Slide 9 Utility of Probability Calculations How frequently would a DNA sequence appear by chance? Whats the probability that an arginine encoded by AGA will mutate to a hydrophilic amino acid? This kind of question has some theoretical importance. Is the genetic code arranged in such a way that the potential for harmful mutations is minimized? Slide 10 Utility of Probability Calculations How frequently would a DNA sequence appear by chance? Whats the probability that an arginine encoded by AGA will mutate to a hydrophobic amino acid? Oversampling Completeness How many nucleotides will be missing if a genome sequencing project is taken to 6x coverage? Sequencing a large genome can be expensive! How can you calculate whether the amount of sequencing is enough to produce a reasonably complete genome? Slide 11 Tools of Probability Calculations How to calculate these probabilities? Probability calculations can be hideously complex, but fortunately, most of the calculations youll run across in bioinformatics are of the simple variety, requiring only a few simple tools. Slide 12 Tools of Probability Calculations Rule of multiplication (intersection) Rule of addition (union) Rule of subtraction (complementation) Probability calculations often boil down to creative counting how many ways are there that satisfy your criteria? However, I wont go into that much in this presentation. Instead, Ill go through three rules, considered within the context of first a simple calculation and then one of bioinformatic relevance. Rule of everything Slide 13 Whats the probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) A seemingly simple question. Slide 14 Whats the probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) If youre certain that the four possible outcomes are all equally likely, then you can just count 1 desired outcome in 4 possible 1/4. Slide 15 P(TT) = 1/2 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first Rule of multiplication (Intersection of possibilities) Or you can calculate the probability of two simple events both occurring. The probability that the first coin lands tails should be 1/2 Slide 16 P(TT) = 1/2 1/2 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) and the probability the second lands tails should be the same. How do you get from the two individual probabilities the probability that both occur? Slide 17 P(TT) = 1/2 x 1/2 = 1/4 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) The probability that both occur is the product of the two individual probabilities. Why? Slide 18 P(TT) = 1/2 x 1/2 = 1/4 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) well, in the universe of possibilities, half the time the first coin lands tails, and in half of those possibilities, the second coin lands tails. Half of half1/2 x 1/2. Slide 19 P(TT) = 1/2 x 1/2 = 1/4 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) When can you resort to the multiplication of probabilities of events to get the joint probability of both events occurring? Slide 20 P(TT) = 1/2 x 1/2 = 1/4 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) The Rule of Multiplication may apply if youre looking for the intersection of two possibilities. Both the first coin lands tails AND the second coin lands tails. Rule of multiplication intersection Slide 21 P(TT) = 1/2 x 1/2 = 1/4 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) But it is also necessary that the two events be independent of one another. What is meant by independent? Rule of multiplication intersection independent Slide 22 What does independent mean? (To illustrate) I wanted to find how likely it is for there to be a series of nice days in a row here in Richmond. So I went on the web Slide 23 What does independent mean? and found that historically, one out of three days in February had some amount of rain. Thats an average, of course. Slide 24 What does independent mean? but when I went to the weather prediction for the week, I found that there were seven consecutive days for which no rain was predicted. Is that credible? Are we looking at a remarkable occurrence, perhaps an effect of global warming? Slide 25 What does independent mean? P( ) = Whats the probability of 7 non-rainy days in a row? 1 234567 Slide 26 What does independent mean? P( ) = P( ) = 2/3 I know the probability of no rain on the first day. So long as the historical average is pertinent, the probability should be 2/3. 1 234567 1 Slide 27 What does independent mean? P( ) = P( ) = 2/3 and similarly for days 2 through 7. The Rule of Multiplication tells me how I might combine these probabilities. How? 1 234567 1 AND P( ) = 2/3 2 AND P( ) = 2/3 3 AND P( ) = 2/3 4 AND P( ) = 2/3 5 AND P( ) = 2/3 6 AND P( ) = 2/3 7 Slide 28 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) I certainly want the intersection of all seven events, i.e. Im asking for the joint probability of all seven events occurring. According to the rule, I should therefore be able to multiply the individual probabilities. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 Slide 29 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) You can reach for your calculator (or computer), but before you do this calculation indeed, before you do any calculation you should have an estimate in mind of what you expect the answer to be. Otherwise you are selling your soul to the machine. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 Slide 30 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Imagine your estimate one day at a time. Where on the number line would you place the probability of just one non-rainy day? 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Slide 31 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Yes, 2/3 -- about 0.67. What about two non-rainy days? Whats 2/3 of 2/3? Mentally divide the interval 0 and 0.67 into thirds, and move the arrow down to the 2/3 of 2/3 mark. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 Slide 32 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Thats about 0.45. Now divide the new interval into thirds as before to reach 2/3 of 2/3 of 2/3, and move the arrow down again. Notice that to calculate the amount to move down, all you have to do is divide the number by 3. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 2 Slide 33 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Thats about 0.45. Now divide the new interval into thirds as before to reach 2/3 of 2/3 of 2/3, and move the arrow down again. Notice that to calculate the amount to move down, all you have to do is divide the number by 3. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 2 Slide 34 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Down to 0.30, the calculated probability of three non-rainy days in a row. Again, divide the new interval into thirds, and move the arrow down. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 3 Slide 35 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Down to 0.30, the calculated probability of three non-rainy days in a row. Again, divide the new interval into thirds, and move the arrow down. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 3 Slide 36 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Down to about 0.20 for four non-rainy days in a row. Again, 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 4 Slide 37 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Down to about 0.20 for four non-rainy days in a row. Again, 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 4 Slide 38 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Maybe 0.14. Again for six non-rainy days in a row, 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 5 Slide 39 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) 1 234567 1 2 3 4 5 6 7 = 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 5 Maybe 0.14. Again for six non-rainy days in a row, Slide 40 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) To about 0.09. Last time for the seventh day 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 6 Slide 41 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) To about 0.09. Last time for the seventh day 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 6 Slide 42 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Close to 0.06. That should be what the calculator/computer gives us for (2/3) 7, the calculated probability for seven non-rainy days in a row. Actually, no need for the machine. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 7 Slide 43 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) That doesnt sound very likely! Are we in an unusual stretch of weather? 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 7 = ~0.06 Slide 44 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Or say today is Thursday, and suppose it indeed rains. Whats the probability that it will rain tomorrow? The historical record seems to say theres a one in three chance 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 7 = ~0.06 Slide 45 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) but of course thats absurd! Knowing that it rained today makes it much more likely that it will rain tomorrow. The two events are not independent. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 7 = ~0.06 Slide 46 What does independent mean? P( ) = P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) AND P( ) = 2/3 = = = = = = = (2/3) Whenever the outcome of one event biases the outcome of another, those two events are not independent. If two events are not independent, then the Rule of Multiplication cannot be applied to obtain a joint probability. 1 234567 1 2 3 4 5 6 7 = (2/3) 7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 7 = ~0.06 Slide 47 P(TT) = 1/2 x 1/2 = 1/4 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) Are the results of two coin flips independent of one another? Rule of multiplication intersection independent Slide 48 P(TT) = 1/2 x 1/2 = 1/4 Whats the probability that Coin#1 AND Coin#2 come up tails Gets T from first AND gets T from second Rule of multiplication (Intersection of possibilities) Probably, but maybe not. Maybe the coins are magnets and subtly influence each others flight. Rule of multiplication intersection independent Slide 49 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC Back to DNA Is this question related to the coin-flip question? Can you reword the question so that it is? Slide 50 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC Maybe Whats the probability that: Nucleotide#1 is G AND Nucleotide#2 is T AND Nucleotide#3 is A AND Nucleotide#4 is T AND Nucleotide#5 is A AND Nucleotide#6 is C ? Slide 51 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC Maybe Whats the probability that: Nucleotide#1 is G AND Nucleotide#2 is T AND Nucleotide#3 is A AND Nucleotide#4 is T AND Nucleotide#5 is A AND Nucleotide#6 is C...but the question asks about a random piece of DNA, not specific nucleotides. Slide 52 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC How often would you expect to find in a coin flip? How often would you expect to find in a random series of coin flips? OK, back to coins for a moment. How do the following two questions differ from one other? Slide 53 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC How often would you expect to find in a coin flip? How often would you expect to find in a random series of coin flips? The answer to the first is surely 50%. The answer to the second is the same, no? Slide 54 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC How often would you expect to find in 3 coin flips? How often would you expect to find in a random series of coin flips? What about these two questions? Slide 55 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC How often would you expect to find in 3 coin flips? How often would you expect to find in a random series of coin flips? Again, the answer is the same for both. (and while in the area, what is the answer, presuming a fair coin?) Slide 56 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC How often would you expect to find GTATAC in 6 random nucleotides? How often would you expect to find GTATAC in a random piece of DNA? And in the context of DNA How do these two questions differ? Slide 57 Probability that Coin#1 AND Coin#2 come up tails Rule of multiplication (Intersection of possibilities) GTATAC How often would you expect to find GTATAC in 6 random nucleotides? How often would you expect to find GTATAC in a random piece of DNA? Answer: They dont. So long as the question calls for a frequency, you can answer either form. Slide 58 Rule of multiplication (Intersection of possibilities) How often would you expect to find GTATAC in 6 random nucleotides? How often would you expect to find GTATAC in a random piece of DNA? Probability that Coin#1 AND Coin#2 come up tails GTATAC So choose what seems to be the simpler form How often would you expect to find GTATAC in 6 random nucleotides? Slide 59 Rule of multiplication (Intersection of possibilities) Now how to proceed? If youre ever stuck on a problem, simplify it until you reach a problem that you can answer. For example, How often would you expect to find GTATAC in 6 random nucleotides? Slide 60 Rule of multiplication (Intersection of possibilities) Whats the probability that one random nucleotide is a G? How often would you expect to find GTATAC in 6 random nucleotides? P( G in ) 123456 1 = p 1 By now you realize that this question depends on the organism. The answer might be 25%, but more likely it is significantly higher or lower. For now, just call the answer p 1. Slide 61 Rule of multiplication (Intersection of possibilities) And the same is true for the nucleotide in the other 5 positions. Each probability is some number, usually easy to obtain. How often would you expect to find GTATAC in 6 random nucleotides? P( G in ) 123456 1 = p 1 But how do we combine these six numbers for a single expected frequency for GTATAC? Are we looking for an intersection of events? Are the events independent? AND P( C in ) 6 = p 6 AND P( A in ) 5 = p 5 AND P( T in ) 4 = p 4 AND P( A in ) 3 = p 3 AND P( T in ) 2 = p 2 Slide 62 Rule of multiplication (Intersection of possibilities) And the same is true for the nucleotide in the other 5 positions. Each probability is some number, usually easy to obtain. How often would you expect to find GTATAC in 6 random nucleotides? P( G in ) 123456 1 = p 1 If so, then you can use the Rule of Multiplication. AND P( C in ) 6 = p 6 AND P( A in ) 5 = p 5 AND P( T in ) 4 = p 4 AND P( A in ) 3 = p 3 AND P( T in ) 2 = p 2 Slide 63 Tools of Probability Calculations Rule of multiplication (intersection) Rule of addition (union) Rule of subtraction (complementation) On to another tool, introduced through another simple calculation and a problem of bioinformatic relevance. Rule of everything Slide 64 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) Another seemingly simple question (note that OR always includes the possibility that both events occur). Slide 65 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) In this problem, its easy to count the number of (equally likely?) events to get the answer, 3/4. Slide 66 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) We can also calculate the result, noting the probability of each desired outcome. P(one T) = 1/4 Gets T from 1 st but not 2 nd Slide 67 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) Youll notice that each new event increases the likelihood. P(one T) = 1/4 1/4 Gets T from 1 st but not 2 nd OR 2 nd but not 1 st Slide 68 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) But the likelihood can never be greater than 100%. P(one T) = 1/4 1/4 1/4 Gets T from 1 st but not 2 nd OR 2 nd but not 1 st OR both Slide 69 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) The probability that one of the outcomes occurs is 3/4, the sum of the individual probabilities. P(one T) = 1/4 1/4 1/4 = 3/4 Gets T from 1 st but not 2 nd OR 2 nd but not 1 st OR both Slide 70 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) P(one T) = 1/4 1/4 1/4 = 3/4 Gets T from 1 st but not 2 nd OR 2 nd but not 1 st OR both When can you resort to the summing the probabilities of outcomes to get the probability of at least one of the outcomes occurring? Slide 71 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) Rule of addition (OR) union mutually exclusive The Rule of Addition may apply if youre looking for the union of multiple possibilities: Either TH OR HT OR TT has occurred. Slide 72 Probability that Coin#1 OR Coin#2 comes up tails Rule of addition (Union of possibilities) Rule of addition (OR) union mutually exclusive But it is also necessary that the two events be mutually exclusive of one another. What is meant by mutually exclusive? Slide 73 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday (To illustrate) The plants in my garden are drying up. If it doesnt rain in one of the next three days, theyll die unless I shake myself out of my lethargy and go outside and water them. Im not quite prepared to do that, Id rather calculate how likely it is to rain. ??? Slide 74 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday No problem I recall from 51 slides ago that the historical frequency of rain is 1 in 3, and presuming thats true for all three days P(rain) = 1/3 P(rain Thursday) OR P(rain Friday) OR P(rain Saturday) = 1/3 ??? Slide 75 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday Add up the three possibilities, and Hey! No need to water! P(rain) = 1/3 P(rain Thursday) OR P(rain Friday) OR P(rain Saturday) = 1/3 + 1/3 + 1/3= 1/3 = 100% ??? Slide 76 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday Whats wrong with this scenario? (absolutely nothing in my opinion, but Im talking about the Rule of Addition) P(rain) = 1/3 P(rain Thursday) OR P(rain Friday) OR P(rain Saturday) = 1/3 + 1/3 + 1/3= 1/3 = 100% ??? Slide 77 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday Well, using the Rule of Addition, implies that I can add outcomes like slices of a pie. The slices cant overlap. = 1/3 + 1/3 + 1/3 = 100% Thu Fri Sat It will rain ??? Slide 78 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday But that's ridiculous! It's not true that raining on Thursday makes raining on Friday or Saturday impossible! I could fix this = 1/3 + 1/3 + 1/3 = 100% Thu Fri Sat It will rain ??? Slide 79 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday Now the slices separate things that should be separated. Raining all day excludes sunning all day. = 1/3 + 1/3 + 1/3 = 100% rain all day sun all day some rain, some sun The weather on Thursday will be ??? Slide 80 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday Or I could list mutually exclusive outcomes, e.g., on Thu-Fri-Sat it rained-rained-rained, or it rained-rained-sunned, etc. = 1/3 + 1/3 + 1/3 = 100% ??? Slide 81 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday The problem with my addition of probabilities before was that the events were not mutually exclusive, so the rule doesn't work. = 1/3 + 1/3 + 1/3 = 100% ??? Slide 82 What does mutually exclusive mean? Probability that it rains ThursdayOR FridayOR Saturday What is the probability that it will rain Thu, Fri, or Sat? You can count, but we'll soon consider a better strategy. = 1/3 + 1/3 + 1/3 = 100% ??? Slide 83 Rule of addition (Union of possibilities) Whats the probability that an arginine encoded by AGA will mutate to a hydrophilic amino acid? P(AGA Gly) = 1/9 P(AGA Lys) = 1/9 P(AGA Ser) = 2/9 P(AGA Thr) = 1/9 Sum? Here's a molecular biology problem I could solve using the Rule of Addition as shown below (and the presumption that all nucleotide mutations are equally likely). But is this a valid use of the rule? Slide 84 Rule of addition (Union of possibilities) Whats the probability that an arginine encoded by AGA will mutate to a hydrophilic amino acid? P(AGA Gly) = 1/9 P(AGA Lys) = 1/9 P(AGA Ser) = 2/9 P(AGA Thr) = 1/9 Sum? It is, so long as the events are mutually exclusive. Does the mutation of arginine to glycine exclude the possibility of that amino acid mutating to lysine? Certainly! It can't become two different things! Slide 85 Tools of Probability Calculations Rule of multiplication (intersection) Rule of addition (union) Rule of subtraction (complementation) On to another tool, introduced through another simple calculation. Rule of everything Slide 86 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) We solved this problem before, but can we do so using only the probabilities of the individual events, i.e. P(T) = 1/2? Slide 87 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) How about P(at least 1 T) = P(T 1 ) x P(T 2 ) = 1/2 x 1/2 Slide 88 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) That doesn't work. The events are independent, as required by the Rule of Multiplication, but we're not looking for 1 st coin tails AND 2 nd coin tails. Slide 89 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) How about P(at least 1 T) = P(T 1 ) + P(T 2 ) = 1/2 + 1/2 Slide 90 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) P(T 1 ) + P(T 2 ) This is somewhat better, since we are looking for either the first coin falling tails OR the second doing so, but... Slide 91 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) the events are not mutually exclusive, as required by the Rule of Addition. It's possible that both Coin#1 and Coin#2 land tails. Slide 92 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) The problem becomes simpler if I change the wording Slide 93 Probability that at least one coin comes up tails Rule of subtraction (Complementation of possibilities) The problem becomes simpler if I change the wording Probability that it is not true that both coins come up heads Slide 94 Rule of subtraction (Complementation of possibilities) This says the same thing, but it's easier to calculate, at least the both-heads part. We've seen that before, and solved it with the Rule of Multiplication. Probability that it is not true that both coins come up heads P(HH) = P(H) x P(H) P(HH) = 1/2 x 1/2 = 1/4 Slide 95 Rule of subtraction (Complementation of possibilities) But we don't want the probability of both heads. Rather, we want the probability of NOT both-heads. What would that be? Probability that it is not true that both coins come up heads P(NOT HH) = ??? P(HH) = P(H) x P(H) P(HH) = 1/2 x 1/2 = 1/4 Slide 96 Rule of subtraction (Complementation of possibilities) We can make use of the fact that P(HH) + P(NOT HH) = 1 i.e. there's a 100% chance that two heads either occur or do not occur. Probability that it is not true that both coins come up heads P(NOT HH) = 1 P(HH) P(HH) = P(H) x P(H) P(HH) = 1/2 x 1/2 = 1/4 Slide 97 Rule of subtraction (Complementation of possibilities) and from this we can solve the problem. Probability that it is not true that both coins come up heads P(NOT HH) = 1 P(HH) P(HH) = P(H) x P(H) P(HH) = 1/2 x 1/2 = 1/4 P(NOT HH) = 1 1/4 = 3/4 = P(at least one T) Slide 98 Rule of subtraction (Complementation of possibilities) This trick of simplifying the question worked because the simplification went from a statement to its complement: If one were true the other must be false. Probability that it is not true that both coins come up heads P(NOT HH) = 1 P(HH) P(HH) = P(H) x P(H) P(HH) = 1/2 x 1/2 = 1/4 P(NOT HH) = 1 1/4 = 3/4 = P(at least one T) Rule of Subtraction (NOT) Go from yin to yang Probabilities add to 1 Slide 99 Tools of Probability Calculations Rule of multiplication (intersection) Rule of addition (union) Rule of subtraction (complementation) Now on to the main event Rule of everything Slide 100 Rule of everything (Do the right thing) Rule of Everything: Dont apply rules mindlessly There is no rule so good and so general that it cant be mangled and abused. Rules cannot replace thought. Visualize what youre trying to calculate. Estimate what the final number ought to be. Maintain control over the proceedings. Dont rely on a dumb rule to lead you to success. Slide 101 How many nucleotides will be missing if a genome sequencing project is taken to 6x coverage? Oversampling Completeness Here we are You're sequencing the 120 Mb Drosophila genome. Some damn fool budgeted for only 6x coverage. Is that enough? At 6x coverage, what fraction of the genome will be sequenced? How many nucleotides will be missed? Rule of everything (Do the right thing) Slide 102 How many nucleotides will be missing if a genome sequencing project is taken to 6x coverage? Oversampling Completeness This isn't an easy question to answer, but it's essentially identical to another question you've seen, one that might be a bit easier to think about Rule of everything (Do the right thing) Slide 103