![Page 1: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/1.jpg)
Bio5488
Genomics
Spring, 2016
Lectures: Mon, Wed 10:00-11:30 am
Lab: Fri 10:00-11:30 am
4th floor classroom
4515 MSRB
![Page 2: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/2.jpg)
Outline of the day
• Outline of the course
• What is genomics?
• A little history
• The simple principles of genomics
• Being quantitative
• From a student to an investigator
![Page 3: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/3.jpg)
A few TA administrivia...
• If you didn’t receive an email from [email protected] this week, please email [email protected] or talk to a TA after class
• If you’re taking the lab:– Please fill out the anonymous survey:
https://goo.gl/YfGnEp
– Read assignment 1 (link on the syllabus section in Blackboard)
– Attempt to install the required software (instructions in the resources section of Blackboard)
– Bring your laptop to class on Friday
![Page 4: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/4.jpg)
Course Web Site
• http://www.genetics.wustl.edu/bio5488/home.html
• Blackboard: bb.wustl.edu
• Linux Primer
• Python Primer
• Lecture notes
• Schedule
• Weekly Assignments and Answers
• Weekly Readings
![Page 5: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/5.jpg)
Grading
4 credit
• ¼ midterm
• ¼ final
• ½ weekly assignments
3 credit
• ½ midterm
• ½ final
What is the key to your success?
![Page 6: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/6.jpg)
![Page 7: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/7.jpg)
![Page 8: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/8.jpg)
http://www.genome.gov/Director/
![Page 9: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/9.jpg)
History of Bio5488
1998
Bio5495
Eddy
2003
Bio5488
Cohen
Mitra
2006
Bio5495
Brent
2012
Bio5488
Wang
Conrad
HGP
Computational
Biology
Functional
Genomics
Microarray
Systems/Synt
hetic
Biology
Next-gen
Sequencing
ENCODE
GWAS
Roadmap
etc
![Page 10: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/10.jpg)
History of Genomics and Epigenomics
1865 Gregor Mendel: founding of genetics
1953 Watson and Crick: double helix model for DNA
1955 Sanger: first protein sequence, bovine insulin
1970 Needleman-Wunsch algorithm for sequence alignment
1977 Sanger: DNA sequencing
1978 The term “bioinformatics” appeared for the first time
1980 The first complete gene sequence (Bacteriophage FX174), 5386 bp
1981 Smith-Waterman algorithm for sequence alignment
1981 IBM: first Personal Computer
1983 Kary Mullis: PCR
1986 The term "Genomics" appeared for the first time: name of a journal
1986 The SWISS-PROT database is
1987 Perl (Practical Extraction Report Language) is released by Larry Wall.
1990 BLAST is published
1995 The Haemophilus influenzea genome (1.8 Mb) is sequenced
1996 Affymetrix produces the first commercial DNA chips
2001 A draft of the human genome (3,000 Mbp) is published
![Page 11: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/11.jpg)
History of Genomics and EpigenomicsHGP
Computational
Biology
Machine Learning
Data mining
Structural informatics
Drug Design
Statistical Modeling
Database
Gene
expression
Microarray
Omics
Systems/Synt
hetic
Biology
Next-gen
Sequencing
ENCODE
GWAS
Roadmap etc
Sequence analysis
Hidden Markov Model
Gene finding
BLAT
Genome Browser
Motif finding
Assembly
90’s
00’s
10’s
Comparative
Genomics
Evolution
![Page 12: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/12.jpg)
Genome, genetics, and genomics
• What is a genome?– The genetic material of an organism.
– A genome contains genes, regulatory elements, and other
mysterious stuff.
• What is genetics?– The study of genes and their roles in inheritance.
• What is genomics– The study of all of a person's genes (the genome), including
interactions of those genes with each other and with the
person's environment
![Page 13: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/13.jpg)
What about genomes• Characterize the genome
– How big
– How many genes
– How are they organized
• Annotate the genome– What, where, and how
• Modern genomics: “ChIPer” vs “Mapper”– Direct measurement
– Inference
– Comparison
– Evolution
• From genome to molecular mechanisms to diseases– Genomes/epigenomes of diseased cells
• What do you want to learn from this class?– Being quantitative
– Concept/philosophy
– Techniques/problem solving skills
– Do not forget genetics!!!
![Page 14: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/14.jpg)
Motivation slides
![Page 15: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/15.jpg)
The sequence explosion
Genomes sequencedhttp://www.genomesonline.org/, (as of November 2015)
49559 Bacteria, 1136 Archaea, 11122 Eukarya, 4473 viruses
18385 metagenome samples
2298 Synthetic genomes
![Page 16: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/16.jpg)
The Human Genome Project
June 26, 2000 President Clinton, with
Craig Venter and
Francis Collins,
announces completion
of "the first survey of
the entire human
genome."February 15, 2001
April 1, 2010
The human genome completed!
– 2001
The human genome completed,
again! – 2010
10 years after draft sequence:
What have we learned?• Genome sequencing
• Functional elements
• Evolution of genome
• Basis of diseases
• Human history
• Computational biology
![Page 17: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/17.jpg)
Francis CollinsNIH Director
“ If I was a senior in college or a
first-year graduate student trying to
figure out what area to work in, I
would be a computational
biologist.
-- During AAAS Meeting Jan 2010
”
COLLINS: .... Computational biologists
are having a really good time and it’s going to get better.
ROSE: Their day is coming?
COLLINS: Their day is here, but it’s
going to be even more here in a few years.
COLLINS: They’re going to be the breakthrough artists ....
-- Mar 15, 2010, Charlie Rose Show
![Page 18: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/18.jpg)
![Page 19: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/19.jpg)
The Human genome:
the “blueprint” of our body
1013 different cells in
an adult human
The cell is the basic
unit of life
DNA = linear
molecule inside the
cell that carries
instructions needed
throughout the
cell’s life ~ long
string(s) over a
small alphabet
Alphabet of four
(nucleotides/bases)
{A,C,G,T}
GTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCT
CCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATG
AAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAA
ATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATT
AGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCG
ATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTG
CATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTA
TTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATT
James Watson
Francis Crick
Fred Sanger
![Page 20: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/20.jpg)
DNA, Chromosome, and Genome
![Page 21: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/21.jpg)
Building an Organism
cellDN
AEvery cell has the same sequence of DNA
Subsets of the DNA
sequence determine the
identity and function of
different cells
![Page 22: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/22.jpg)
What makes us different?
Differences between individuals?
Differences between species?
![Page 23: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/23.jpg)
One genome, thousands of epigenomes
Embryonic
stem cells
Fetal
tissues
Adult cells
and tissues
![Page 24: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/24.jpg)
Understanding the genome
• View from 2000
– Protein-coding genes 35,000 - 120,000
– Regulatory sequence Less than protein-
coding information
– Transposons Junk DNA
• We now know ...
All these are WRONG!
![Page 25: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/25.jpg)
How many genes do we have?
Science 2005
fly
worm
human
weed
fish
rice
What we used to think
Gene numbers do not correlate with organism complexity.
Many gene families are surprisingly old.
![Page 26: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/26.jpg)
Complexity, Genome Size and the C-value Paradox
www.genomesize.comOrganism Genome Size (MB)
Amoeba 670,000
Fern 160,000
Salamander 81,300
Onion 18,000
Paramecium 8,600
Toad 6,900
Barley 5,000
Chimp 3,600
Gorilla 3,500
Human 3,500
Mouse 3,400
Dog 3,300
Pig 3,100
Rat 3,000
Boa Constrictor 2,100
Zebrafish 1,900
Chicken 1,200
Fruit fly 180
C. elegans 100
Plasmodium falciparum 25
Yeast, Fission 14
Yeast, Baker's 12Escherichia coli 4.6
Bacillus subtilis 4.2
H. influenzae 1.8
Mycoplasma genitalium 0.60
C-value: the amount of
DNA contained within a
haploid nucleus (e.g. a
gamete) or one half the
amount in a diploid
somatic cell of a
eukaryotic organism,
expressed in picograms
(1pg = 10−12 g).
![Page 27: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/27.jpg)
Complexity and Organism Specific Genes
• Only 14 out of 731 genes on mouse chromosome 16 have no human homolog (Celera)
• Many genes specific to the mouse are olfactory receptors, also some differences in immunity and reproduction
![Page 28: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/28.jpg)
Most functional information is non-coding
!5% highly conserved, but only 1.5% encodes proteins
21
chr2 (q31.1) 21 p14 2p12 13 31.1 q34 q35
chr2:
DLX1
DLX2
Vertebrate Cons
ChimpRhesus
BushbabyTree_shrew
MouseRat
Guinea_PigShrew
HedgehogDogCat
HorseCow
ArmadilloElephantTenrec
OpossumPlatypusLizard
ChickenZebrafishTetraodon
FuguSticklebackMedaka
172660000 172665000 172670000 172675000UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics
Vertebrate Multiz Alignment & PhastCons Conservation (28 Species)
DLX1
GapsHumanChimpRhesus
BushbabyTree_shrew
MouseRat
Guinea_PigShrew
HedgehogDogCat
HorseCow
ArmadilloElephantTenrec
OpossumPlatypusLizard
ChickenZebrafishTetraodon
FuguSticklebackMedaka
UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics
Vertebrate Multiz Alignment & PhastCons Conservation (28 Species)
K P R T I Y S S L Q L Q A L N
1A A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A T T CC A GC T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A TA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA C A A T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA C A A T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CCCCC C T A GGA C A A T T T A T T CC A GT T T GC A GC T GGA CGC T T T GA A TA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A GC CC A GGA C A A T C T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A C T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA CGA T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA C A A T T T A T T CC A GT T T GC A GT T GC A GGC T T T GA A CA A A C C T A GGA CGA T T T A T T CC A GT T T GC A GC T GC A GGC T T T GA A TA A A C CC A GGA C T A T T T A T T CC A GT C T GC A GT T GC A GGC T T T GA A CA A A C CC A GGA C T A T A T A T T CC A GT T T GC A GT T GC A GGC A T T GA A CA A GC CGC GC A CC A T C T A C T CC A GC C T C C A GC T CC A GGC C T T GA A CA A A C CC A GGA C T A T T T A T T CC A GT T T GC A GC T GC A GGC T C T GA A CA A GC CCC GGA CC A T A T A C T CC A GT C T C C A GC T GC A GGC T C T GA A CA A A C CC A GGA C T A T C T A T T CC A GT T T A C A GC T CC A GGC CC T GA A CA A A C C A A GGA C T A T C T A T T C A A GT T T A C A A C T CC A A GC CC T GA A CA A A C C A A GGA C T A T C T A T T CC A GT T T A C A A C T T C A A GC T C T A A A CA A A C C A A GGA C T A T A T A T T CC A GT T T A C A GC T T C A GGC T C T GA A C
What do they do?
Most functional information is non-coding
![Page 29: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/29.jpg)
Ultra conserved elements
ultra conserved
e.d 12.5
![Page 30: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/30.jpg)
HARs: Human accelerated regions
• 118 bp segment with 18 changes between
the human and chimp sequences
• Expect less than 1
![Page 31: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/31.jpg)
Human HAR1F differs from the ancestral
RNA structure
![Page 32: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/32.jpg)
Main components in the Human genome
Only 1.5% of the human genome are protein-coding regions
Transposable elements make up almost half of the human genome
Barbara McClintock
![Page 33: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/33.jpg)
Transposable Elements (TEs)
Evolutionary pattern of LTR10B1 genomic copies
Wang et al. PNAS 2007
The LTR10 and MER61 families are
particularly enriched for copies with a p53
site. These ERV families are primate-specific
and transposed actively near the time when
the New World and Old World monkey
lineages split.
TEs can shape transcriptional network
![Page 34: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/34.jpg)
Understanding Human Disease
NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/
Published Genome-Wide Associations through 12/2013 Published GWA at p≤5X10-8 for 17 trait categories
![Page 35: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/35.jpg)
ENCODE Project
Proposed in Nov 2009
HapMap 1000 Genomes
Epigenome
TCGA
![Page 36: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/36.jpg)
Thinking Quantitatively
![Page 37: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/37.jpg)
Biology Is A Quantitative Science!!!
1) Mendel’s Laws
2) Chargaff’s Rules
Gregor Mendel
1823-1884
Erwin Chargaff
1929-1992
![Page 38: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/38.jpg)
Thinking Quantitatively• Space
– Be comprehensive
• Signal to Noise Ratio– Sensitivity, specificity, dynamic range
– What is my background control?
• Distributions– Normal, Gaussian, Poisson, negative binomial, extreme value,
hypergeometric, etc.
– Discrete vs continuous
• The P value
• Statistics, Probability, Computation, and Informatics
• Don’t forget genetics!!!
![Page 39: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/39.jpg)
Spaces: Be comprehensive
• Conditions: spatial, temporal, treatment – think about controlling for multiple variables
• Think globally – interaction between local features and global features (Placenta histone example)
• Be comprehensive about what assumptions are made – some we know, some we don’t (genome assembly example)
![Page 40: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/40.jpg)
Signal to Noise
m/z
rela
tive a
bundance
400 600 800 1000
100
75
50
25
input
ChIP
Different sources of noise
![Page 41: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/41.jpg)
Sensitivity, Specificity, and Dynamic Range
• Sensitivity– What is the smallest signal that can reliably be detected
(signal to noise)?
– True positive rate
• Specificity– How well can we discriminate between similar signals?
– True negative rate
• Dynamic Range– What is the linear range of detection?
– What is the range of natural variation?
![Page 42: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/42.jpg)
Distributions
Normal (Gaussian)
Poisson
Extreme value
Negative binomial
![Page 43: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/43.jpg)
The P value*for discreet variables
# o
f o
ccurr
ences
0 1 2 3 4 5 6 7 8 9 10
5
10
15
20
25What is the chance of getting exactly
seven?
10.0
139172317931
9
#
#
P
P
trialsoftotal
sevenwerethattrialsofP
# o
f occurr
ences
0 1 2 3 4 5 6 7 8 9 10
5
10
15
20
25
16.0
139172317931
139
#
#
P
P
trialsoftotal
betterorsevenwerethattrialsofP
What is the chance of getting seven or
better?
![Page 44: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/44.jpg)
“There Are Three Kinds of Lies:
Lies, Damned Lies, and Statistics”
• Comparing averages
• Comparing variation
-Benjamin Disraeli (1804-1881)
Statistics: drawing inferences from the results
of an experiment
0
0.2
0.4
0.6
0.8
1
1 7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
![Page 45: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/45.jpg)
Gaussian (Normal) Distributions I
• Mean (x) =
• Standard Deviation (s) =
0
0.2
0.4
0.6
0.8
1
1 7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
height
# o
f people
n
i
ixn 1
1
1
)(1
2
n
xxn
i
i
![Page 46: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/46.jpg)
Gaussian (Normal) Distributions II
• Mean (x or )
• Standard Deviation (s or )
• Compare means (t-tests)
• Compare standard deviations (f-tests)
• Calculate P-values for particular results (z-tests)
0
0.2
0.4
0.6
0.8
1
1 7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
height
# o
f people
![Page 47: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/47.jpg)
“The most important questions of life are
…really only problems of probability.”
-Pierre Simon, Marquis de Laplace (1749-
1827)
Probability: Computing the chance of a particular
outcome of an experiment
E
EC
S
P(E)=E/S
P(Ec)=1-P(E)
![Page 48: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/48.jpg)
Independent Events/Mutually Exclusive Events
What is the probability of A and B both occurring?
P(AB) = P(A)*P(B) (Independent)
P(A|B) = P(A) (Independent)
P(A|B)*P(B) = P(B|A)*P(A) (Bayes rule)
P(AB) = P(A|B) = 0 (Mutually Exclusive)
What is the probability of A or B occurring (independent)?
P(A) * P(B) + P(A) * 1-P(B) + 1-P(A) * P(B)
both + A only + B only
What is the probability of A or B occurring (mutually exclusive)?
P(A) + P(B)
![Page 49: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/49.jpg)
An example
Ala (A) 7.81 Gln (Q) 3.94 Leu (L) 9.62 Ser (S) 6.88
Arg (R) 5.32 Glu (E) 6.60 Lys (K) 5.93 Thr (T) 5.45
Asn (N) 4.20 Gly (G) 6.93 Met (M) 2.37 Trp (W) 1.15
Asp (D) 5.30 His (H) 2.28 Phe (F) 4.01 Tyr (Y) 3.07
Cys (C) 1.56 Ile (I) 5.91 Pro (P) 4.84 Val (V) 6.71
What is the probability that a peptide of length 25 contains at least one SP motif?
P(SP) = P(S)P(P) = .0688*.0484 =.00329
P(SPc) = 1 - P(SP) = .9966
P(no SP anywhere in a 25 mer) = P(SPc)24 = 0.92
P(at least one SP in a 25 mer) = 1- P(SPc)24 =0.08
What is the probability that the last residue of a protein is either K or R?
P(K)+P(R) = .0593 + .0532= .11
Amino acid percentages of Swisprot
![Page 50: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/50.jpg)
counting permutations
Permutations:
Question: How many different 5-mer
sequences can I make using each of the
amino acids S, T, A, G, P once and only
once?
Answer:
5! = 5*4*3*2*1 = 120
![Page 51: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/51.jpg)
counting combinations
Combinations:
Question: How different 5-mer sequences
can I make using three Ser’s and two
Pro’s
Answer:
101*2*1*2*3
1*2*3*4*5
!2!*3
!5
![Page 52: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/52.jpg)
Example
A particular cluster has 25
coexpressed genes in it. 15 of these
genes are annotated as being
involved in rRNA transcription.
Is 15/25 significant?
![Page 53: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/53.jpg)
Hypergeometric Probability Distributionthe “overlap problem” or sampling without replacement
• N = number of genes in the genome (6000 for yeast)
• n = number of genes in the cluster (25)
• m = number of rRNA transcription genes (109 from MIPS)
• s = number of rRNA transcription genes in the cluster (15)
N
n s m
![Page 54: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/54.jpg)
Hypergeometric Probability Distribution
N
n s m
)!(!
!)(
bab
a
b
awhere
n
N
in
mN
i
m
sXPn
si
![Page 55: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/55.jpg)
Hypergeometric Probability Distribution
6000
25 15 109
1625
15
101
25
6000
25
1096000109
)15(
i
iiXP
![Page 56: Bio5488 Genomics Spring, 2012genetics.wustl.edu/bio5488/files/2016/01/Bio5488_Intro_2016.pdfBio5488 Genomics Spring, 2016 Lectures: Mon, Wed 10:00-11:30 am Lab: Fri 10:00-11:30 am](https://reader034.vdocuments.us/reader034/viewer/2022050404/5f8155d3a0dea836d3226553/html5/thumbnails/56.jpg)
Therefore in our example…
15/25 rRNA transcription genes in the cluster is significant.
BUT…
1) 10 out of 25 genes in the cluster are not rRNA transcription genes.
2) 94 rRNA transcription genes are not in the cluster.
3) What about the other genes in the cluster?
6000
25 15 109