![Page 1: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/1.jpg)
Introduction to genomics
CM226: Machine Learning for Bioinformatics
Fall 2017 Sriram Sankararaman
![Page 2: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/2.jpg)
What is this course about? Bioinformatics: Answering biological questions using tools from computer science, statistics and mathematics.
Machine Learning: Learning from data
![Page 3: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/3.jpg)
Personalized medicine
A biological question
![Page 4: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/4.jpg)
Cystic fibrosis
Genetic disease
Caused by mutations in the CFTR gene
![Page 5: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/5.jpg)
Cystic fibrosis
Ivacaftor is a drug approved for patients with a mutation that changes G to D at position 551 in the protein encoded by CFTR
Ramsey et al. NEJM 2011
![Page 6: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/6.jpg)
Relapse in ALL
Acute Lymphoblastic Leukemia in children
Five-year survival rates ~80%
Poorer survival rates among African-Americans and Hispanics
Higher risk of relapse among children with native American ancestry
Yang et al. Nature Genetics 2011
Risk
of r
elap
se
![Page 7: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/7.jpg)
Path to personalized medicine
Basic biology
Understand the architecture of a disease
Is the disease genetic or environmental ?
What are the genetic and environmental factors?
Can we develop drug to target defective genes ?
Disease prediction
Predict disease risk for an individual from genetic data
![Page 8: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/8.jpg)
Path to personalized medicine
Basic biology
Understand the architecture of a disease
Is the disease genetic or environmental ?
What are the genetic and environmental factors?
Disease prediction
Predict disease risk for an individual from genetic data
Problems of statistical learning
![Page 9: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/9.jpg)
What is this course about? Bioinformatics: Answering biological questions using tools from computer science, statistics and mathematics.
Machine Learning: Learning from data
![Page 10: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/10.jpg)
What is this course about?Genomic revolution in biology
2001 Human genome project
2010 1000 genomes project
![Page 11: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/11.jpg)
What is this course about?Genomic revolution in biology
Stephens et al. PLoS Biology 2015
Num
ber o
f hum
an g
enom
es
Num
ber o
f bas
es
![Page 12: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/12.jpg)
What is this course about?Personal genomics
![Page 13: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/13.jpg)
Statistics and computing important in obvious and subtle ways
What does this mean for us?
![Page 14: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/14.jpg)
Administrivia
![Page 15: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/15.jpg)
Course goalsIdentify biological questions.
Translate these questions into a statistical model.
Perform inference within the model.
Apply the model to data.
Interpret the results
Statistically sound ? Biologically meaningful ?
![Page 16: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/16.jpg)
Course goals
For CS/Stats/EE students, identify important quantitative problems in Bioinformatics.
For Bioinformatics/Human Genetics students, learn a new set of tools and understand their strengths and limitations.
![Page 17: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/17.jpg)
Enrolment/PTEsCourse over-subscribed.
Students on the waitlist will be enrolled in case some one drops.
I am not giving out PTEs until after problem set 0 submission date.
Expect several students will drop the course.
Last year, several students dropped late in the quarter.
Does require mathematical maturity (more later).
![Page 18: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/18.jpg)
Enrolment/PTEsHope that all qualified students will be able to enroll.
Problem set 0 intended to mitigate late drops.
This is a course requirement.
Graded to assess your background but not part of final grade.
Be honest/realistic with yourself about your background.
![Page 19: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/19.jpg)
Prerequisites
Some programming experience (R strongly encouraged)
![Page 20: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/20.jpg)
PrerequisitesMachine learning builds upon
Probability and Statistics
Linear algebra
Optimization
![Page 21: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/21.jpg)
Math background reviewProblem set 0 intended to evaluate if you have the necessary background.
1. Minimum background section
2. Moderate background section
If you pass (2), in good shape.
If you pass (1) but not (2), you should expect to fill in the background needed ( we will also cover this material in class).
If you find (1) difficult, you should consider other classes to obtain the necessary background.
![Page 22: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/22.jpg)
Course formatIntroduce important problems in Bioinformatics.
Use these problems to motivate the study of learning algorithms
Will study different aspects of learning algorithms
Statistical
Computational
Issues that arise in practice
Lectures will switch between studying the methodology and the applications
![Page 23: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/23.jpg)
Course formatProblem sets (worth 50% each).
Will include programming and data analyses.
Homeworks are due at 11:59pm on due date.
Late submissions not accepted.
Will use gradescope to manage submissions and grading. Code for the class: M6PPWB.
All solutions must be clearly written or typed. Unreadable answers will not be graded. We encourage using LaTeX to typset answers.
For some assignments, it might be the case that only a subset of the problems will be graded in detail. Students may or may not be told which problems will be graded in advance.
Solutions will be graded on both correctness and clarity.
You are free to discuss homework problems. However, you must write up your own solutions. You must also acknowledge all collaborators.
![Page 24: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/24.jpg)
Course formatExams
In-class midterm (20%) Date: November 7
In-class final (30%) Date: December 7
Exams are in class, closed-book and closed-notes and will cover material from the lectures and the problem sets.
No alternate or make-up exams will be administered, except for disability/medical reasons documented and communicated to the instructor prior to the exam date. In particular, exam dates and times cannot be changed to accommodate scheduling conflicts with other classes.
Problem set 0 will be graded but not count towards final grade. We will not grade any other homeworks/exams unless you submit problem set 0.
![Page 25: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/25.jpg)
ForumsPiazza
Must have already got an email.
Otherwise sign up at: https://piazza.com/ucla/fall2017/cm226/home
Strongly encourage students to post here rather than email course staff directly (you will get a faster response this way)
If you do need to contact the staff privately, Piazza allows you to do this.
![Page 26: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/26.jpg)
ForumsGradescope
Manage problem sets and exam submissions.
For problem sets, you will need to upload pdfs of your submission to gradescope.
Regrade requests must be made through gradescope within one week after the graded homeworks have been handed out, regardless of your attendance on that day and regardless of any intervening holidays such as Memorial Day.
We reserve the right to regrade all problems for a given regrade request.
![Page 27: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/27.jpg)
Tentative class syllabusSee class website:
http://web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/html/index.html
![Page 28: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/28.jpg)
Office hoursMy Office hours: Wednesday 11am-noon or by appointment@Boelter 4531D
TA: Gleb Kichaev
Office hours: TBD
![Page 29: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/29.jpg)
Questions?
![Page 30: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/30.jpg)
Goals for this class and the next
Introduction to genomics
Introduction to statistics
![Page 31: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/31.jpg)
Crash-course in genomicsMolecular biology : How does the genome code for function?
Genetics: How is the genome passed on from parent to child ?
Genetic variation: How does the genome change when it is passed on ?
Population and evolutionary genetics: How does the genome vary across populations and species?
Genome sequencing: How do we read the genome ?
![Page 32: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/32.jpg)
OutlineMolecular biology : How does the genome code for function?
Genetics: How is the genome passed on from parent to child ?
Genetic variation: How does the genome change when it is passed on ?
What can we learn from genetic variation ?
Genome sequencing: How do we read the genome ?
![Page 33: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/33.jpg)
Traits/Phenotype
Trait/phenotype: Any observable that is inherited
Height, eye color, disease status, cellular measurements, IQ
Instructions that modulate traits found in the genome
Galton et al. 1877
![Page 34: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/34.jpg)
How the genome codes for function?
![Page 35: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/35.jpg)
OutlineMolecular biology : How does the genome code for function?
Genetics: How is the genome passed on from parent to child ?
Genetic variation: How does the genome change when it is passed on ?
What can we learn from genetic variation ?
Genome sequencing: How do we read the genome ?
![Page 36: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/36.jpg)
Correlation of traits across relatives
Galton et al. 1877
![Page 37: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/37.jpg)
Typical human cell has 46 chromosomes
22 pairs of homologous chromosomes (autosomes)
1 pair of sex chromosomes
Genetics and inheritance
The chromosome painting collective
![Page 38: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/38.jpg)
Genetics and inheritanceOne member of each pair of homologous chromosomes comes from the father (paternal) and the other from the mother (maternal)
In males, Y from father and X from mother
The chromosome painting collective
![Page 39: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/39.jpg)
OutlineMolecular biology : How does the genome code for function?
Genetics: How is the genome passed on from parent to child ?
Genetic variation: How does the genome change when it is passed on ?
What can we learn from genetic variation ?
Genome sequencing: How do we read the genome ?
![Page 40: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/40.jpg)
Causes of genetic variationDNA not always inherited accurately
Mutations: changes in DNA
Changes at a single base (single nucleotide)
Can have more complex changes
![Page 41: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/41.jpg)
More definitions
Locus: position along the chromosome (could be a single base or longer).
Allele: set of variants at a locus
Genotype: sequence of alleles along the loci of an individual
Individual 1: (1,CT),(2,GG)
Individual 2: (1,TT), (2,GA)
A T C C T T A G G A
A T C T T T C A G A
Locus 1 Locus 2
A T C T T T C A G A
A T C T T T C A A A
Individual 1
Individual 2
Maternal
Paternal
![Page 42: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/42.jpg)
Single Nucleotide Polymorphism (SNP)
Form the basis of most genetic analyses
Easy to study in high-throughput (million at a time)
Common (80 million SNPs discovered in 2500 individuals)
Two human chromosomes have a SNP every ~1000 bases
A T C C T T A G G A
A T C T T T C A G A
Locus 1 Locus 2
A T C T T T C A G A
A T C T T T C A A A
Individual 1
Individual 2
Maternal
Paternal
![Page 43: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/43.jpg)
Genetic inheritanceSegregation (Mendel’s first law)
AA aa
A
A a
a
Aa
p(A) = 1 p(A) = 0
p(A) = 0.5
Generation 0
Gametes
Generation 1
Gametes
![Page 44: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/44.jpg)
Segregation (Mendel’s first law)
AA X aa
Aa
AA X Aa
AaAA
Aa X Aa
AaAA aa
0.5 0.5
0.25 0.250.50
1.0
Generation 0
Generation 1
Generation 0
Generation 1
Genetic inheritance
![Page 45: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/45.jpg)
Assortment (Mendel’s second law)
AaBb
AB aB abAb
0.25 0.25 0.25 0.25
Generation 0
Gametes
Locus 1 Locus 2
Genetic inheritance
![Page 46: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/46.jpg)
Assortment (Mendel’s second law)
Not quite
AaBb
AB aB abAb
0.45 0.05 0.05 0.45
Generation 0
Gametes
Genetic inheritance
![Page 47: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/47.jpg)
Assortment (Mendel’s second law)
Not quite. Recombination
AB aBab AbParental Recombinant
Genetic inheritance
![Page 48: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/48.jpg)
Back to genetic inheritanceAssortment (Mendel’s second law)
Linkage: Positions nearby inherited together.
The chromosome painting collective
![Page 49: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/49.jpg)
Back to genetic inheritanceMutation and recombination (among other forces that we will learn about later) produce genetic variation
Mutation produces differences
Recombination shuffles these differences
![Page 50: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/50.jpg)
OutlineMolecular biology : How does the genome code for function?
Genetics: How is the genome passed on from parent to child ?
Genetic variation: How does the genome change when it is passed on ?
What can we learn from genetic variation ?
Genome sequencing: How do we read the genome ?
![Page 51: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/51.jpg)
What can we learn from genetic variation ?
Evolution and history
Biological function and disease
![Page 52: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/52.jpg)
What can we learn from genetic variation ?
History of human populations learned from genetics
https://blog.23andme.com/23andme-and-you/genetics-101/a-beautiful-ancestry-painting/
![Page 53: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/53.jpg)
What can we learn from genetic variation ?
Evolution and history
Biological function and disease
![Page 54: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/54.jpg)
Genome-wide Association Studies (GWAS)
![Page 55: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/55.jpg)
OutlineMolecular biology : How does the genome code for function?
Genetics: How is the genome passed on from parent to child ?
Genetic variation: How does the genome change when it is passed on ?
What can we learn from genetic variation ?
Genome sequencing: How do we read the genome ?
![Page 56: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/56.jpg)
Reading our genomesGoal: Determine the sequence of bases along each chromosome
Details depend on the technology
Computationally hard
![Page 57: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/57.jpg)
The human genome projectGoals
Sequence an accurate reference human genome
Draft published in 2001
High-quality version completed in 2003
Cost: ~$3 billion.
Time: ~13 years.
Two competing groups (public and private)
![Page 58: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/58.jpg)
The human genome projectMajor findings
Fewer genes than previously thought (~20K)
Pertea and Salzberg, Genome Biology 2010
![Page 59: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/59.jpg)
The human genome project
Other outcomes
International collaborations
Power of computing
![Page 60: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/60.jpg)
Reading the genomeHuman genome provides a reference
Humans share most of their genome (~99.9% )
Can focus on reading the differences
Simpler problem
![Page 61: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/61.jpg)
Reading the genome
High-throughput genotyping
Hybridization of DNA molecules
Nucleotides bind to their complementary bases
A=T, C=G
Can be used to get the genotype at a chosen set of SNPs
A T C C T T A G G A
A T C T T T C A G A
![Page 62: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/62.jpg)
Maps of genetic variation
Goals: Describe common patterns of genetic variation in human populations
Phase 1: Genotyped ~1 million SNPs from 270 individuals in 4 populations. Captured most SNPs with a frequency of >5%.
Phase 3: 7 additional populations included
All data publicly available.
International HapMap Project
![Page 63: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/63.jpg)
Reading the genomeLimitations of genotyping
Can only read SNPs that are on the chip
Biased by how these SNPs are chosen (e.g. common SNPs)
![Page 64: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/64.jpg)
Reading the genomeTechnologies: Illumina, IonTorrent, PacBio
Can read only small pieces of the genome (~100bp)
But can read hundreds of thousands of fragments in parallel
Use the reference human genome to find the locations of the reads (and to infer mutations)
High-throughput (or next-gen) sequencing
![Page 65: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/65.jpg)
Reading the genomeCost of genome sequencing
![Page 66: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/66.jpg)
Maps of genetic variation
2500 individuals from 26 populations
Discover ~90 million SNPs
Includes >99% of SNPs with frequency >1%
All data publicly available
1000 Genomes Project
![Page 67: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/67.jpg)
Maps of genetic variation
Many more such efforts underway
Example: Simons Genome Diversity Project : 260 genomes from 127 populations
Also publicly available
![Page 68: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/68.jpg)
Other interesting data
EXAC data: Exomes from ~60,000 individuals
Also publicly available
UK Biobank: 500,000 individuals with 200 phenotypes
Not publicly available
![Page 69: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/69.jpg)
Computational and statistical problems
Recurring theme:
How to better model a biological problem?
![Page 70: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/70.jpg)
Computational and statistical problems
Recurring theme:
How to scale up our favorite algorithms to massive datasets?
Hundreds of thousands of individuals at hundreds of millions of SNPs measured for thousands of phenotypes
![Page 71: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/71.jpg)
Computational and statistical problems
Recurring theme:
Tradeoffs
Computational efficiency
Statistical accuracy
Constraints from technology
Constraints from the data generation. e.g. privacy
![Page 72: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/72.jpg)
Computational and statistical problems
Learning the genotype-phenotype map
Genome-wide association studies
Inferring genetic architecture
Correcting for confounding
Multiple hypothesis testing
Risk prediction
Causality
Population genomics
Inference of human history
Admixture inference
![Page 73: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/73.jpg)
This classOverview of the biological questions
Basic concepts in genetics
Genomes are inherited according to well-known rules (Mendel’s laws)
Genomes change
Genetic variation forms the starting point for inference.
Possible inferences: history, disease risk and many more
Advances in technology are allowing us to read many more genomes
![Page 74: Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction](https://reader034.vdocuments.us/reader034/viewer/2022051405/5a9551687f8b9a451b8c5a72/html5/thumbnails/74.jpg)
Next classBasic concepts in statistics
Given data, how do we perform inference ?
What are the set of inferential tasks?
How do we derive procedures for these tasks ?
How do we evaluate these procedures ?