algorithms in computational biology (236522) fall 2004-5 lecture #1 lecturer: shlomo moran, taub...
TRANSCRIPT
![Page 1: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/1.jpg)
Algorithms in Computational Biology (236522) Fall 2004-5
Lecture #1
Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730TA: Sivan Yogev, Taub 224, tel 5617Office hours Monday 1030-1130
Lecture: Tuesday 12:30-14:30, Taub 6Tutorial: Thursday 10:30-11:30, Taub 61st tutorial: Sunday 24.10, 16:30, Taub 6
This class has been initially edited from Nir Friedman’s lecture at the Hebrew University. Changes made by Dan Geiger, then by Shlomo Moran.
![Page 2: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/2.jpg)
Course Information
Requirements & Grades:
• 15-25% homework, in five assignments. [Submit in two weeks time]. Homework is obligatory.
• 75-85% test. Must pass beyond 55 for the homework’s grade to count
• Exam date: 3.2.05.
![Page 3: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/3.jpg)
Bibliography• Biological Sequence Analysis, R.Durbin et al.
, Cambridge University Press, 1998 • Introduction to Molecular Biology, J.
Setubal, J. Meidanis, PWS publishing Company, 1997
• Phylogenetics, C. Semple, M. Steel, Oxford press, 2003
• url: webcourse.cs.technion.ac.il/~cs236522
![Page 4: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/4.jpg)
Course PrerequisitesComputer Science and Probability Background• Data structure 1 (cs234218)• Algorithms 1 (cs234247)• Probability (any course)
Some Biology Background Formally: None, to allow CS students to take this course. Recommended: Molecular Biology 1 (especially for those in the
Bioinformatics track), or a similar Biology course, and/or a serious desire to complement your knowledge in Biology by reading the appropriate material (see the course web site).
![Page 5: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/5.jpg)
Biological Background
Due time: Tutorial class of 2.11.04 (2 weeks from today).
First home work assignment: Read the first chapter (pages 1-30) of Setubal et al., 1997. (copies are available in the Taub building library, and in the central library). Answer the questions of the first assignment in the course site.
![Page 6: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/6.jpg)
Computational BiologyComputational biology is the application of computational tools and techniques to (primarily) molecular biology. It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics.
Computational biology is also called Bioinformatics, although many practitioners define Bioinformatics somewhat narrower by restricting the field to molecular Biology only.
![Page 7: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/7.jpg)
Examples of Areas of Interest• Building evolutionary trees from molecular (and other) data• Efficiently constructing genomes of various organisms• Understanding the structure of genomes (SNP, SSR, Genes)• Understanding function of genes in the cell cycle and disease• Deciphering structure and function of proteins
_____________________SNP: Single Nucleotide PolymorphismSSR: Simple Sequence Repeat
![Page 8: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/8.jpg)
Exponential growth of biological information: growth of sequences, structures, and literature.
![Page 9: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/9.jpg)
Course Goals
• Learning about computational tools for (primarily) molecular biology.
• Cover computational tasks that are posed by modern molecular biology
• Discuss the biological motivation and setup for these tasks
• Understand the kinds of solutions that exist and what principles justify them
![Page 10: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/10.jpg)
Topics I
Dealing with DNA/Protein sequences:
• Informal biological background.
• Finding similar sequences
• Models of sequences: Hidden Markov Models
• Gene finding
![Page 11: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/11.jpg)
Topics II
Models of genetic changes:• Long term: evolutionary changes among
species• Reconstructing evolutionary trees from
sequences• Short term: genetic variations in a
population• Finding genes by linkage and association
![Page 12: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/12.jpg)
Human GenomeMost human cells contain
46 chromosomes:
• 2 sex chromosomes (X,Y):
XY – in males.
XX – in females.
• 22 pairs of chromosomes named autosomes.
![Page 13: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/13.jpg)
DNA OrganizationS
ourc
e: A
lber
ts e
t al
![Page 14: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/14.jpg)
The Double HelixS
ourc
e: A
lber
ts e
t al
![Page 15: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/15.jpg)
DNA ComponentsFour nucleotide types:• Adenine• Guanine• Cytosine• Thymine
Hydrogen bonds(electrostatic connection):
• A-T• C-G
![Page 16: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/16.jpg)
Genome Sizes• E.Coli (bacteria) 4.6 x 106 bases• Yeast (simple fungi) 15 x 106 bases• Smallest human chromosome 50 x 106 bases• Entire human genome 3 x 109 bases
![Page 17: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/17.jpg)
Genetic Information
• Genome – the collection of genetic information.
• Chromosomes – storage units of genes.
• Gene – basic unit of genetic information. They determine the inherited characters.
![Page 18: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/18.jpg)
GenesThe DNA strings include:• Coding regions (“genes”)
– E. coli has ~4,000 genes – Yeast has ~6,000 genes– C. Elegans has ~13,000 genes– Humans have ~32,000 genes
• Control regions – These typically are adjacent to the genes– They determine when a gene should be “expressed”
• “Junk” DNA (unknown function - ~90% of the DNA in human’s chromosomes)
![Page 19: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/19.jpg)
The Cell
All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types.
![Page 20: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/20.jpg)
Example: Tissues in Stomach
How is this variety encoded and expressed ?
![Page 21: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/21.jpg)
Central Dogma
Transcription
mRNA
Translation
ProteinGene
cells express different subset of the genesIn different tissues and under different conditions
שעתוק תרגום
![Page 22: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/22.jpg)
Transcription• Coding sequences can be transcribed to
RNA
• RNA – Similar to DNA, slightly different nucleotides:
different backbone– Uracil (U) instead of Thymine (T)
Sou
rce:
Mat
hew
s &
van
Hol
de
![Page 23: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/23.jpg)
Transcription: RNA Editing
Exons hold information, they are more stable during evolution.This process takes place in the nucleus. The mRNA molecules diffuse through the nucleus membrane to the outer cell plasma.
1. Transcribe to RNA2. Eliminate introns3. Splice (connect) exons* Alternative splicing exists
![Page 24: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/24.jpg)
RNA roles• Messenger RNA (mRNA)
– Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block).
• Transfer RNA (tRNA)– Decodes the mRNA molecules to amino-acids. It connects
to the mRNA with one side and holds the appropriate amino acid on its other side.
• Ribosomal RNA (rRNA) – Part of the ribosome, a machine for translating mRNA to
proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the tRNA to the amino acid chain being created.
• ...
![Page 25: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/25.jpg)
Translation
• Translation is mediated by the ribosome• Ribosome is a complex of protein & rRNA
molecules• The ribosome attaches to the mRNA at a
translation initiation site• Then ribosome moves along the mRNA sequence
and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein.
![Page 26: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/26.jpg)
Genetic Code
There are 20 amino acids from which proteins are build.
![Page 27: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/27.jpg)
Protein Structure
• Proteins are poly-peptides of 70-3000 amino-acids
• This structure is (mostly) determined by the sequence of amino-acids that make up the protein
![Page 28: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/28.jpg)
Protein Structure
![Page 29: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/29.jpg)
Evolution
• Related organisms have similar DNA– Similarity in sequences of proteins– Similarity in organization of genes along the
chromosomes
• Evolution plays a major role in biology– Many mechanisms are shared across a wide
range of organisms– During the course of evolution existing
components are adapted for new functions
![Page 30: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/30.jpg)
Evolution
Evolution of new organisms is driven by
• Diversity– Different individuals carry different variants of
the same basic blue print
• Mutations– The DNA sequence can be changed due to
single base changes, deletion/insertion of DNA segments, etc.
• Selection bias
![Page 31: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/31.jpg)
The Tree of Life
Sou
rce:
Alb
erts
et
al
![Page 32: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/32.jpg)
Example of a graph theoretic problem related
to evolution trees: the perfect phylogeny
problem
![Page 33: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/33.jpg)
Characters in Species
• A (discrete) character is a property which distinguishes between species (e.g. dental structure, a certain gene)
• A characters state is a value of the character (human dental structure).
• Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.
![Page 34: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/34.jpg)
Species ≡ VerticesCharacters ≡ Colorings
States ≡ Colors
Evolutionary tree ≡ A tree with many colorings, containing the given vertices
= No teeth
= teeth
AB
C
D
![Page 35: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/35.jpg)
Another tree
Which tree is more reasonable?
= No teeth
= teeth
A B
C D
![Page 36: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/36.jpg)
Evolutionary trees should avoid
reversal transitions
• A species regains a state it’s direct ancestor has lost.
• Famous (and rare) examples:– Teeth in birds.– Legs in snakes.
![Page 37: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/37.jpg)
Evolutionary trees should avoid convergence transitions
• Two species possess the same state while their least common ancestor possesses a different state.
• Famous example: The marsupials.
![Page 38: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/38.jpg)
![Page 39: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/39.jpg)
Common Assumption:Characters with Reversal or Convergent transitions are highly unlikely in the Evolutionary Tree
A character that exhibits neither reversals nor convergence is denoted homoplasy free.
![Page 40: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/40.jpg)
A character is Homoplasy Free
↕ The corresponding coloring is convex
(each color induces a connected subtree)
![Page 41: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/41.jpg)
A partial coloring is convex if it can be completed to a (total) convex coloring
![Page 42: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/42.jpg)
The Perfect Phylogeny Problem
• Input: a set of species, and many characters, each assign states (colors) to the species.
• Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex?
![Page 43: Algorithms in Computational Biology (236522) Fall 2004-5 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday 1630-1730 TA: Sivan](https://reader036.vdocuments.us/reader036/viewer/2022070306/5518b393550346a61f8b5006/html5/thumbnails/43.jpg)
Input: Some colorings (C1,…,Ck) of a set of vertices (in the example: 3 colorings: left, center, right, each by (the same) two colors).
Problem: Is there a tree T which includes these vertices, s.t. (T,Ci) is convex for i=1,…,k?
RBRRRRBBRRRB
The Perfect Phylogeny Problem(combinatorial setting)
NP-Hard In general, in P for some special cases