computational methods in molecular biology cs-67693 ... · computational methods in molecular...

9
cbio course, spring 2005, Hebrew University Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University, Jerusalem cbio course, spring 2005, Hebrew University Class 1: Introduction cbio course, spring 2005, Hebrew University Introduction What is Comp. Bio.? Why is it great? What are the aims and basic concepts of this course High level biological review: give basic bio background and motivation for tasks handled in the course Administration… cbio course, spring 2005, Hebrew University The Cell cbio course, spring 2005, Hebrew University Example: Tissues in Stomach cbio course, spring 2005, Hebrew University DNA Components Four nucleotide types: Adenine Guanine Cytosine Thymine Hydrogen bonds: A-T C-G

Upload: others

Post on 20-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

1

cbio course, spring 2005, Hebrew University

Computational Methods In Molecular Biology

CS-67693, Spring 2005

School of Computer Science & EngineeringHebrew University, Jerusalem

cbio course, spring 2005, Hebrew University

Class 1: Introduction

cbio course, spring 2005, Hebrew University

Introduction

What is Comp. Bio.? Why is it great?What are the aims and basic concepts of this courseHigh level biological review: give basic bio background and motivation for tasks handled in the courseAdministration…

cbio course, spring 2005, Hebrew University

The Cell

cbio course, spring 2005, Hebrew University

Example: Tissues in Stomach

cbio course, spring 2005, Hebrew University

DNA Components

Four nucleotide types:AdenineGuanineCytosineThymine

Hydrogen bonds:A-TC-G

Page 2: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

2

cbio course, spring 2005, Hebrew University

The Double HelixSo

urce

: Alb

erts

et a

l

cbio course, spring 2005, Hebrew University

DNA Organization

Sour

ce: A

lber

tset

al

cbio course, spring 2005, Hebrew University

Genome Sizes

E.Coli (bacteria) 4.6 x 106 basesYeast (simple fungi) 15 x 106 basesSmallest human chromosome 50 x 106 basesEntire human genome 3 x 109 bases

cbio course, spring 2005, Hebrew University

Related Computational Tasks

Need a way to reconstruct DNA sequence from fragments – major contribution of comp. bio. !Related: sequence comparison, sequence alignment

cbio course, spring 2005, Hebrew University

DNA Duplication

Sour

ce: M

athe

ws &

van

Hol

de

cbio course, spring 2005, Hebrew University

GenesThe DNA strings include:

Coding regions (“genes”) • E. coli has ~4,000 genes • Yeast has ~6,000 genes• C. Elegans has ~13,000 genes• Humans have ~32,000 genes

Control regions • These typically are adjacent to the genes• They determine when a gene should be

expressed“Junk” DNA (unknown function)

Page 3: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

3

cbio course, spring 2005, Hebrew University

The Tree of LifeSo

urce

: Alb

erts

et a

l

cbio course, spring 2005, Hebrew University

Evolution

Related organisms have similar DNA• Similarity in sequences of proteins• Similarity in organization of genes along the

chromosomesEvolution plays a major role in biology• Many mechanisms are shared across a wide

range of organisms (e.g. orthologes)• During the course of evolution existing

components are adapted for new functions (e.gparaloges)

cbio course, spring 2005, Hebrew University

Evolution

Evolution of new organisms is driven byDiversity• Different individuals carry different variants of

the same basic blue printMutations• The DNA sequence can be changed due to

single base changes, deletion/insertion of DNA segments, etc.

Selection bias

cbio course, spring 2005, Hebrew University

Related Computational Tasks

Phylogeny – not just theory!: • Rebuild the tree of life…• Infer relations between genes/pathways etc.

across species• Learn models for changes and development• Major benefit: exploit the information we do

have/observe to infer about the systems on which we have very little knowledge and observations….

cbio course, spring 2005, Hebrew University

How Do Genes Code for Proteins?

Transcription

RNA

Translation

ProteinDNA cbio course, spring 2005, Hebrew University

Transcription

Coding sequences can be transcribed to RNA

RNA nucleotides:• Similar to DNA, slightly different backbone• Uracil (U) instead of Thymine (T)

Sour

ce: M

athe

ws &

van

Hol

de

Page 4: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

4

cbio course, spring 2005, Hebrew University

RNA Editing

cbio course, spring 2005, Hebrew University

Translation

cbio course, spring 2005, Hebrew University

Translation

The ribosome attaches to the mRNA at a translation initiation siteThen ribosome moves along the mRNA sequence and in the process constructs a poly-peptideWhen the ribosome encounters a stop signal, it releases the mRNA. The construct poly-peptide is released, and folds into a protein.

Translation is mediated by the ribosome

Ribosome is a complex of protein & rRNA molecules

cbio course, spring 2005, Hebrew University

Translation

Sour

ce: A

lber

tset

al

cbio course, spring 2005, Hebrew University

Translation

Sour

ce: A

lber

tset

al

cbio course, spring 2005, Hebrew University

Translation

Sour

ce: A

lber

tset

al

Page 5: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

5

cbio course, spring 2005, Hebrew University

TranslationSo

urce

: Alb

erts

et a

l

cbio course, spring 2005, Hebrew University

Translation

Sour

ce: A

lber

tset

al

cbio course, spring 2005, Hebrew University

Genetic Code

cbio course, spring 2005, Hebrew University

Transcription

RNA

Translation

ProteinDNA

The Central Dogma

Genes

Experiments

cbio course, spring 2005, Hebrew University

TFTFTFs

Basal

PromotermRNA

Gene5’ 3’

Transcription start site

3’ 5’RNA

polymerase II

5’

Eukaryotic Transcription Regulation

“Classical Model”Composition of promoter region determines rate of transcription initiationCombinations of TFs control the transcription of gene sets under specific conditions

Genes

TF

cbio course, spring 2005, Hebrew University

From Data to Model

>YKL112W Chr 11 ATGGACAAATTAGTCGTGAATTATTATGAATACAAGCACCCTATAATTAATAAAGACCTGGCCATTGGAGCCCATGGAGGCAAAAAATTTCCCACCTTGGGTGCTTGGTATGATGTAATTAATGAGTACGAATTTCAGACGCGTTGCCCTATTATTTTAAAGAATTCGCATAGGAACAAACATTTTACATTTGCCTGTCATTTGAAAAACTGTCCATTTAAAGTCTTGCTAAGCTATGCTGGCAATGCTGCATCCTCAGAAACCTCATCTCCTTCTGCAAATAATAATACCAACCCTCCGGGTACTCCTGATCATATTCATCATCATAGCAACAACATGAACAACGAGGACAATGATAATAACAATGGCAGTAATAATAAGGTTAGCAATGACAGTAAACTTGACTTCGTTACTGATGATCTTGAATACCATCTGGCGAACACTCATCCGGACGACACCAATGACAAAGTGGAGTCGAGAAGCAATGAGGTGAATGGGAACAATGACGATGATGCTGATGCCAACAACATTTTTAAACAGCAAGGTGTTACTATCAAGAACGACACTGAAGATGATTCGATAAATAAGGCCTCTAT

Page 6: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

6

cbio course, spring 2005, Hebrew University

Many Related Computational Tasks…

Information is in the code book →:• How alternative splicing is determined and

where?• Build models for regulation of genes at different

levels of complexity• Relate genotype and phenotype: What are the

expression patterns of some disease? How do they relate to sequence? What model can explain the observations? Can we predict phenomenon based on our models?

cbio course, spring 2005, Hebrew University

Who came first?

Chicken or egg? • Egg

DNA or Protein? • RNA…

Thomas Cech & Sidney Altman ( 80’s !):• RNA as an “independent” molecule• Probably more close to the ancient “source”

cbio course, spring 2005, Hebrew University

RNA roles

Messenger RNA (mRNA)• Encodes protein sequences

Transfer RNA (tRNA)• Adaptor between mRNA molecules and amino-

acids (protein building blocks)Ribosomal RNA (rRNA) • Part of the ribosome, a machine for translating

mRNA to proteins...

cbio course, spring 2005, Hebrew University

Transfer RNA

Anticodon:matches a codon (triplet of mRNA nucleotides)

Attachment site:matches a specific amino-acid

cbio course, spring 2005, Hebrew University

Related Computational Tasks

RNA secondary structure prediction: • based on CFG and CM

RNA coding area prediction…

cbio course, spring 2005, Hebrew University

RNA Editing

Sour

ce: M

athe

ws &

van

Hol

de

Page 7: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

7

cbio course, spring 2005, Hebrew University

Translation

cbio course, spring 2005, Hebrew University

How do Proteins Perform their Rules?

Protein interact in various waysChange conformations, conformations → functionMajor Issues: • Their “active”/functional areas which interact • Their 3D structure

cbio course, spring 2005, Hebrew University

Protein Structure

Proteins are poly-peptides of 70-3000 amino-acids

This structure is (mostly) determined by the sequence of amino-acids that make up the protein

cbio course, spring 2005, Hebrew University

Protein Structure

cbio course, spring 2005, Hebrew University

Related Computational Tasks

Protein 2D, 3D structure predictionIdentify sequence motifs/domains in proteins• Sequence similarity vs. functional similarity

cbio course, spring 2005, Hebrew University

Course GoalsReview current tasks posed by modern molecular biologyReview and experiment with some of the tools/solutions currently found (e.g. BLAST, clustalw)Gain some tools to handle such problems:• Dynamic programming• Probabilistic graphical models:

♦MM,HMM,CM,Trees♦Representation, what principles justify them,

Learning, Inference• Statistic tools: how to measure our confidence in our

results?

Page 8: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

8

cbio course, spring 2005, Hebrew University

Course Goals

Computational tools in molecular biology:

We will cover computational tasks that are posed by modern molecular biologyWe will discuss the biological motivation and setup for these tasksWe will understand the the kinds of solutions exist and what principles justify them

cbio course, spring 2005, Hebrew University

Course’s Main Point

cbio course, spring 2005, Hebrew University

Course’s Main Point

Learn to do:Define the problem → Find comp. solutionFour Aspects:

Biological • What is the task?

Algorithmic• How to perform the task at hand efficiently?

Learning• How to adapt parameters of the task form examples

Statistics• How to differentiate true phenomena from artifacts

cbio course, spring 2005, Hebrew University

Example: Sequence Comparison

Biological • Evolution preserves sequences, thus similar genes might

have similar functionAlgorithmic

• Consider all ways to “align” one sequence against another

Learning• How do we define “similar” sequences? Use examples to

define similarityStatistics

• When we compare to ~106 sequences, what is a random match and what is true one

cbio course, spring 2005, Hebrew University

Topics I

Dealing with DNA/Protein sequences:Genome projects and how sequences are found Finding similar sequencesModels of sequences: Hidden Markov ModelsTranscription regulationProtein FamiliesGene finding

cbio course, spring 2005, Hebrew University

Topics II

Gene Expression:Genome-wide expression patternsData organization: clusteringReconstructing transcription regulationRecognizing and classifying cancers

Page 9: Computational Methods In Molecular Biology CS-67693 ... · Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University,

9

cbio course, spring 2005, Hebrew University

Topics III

Models of genetic change:Long term: evolutionary changes among speciesReconstructing evolutionary trees from current day sequencesShort term: genetic variations in a populationFinding genes by linkage and association

cbio course, spring 2005, Hebrew University

Topics IV

Protein World:How proteins fold - secondary & tertiary structureHow to predict protein folds from sequences data aloneHow to analyze proteins changes from raw experimental measurements (MassSpec)2D gels

cbio course, spring 2005, Hebrew University

Class Structure2 weekly meeting• Mondays 16-18 (Levin 8), Wednesdays 10-12

(Kaplan)Grade:

Homework assignments: ~50% of the final grade. There will be up to seven homework assignments. These assignments will include theoretical problems, using bioinformatics tools and programming. Final home assignment: ~20% of the final grade. Final test: ~30% of the grade. Class participation: A 5% bonus grade for students who actively participate in discussions during classes Possible: oral presentation of any exercise to define grade!

cbio course, spring 2005, Hebrew University

Exercises & Handouts

Check regularly

http://www.cs.huji.ac.il/~cbio