introduction to bioinformatics 234525-236523 lecturer: dr. yael mandel-gutfreund teaching...

45
Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel- Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici urse web site : tp://webcourse.cs.technion.ac.il/234525

Upload: scarlett-malone

Post on 17-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

Introduction to Bioinformatics234525-236523

Lecturer: Dr. Yael Mandel-Gutfreund

Teaching Assistance:

Martin Akerman

Sivan Bercovici

Course web site :http://webcourse.cs.technion.ac.il/234525

Page 2: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

2

What is Bioinformatics?

Page 3: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

3

Course Objectives

• To introduce the bioinfomatics discipline • To make the students familiar with the major

biological questions which can be addressed by bioinformatics tools

• To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

Page 4: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

4

Course Structure and Requirements

1.Class Structure1. 2 hours Lecture 2. 1 hour tutorial

2. Home work• Homework projects will be given every second week• The homework will be done in pairs.• 5/5 homework projects submitted

2. A final project will be conducted and submitted in pairs

Page 5: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

5

Grading

• 30 % Homework assignments

• 70% final project

Page 6: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

6

Literature list• Gibas, C., Jambeck, P. Developing Bioinformatics

Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford

University Press, 2002.

• Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004.

Advanced Reading

Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

Page 7: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

7

What is Bioinformatics?

Page 8: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

8

“The field of science in which biology, computer science, and information technology merge to form a single discipline”

Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

What is Bioinformatics?

Page 9: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

9

from purely lab-based science to an information science

BioinformaticsBio = Informatics

Page 10: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

10

Central Paradigm in Molecular Biology

mRNAGene (DNA) Protein

21ST centaury

Genome Transcriptome Proteome

Page 11: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

11

Genome

• Chromosomal DNA of an organism

• Coding and non-coding DNA

• Genome size and number of genes does not necessarily determine organism complexity

Page 12: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

12

Transcriptome

• Complete collection of all possible mRNAs (including splice variants) of an organism.

• Regions of an organism’s genome that get transcribed into messenger RNA.

• Transcriptome can be extended to include all transcribed elements, including non-coding RNAs used for structural and regulatory purposes.

Page 13: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

13

Proteome

• The complete collection of proteins that can be produced by an organism.

• Can be studied either as static (sum of all proteins possible) or dynamic (all proteins found at a specific time point) entity

Page 14: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

14

From DNA to Genome

Watson and Crick DNA model

First protein sequence1955

1960

1965

1970

1975

1980

1985

First protein structure

Page 15: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

15

1995

1990

2000 First human genome draft

First bacterial genome

Hemophilus Influenzae

Yeast genome

Page 16: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

16

Total 706 456

Eukaryotes 78 43

Bacteria 578 383

Archaea 50 29

Complete Genomes

2008 2007

Page 17: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

17

Comparison between the full drafts of the human and chimp genomesrevealed that they differ only by 1.23%

How humans are chimps?

Perhaps not surprising!!!

Page 18: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

18

The “post-genomics” eraThe “post-genomics” era

Goal:

to understand the living cell

Annotation Comparativegenomics

Structuralgenomics

Functionalgenomics

What’s Next ?

Page 19: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

19

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG AAT ......

.............. TGAAAAACGTA

Annotation

Page 20: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

20

Annotation

Identify the genes within a given sequence of DNA

Identify the sitesWhich regulate the gene

Predict the function

Page 21: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

21

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG

AAT .................................

.............. TGAAAAACGTA

TF binding sitepromoter

Ribosome binding SiteORF=Open Reading FrameCDS=Coding Sequence

Transcription

Start Site

Page 22: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

22

Comparativegenomics

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG - - GGATGCGGGCCCTATACCCMouse ATAGCG - - - GGATGCGGCGC -TATACCA

Page 23: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

23

Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse.

Conservation of the IGFALS (Insulin-like growth factor)Between human and mouse.

Page 24: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

24

Functionalgenomics

Page 25: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

25

Understanding the function of genes and other parts of the genome

Page 26: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

26

A large network of 8184 interactions among 4140 S. Cerevisiae proteins

A network of interactions can be built For all proteins in an organism

Page 27: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

27

Structural genomics

Page 28: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

28

Assigning the structures of all proteins

Protein-ligand complexes

Functional sites

fold Evolutionaryrelationship

Shape and electrostatics

Active sites

protein complexes

Biologic processes

Page 29: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

29

Resources and Databases

The different types of data are collected in database

– Sequence databases – Structural databases– Databases of Experimental Results

All databases are connected

Page 30: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

30

Sequence databases

• Gene database

• Genome database

• SNPs database

• Disease related mutation database

Page 31: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

31

Gene database

• Give information into gene functionality

• Alternative splicing of genes– Alternative pattern of exons included to create

gene product

• EST

Page 32: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

32

Genome Databases

• Data organized by species

• Clones assembled into contigous pieces ‘contigs’ or whole chromosomes

• Information on non-coding regions

• Relativity

Page 33: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

33

Genome Browsers

• Annotation adds value to sequence

• Easy “walk” through the genome

• Comparative genomics

Page 34: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

34

Genome Browsers

• UCSC Genome Browser http://genome.ucsc.edu/

• Ensembl Genome Browser (http://www.ensembl.org)

• WormBase: http://www.wormbase.org/

• AceDB: http://www.acedb.org/

• Comprehensive Microbial Resource: http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl

• FlyBase: http://flybase.bio.indiana.edu/

Page 35: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

35

SNP database

Single Nucleotide Polymorphisms (SNPs)

• Single base difference in a single position among two different individuals of the same species

• Play an important role in differentiation and disease

Page 36: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

36

Sickle Cell Anemia

• Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

Page 37: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

37

Healthy Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

Page 38: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

38

Diseased Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

Page 39: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

39

Disease Databases

• Genes are involved in disease

• Many diseases are well studied

• Description of diseases and what is known about them is stored

Page 40: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

40

Structure Databases

• 3-dimensional structures of proteins, nucleic acids, molecular complexes etc

• 3-d data is available due to techniques such as NMR and X-Ray crystallography

Page 41: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

41

Page 42: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

42

Databases of Experimental Results

• Data such as experimental microarray images- expression data

• Proteomic data

• Metabolic pathways, protein-protein interaction data, regulatory networks

• ETC………….

Page 43: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

43

PubMed

• MEDLINE publication database– Over 17,000 journals– 15 million citations since 1950

Service of the National Library of Medicine

http://www.ncbi.nlm.nih.giv/PubMed

Literature Databases

Page 44: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

44

Putting it all Together

• Each Database contains specific information

• Like other biological systems also these databases are interrelated

Page 45: Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

45

GENOMIC DATAGenBank

DDBJ

EMBL

ASSEMBLED GENOMES

GoldenPath

WormBase

TIGR

PROTEIN

PIR

SWISS-PROT

STRUCTUREPDB

MMDB

SCOP

LITERATURE

PubMed

PATHWAYKEGG

COG

DISEASE

LocusLink

OMIM

OMIA

GENESRefSeq

AllGenes

GDBSNPs

dbSNP

ESTs

dbEST

unigene

MOTIFS

BLOCKS

Pfam

Prosite

GENE EXPRESSION

Stanford MGDB

NetAffx

ArrayExpress