research in computational genomics mar albà

61
Research in Computational Genomics Mar Albà Evolutionary Genomics Group esearch Unit on Biomedical Informatics Universitat Pompeu Fabra UPC, April 1 2005

Upload: pammy98

Post on 10-May-2015

826 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research in Computational Genomics Mar Albà

Research in Computational Genomics

Mar Albà

Evolutionary Genomics GroupResearch Unit on Biomedical Informatics

Universitat Pompeu Fabra

UPC, April 1 2005

Page 2: Research in Computational Genomics Mar Albà

1. The genetic information

2. The human genome project

3. Genomics: techniques and research

Page 3: Research in Computational Genomics Mar Albà

1. The genetic information

Page 4: Research in Computational Genomics Mar Albà

1865 – Mendel

The genetic information: inheritance

Page 5: Research in Computational Genomics Mar Albà

1928 – Griffith : transforming principle

deadly bacteria

non deadly bacteria

pneumonia Infection of mice

Die

Live

boiled deadly bacteria Live

Die+

1944 - Avery, MacLeod, McCarthy: DNA is the transforming principle

Live

Die

+ + DNAse

+ + protease

DNA is the hereditary material

Page 6: Research in Computational Genomics Mar Albà

DNA structure

1953 – Watson and Crickdiscover the structure of DNA

1953 – Rosalind FranklinX difraction image of DNA

Page 7: Research in Computational Genomics Mar Albà

DNA structure: antiparallel double helix

A: adenineG: guanineC: citosineT: thymine

C-GA-T

nucleotides:

Page 8: Research in Computational Genomics Mar Albà

RNA:

-single strand

-uracil instead of thimine

-contains riboseinstead of desoxiribose

A-UC-G

Page 9: Research in Computational Genomics Mar Albà

Proteins

QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFLIKHNPTNTIVYFGRYWSP

Page 10: Research in Computational Genomics Mar Albà

Proteins are made of amino acids

amino acid

Page 11: Research in Computational Genomics Mar Albà

20 amino acids

Page 12: Research in Computational Genomics Mar Albà

Peptide bond

Proteins: amino acid chain

Page 13: Research in Computational Genomics Mar Albà
Page 14: Research in Computational Genomics Mar Albà

DNA replication

Page 15: Research in Computational Genomics Mar Albà

Transcription

The transcription of a gene may be off or on, dependingon the cell type and conditions.

Page 16: Research in Computational Genomics Mar Albà

Translation

Page 17: Research in Computational Genomics Mar Albà

Translation

Page 18: Research in Computational Genomics Mar Albà

Genetic code

1 2 3 4 5 6

nucleotides coding DNA

AA 1 AA 2amino acids

protein

ATGGCACAACCA…

MetAlaGlnPro..

Page 19: Research in Computational Genomics Mar Albà

DNA cloning

DNA fragments Vectors (replicating DNA)

+ DNA ligase

vectorwith insert

transformation of bacteria

amplificationextraction

Page 20: Research in Computational Genomics Mar Albà

DNA sequencing

......

DNA polymerase

DNA synthesis

resulting partial labelled fragments

Page 21: Research in Computational Genomics Mar Albà

DNA sequencing

Page 22: Research in Computational Genomics Mar Albà

2. The human genome project

Page 23: Research in Computational Genomics Mar Albà

The human genome project

1953 - Discovery of the DNA double helix by Watson and Crick

1995 - Haemophilus influenzae genome

2001 - The first draft of the human genome ispublished, covering approximately 94% of thegenome (Public Consortium + Celera)

2003 – Human genome sequence complete

Page 24: Research in Computational Genomics Mar Albà

2001 – Draft of the human genome

15 February 2001

Page 25: Research in Computational Genomics Mar Albà

Josep Abril and Roderic Guigó

IMIM (Institut Municipal d’Investigacions Mèdiques, Barcelona)participates in the annotation of the human genome

Page 26: Research in Computational Genomics Mar Albà

Human genome : 3.000.000.000 nucleotides

Page 27: Research in Computational Genomics Mar Albà

Human chromosomes

Page 28: Research in Computational Genomics Mar Albà

What’s in the human genome?

gene non-coding part

gene coding part(2%)

“parasitic”repetitiveelements

microsatellitesDNA long repeats

Page 29: Research in Computational Genomics Mar Albà

EXONS

INTRONS

‘UPSTREAM’REGULATORYELEMENT

‘DOWNSTREAM’REGULATORYELEMENT

PROMOTER

PROTEIN

Gene structure

Page 30: Research in Computational Genomics Mar Albà

Organism Genome Size (Bases) Estimated Genes

Human (Homo sapiens) 3 billion 30,000

Laboratory mouse (M. musculus) 2.6 billion 30,000

Mustard weed (A. thaliana) 100 million 25,000

Roundworm (C. elegans) 97 million 19,000

Fruit fly (D. melanogaster) 137 million 13,000

Yeast (S. cerevisiae) 12.1 million 6,000

Bacterium (E. coli) 4.6 million 3,200

Human immunodeficiency virus (HIV)

9700 9

Comparison with other genomes

Page 31: Research in Computational Genomics Mar Albà

~ 30.000 genes

~ 10.000 already known (cDNA)

-Gene prediction programmes

-Homology to other species

-ESTs (expressed sequence tags)

Gene catalogue

- the functions of approximately half of the genes are not known !

Page 32: Research in Computational Genomics Mar Albà

“Parasitic” repetitive elements

Nature, Feb. 15, 2001

Page 33: Research in Computational Genomics Mar Albà

“Parasitic” repetitive elementsRetrotransposition

genomeLINE

RNA

transcriptionpol II

translation Translocationof the complex

LINE copy

cytoplasm

Page 34: Research in Computational Genomics Mar Albà

3. Genomics: techniques and research

Page 35: Research in Computational Genomics Mar Albà

- bioinformatics

- genome sequencing and annotation

- functional genomics

- systems biology

Genomics

Page 36: Research in Computational Genomics Mar Albà
Page 37: Research in Computational Genomics Mar Albà

Genome sequencing and annotation

Page 38: Research in Computational Genomics Mar Albà

Exponential growth of DNA sequences

Page 39: Research in Computational Genomics Mar Albà

How many genomes?

Genome Sequencing Projects on GOLD ©

0

200

400

600

800

1000

1200

Dec-97Mar-98Jun-98Sep-98Dec-98Mar-99Jun-99Sep-99Dec-99Mar-00Jun-00Sep-00Dec-00Mar-01Jun-01Sep-01Dec-01Mar-02Jun-02Sep-02Dec-02Mar-03Jun-03Sep-03Dec-03Mar-04

Incomplete

Complete

Page 40: Research in Computational Genomics Mar Albà

Recently sequenced eukaryotic genomes

T.rubripes

C.intestinalis

A.gossypii

A.mellifera

R.norvegicus

A.gambiae

Page 41: Research in Computational Genomics Mar Albà

How long does it take to sequence a genome?

bacteria: 1 day

fungus: 1 week

insect: 1-2 months

mammal: 1-2 years

Page 42: Research in Computational Genomics Mar Albà

Gene prediction

- DNA coding for protein sequences (exons) only accounts for 2% of the human genome

-Information we can use:

- splice site signals-statistics of coding sequences

EXONS

PROTEIN

gene

Page 43: Research in Computational Genomics Mar Albà

Sequence similarity

-To predict genes we can also use sequence similaritysearches to known proteins

alignment of protein sequences

Page 44: Research in Computational Genomics Mar Albà

Microbial Genomes at NCBI

http://www.ncbi.nlm.nih.gov/genomes/MICROBES/Complete.html

National Center for Biotechnology information, National Institute of Health

Page 45: Research in Computational Genomics Mar Albà

Functional annotation of all genes in a genome

Page 46: Research in Computational Genomics Mar Albà

Ensembl Genome Browser

http//www.ensembl.org European Bioinformatics Institute

Page 47: Research in Computational Genomics Mar Albà

Ensembl Genome Browser

Page 48: Research in Computational Genomics Mar Albà
Page 49: Research in Computational Genomics Mar Albà

Encode (NIH)Encyclopedia Of DNA Elements

- exhaustive analysis of 1% of the human genome

- identification of functional elements

- development and comparison ofdifferent computational methods

http://www.genome.gov/Pages/Research/ENCODE/2003-

Page 50: Research in Computational Genomics Mar Albà

HapMap (Haplotype Map)

http://www.hapmap.org/2002-

Variability map (single nucleotide polymorphism, SNPs) in Africa, Asiaand USA populations.

It will help identify genes involved incomplex disease, by association with particular haplotypes.

haplotype variants

SNPs

Page 51: Research in Computational Genomics Mar Albà

Environmental Genome Shotgun Sequencing of the Sargasso Sea

J.Craig Venter et al. Science, Vol 304, Issue 5667, 66-74, 2 April 2004

1.045 billion base pairs

1800 genomic species

148 previously unknown bacterial phylotypes

Page 52: Research in Computational Genomics Mar Albà

Functional genomics

Page 53: Research in Computational Genomics Mar Albà

DNA microarrays: high throughput analysisof gene transcription

Page 54: Research in Computational Genomics Mar Albà

chIp-chip : analysis of protein binding DNA fragments

cross-link protein and DNA

immunoprecipitation

eliminate protein

hybridize with DNA

Page 55: Research in Computational Genomics Mar Albà

Protein-protein interactions: yeast two hybrid

Page 56: Research in Computational Genomics Mar Albà

Protein interaction networks

Page 57: Research in Computational Genomics Mar Albà
Page 58: Research in Computational Genomics Mar Albà

Systems biology

- Development of mathematical methods to model thebehaviour of biological systems, including all elements inthe system and their interactions.

Page 59: Research in Computational Genomics Mar Albà

Funded in 2000 byLeroy Hood, Seattle

Masaru Tomita,Keio Unversity, Japan

Page 60: Research in Computational Genomics Mar Albà

National Center for Biotechnology Information (USA):

http://www.ncbi.nlm.nih.gov

European Bioinformatics Institute (UK):

http://www.ebi.ac.uk

Page 61: Research in Computational Genomics Mar Albà

Acknowledgements :

Grup de Recerca en Informàtica Biomèdica – Ferran SanzGrup de Genòmica Computacional – Roderic Guigó

Universitat Pompeu Fabra

www.imim.es/grib

Genòmica ComputacionalGRIB