research in computational genomics mar albà

Post on 10-May-2015

826 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Research in Computational Genomics

Mar Albà

Evolutionary Genomics GroupResearch Unit on Biomedical Informatics

Universitat Pompeu Fabra

UPC, April 1 2005

1. The genetic information

2. The human genome project

3. Genomics: techniques and research

1. The genetic information

1865 – Mendel

The genetic information: inheritance

1928 – Griffith : transforming principle

deadly bacteria

non deadly bacteria

pneumonia Infection of mice

Die

Live

boiled deadly bacteria Live

Die+

1944 - Avery, MacLeod, McCarthy: DNA is the transforming principle

Live

Die

+ + DNAse

+ + protease

DNA is the hereditary material

DNA structure

1953 – Watson and Crickdiscover the structure of DNA

1953 – Rosalind FranklinX difraction image of DNA

DNA structure: antiparallel double helix

A: adenineG: guanineC: citosineT: thymine

C-GA-T

nucleotides:

RNA:

-single strand

-uracil instead of thimine

-contains riboseinstead of desoxiribose

A-UC-G

Proteins

QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFLIKHNPTNTIVYFGRYWSP

Proteins are made of amino acids

amino acid

20 amino acids

Peptide bond

Proteins: amino acid chain

DNA replication

Transcription

The transcription of a gene may be off or on, dependingon the cell type and conditions.

Translation

Translation

Genetic code

1 2 3 4 5 6

nucleotides coding DNA

AA 1 AA 2amino acids

protein

ATGGCACAACCA…

MetAlaGlnPro..

DNA cloning

DNA fragments Vectors (replicating DNA)

+ DNA ligase

vectorwith insert

transformation of bacteria

amplificationextraction

DNA sequencing

......

DNA polymerase

DNA synthesis

resulting partial labelled fragments

DNA sequencing

2. The human genome project

The human genome project

1953 - Discovery of the DNA double helix by Watson and Crick

1995 - Haemophilus influenzae genome

2001 - The first draft of the human genome ispublished, covering approximately 94% of thegenome (Public Consortium + Celera)

2003 – Human genome sequence complete

2001 – Draft of the human genome

15 February 2001

Josep Abril and Roderic Guigó

IMIM (Institut Municipal d’Investigacions Mèdiques, Barcelona)participates in the annotation of the human genome

Human genome : 3.000.000.000 nucleotides

Human chromosomes

What’s in the human genome?

gene non-coding part

gene coding part(2%)

“parasitic”repetitiveelements

microsatellitesDNA long repeats

EXONS

INTRONS

‘UPSTREAM’REGULATORYELEMENT

‘DOWNSTREAM’REGULATORYELEMENT

PROMOTER

PROTEIN

Gene structure

Organism Genome Size (Bases) Estimated Genes

Human (Homo sapiens) 3 billion 30,000

Laboratory mouse (M. musculus) 2.6 billion 30,000

Mustard weed (A. thaliana) 100 million 25,000

Roundworm (C. elegans) 97 million 19,000

Fruit fly (D. melanogaster) 137 million 13,000

Yeast (S. cerevisiae) 12.1 million 6,000

Bacterium (E. coli) 4.6 million 3,200

Human immunodeficiency virus (HIV)

9700 9

Comparison with other genomes

~ 30.000 genes

~ 10.000 already known (cDNA)

-Gene prediction programmes

-Homology to other species

-ESTs (expressed sequence tags)

Gene catalogue

- the functions of approximately half of the genes are not known !

“Parasitic” repetitive elements

Nature, Feb. 15, 2001

“Parasitic” repetitive elementsRetrotransposition

genomeLINE

RNA

transcriptionpol II

translation Translocationof the complex

LINE copy

cytoplasm

3. Genomics: techniques and research

- bioinformatics

- genome sequencing and annotation

- functional genomics

- systems biology

Genomics

Genome sequencing and annotation

Exponential growth of DNA sequences

How many genomes?

Genome Sequencing Projects on GOLD ©

0

200

400

600

800

1000

1200

Dec-97Mar-98Jun-98Sep-98Dec-98Mar-99Jun-99Sep-99Dec-99Mar-00Jun-00Sep-00Dec-00Mar-01Jun-01Sep-01Dec-01Mar-02Jun-02Sep-02Dec-02Mar-03Jun-03Sep-03Dec-03Mar-04

Incomplete

Complete

Recently sequenced eukaryotic genomes

T.rubripes

C.intestinalis

A.gossypii

A.mellifera

R.norvegicus

A.gambiae

How long does it take to sequence a genome?

bacteria: 1 day

fungus: 1 week

insect: 1-2 months

mammal: 1-2 years

Gene prediction

- DNA coding for protein sequences (exons) only accounts for 2% of the human genome

-Information we can use:

- splice site signals-statistics of coding sequences

EXONS

PROTEIN

gene

Sequence similarity

-To predict genes we can also use sequence similaritysearches to known proteins

alignment of protein sequences

Microbial Genomes at NCBI

http://www.ncbi.nlm.nih.gov/genomes/MICROBES/Complete.html

National Center for Biotechnology information, National Institute of Health

Functional annotation of all genes in a genome

Ensembl Genome Browser

http//www.ensembl.org European Bioinformatics Institute

Ensembl Genome Browser

Encode (NIH)Encyclopedia Of DNA Elements

- exhaustive analysis of 1% of the human genome

- identification of functional elements

- development and comparison ofdifferent computational methods

http://www.genome.gov/Pages/Research/ENCODE/2003-

HapMap (Haplotype Map)

http://www.hapmap.org/2002-

Variability map (single nucleotide polymorphism, SNPs) in Africa, Asiaand USA populations.

It will help identify genes involved incomplex disease, by association with particular haplotypes.

haplotype variants

SNPs

Environmental Genome Shotgun Sequencing of the Sargasso Sea

J.Craig Venter et al. Science, Vol 304, Issue 5667, 66-74, 2 April 2004

1.045 billion base pairs

1800 genomic species

148 previously unknown bacterial phylotypes

Functional genomics

DNA microarrays: high throughput analysisof gene transcription

chIp-chip : analysis of protein binding DNA fragments

cross-link protein and DNA

immunoprecipitation

eliminate protein

hybridize with DNA

Protein-protein interactions: yeast two hybrid

Protein interaction networks

Systems biology

- Development of mathematical methods to model thebehaviour of biological systems, including all elements inthe system and their interactions.

Funded in 2000 byLeroy Hood, Seattle

Masaru Tomita,Keio Unversity, Japan

National Center for Biotechnology Information (USA):

http://www.ncbi.nlm.nih.gov

European Bioinformatics Institute (UK):

http://www.ebi.ac.uk

Acknowledgements :

Grup de Recerca en Informàtica Biomèdica – Ferran SanzGrup de Genòmica Computacional – Roderic Guigó

Universitat Pompeu Fabra

www.imim.es/grib

Genòmica ComputacionalGRIB

top related