genomica sequenziamento del genoma

81
Genomica Sequenziamento del genoma Dott.ssa Inga Prokopenko

Upload: others

Post on 11-Feb-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

GenomicaSequenziamento del genoma

Dott.ssa Inga Prokopenko

The Human Genome ProjectThe Human Genome Project

Before HGP� 1975 – method for DNA sequencing introduced

Frederick Sangerhttp://www.scq.ubc.ca/wp-content/uploads/2006/08/sequencing2.gif

The Human Genome Project

� 1996 – large-scale human genome sequencing attempt

� 1998 – “Celera Genomics”

o new approach

o public project competitor

o HG in 3 years for $300 mln.

o led by C. Venter

http://kidblog.files.wordpress.com/2007/06/craig-venter-scientist-and-businessman.jpg

4

Genome Sequencing Project

…GTGACGTCGTCGTCG….sequencing Project

Sequencing: To find out the sequence of nucleotides in a genomic sequence

DNA

Engineering Society meets BCCB

The Human Genome ProjectEngineering Society meets BCCB

The Human Genome Project

HGP� 2003– Completion of HGP

Session 04 ~ 18/11/07Session 04 ~ 18/11/07

http://www.sanger.ac.uk/Info/Press/gfx/030414_hgp_300.jpg

6

Complete Genome Sequencing

1. Copy the DNA sequence,

2. randomly cut it into fragments up to 600 bp,

3. insert them into cloning vectors (BACs),

4. sequence the fragments, and

5. re-assemble the fragments.

� Genome Assembly is like the jigsaw puzzle

7

The Human Genome

No Bioinformatics

No Human Genome

�developed GigAssembler to assemble the public human genome fragments

� awarded the 2003 Benjamin Franklin

James Kent

�developed whole genome shotgun sequencing

� awarded the 2004 Max-Planck Prize

Gene Myers

Sequencing Methods

DNA Sequencing Methods� Maxam-Gilbert method

� Sanger method

� Whole genome sequencing strategieso dye-terminator sequencing

o automated high-throughput analyzer

o shotgun sequencing and chromosome walking

� New DNA sequencing methods

Maxam-Gilbert Method

http://homepages.strath.ac.uk/~dfs99109/BB211/MGSeq.html

� Technically complex

� Requires radioactive labeling

� Extensive use of hazardous chemicals

� Difficult to scale up, because doesn't use primed DNA synthesis, thus limited to sequences adjacent to restriction sites

Sanger Method

http://www.scq.ubc.ca/wp-content/uploads/2006/08/sequencing2.gif

�Uses DNA polymerase to synthesize a new strand of DNA

�Requires DNA primer

� Amplification of ssDNA in the presence of labeled ddNTPs

� Separation according to fragment size by gel electrophoresis

Sanger Method

• Amplicon is a target sequence

• Primer extension is done by DNA polymerase

• Repetition doubles the amount of amplicon at each step

• Randomly ddNTPs are added =>no other extension occurs

Sanger Method

Example of early sequencing of BacteriophageFrom F. Sanger, 1977

Sanger Method Extension

• In 1986 L.Hood and L.Smith described a method using base-specific fluorescent sequence tags (fluorescent dyes)

• Automation of DNA sequencing• Applied Biosystems added to this step the

capillary process for fragment separation (replaces flat sheet gels)

• 1998 - ~1Mb could be sequenced in 1 day

Dye-terminator Sequencing

1. DNA Preparation

2. Sequencing Reaction

3. Termination

4. Capillary Electrophoresis

5. Computer Analysis

DNA Preparation

Break open cells to access DNA Purify DNA from

cell debris

http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html

Sequencing Reaction

Strand Separation

Primer Annealing

Elongation

Termination

primer

DNA Polymerase

http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html

Standard Nucleotides

ddNTP incorporation leads to chain growth termination

Dye-labeled dideoxynucleotides

Termination

http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html

Laser Photo cell

Capillary Tube

Capillary Electrophoresis

http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html

Computer Analysis

http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html

Screening for BRCA1 185delAG mutant for breast

cancer susceptibility

Mass spectrometry DNA sequencing•High throuput•More data per sample => use of mixtures of samples•High specificity and sensitivity•In the figure: DNA strands of a heterozygote

Applications of mass spectrometry to DNA sequencing:

• Measurement of allele frequencies in population, detection of alleles in individuals by identification of SNPs (samples pooling, to ~3% accuracy)

• Characterisation of individual genotypes (diagnostics and pharmacogenomics)

• Measurement of individual haplotypes (on single DNA molecule, not on two chromosomes)

• Non-invasive prenatal diagnostics on the small amount of fetal DNA that leaks into maternal blood (even having 95-99% of maternal DNA as background, it’s possible to detect SRY gene demonstrating that fetus is male)

Whole Genome Sequencing Strategies

� Top-down approach

� Shotgun Sequencing

� Chromosome walking

Top-down approach

� DNA library generation

� Gene mapping by

genetic markers

� Sequencing reaction

� Electrophoresis

� Analysis

http://www.scq.ubc.ca/?p=392

Shotgun Sequencing

• The "whole-genome shotgun" method, involves breaking the genome up into small pieces, sequencing the pieces, and reassembling the pieces into the full genome sequence.

� Generation DNA library

� Multiple sequencing events

� Sequence alignment o jigsaw puzzle analogy

o uncertainty!

Shotgun Sequencing

http://www.scq.ubc.ca/?p=392

Chromosome Walking

Fig. 8-24, Lodish et al (4th edintion)

� Pyrosequencing

� Massive Parallel Sequencing

� In vitro clonal amplification

� Sequencing by hybridization

� Sequencing by ligation

� Many more…

New DNA Sequencing Methods

Pyrosequencing

http://student.ccbcmd.edu/courses/bio141/lecguide/unit6/metabolism/energy/images/atp.gif

http://www.pyrosequencing.com/DynPage.aspx?id=7454

the movie ...

� Why do we need them?o To lower the cost of sequencing

o To decrease time for sequencing

o Parallelization of sequencing

o To increase efficiency and accuracy

New DNA Sequencing Methods

Nuova generazione dei metodi di sequenziamento del DNA

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Solexa Gene Analyzer

Processamento dei campioni DNA

Generazione dati

Analisi dei dati

Flow cell imaging by microsopy

Processamento dati: Solexa screening

Solexa:Sequensing by synthesis (SBS)

Colonia di ~1000 frammenti DNA single-stranded

Disegno sperimentale

Analisi dei dati

Jigsaw Puzzle del genoma

• 1.5 Gbp della sequanza letti con 36bp reads vuol dire jigsaw puzzle

• con 42 millioni di pezzi… alcuni di essi non saranno posiziona correttamenete

Sequenziamento e resequenziamento del DNA

• Genome Analyzer produce velocemente ed in modo economico una quantità dei dati di alta qualità

• Possibili modi di uso:– identificazione e riconferma degli SNPs– Identificazione dei riarragiamenti dei

cromosomi, inclusi Copy Number Variations (CNVs)

– Mappaggio dei break points– Identificazione dei polimorfismi rare

Proprietà avanzate del Genome Analyzer

• Accuratezza alta• Efficacia alta – gigabasi dei dati per run

con i soli 100ng del DNA ed il prezzo piùbasso per base di sequenziamento

• Semplice elaborazione – un operatore elabora 1 run in 4 ore

Sequence Statistics

How big is it?� The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).

� The average gene consists of 3000 bases,

� The largest known human gene is coding for dystrophin.

� The total number of genes is estimated at 30,000

� Functions are unknown for over 50% of discovered genes.

How big is it?

� An analogy to the human

genome is that of a book that is:

� Over one billion

words long!

How big is it?� Bound into 5000 volumes of 300 pages each

How big is it really?� This sequence fits into a cell nucleus the size of a

pinpoint

What do you do with this information?

�Find what regions of the Genome are

actually used by the body.

� These are so called “coding regions”

� There are also “exons” and “introns”

� And repeats

How much do we use?� Less than 2% of the genome codes for proteins.

� Repeated sequences that do not code for proteins ("junk DNA")

o 50% of the human genome.

� Repetitive sequences o no direct functions,o chromosome structure and dynamics.

What is next?http://www.youtube.com/watch?v=XuUpnAz5y1g

http://www.youtube.com/watch?v=gkQJ26DAxfsEthical

http://www.youtube.com/watch?v=QorIzoDgIPY

http://www.youtube.com/watch?v=PXeCDnfh0GA

The Human Genome ProjectThe Human Genome Project

What can the HGP be useful for?

� epigenetics

� genetic regulation

� investigating genetic diseases

� personalized medicine

� genetic engineering

� genomes as personal ID -> forensics

� genomics, proteomics, interactomics,

other-omics ...

The Human Genome ProjectThe Human Genome Project

Other genomes – comparative genomics

� Many more genomes were sequenced:

o Human

o Mouse (M. musculus)

o Chimp (P.troglodytes)

o human pathogens

o many more ...

Completed On going...

Prokaryotic 554 1380

Eukaryotic 76 878

Archea 49 57

The Human Genome ProjectThe Human Genome Project

Epigenetics:Chromatin structure and modifications

� The structure of chromatin (closed –

heterochromatin or open – euchromatin)

can influence the expression of genes

� The histone modification code

� Regulation of gene expression

http://www.youtube.com/watch?v=lUESmHDrN40

The Human Genome ProjectThe Human Genome Project

Epigenetics: Histone code

� Activating the

chromatin or switching it

of by a combination of

histone tail modification

The Human Genome ProjectThe Human Genome Project

Epigenetics: DNA methylation

� CpG methylation of human DNA

o 5mC as an additional letter of the

genetic code

o control of gene expression and

chromatin structure

o part of the epigenetic regulation

5-methylcytosine

The Human Genome ProjectThe Human Genome Project

OMIM database � Online Mendelian Inheritance in Man

o Catalogues the genetic diseases of human

� A step towards the personalized

medicine

� So far there are ~18250 known

genetic disorders

The Human Genome ProjectThe Human Genome Project

DNA forensics

� DNA can be used as a personal ID

o Used in forensics

o Paternity test

o Detection of potential

biohazards

o Identification of crime victims

The Human Genome ProjectThe Human Genome Project

Besides genomics – other “omics”

� There are other interesting projects similar to HGP

o Other “omics”

� proteomics

� interactomics

� transcriptomics

� more ...http://omics.org/

The Human Genome ProjectThe Human Genome Project

Genetic engineering

� One day we will be able to get rid of the

genetic diseases simply by exchanging the

bad part of the DNA with a good one

o Project in UK

o Ethical issues

o Tissue engineering

The Human Genome ProjectThe Human Genome Project

Session 04 ~ 18/11/07Session 04 ~ 18/11/07

Costs and Ethics

http://campus.queens.edu/faculty/jannr/Genetics/images/dnatech/machines4sequencing.jpg

� HGP – $2.7 billion

o US – DOE and NIH

o UK, Canada, Japan, Sweden, Germany and others

� Sequencing – $500 million and 5 years

� Nowadays – $1 million and 100 days

� ~3% budget for ethical, social issues

The Human Genome ProjectThe Human Genome Project

How much?

The Human Genome ProjectThe Human Genome Project

Ethical issues� Amount of money for the project

� Discrimination

o Insurance companies

o Employers

� Newborn genetic screening

� Genetic engineering (playing God)

The Human Genome ProjectThe Human Genome Project

Huge price� Why HGP?

� Why waste money on non-coding DNA?

� Big science vs. small science

vs.

The Human Genome ProjectThe Human Genome Project

Playing God

� Genetic engineering

� Therapeutic vs. enhancement

� Non-heriditable vs. heriditable

78

Databases

Databases store biological data in various forms

➨ SequencesDNA, RNA (Nucleic Acids)ProteinsStructural dataX-ray crystallography data

➨ Expression DataTranscription of all genes

➨ Interactions DataProtein-Protein interactionsReceptor Binding DataSubstrate Binding Data

➨ Metabolic pathways data➨ Literature database: PubMed

79

The Explosion of Data

With annotations, this adds up to about 300 terabyte

80

Disease Fighting

Escherichia Coli is bacteria lives in the human intestines

Genome ofK12

Genome ofO157:H7

Strain K12 not pathogenic

Strain O157:H7 pathogenic

81

Drug Discovery

Protein-protein docking Protein-ligand docking

Given two biological molecules determine whether they interact.

I.e., do the molecules fit together in any energetically favorable way?