dal progeo $genoma$umanoad oggi: evoluzione$delle ... · structural variation (sv) "...

52
Dal proge*o genoma umano ad oggi: evoluzione delle tecniche di sequenziamento, analisi genomica e proteomica e prospe9ve future! David Horner Dipar.mento di Bioscienze Università degli Studi di Milano

Upload: others

Post on 25-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Dal  proge*o  genoma  umano  ad  oggi:  evoluzione  delle  tecniche  di  

sequenziamento,  analisi  genomica  e  proteomica  e  prospe9ve  future!    

David  Horner  Dipar.mento  di  Bioscienze  Università  degli  Studi  di  Milano  

Page 2: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Come  va  sequenziato  il  DNA?  

•  Sequenziamento  Sanger  (1978  –  oggi):  –  Cos.  rela.vamente  al.  –  Richiede  molto  tempo  per  preparazione  di  campioni  –  Produce  poche  leLuri  LUNGHI  (1000  nt)  –  Pochi  errori  di  sequenziamento  

Page 3: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Sequenziamento  Sanger  (1978)    

Page 4: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Sequenziamento  Sanger  (1978)    

Page 5: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Genome  

1)  Frammentare  in  modo  “casuale”,    clonare  fammen.  in  plasmidi  

2)  Sequenziare  un  fragmento  (a  caso)    

3)  Individuare  un  clone  sovraposto  ….  Sequenziarlo  e  costruire  un  frammento  piu  lungo  

4)  Andare  al  passaggio  2  (fino  alla  fine!)  

Page 6: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

viruses  plasmids  

bacteria  fungi  

plants  algae  

insects  

mollusks  

rep.les  

birds  

mammals  

Genomi,  quanto  sono  grandi  ?  

104   108  105   106   107   1011  1010  109  

bony  fish  

amphibians  

Page 7: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),
Page 8: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Sequenziamento  Sanger  (anni  1990)    

96  reazioni  in  parallelo  1000  nt  x  reazione  

Page 9: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Robot!  

Page 10: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

1981  •  Sinclair  ZX-­‐81  

Page 11: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Computer  

Page 12: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Whole  Genome  Shotgun  Approach  

Page 13: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Assembly  by  overlap  

Page 14: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Sequenze  Ripetute  

Sequenze  uniche  

Sequenze  ripetute  

Se  le  sequenze  ripetute  sono  meno  lunghe  del  “leLure”  di  sequenziamento,  non  c’è  problema    

A   B   C  

Page 15: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

A   B   C  

Sequenze  Ripetute  

Se  sono  piu  lunghi,  NON  POSSIAMO  ASSEMBLARE!  

A   B   c  ?  

A   C   B  ?  

Page 16: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Steps  to  Assemble  a  Genome  

1. Find overlapping reads

4. Derive consensus sequence ..ACGATTACAATAGGTT..

2. Merge some “good” pairs of reads into longer contigs

3. Link contigs to form supercontigs

Some  Terminology    read        a  500-­‐900  long  word  that  comes    

 out  of  sequencer    mate  pair      a  pair  of  reads  from  two  ends  

 of  the  same  insert  fragment    con-g        a  con.guous  sequence  formed    

 by  several  overlapping  reads    with  no  gaps  

 supercon-g      an  ordered  and  oriented  set  (scaffold)                  of  con.gs,  usually  by  mate  

                   pairs    consensus      sequence  derived  from  the  sequene              mul.ple  alignment  of  reads  

               in  a  con.g  

Page 17: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Con.gs  and  scaffolds  

Page 18: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

1. Genome fragmentation

2. Library

3. Sequences 4. Genome assembly by overlap

Shot  Gun  Sequencing  

Page 19: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Timeline  

Page 20: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Meet  Your  Genome  

(The  Wheat  genome  (16.9  Gbp)  is  more  than  5  .mes  bigger  than  the  human  genome  and  80%  of  its  genome  consists  of  repe..ve  sequences)  

The  Human  Genome  

Page 21: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Quanto  è  COMPLESSO  il  genoma?  

(Il  Genoma  di  FRUMENTO  (16.9  Gbp)  è  piu  di  5  VOLTE  piu  grande  di  quello  umano.  80%  consiste  di  elemen.  ripetu.)  

Il  genoma  Umano  c.  3Gb  

Page 22: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Physical  Mapping  

Page 23: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Top  down  sequencing  

1. 2.

3. 4.

Genome fragmentation

Physical map

Subclone library

Sequence clones by walking or by SHOTGUN strategy

Page 24: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Human  Genome  Project  16/02/2001      

Page 25: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

OK,  abbiamo  sequenziato  il  genoma  ….    Ora  che  cosa  fare?  

Dove  sono  I  geni?      Sequenziare  ed  allineare  cDNA  (mRNA)  al  genoma  

Page 26: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Ma  quali  gene/allele  sono  responsabile  per  feno.pi  di  interesse?    

Dobbiamo  paragonare  genomi  di  tan.  individui  diversi  e  fare  sta.s.ca  per  capire  feno.pi  complessi  ….    Cioè,  dobbiamo  sequenziare  TANTI  individui  della  stessa  specie  ed  associare  feno.pi  con  geno.pi.    Genome  Wide  Associa.on  Studies  (GWAS)    

Page 27: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

“GWAS”  +  “Human”  nella  leLeratura    Prima  di  2004  (60  ar.coli)    Da  2004  in  poi  (>14000  ar.coli)      Sono  sta.  sequenzia.  >  10000  genomi  umani  da  2004  in  poi,    

Come  è  stato  faLo?  

Page 28: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Revolu.onary  techniques  in  molecular  gene.cs  

     Molecular  cloning    Sanger  sequencing    PCR      

Gel  Electrophoresis    Bloung  (Southern/Northern/Western  etc)    Expression  cloning    (microarrays)  

Next  Genera.on  Sequencing  

Page 29: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Next  Genera.on  Sequencing  

•  (Massively  Parallel  /Second  Genera.on)  •  HIGH  throughput  (lots  of  data)  •  Rela.vely  low  cost  •  Transversal  in  terms  of  applica.on  

Page 30: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Read  Length  is  Not  As  Important  For  Resequencing  

0%

10%20%

30%40%

50%

60%70%

80%90%

100%

8 10 12 14 16 18 20

Length of K-mer Reads (bp)

% o

f P

aire

d K

-mer

s w

ith U

niqu

ely

Ass

igna

ble

Loca

tion

E.COLIHUMAN

Page 31: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Cost per megabase of DNA sequence

Page 32: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Next-Generation Sequencing

Illumina  /  Solexa    Gene.c  Analyzer  HiSeq  2000  (150x2  bp,  600  Gb  /  run)  

Applied Biosystems SOLiD 4 SystemTM

(100x2 bp, 400 Gb / run)

Roche  /  454  Genome  Sequencer    FLX  .tanium  (800  bp,  800  Mb  /  run)  

Ion  Proton   PacBio  

A number of platforms using different strategies and chemistries, and with different throughput are entering the market.

Page 33: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Fold coverage % sequenced 0.25 22 0.5 39 0.75 53 1 63 2 87.5 3 95 4 98.2 5 99.4 6 99.75 7 99.91 8 99.97 9 99.99 10 99.995

When  has  a  genome  been  fully  sequenced?  

Page 34: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Illumina  

• Bridge  PCR  

• Sequencing  by  synthesis  using  fluorescent  reversible  terminators      

Page 35: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Technology Overview: Solexa/Illumina Sequencing

http://www.illumina.com/

Page 36: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Immobilize DNA to Surface

Source:    www.illumina.com  

Page 37: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Technology Overview: Solexa Sequencing

Page 38: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Bridge  PCR  

•  DNA  fragments  are  flanked  with  adaptors.  •  A  flat  surface  coated  with  two  types  of  primers,  corresponding  to  the  

adaptors.  •  Amplifica.on  proceeds  in  cycles,  with  one  end  of  each  bridge  

tethered  to  the  surface.  •  Used  by  Solexa.  

Page 39: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Sequence Colonies

The  bases  are  “reversible  terminators”,  only  one  base  can  be  added.  Then  they  are  modified  so  that  the  next  round  of  extension  can  occur.  

Page 40: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),
Page 41: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Sequence Colonies

Each  base  has  a  different  Fluor  (color).  Excited  by  laser,  and  color  is  read.  

Page 42: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Illumina sequencers sequencing-by-synthesis coupled with bridge amplification

Available  versions:    §   HiSeq    2000  (up  to  600  Gb,  250x2  bp  reads)  

§   HiSeq    1000  (up  to  300  Gb,  250x2    bp  reads)  

§   Genome  Analyzer  (up  to  95  Gb,  150x2  bp  reads)    §   MiSeq      pla=orm      (up    to  6  Gb,  250x2  bp  reads)    

Page 43: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Da  2008    

Page 44: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),
Page 45: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

SNP  calling  •  The  basic  principle  is  simple!  

•  This  looks  like  a  homozygous  SNP  

ACTTTTGCCCTGTGTCTAAAATGCGTCGTAGCATGT - reference!ACTTTTGCCCTGTGACTAAAATG ! ! !read1! TTGCCCTGTGACTAAAATGCGT! ! !read2! TGCCCTGTGACTAAAATGCGTA ! !read3! GCCCTGTGACTAAAATGCGTAG ! !read4! GCCCTGTGACTAAAATGCGTAG ! !read5! CCTGTGACTAAAATGCGTAGTAG ! !read6!

Page 46: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

SNP  calling  •  And  this  one  looks  heterozygous  

ACTTTTGCCCTGTGTCTAAAATGCGTCGTAGCATGT - reference!ACTTTTGCCCTGTGACTAAAATG ! ! !read1! TTGCCCTGTGTCTAAAATGCGT! ! !read2! TGCCCTGTGACTAAAATGCGTA ! !read3! GCCCTGTGTCTAAAATGCGTAG ! !read4! GCCCTGTGACTAAAATGCGTAG ! !read5! CCTGTGTCTAAAATGCGTAGTAG ! !read6!

Page 47: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

On  average,  we  think  that  we  will  find  a  SNP  (Single  Nucleo.de  

Polymorphism)  between  2  Human  individuals  about  every  2000  bases.  

 99.5%  iden.ty  

maybe  1,500,000  differences!  

Page 48: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Structural Variation (SV) l  Any  DNA  sequence  altera.on  other  than  a  single  nucleo.de  

subs.tu.on  l  copy number variations (CNV), l  transposon movement l  Expansion of trinucleotide and other simple repeats l  insertions-deletions (indels) l  translocations l  inversions l  the vast majority of SV events are small indels

•  Human  genomes  differ  more  as  a  consequence  of  structural  varia.on  than  of  single-­‐base-­‐pair  differences*    –  Causal events in hereditary diseases –  somatic SV –  markers for GWAS / mapping studies

 

Page 49: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

49

Copy  Number  Varia.on  (CNVs)  

so... how representative is the reference genome?

Page 50: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),
Page 51: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),
Page 52: Dal progeo $genoma$umanoad oggi: evoluzione$delle ... · Structural Variation (SV) " Any"DNA"sequence"alteraon"other"than"asingle"nucleo.de" subs.tu.on" " copy number variations (CNV),

Applica.ons  of  NGS  playorms  

•  DNA sequencing -  genome resequencing (SNPs, CNV, GWAS) -  de novo sequencing -  identification of genome structural variants (cancer genome) -  3D chromatin interactions -  Epigenomics (chromatin state and genome methylation) -  Metagenomics (taxonomic analysis of environmental samples)

•  RNA sequencing -  Qualitative and quantitative analysis of the Transcriptome -  Identification and characterization of miRNAs and other ncRNAs - RNA editing -  Metatrancriptomics (functional analysis of envronmental samples)