introduction to ngs
DESCRIPTION
Introduction to NGS - Ana Conesa -Massive sequencing data analysis workshop -Granada 2011TRANSCRIPT
Introduction to NGS
Ana Conesa
Head of Genomics of Gene Expression Lab
Centro de Investigaciones Prínicpe Felipe
http://bioinfo.cipf.es/aconesa
Next Generation Sequencing
NGS has brought high speed not only to genome sequencing and personal medicine, but has also change the way we do genome research:
Got a question on genome organization:
SEQUENCE IT!!!!
NGS technologies
Cost-effectiveFast
Ultra throughputCloning-freeShort reads
Roche 454 pyrosequencing
Roche 454 pyrosequencing
Roche 454
GS Junior, benchtop
Solexa
Solexa
Solexa-HiSeq
200 Gb/run in 8 days2x100 bp fragments
2 billion reads per run
Helicos
SOLiD
SOLiD
* Sequencing output in “color space”
* Needs reference genome to translate to base space.
SOLiD 5500
* Fifth 3-based encoded primer
* Sequencing output in base space
* No reference needed
5500 xl-u SOLiD
180 Gb/run (microbeads)300 Gb/run (nanobeads)
35-75 bp fragments
2.8 - 4.8 billion reads/run
2x6 lanes/run96 bar-codes
99.99% accuracy
Pacific Biosystems
Real time DNA synthesisUp to 12000 nt??
50 bases/second??
Ion Torrent
•$ 50.000•$ 500 /sample
• 1 hour/run• > 200 nt lengths
•Reads H+ released by DNA polymerase
Comparison
•Short fragments•Errors: Hexamer bias•High throughput•Cheap
•Resequencing:•ChipSeq•RNASeq•MethylSeq
•Short fragments•Color-space•High throughput•Cheap
•Resequencing:•ChipSeq•RNASeq•MethylSeq
•Long fragments•Errors: poly nts•Low throughput•Expensive
•De novo sequencing:•Amplicon sequencing
Roche 454 Solexa SOLiD
Applications
De novo sequencingResequencingExome SequencingRNA-seqGenome annotationChip-seqMethyl-seq…….
Applications
De novo sequencingResequencingExome SequencingRNA-seqGenome annotationChip-seqMethyl-seq…….
Basic steps NGS data processing
QC and read cleaning
Basic steps NGS data processing
QC and read cleaning
Mapping
Basic steps NGS data processing
QC and read cleaning
MappingFeature
identification
Basic steps NGS data processing
QC and read cleaning
MappingFeature
identification
SNVsIndels
Rearrang.
RPKMSplicing
DNA Binding site
RNA-seq
Elucidate gene models
Quantify gene expression
RNA-seq
Elucidate gene models
RNA-seq protocol*
total RNA purification
oligodT
RiboZ
mRNA preparation
2nd strand synthesis fragmentation1st strand synthesis
RNADNA
*Solexa Pair-End
RNA-seq protocol (II)
A
A
A
A
A
A
A
A
A
A
adenylation 3’ ends
ligate adapters
amplification
SEQUENCING!
library
10
0b
p la
d
400-200
400-200
Strand-specific RNAseq
Strand-specific RNA-seq
fastq: sequence data and qualities
SAM/BAM: mapping data and qualities
File formats
Some Figures
1 Solexa run ==8 lanes ==25 M reads/lane==2 x 4 G fastq/lane (PE)32 G disk space
Mapping @ processor 12 cores, 48 GB RAM , 4TB disk 24 hours
SAM (Ascii) / BAM (Binary) output 36 G / 9 G
How much does it “cost” (computationally) to sequence a human transcriptome?
One human transcriptome: 100 Million reads
Applications of RNAseq
Qualitative:* Alternative splicing* Antisense expression* Extragenic expression* Alternative 5’ and 3’ usage* Detection of fusion transcripts
….
Quantitative:* Differential expression* Dynamic range of gene expression
….
Tophat/CufflinksScripture
Alexa
edgeRDESeqbaySeqNOISeq
Advantages of RNAseq?
* Non targeted transcript detection* No need of reference genome* Strand specificity* Find novels splicing sites* Larger dynamic range* Detects expression and SNVs* Detects rare transcripts
….
* Restricted to probes on array* Needs genome knowledge* Normally, not strand specific* Exon arrays difficult to use* Smaller dynamic range* Does not provide sequence info* Rare transcripts difficult
….
RNAseq microarrays
and…. are there any disadvantages?????
Resequencing
Exome Sequencing
DNA (patient)
Gene A Gene B
Produce shotgun library
Capture exon sequences
Wash & Sequence
Map againstreference genome
Determine variants,Filter, comparepatients
candidate genes
1
2
3
45
Exome capture
The principle: comparison of patients
Patient 1
Patient 2
Patient 3
Patient 4
Patient 5
Patient 6
mutation
candidate gene (shares mutation for all patients)
ChipSeq
MethylSeq
MIDseq
Census NGS methods
Sucessful Stories
Miller syndrome
Species composition of metagenomic DNAextracted from mammoth hair.
Conclusions
NGS is revolutionizing how we do genome research
Conclusions
NGS is revolutionizing how we do genome research
But it will also revolutionize our lives….
Conclusions
NGS is revolutionizing how we do genome research
But it will also revolutionize our lives….
If we manage to process and analyze the data
YOUR SUCESSFUL STORY???
Have a great MDA course?