introduction to ngs analyses - forth-imbb · introduction to ngs analyses giorgio l papadopoulos...
TRANSCRIPT
Introduction to NGS analyses
Giorgio L Papadopoulos
Institute of Molecular Biology and Biotechnology
Bioinformatics Support Group
04/12/2015
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1 / 28
Overview
1 IntroductionDNA sequencingApplications
2 Primary AnalysisChIPseqRNAseq
3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis
4 Online ResourcesENCODEGEOENA
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28
Overview
1 IntroductionDNA sequencingApplications
2 Primary AnalysisChIPseqRNAseq
3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis
4 Online ResourcesENCODEGEOENA
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28
Overview
1 IntroductionDNA sequencingApplications
2 Primary AnalysisChIPseqRNAseq
3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis
4 Online ResourcesENCODEGEOENA
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28
Overview
1 IntroductionDNA sequencingApplications
2 Primary AnalysisChIPseqRNAseq
3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis
4 Online ResourcesENCODEGEOENA
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28
Introduction DNA sequencing
Fred SangerLate 1970s
... Several million times ...Late 2000s
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28
Introduction DNA sequencing
Fred SangerLate 1970s
... Several million times ...Late 2000s
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28
Introduction DNA sequencing
Fred SangerLate 1970s
... Several million times ...Late 2000s
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28
Introduction DNA sequencing
Fred SangerLate 1970s
... Several million times ...Late 2000s
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28
Introduction Applications
Reuter et al, Mol Cell, 2015
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 4 / 28
Introduction Applications
Reuter et al, Mol Cell, 2015
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 4 / 28
Introduction ChIPseq
Genome Wide Occupancy Profiling
Overlapping reads come from different cells!!!
NGS is a cell population readout ( 50 million cells)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28
Introduction ChIPseq
Genome Wide Occupancy Profiling
Overlapping reads come from different cells!!!
NGS is a cell population readout ( 50 million cells)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28
Introduction ChIPseq
Genome Wide Occupancy Profiling
Overlapping reads come from different cells!!!
NGS is a cell population readout ( 50 million cells)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28
Introduction ChIPseq
Genome Wide Occupancy Profiling
Overlapping reads come from different cells!!!
NGS is a cell population readout ( 50 million cells)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28
Introduction ChIPseq
Genome Wide Occupancy Profiling
Overlapping reads come from different cells!!!
NGS is a cell population readout ( 50 million cells)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28
Introduction ChIPseq
Genome Wide Occupancy Profiling
Overlapping reads come from different cells!!!
NGS is a cell population readout ( 50 million cells)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28
Introduction ChIPseq
Genome Wide Occupancy Profiling
Overlapping reads come from different cells!!!
NGS is a cell population readout ( 50 million cells)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28
Introduction ChIPseq
Primary Sequencing Data
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 6 / 28
Introduction ChIPseq
Primary Sequencing Data
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 6 / 28
Introduction ChIPseq
Genome wide, highly accurate representation of genomic statesFairly unbiased
Complementary and overlapping information
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 7 / 28
Introduction ChIPseq
Genome wide, highly accurate representation of genomic statesFairly unbiased
Complementary and overlapping information
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 7 / 28
Primary Analysis ChIPseq
Raw Data
Aligned Data
Enriched
RegionsRead Density
VisualizationTarget Genes
Functional
Analysis
Distribution
Motifs
Overlaps
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28
Primary Analysis ChIPseq
Raw Data
Aligned Data
Enriched
RegionsRead Density
VisualizationTarget Genes
Functional
Analysis
Distribution
Motifs
Overlaps
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28
Primary Analysis ChIPseq
Raw Data
Aligned Data
Enriched
RegionsRead Density
VisualizationTarget Genes
Functional
Analysis
Distribution
Motifs
Overlaps
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28
Primary Analysis ChIPseq
Raw Data
Aligned Data
Enriched
RegionsRead Density
Visualization
Target Genes
Functional
Analysis
Distribution
Motifs
Overlaps
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28
Primary Analysis ChIPseq
Raw Data
Aligned Data
Enriched
RegionsRead Density
VisualizationTarget Genes
Functional
Analysis
Distribution
Motifs
Overlaps
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28
Primary Analysis ChIPseq
Raw Data
Aligned Data
Enriched
RegionsRead Density
VisualizationTarget Genes
Functional
Analysis
Distribution
Motifs
Overlaps
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28
Primary Analysis ChIPseq
Raw Data
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28
Primary Analysis ChIPseq
Raw Data
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28
Primary Analysis ChIPseq
Raw Data
Operations:DeMultiplexingQuality Control
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28
Primary Analysis ChIPseq
Raw Data
Operations:DeMultiplexingQuality Control
FastQC
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28
Primary Analysis ChIPseq
Raw Data
Operations:DeMultiplexingQuality Control
FastQC
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28
Primary Analysis ChIPseq
Raw Data
Operations:DeMultiplexingQuality Control
FastQC
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28
Primary Analysis ChIPseq
Raw Data
Operations:DeMultiplexingQuality Control
FastQC
Align Data...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
http://www.ebi.ac.uk/~nf/hts_mappers/
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will use different Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
Fast...Versatile...
Documented and supported...
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Align Data...
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...(iGenomes, UCSC, NCBI, EBI...)
Different mappers will usedifferent Index Formats...
bowtie.2
% of alignment...
Low percentage?Contaminations,
adapter/barcode trimming,wrong reference...
Bad Library...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED
⇓Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis ChIPseq
Aligned Data
SAM ⇔ BAM ⇒ BED⇓
Enriched
Regions
⇓
⇓Chr Start Stop ID Score
SET ofGENOMIC COORDINATES
(PEAKS...)
How Many???
Where???
With Who???
Function???
HOW???
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28
Primary Analysis Peak Calling
“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“
Challenge: Not one single ’peak shape’...
Number of peaksPeak distribution
Peak heightRange of measurements
Experimental variation (IP...)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28
Primary Analysis Peak Calling
“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“
Challenge: Not one single ’peak shape’...
Number of peaksPeak distribution
Peak heightRange of measurements
Experimental variation (IP...)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28
Primary Analysis Peak Calling
“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“
Challenge: Not one single ’peak shape’...
Number of peaksPeak distribution
Peak heightRange of measurements
Experimental variation (IP...)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28
Primary Analysis Peak Calling
“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“
Challenge: Not one single ’peak shape’...
Number of peaksPeak distribution
Peak heightRange of measurements
Experimental variation (IP...)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Peaks represent only a ’summary’ of a ChIPseq experiment...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Peaks represent only a ’summary’ of a ChIPseq experiment...
Useful metrics:FRiP: Fraction of Reads in Peaks (>1%)
IDR: Irreproducible Discovery RateSignificance: pValue, ChIPtoINPUT ratio, # of reads...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Peaks represent only a ’summary’ of a ChIPseq experiment...
Empirical:Known Motif: Enriched in peaks, Central Position
Distribution: Near TSS, Within GeneBody, EnhancersTarget Genes: Gene expression changes, Functional Annotation
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
Solutions:Different Peak CallersOptimal Parameters
Correct Experimental DesignGood ChIP...
Peaks represent only a ’summary’ of a ChIPseq experiment...
Empirical:Known Motif: Enriched in peaks, Central Position
Distribution: Near TSS, Within GeneBody, EnhancersTarget Genes: Gene expression changes, Functional Annotation
Browser inspection and previously known sites
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Running MACS: macs14 -t ChIP -c Input -g mm -n Name
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Running MACS: macs14 -t ChIP -c Input -g mm -n Name
−600 −400 −200 0 200 400 600
0.0
0.1
0.2
0.3
0.4
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=119
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Running MACS: macs14 -t ChIP -c Input -g mm -n Name
−600 −400 −200 0 200 400 600
0.0
0.1
0.2
0.3
0.4
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=119
−600 −400 −200 0 200 400 600
0.04
0.05
0.06
0.07
0.08
0.09
0.10
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=50
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Running MACS: macs14 -t ChIP -c Input -g mm -n Name
−600 −400 −200 0 200 400 600
0.0
0.1
0.2
0.3
0.4
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=119
−600 −400 −200 0 200 400 600
0.04
0.05
0.06
0.07
0.08
0.09
0.10
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=50
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Running MACS: macs14 -t ChIP -c Input -g mm -n Name
−600 −400 −200 0 200 400 600
0.0
0.1
0.2
0.3
0.4
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=119
−600 −400 −200 0 200 400 600
0.04
0.05
0.06
0.07
0.08
0.09
0.10
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=50
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Running MACS: macs14 -t ChIP -c Input -g mm -n Name
−600 −400 −200 0 200 400 600
0.0
0.1
0.2
0.3
0.4
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=119
−600 −400 −200 0 200 400 600
0.04
0.05
0.06
0.07
0.08
0.09
0.10
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=50
WIG file: Normalized Read Density text format⇓
bigWig file: Binary Read Density format
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis Peak Calling
TFs: MACSHMs: SICER
Running MACS: macs14 -t ChIP -c Input -g mm -n Name
−600 −400 −200 0 200 400 600
0.0
0.1
0.2
0.3
0.4
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=119
−600 −400 −200 0 200 400 600
0.04
0.05
0.06
0.07
0.08
0.09
0.10
Peak Model
Distance to the middle
Per
cent
age
forward tagsreverse tagsshifted tags
d=50
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28
Primary Analysis ChIPseq Summary
General Guidelines:
Biological Replicates (2x)
INPUT (noIP sample)
20-25 mln reads per sample
Optimize ChIP conditions before library preparation
Minimize experimental variation
Look at the data
’ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia’Landt et al, Genome Res, 2012
’Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data’Bailey et al, PLoS Comp Biol, 2013
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 15 / 28
Primary Analysis ChIPseq Summary
General Guidelines:
Biological Replicates (2x)
INPUT (noIP sample)
20-25 mln reads per sample
Optimize ChIP conditions before library preparation
Minimize experimental variation
Look at the data
’ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia’Landt et al, Genome Res, 2012
’Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data’Bailey et al, PLoS Comp Biol, 2013
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 15 / 28
Primary Analysis RNAseq
Raw Data
Aligned Data
Read Counts
Condition 1
Read Counts
Condition 2
Differential
Expression
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28
Primary Analysis RNAseq
Raw Data
Aligned Data
Read Counts
Condition 1
Read Counts
Condition 2
Differential
Expression
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28
Primary Analysis RNAseq
Raw Data
Aligned Data
Read Counts
Condition 1
Read Counts
Condition 2
Differential
Expression
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28
Primary Analysis RNAseq
Raw Data
Aligned Data
Read Counts
Condition 1
Read Counts
Condition 2
Differential
Expression
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28
Primary Analysis RNAseq
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...Spliced alignment
Build reference based onknown transcripts
STAR
VERY Fast...(45 million paired readsper hour per processor)
TopHat2
If you want to use CuffLinks...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28
Primary Analysis RNAseq
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...Spliced alignment
Build reference based onknown transcripts
STAR
VERY Fast...(45 million paired readsper hour per processor)
TopHat2
If you want to use CuffLinks...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28
Primary Analysis RNAseq
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...Spliced alignment
Build reference based onknown transcripts
STAR
VERY Fast...(45 million paired readsper hour per processor)
TopHat2
If you want to use CuffLinks...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28
Primary Analysis RNAseq
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...Spliced alignment
Build reference based onknown transcripts
STAR
VERY Fast...(45 million paired readsper hour per processor)
TopHat2
If you want to use CuffLinks...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28
Primary Analysis RNAseq
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...Spliced alignment
Build reference based onknown transcripts
STAR
VERY Fast...(45 million paired readsper hour per processor)
TopHat2
If you want to use CuffLinks...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28
Primary Analysis RNAseq
Aligner (Mapper)
Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)
...Spliced alignment
Build reference based onknown transcripts
STAR
VERY Fast...(45 million paired readsper hour per processor)
TopHat2
If you want to use CuffLinks...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28
Primary Analysis RNAseq
Raw Data
Aligned Data
Read Counts
Condition 1
Read Counts
Condition 2
htseqCount
Differential
ExpressionDESeqedgeR
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28
Primary Analysis RNAseq
Raw Data
Aligned Data
Read Counts
Condition 1
Read Counts
Condition 2
htseqCount
Differential
ExpressionDESeqedgeR
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28
Primary Analysis RNAseq
Raw Data
Aligned Data
Read Counts
Condition 1
Read Counts
Condition 2
htseqCount
Differential
ExpressionDESeqedgeR
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28
Primary Analysis RNAseq Design
Raw Data
Adapted from: Kostadyma M, EBI Workshop
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28
Primary Analysis RNAseq Design
Raw Data
Adapted from: Kostadyma M, EBI Workshop
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28
Primary Analysis RNAseq Design
Raw Data
Adapted from: Kostadyma M, EBI Workshop
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28
Primary Analysis RNAseq Design
Raw Data
Adapted from: Kostadyma M, EBI WorkshopPapadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28
Interpretation Visualization
GenomeGraphs (R package) Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
GenomeGraphs (R package)Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
GenomeGraphs (R package)
Online
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
IGV (Integrative Genomics Viewer)
GenomeGraphs (R package)
Online
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
IGV (Integrative Genomics Viewer)
GenomeGraphs (R package)
Online
UCSC Genome Browser
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
IGV (Integrative Genomics Viewer)
Cross Platform (Win, Mac, Linux)Different file formats
Fast
GenomeGraphs (R package)
Online
UCSC Genome Browser
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
IGV (Integrative Genomics Viewer)
Cross Platform (Win, Mac, Linux)Different file formats
Fast
GenomeGraphs (R package)
Online
UCSC Genome Browser
Web basedSlow
Comprehensive database
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
IGV (Integrative Genomics Viewer)
Cross Platform (Win, Mac, Linux)Different file formats
Fast
GenomeGraphs (R package)
Online
UCSC Genome Browser
Web basedSlow
Comprehensive database
Scalechr11:
STS Markers
RatHuman
OrangutanDog
HorseOpossum
ChickenStickleback
RepeatMasker
SNPs (128)
50 kb mm918,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000
STS Markers on Genetic and Radiation Hybrid Maps
UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics)
G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU
G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU
G1E DNaseI HS Signal Rep 2 from ENCODE/PSU
G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU
CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/YaleCH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
CH12 c-Jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
CH12 c-Jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
Placental Mammal Basewise Conservation by PhyloP
Multiz Alignments of 30 Vertebrates
Repeating Elements by RepeatMasker
Simple Nucleotide Polymorphisms (dbSNP build 128)
BC050140AK018427
Meis1Meis1
Meis1Gm16140
Meis1AK144295
chr11.2463chr11.2464
chr11.2467 chr11.2474 chr11.2477 chr11.2485chr11.2488
chr11.4792chr11.4794
chr11.4817 chr11.4841chr11.4846
chr11.4858chr11.4862
chr11.4866chr11.4869
chr11.4871
G1E 2
0.15 _
0 _
G1E-ER4 E2 24 2
0.15 _
0 _
CH12 CTCF
CH12 p300 SC-584
CH12 Pol2
CH12 c-Jun
MEL CTCF
MEL GATA-1
MEL p300 SC-584
MEL Pol2
Mammal Cons
2.1 _
-3.3 _
0 -
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
GenomeGraphs (R package)
Online
UCSC Genome Browser
Web basedSlow
Comprehensive database
Scalechr11:
STS Markers
RatHuman
OrangutanDog
HorseOpossum
ChickenStickleback
RepeatMasker
SNPs (128)
50 kb mm918,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000
STS Markers on Genetic and Radiation Hybrid Maps
UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics)
G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU
G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU
G1E DNaseI HS Signal Rep 2 from ENCODE/PSU
G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU
CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/YaleCH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
CH12 c-Jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
CH12 c-Jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale
MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale
Placental Mammal Basewise Conservation by PhyloP
Multiz Alignments of 30 Vertebrates
Repeating Elements by RepeatMasker
Simple Nucleotide Polymorphisms (dbSNP build 128)
BC050140AK018427
Meis1Meis1
Meis1Gm16140
Meis1AK144295
chr11.2463chr11.2464
chr11.2467 chr11.2474 chr11.2477 chr11.2485chr11.2488
chr11.4792chr11.4794
chr11.4817 chr11.4841chr11.4846
chr11.4858chr11.4862
chr11.4866chr11.4869
chr11.4871
G1E 2
0.15 _
0 _
G1E-ER4 E2 24 2
0.15 _
0 _
CH12 CTCF
CH12 p300 SC-584
CH12 Pol2
CH12 c-Jun
MEL CTCF
MEL GATA-1
MEL p300 SC-584
MEL Pol2
Mammal Cons
2.1 _
-3.3 _
0 -
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Visualization
Local
GenomeGraphs (R package)
Online
Ariadne (GeneViewer)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28
Interpretation Analysis Software
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28
Interpretation Analysis Software
GALAXY server
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28
Interpretation Analysis Software
GALAXY server
Online collection of widely usedbioinformatics tools
(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28
Interpretation Analysis Software
GALAXY server
Online collection of widely usedbioinformatics tools
(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)
Learn Galaxy...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28
Interpretation Analysis Software
GALAXY server
Online collection of widely usedbioinformatics tools
(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)
Learn Galaxy...
https://galaxyproject.org/
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28
Interpretation Analysis Software
GALAXY server
Online collection of widely usedbioinformatics tools
(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)
Learn Galaxy...
https://galaxyproject.org/
Chipster
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28
Interpretation Analysis Software
GALAXY server
Online collection of widely usedbioinformatics tools
(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)
Learn Galaxy...
https://galaxyproject.org/
Chipster
User-friendly analysis software forhigh-throughput data (GUI)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28
Interpretation Downstream Analysis
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Output:List of potential target genes
Peak distribution plotsGeneset enrichment analysis
(GO, Pathways, Phenotypes ...)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Output:List of potential target genes
Peak distribution plotsGeneset enrichment analysis
(GO, Pathways, Phenotypes ...)
http:
//bejerano.stanford.edu/great/public/html/index.php
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Output:List of potential target genes
Peak distribution plotsGeneset enrichment analysis
(GO, Pathways, Phenotypes ...)
http:
//bejerano.stanford.edu/great/public/html/index.php
MEME-ChIP
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Output:List of potential target genes
Peak distribution plotsGeneset enrichment analysis
(GO, Pathways, Phenotypes ...)
http:
//bejerano.stanford.edu/great/public/html/index.php
MEME-ChIP
Perform motif discovery, motifenrichment analysis and clustering
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Output:List of potential target genes
Peak distribution plotsGeneset enrichment analysis
(GO, Pathways, Phenotypes ...)
http:
//bejerano.stanford.edu/great/public/html/index.php
MEME-ChIP
Perform motif discovery, motifenrichment analysis and clustering
Input: FASTA file
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Output:List of potential target genes
Peak distribution plotsGeneset enrichment analysis
(GO, Pathways, Phenotypes ...)
http:
//bejerano.stanford.edu/great/public/html/index.php
MEME-ChIP
Perform motif discovery, motifenrichment analysis and clustering
Input: FASTA file
Output:Enriched motifs
Motif distribution plots
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Interpretation Downstream Analysis
Genomic Regions Enrichment ofAnnotations Tool (GREAT)
Functional annotation ofcis-regulatory regions
Input: BED file
Output:List of potential target genes
Peak distribution plotsGeneset enrichment analysis
(GO, Pathways, Phenotypes ...)
http:
//bejerano.stanford.edu/great/public/html/index.php
MEME-ChIP
Perform motif discovery, motifenrichment analysis and clustering
Input: FASTA file
Output:Enriched motifs
Motif distribution plots
http://meme-suite.org/tools/meme-chip
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Aim: build a comprehensive parts list offunctional elements in the human genome
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 23 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Aim: build a comprehensive parts list offunctional elements in the human genome
New website is extremely easy to use!
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 23 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources ENCODE
Encyclopedia of DNA Elements
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28
Online Resources GEO
Gene Expression Omnibus
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28
Online Resources GEO
Gene Expression Omnibus
GEO is a public functional genomics data repository
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28
Online Resources GEO
Gene Expression Omnibus
GEO is a public functional genomics data repository
All NGS datasets have to be deposited to GEO prior to publication
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28
Online Resources GEO
Gene Expression Omnibus
GEO is a public functional genomics data repository
All NGS datasets have to be deposited to GEO prior to publication
All datasets included in GEO are publicly available for academic use
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28
Online Resources GEO
Gene Expression Omnibus
GEO is a public functional genomics data repository
All NGS datasets have to be deposited to GEO prior to publication
All datasets included in GEO are publicly available for academic use
The search function is pretty close to awful
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28
Online Resources GEO
Gene Expression Omnibus
GEO is a public functional genomics data repository
All NGS datasets have to be deposited to GEO prior to publication
All datasets included in GEO are publicly available for academic use
The search function is pretty close to awful
Best way to query the GEO database is by Accession Number(declared in the manuscript, GSE30142, GSM746581)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28
Online Resources GEO
Gene Expression Omnibus
GEO is a public functional genomics data repository
All NGS datasets have to be deposited to GEO prior to publication
All datasets included in GEO are publicly available for academic use
The search function is pretty close to awful
Best way to query the GEO database is by Accession Number(declared in the manuscript, GSE30142, GSM746581)
FTP access to SRA experiment (Sequence Read Archive)
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28
Online Resources ENA
European Nucleotide Archive
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28
Online Resources ENA
European Nucleotide Archive
ENA provides a comprehensive record of theworld’s nucleotide sequencing information
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28
Online Resources ENA
European Nucleotide Archive
ENA provides a comprehensive record of theworld’s nucleotide sequencing information
It covers raw sequencing data, sequence assembly informationand functional annotation
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28
Online Resources ENA
European Nucleotide Archive
ENA provides a comprehensive record of theworld’s nucleotide sequencing information
It covers raw sequencing data, sequence assembly informationand functional annotation
The search function is pretty close to awful
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28
Online Resources ENA
European Nucleotide Archive
ENA provides a comprehensive record of theworld’s nucleotide sequencing information
It covers raw sequencing data, sequence assembly informationand functional annotation
The search function is pretty close to awful
Large overlap with GEO database
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28
Comprehensive database of genomic features
Systemic understandingof biological processes
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 27 / 28
Thank you...
Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 28 / 28