introduction to ngs analyses - forth-imbb · introduction to ngs analyses giorgio l papadopoulos...

146
Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1 / 28

Upload: others

Post on 23-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction to NGS analyses

Giorgio L Papadopoulos

Institute of Molecular Biology and Biotechnology

Bioinformatics Support Group

04/12/2015

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1 / 28

Page 2: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Page 3: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Page 4: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Page 5: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Page 6: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Page 7: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Page 8: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Page 9: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Page 10: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction Applications

Reuter et al, Mol Cell, 2015

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 4 / 28

Page 11: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction Applications

Reuter et al, Mol Cell, 2015

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 4 / 28

Page 12: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Page 13: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Page 14: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Page 15: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Page 16: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Page 17: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Page 18: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Page 19: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Primary Sequencing Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 6 / 28

Page 20: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Primary Sequencing Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 6 / 28

Page 21: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome wide, highly accurate representation of genomic statesFairly unbiased

Complementary and overlapping information

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 7 / 28

Page 22: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Introduction ChIPseq

Genome wide, highly accurate representation of genomic statesFairly unbiased

Complementary and overlapping information

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 7 / 28

Page 23: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Page 24: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Page 25: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Page 26: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

Visualization

Target Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Page 27: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Page 28: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Page 29: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Page 30: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Page 31: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Page 32: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Page 33: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Page 34: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Page 35: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Align Data...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Page 36: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 37: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 38: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

http://www.ebi.ac.uk/~nf/hts_mappers/

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will use different Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 39: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 40: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 41: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

Fast...Versatile...

Documented and supported...

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 42: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 43: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 44: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Page 45: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 46: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED

⇓Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 47: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 48: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 49: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 50: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 51: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 52: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 53: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 54: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Page 55: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Page 56: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Page 57: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Page 58: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Page 59: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 60: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 61: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 62: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 63: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 64: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 65: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Useful metrics:FRiP: Fraction of Reads in Peaks (>1%)

IDR: Irreproducible Discovery RateSignificance: pValue, ChIPtoINPUT ratio, # of reads...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 66: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Empirical:Known Motif: Enriched in peaks, Central Position

Distribution: Near TSS, Within GeneBody, EnhancersTarget Genes: Gene expression changes, Functional Annotation

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 67: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Empirical:Known Motif: Enriched in peaks, Central Position

Distribution: Near TSS, Within GeneBody, EnhancersTarget Genes: Gene expression changes, Functional Annotation

Browser inspection and previously known sites

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Page 68: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 69: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 70: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 71: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 72: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 73: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 74: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

WIG file: Normalized Read Density text format⇓

bigWig file: Binary Read Density format

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 75: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Page 76: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq Summary

General Guidelines:

Biological Replicates (2x)

INPUT (noIP sample)

20-25 mln reads per sample

Optimize ChIP conditions before library preparation

Minimize experimental variation

Look at the data

’ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia’Landt et al, Genome Res, 2012

’Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data’Bailey et al, PLoS Comp Biol, 2013

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 15 / 28

Page 77: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis ChIPseq Summary

General Guidelines:

Biological Replicates (2x)

INPUT (noIP sample)

20-25 mln reads per sample

Optimize ChIP conditions before library preparation

Minimize experimental variation

Look at the data

’ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia’Landt et al, Genome Res, 2012

’Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data’Bailey et al, PLoS Comp Biol, 2013

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 15 / 28

Page 78: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Page 79: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Page 80: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Page 81: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Page 82: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Page 83: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Page 84: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Page 85: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Page 86: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Page 87: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Page 88: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

htseqCount

Differential

ExpressionDESeqedgeR

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28

Page 89: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

htseqCount

Differential

ExpressionDESeqedgeR

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28

Page 90: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

htseqCount

Differential

ExpressionDESeqedgeR

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28

Page 91: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI Workshop

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Page 92: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI Workshop

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Page 93: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI Workshop

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Page 94: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI WorkshopPapadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Page 95: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

GenomeGraphs (R package) Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 96: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

GenomeGraphs (R package)Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 97: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

GenomeGraphs (R package)

Online

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 98: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

GenomeGraphs (R package)

Online

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 99: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

GenomeGraphs (R package)

Online

UCSC Genome Browser

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 100: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

Cross Platform (Win, Mac, Linux)Different file formats

Fast

GenomeGraphs (R package)

Online

UCSC Genome Browser

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 101: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

Cross Platform (Win, Mac, Linux)Different file formats

Fast

GenomeGraphs (R package)

Online

UCSC Genome Browser

Web basedSlow

Comprehensive database

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 102: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

Cross Platform (Win, Mac, Linux)Different file formats

Fast

GenomeGraphs (R package)

Online

UCSC Genome Browser

Web basedSlow

Comprehensive database

Scalechr11:

STS Markers

RatHuman

OrangutanDog

HorseOpossum

ChickenStickleback

RepeatMasker

SNPs (128)

50 kb mm918,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000

STS Markers on Genetic and Radiation Hybrid Maps

UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics)

G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU

G1E DNaseI HS Signal Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU

CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/YaleCH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

Placental Mammal Basewise Conservation by PhyloP

Multiz Alignments of 30 Vertebrates

Repeating Elements by RepeatMasker

Simple Nucleotide Polymorphisms (dbSNP build 128)

BC050140AK018427

Meis1Meis1

Meis1Gm16140

Meis1AK144295

chr11.2463chr11.2464

chr11.2467 chr11.2474 chr11.2477 chr11.2485chr11.2488

chr11.4792chr11.4794

chr11.4817 chr11.4841chr11.4846

chr11.4858chr11.4862

chr11.4866chr11.4869

chr11.4871

G1E 2

0.15 _

0 _

G1E-ER4 E2 24 2

0.15 _

0 _

CH12 CTCF

CH12 p300 SC-584

CH12 Pol2

CH12 c-Jun

MEL CTCF

MEL GATA-1

MEL p300 SC-584

MEL Pol2

Mammal Cons

2.1 _

-3.3 _

0 -

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 103: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

GenomeGraphs (R package)

Online

UCSC Genome Browser

Web basedSlow

Comprehensive database

Scalechr11:

STS Markers

RatHuman

OrangutanDog

HorseOpossum

ChickenStickleback

RepeatMasker

SNPs (128)

50 kb mm918,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000

STS Markers on Genetic and Radiation Hybrid Maps

UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics)

G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU

G1E DNaseI HS Signal Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU

CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/YaleCH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

Placental Mammal Basewise Conservation by PhyloP

Multiz Alignments of 30 Vertebrates

Repeating Elements by RepeatMasker

Simple Nucleotide Polymorphisms (dbSNP build 128)

BC050140AK018427

Meis1Meis1

Meis1Gm16140

Meis1AK144295

chr11.2463chr11.2464

chr11.2467 chr11.2474 chr11.2477 chr11.2485chr11.2488

chr11.4792chr11.4794

chr11.4817 chr11.4841chr11.4846

chr11.4858chr11.4862

chr11.4866chr11.4869

chr11.4871

G1E 2

0.15 _

0 _

G1E-ER4 E2 24 2

0.15 _

0 _

CH12 CTCF

CH12 p300 SC-584

CH12 Pol2

CH12 c-Jun

MEL CTCF

MEL GATA-1

MEL p300 SC-584

MEL Pol2

Mammal Cons

2.1 _

-3.3 _

0 -

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 104: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Visualization

Local

GenomeGraphs (R package)

Online

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Page 105: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Analysis Software

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Page 106: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Analysis Software

GALAXY server

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Page 107: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Page 108: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Page 109: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

https://galaxyproject.org/

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Page 110: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

https://galaxyproject.org/

Chipster

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Page 111: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

https://galaxyproject.org/

Chipster

User-friendly analysis software forhigh-throughput data (GUI)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Page 112: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 113: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 114: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 115: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 116: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 117: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 118: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 119: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 120: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Input: FASTA file

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 121: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Input: FASTA file

Output:Enriched motifs

Motif distribution plots

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 122: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Input: FASTA file

Output:Enriched motifs

Motif distribution plots

http://meme-suite.org/tools/meme-chip

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Page 123: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Aim: build a comprehensive parts list offunctional elements in the human genome

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 23 / 28

Page 124: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Aim: build a comprehensive parts list offunctional elements in the human genome

New website is extremely easy to use!

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 23 / 28

Page 125: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 126: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 127: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 128: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 129: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 130: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 131: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 132: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Page 133: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources GEO

Gene Expression Omnibus

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Page 134: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Page 135: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Page 136: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Page 137: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

The search function is pretty close to awful

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Page 138: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

The search function is pretty close to awful

Best way to query the GEO database is by Accession Number(declared in the manuscript, GSE30142, GSM746581)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Page 139: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

The search function is pretty close to awful

Best way to query the GEO database is by Accession Number(declared in the manuscript, GSE30142, GSM746581)

FTP access to SRA experiment (Sequence Read Archive)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Page 140: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENA

European Nucleotide Archive

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Page 141: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Page 142: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

It covers raw sequencing data, sequence assembly informationand functional annotation

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Page 143: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

It covers raw sequencing data, sequence assembly informationand functional annotation

The search function is pretty close to awful

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Page 144: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

It covers raw sequencing data, sequence assembly informationand functional annotation

The search function is pretty close to awful

Large overlap with GEO database

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Page 145: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Comprehensive database of genomic features

Systemic understandingof biological processes

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 27 / 28

Page 146: Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015

Thank you...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 28 / 28