introduction to next generation sequencing (ngs) … •introduction to ngs data analysis in cancer...

23
Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis Jenny Wu

Upload: hoanghanh

Post on 12-May-2019

237 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Introduction to Next Generation Sequencing (NGS) Data Analysis and

Pathway Analysis

Jenny Wu

Page 2: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Outline • Introduction to NGS data analysis in Cancer

Genomics

– NGS applications in cancer research

– Typical NGS workflows and pipeline

– Open source software with GUI

• Pathway Analysis and Software • Pathway Analysis goals and concepts

• Commercial and open source pathway analysis software

• Data analysis resources

• Summary

Page 3: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Next Generation Sequencing Massively Parallel Sequencing: One can generate hundreds of millions of short sequences (up to 250bp) in a single run in a short period of time with low per base cost. • Illumina/Solexa GA II, HiSeq 2500, 3000,X • Roche/454 FLX, Titanium • Life Technologies/Applied Biosystems SOLiD

Reviews: Michael Metzker (2010) Nature Reviews Genetics 11:31 Quail et al (2012) BMC Genomics Jul 24;13:341.

Page 4: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

NGS in Cancer Genomics

Shyr et al.2013

Page 5: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Data Analysis in the bottleneck

(wall.hms.harvard.edu)

Informatics

Page 6: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Basic NGS Workflow

Olson et al.

QC and pipeline analysis

Data interpretation

Isolation of material

PCR amplification

End repair, size selection

Library QC

Cluster generation

Instrument operation

Page 7: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

High Throughput Data Analysis Overview

Olson et al.

Page 8: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

http://www.broadinstitute.org/gsa/wiki/images/7/7a/Overall_flow.jpg http://www.broadinstitute.org/gatk/guide/topic?name=intro

Many Analysis Pipelines Start with Read Mapping

http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

Genotyping (GATK) RNA-seq (Tuxedo)

Typical Data Analysis Pipelines

Page 9: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Cancer NGS Data Analysis Pipeline-Software

Raw reads

Analysis-ready reads

FASTQC, FASTX-toolkit,

Trimmomatic

Mapped reads Visualization (IGV, IGB, USCS GB……)

BWA, STAR

……

Data Task Software

Page 10: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Cancer NGS Application Specific Software

Cufflinks, MISO DESeq2,GATK

MACS2, SISSRs

Mapped reads

Bismark, BS Seeker

SomaticSniper, VarScan2, mutect freeBayes, Pindel,

CNVnator

……

Page 11: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Open Source Software with GUI

http://www.broadinstitute.org/cancer/software/GENE-E

Galaxy: Web based platform for analysis of large datasets http://hpc-galaxy.oit.uci.edu/root https://main.g2.bx.psu.edu/ https://usegalaxy.org/

GENE-E: java based matrix visualization and analysis platform; includes heatmap, clustering, filtering etc.

Page 12: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Commercial software for NGS analysis

• Easy to use, no command line skills required

• Usually platform independent

• Little to no learning curve

o Limited flexibility o Harder to publish

Page 13: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Outline • Introduction to NGS data analysis in Cancer

Genomics

– NGS applications in cancer research

– Typical NGS workflows and pipeline

– Open source software with GUI

• Pathway Analysis and Software • Pathway Analysis goals and concepts

• Commercial and open source pathway analysis software

• Data analysis resources

• Summary

Page 14: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Why Pathway Analysis

• Logical next step in any high

throughput experiments

• Goal: to characterize biological

meaning of the joint changes in gene expression

• Why? Often group of genes doing related functions are changed

Page 15: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Pathway and Network Analysis

Pathway Analysis Methods:

• Functional category over representation: discrete test for significance (BiNGO, David, IPA etc)

• Continuous test (GSEA, PAGE)

• Signaling Pathway Impact Analysis (iPathway

Guide)

Network Analysis: (WGCNA, Cytoscape etc)

Page 16: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Functional Category Enrichment • Discrete tests: enrichment for groups in gene

lists – Select gene list at some predefined cutoff – For each gene list and functional category cross-tabulate to get a 2X2 contingency table – Test for significance using Fisher’s exact test – FDR correction for multiple hypothesis testing

Differentially

expressed

Not

differentially

expressed

total

In the

pathway

a b a+b

Not in the

pathway

c d c+d

total a+c b+d n

Page 17: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Functional Categories in Pathway Analysis

• Gene Ontology – Biological Process

– Molecular Function

– Cellular Localization

• Pathway Databases – KEGG

– BioCarta

– Broad Institute (MSigDB)

– Commercial knowledge bases

such as IPA

• Other – Transcription factor targets

– Protein complexes

– Self-Defined

Page 18: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Commerical and Open Source Pathway Analysis Software

Page 19: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Ingenuity Pathway Analysis Tool

Page 20: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

IPA Input file

Page 21: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

IPA results page

Page 22: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Resources in NGS data analysis

Public forums:

Computational resources available at UCI:

• HPC: open source software

• CLCbio, IPA, JMP Genomics…

Page 23: Introduction to Next Generation Sequencing (NGS) … •Introduction to NGS data analysis in Cancer Genomics –NGS applications in cancer research –Typical NGS workflows and pipeline

Summary

Thank you!

• NGS technologies are transforming cancer research.

• Data analysis is a crucial part in NGS applications

• Pathway analysis concepts and software

• Data analysis resources