day2 goeman2 testing groups 2012

Upload: anonymous-pke8zox

Post on 03-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    1/33

    Introduction Aggregation Methods Conclusion

    Testing groups of genes

    Jelle Goeman

    Department of Medical Statistics

    Leiden University Medical Center

    RNAseq course

    25 October 2012

    Testing groups of genes Jelle Goeman

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    2/33

    Introduction Aggregation Methods Conclusion

    Outline

    1 Introduction

    2 Aggregation

    3 Methods

    4 Conclusion

    Testing groups of genes Jelle Goeman

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    3/33

    Introduction Aggregation Methods Conclusion

    Beyond differentially expressed genes

    edgeR, bayseq, BBseq

    Return a list of differentially expressed genes

    Biological theory is not about isolated genes

    Typical biological research questions and hypothesesAbout pathways

    About biological processes

    About areas of the genome

    About sets of related genes

    Testing groups of genes Jelle Goeman

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    4/33

    Introduction Aggregation Methods Conclusion

    Beyond differentially expressed genes

    edgeR, bayseq, BBseq

    Return a list of differentially expressed genes

    Biological theory is not about isolated genes

    Typical biological research questions and hypothesesAbout pathways

    About biological processes

    About areas of the genome

    About sets of related genes

    Question

    How to analyze RNAseq data from a gene set perspective?

    Testing groups of genes Jelle Goeman

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    5/33

    Introduction Aggregation Methods Conclusion

    Gene set testing: advantages

    More directly interpretable resultsStudy relevant processes directly, not via single genes

    Testing groups of genes Jelle Goeman

    http://find/http://goback/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    6/33

    Introduction Aggregation Methods Conclusion

    Gene set testing: advantages

    More directly interpretable resultsStudy relevant processes directly, not via single genes

    Reduce multiple testing load

    Fewer gene sets than genes

    Testing groups of genes Jelle Goeman

    I d i A i M h d C l i

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    7/33

    Introduction Aggregation Methods Conclusion

    Gene set testing: advantages

    More directly interpretable resultsStudy relevant processes directly, not via single genes

    Reduce multiple testing load

    Fewer gene sets than genes

    Introduce biological knowledge in the analysis

    Make use of knowledge in pathway databases to improve power

    Testing groups of genes Jelle Goeman

    I t d ti A ti M th d C l i

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    8/33

    Introduction Aggregation Methods Conclusion

    Gene set testing: advantages

    More directly interpretable resultsStudy relevant processes directly, not via single genes

    Reduce multiple testing load

    Fewer gene sets than genes

    Introduce biological knowledge in the analysis

    Make use of knowledge in pathway databases to improve power

    Find gene sets with consistent but small effectsOf which no single gene shows up in a single gene analysis

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    9/33

    Introduction Aggregation Methods Conclusion

    Gene set testing: advantages

    More directly interpretable resultsStudy relevant processes directly, not via single genes

    Reduce multiple testing load

    Fewer gene sets than genes

    Introduce biological knowledge in the analysis

    Make use of knowledge in pathway databases to improve power

    Find gene sets with consistent but small effectsOf which no single gene shows up in a single gene analysis

    More reproducible results

    Also across platforms

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/http://goback/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    10/33

    Introduction Aggregation Methods Conclusion

    This talk

    Methodology still under development

    Not many methods and packages available

    Focus on principles

    How should gene set testing be done?

    Parallels and differences with microarray gene set testing

    Many parallels, a few differences

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/http://goback/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    11/33

    Introduction Aggregation Methods Conclusion

    Outline

    1 Introduction

    2 Aggregation

    3 Methods

    4 Conclusion

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/http://goback/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    12/33

    Introduction Aggregation Methods Conclusion

    Testing at aggregated levels

    In abstract sense

    Gene set testing = testing at higher levels of aggregation

    Different ways of aggregatingAggregating data

    Aggregating evidence

    Aggregating evidenceWith or without taking direction of effect into account

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/http://goback/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    13/33

    gg g

    Data aggregation

    Subj. 1 Subj. 2 . . . Subj. n p-value

    exon 1 34 18 26

    exon 2 44 32 8...

    ......

    ...

    exon k 16 5 9

    whole gene 236 187 74 0.01

    Data aggregation

    Simplest way of aggregationAssumption: all exons have same differential expression

    Ignores between exon-variation

    Result dominated by exons with high coverage

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    14/33

    gg g

    Evidence aggregation

    Subj. 1 Subj. 2 . . . Subj. n p-value

    gene 1 34 18 26 0.13

    gene 2 44 32 8 0.20...

    ......

    ... ...

    gene k 16 5 9 0.04

    whole pathway 0.01

    Evidence aggregation

    More sophisticated way of aggregationGenes differential expression may be very different

    Direction of differential expression is lost

    In result all genes equally important

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    15/33

    What gene sets?

    Any externally defined set that has something in common

    Pathways

    KEGG, Biocarta

    Gene Ontology termsBiological process, Molecular function, Cellular component

    Chromosomal regions

    Chromosome arms, cytobands, linkage peaks, genes

    Published gene sets

    Predictive signatures, gene lists

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    16/33

    Outline

    1 Introduction

    2 Aggregation

    3 Methods

    4 Conclusion

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    17/33

    Two types of gene set testing methods

    Enrichment type methods

    Compare set of genes to other genes

    Keep subjects/samples fixed

    H0: this gene set behaves just like the rest of the genes

    Aggregate score methods

    Compare subjects to other (hypothetical) subjects

    Keep set of genes fixed

    H0: this gene set behaves like one that is not diff. expressed

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    18/33

    Enrichment: general idea

    Primary analysis

    First do a single gene analysis

    Gather the results (gene list, p-values, fold change. . . )

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    19/33

    Enrichment: general idea

    Primary analysis

    First do a single gene analysis

    Gather the results (gene list, p-values, fold change. . . )

    Secondary (post hoc) analysis

    Input: the results of the primary analysis

    Do genes in the gene set tend to have low p-values?

    Do the low p-values tend to belong to the same gene set?

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    20/33

    Enrichment: general idea

    Primary analysis

    First do a single gene analysis

    Gather the results (gene list, p-values, fold change. . . )

    Secondary (post hoc) analysis

    Input: the results of the primary analysis

    Do genes in the gene set tend to have low p-values?

    Do the low p-values tend to belong to the same gene set?

    Compare

    Searching for themes in the gene list

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    21/33

    Competitive method

    Null hypothesis

    Genes in the gene set are as often differentially expressed as genes

    outside

    Alternative hypothesis

    Genes in the gene set are more often differentially expressed as

    genes outside

    Competitive

    Null and alternative compare to genes outside gene set

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    22/33

    Fishers exact test: practicalities

    A 2 2 table

    Make a 2 2 table:diff. expr. gene non-diff. expr. gene

    in gene set

    not in gene set

    Test row/column association with Fishers exact test

    Equivalent variants

    2 test; hypergeometric test; binomial z-test

    Overview

    Khatri and Dhragici, Bioinformatics, 2005

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    23/33

    Urn model: randomly draw genes from an urn

    Not diff. expressed

    Diff. expressed

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    24/33

    Assumptions of enrichment

    Assumption: equal probabilities of being called diff. expressed

    Violation: more power for long genes (coverage)

    Consequence: gene sets with long genes called enriched

    Assumption: equal probabilities of being called diff. expressed

    Violation: more power highly expressed genes

    Consequence: gene sets high expression called enriched

    Assumption: independence of genes in gene setsViolation: correlation of expression within pathway likely

    Consequence: gene sets with dependence called enriched

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    25/33

    The goseq package

    goseq: Young, Wakefield, Smyth

    Adjust for length bias in enrichment (only)

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    qq

    q

    q

    qq

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    qq

    qq

    qq

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    qq

    q

    q

    q

    q

    q

    q

    qq

    q

    q

    q

    q

    qq

    q

    q

    q

    q

    qq

    q

    q q

    q

    q

    q

    q

    q

    q

    0 2000 4000 6000 8000 10000 12000 140

    0.0

    0

    0.0

    5

    0.1

    0

    0.1

    5

    0.2

    0

    0.2

    5

    Biased Datain200gene bins.

    ProportionDE

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    26/33

    Remedy the problems of the urn model

    A permutationRandomly reassign group labels to array data

    Permutation p-value

    Permute many timesCalculate enrichment score for each permutation

    Permutation P-value

    = percentage of permuted scores better than true score

    Why?

    Under permutations no reads are differentially expressed

    But correlations between reads are retained

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    27/33

    Permutation testing

    True data

    0 0 0 1 1 1

    x1,1 x1,2 x1,3 x1,4 x1,5 x1,6...

    ......

    ......

    ...

    xp,1 xp,2 xp,3 xp,4 xp,5 xp,6

    Random permutation of sample labels

    1 0 1 1 0 0

    x1,1 x1,2 x1,3 x1,4 x1,5 x1,6...

    ......

    ......

    ...

    xp,1 xp,2 xp,3 xp,4 xp,5 xp,6

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    28/33

    Permutation testing

    Advantages

    Almost assumption-free

    Always correct p-values

    No problems taking into account gene correlations

    Disadvantages

    Not possible if experimental design too complex

    Computationally very intensive

    Often there is no alternative

    If you want reliable p-values

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    29/33

    Aggregate score methods

    Null hypothesis

    There is no differential expression in the reads of the gene set

    Alternative hypothesis

    There is some differential expression in the reads of the gene set

    Self-contained

    Null and alternative do not involve genes outside gene set

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    30/33

    Analyzing RNA-seq data as a microarray

    Microarray methodsReduce read counts to single score (counts per million)

    Variance stabilize: voom

    Use methods designed for microarray data analysis

    Many R packages available for microarray data

    globaltest

    GlobalAncova

    . . .

    Disadvantage

    Ignores differences in coverage ( noise) between subjects/genes

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/http://goback/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    31/33

    Multiple testing

    Testing more than one gene set

    Multiple testing comes back into play

    use FDR (Benjamini and Hochberg)or Permutation-based: Westfall & Young

    Be selective if you dare

    The more gene sets tested, the less power per gene set

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/http://goback/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    32/33

    Outline

    1 Introduction

    2 Aggregation

    3 Methods

    4 Conclusion

    Testing groups of genes Jelle Goeman

    Introduction Aggregation Methods Conclusion

    http://find/
  • 7/28/2019 Day2 Goeman2 Testing Groups 2012

    33/33

    Work in progress

    Field under development

    Few methods available that are specific for RNA-seq

    Choice of methodology

    Lessons from microarray data analysis remain valid

    Prefer an aggregate score method

    Rely on permutations if you can afford them

    Testing groups of genes Jelle Goeman

    http://find/