a combinatorial approach to the analysis of differential gene expression data

33
A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening

Upload: gage-macias

Post on 30-Dec-2015

29 views

Category:

Documents


1 download

DESCRIPTION

A Combinatorial Approach to the Analysis of Differential Gene Expression Data. The Use of Graph Algorithms for Disease Prediction and Screening. The Goal. To classify patients based on expression profiles Presence of cancer Type of cancer Response to treatment - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

A Combinatorial Approach to the Analysis of Differential Gene

Expression Data

The Use of Graph Algorithms for Disease Prediction and Screening

Page 2: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

The Goal

• To classify patients based on expression profiles– Presence of cancer

– Type of cancer

– Response to treatment

• To identify the genes required for accurate classification– Too many = unnecessary noise

– Too few = insufficient information

Page 3: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Classic Clustering Problem

• Current techniques:– Hierarchical Clustering

– K-Means Clustering

– Self-Organizing Maps

– Others

• Drawbacks:– Determining cluster boundaries difficult with diffuse

data

– Objects can only belong to one group

Page 4: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Eliminate Poorly Covering Genes

Raw Data

Set of Discriminatory Genes

Gene Scores

Verify by Classification

Calculate Sample Similarities

Apply Threshold

Eliminate PoorlyDiscriminating Genes

Algorithmic Training

Dominating Set

Maximal Cliques

Gene Scoring

Page 5: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Raw Data

Eliminate PoorlyDiscriminating Genes

Algorithmic Training

Page 6: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

The Gene Scoring Function: Identifying Discriminators

0 2 4 6 8 10 0 2 4 6 8

score(genei) mclassA mclassB classA classB

vs.

Page 7: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 8: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 9: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Eliminate Poorly Covering Genes

Raw Data

Eliminate PoorlyDiscriminating Genes

Algorithmic Training

Page 10: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Eliminate Poorly Covering Genes

Samples Genes

Cla

ss 2

Cla

ss 1

Page 11: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Eliminate Poorly Covering Genes

Raw Data

Calculate Sample Similarities

Apply Threshold

Eliminate PoorlyDiscriminating Genes

Algorithmic Training

Page 12: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Create Unweighted Graph

• Complete, edge-weighted graph– Vertices = samples– Edge weight = similarity metric

• Remove edge weights– If edge weight < threshold, remove edge from

graph– Otherwise, keep edge, ignore weight

• Result: incomplete unweighted graph

Page 13: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

The Edge Weight Function

score(genei) (1 expression_valueij expression_valueik )

where,expression valueij = expression value of genei for samplej

Page 14: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 15: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 16: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 17: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Eliminate Poorly Covering Genes

Raw Data

Set of Discriminatory Genes

Gene Scores

Verify by Classification

Calculate Sample Similarities

Apply Threshold

Eliminate PoorlyDiscriminating Genes

Algorithmic Training

Page 18: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

• A completely connected subset of vertices in a graph

• Maximal clique = local optimization• NP-complete

What is a Clique?

Page 19: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Classification Using Clique

Class2

Class 1

Class 1

Class 3

Class 2

GRAPH

Page 20: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

A Selection of Discriminators

ADH1B alcohol dehydrogenase IB alcohol dehydrogenase activity

FHL1 four and a half LIM domains 1 cell growth, cell differentiation

HBB hemoglobin, beta oxygen transport

CYP4B1 cytochrome P450 4B1 electron transport

TNA tetranectin plasminogen binding protein

TGFBR2 transforming growth factor, beta receptor II

transmembrane receptor protein serine/threonine kinase signaling pathway

Page 21: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Raw Data

Classify Unknown Samples

Calculate Sample Similarities

Apply Threshold

Set of Discriminatory Genes, Scores

The Algorithm - Unsupervised

Page 22: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 23: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 24: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 25: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Summary

• Intersection of clique and dominating set techniques improves results

• Combined orthogonal scoring identifies limited number of discriminatory genes

• Clique offers means of validating obtained scores and weights

• Our technique identifies differing set of discriminatory genes from original paper

• Clique-based classification a viable complement to present clustering methods

Page 26: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Ongoing and Future Research

• Reverse Training• Train to distinguish among types of cancer• Experiment with different weight functions (ex.

Pearson’s coefficient)• Investigate using less stringent techniques

– Near-cliques

– Neighborhood search

– K-dense subgraphs

• Port codes to SGI Altix supercomputer

Page 27: A Combinatorial Approach to the Analysis of Differential Gene Expression Data

Our Research Group

Mike Langston, Ph. D.

Lan Lin Chris Symons

Xinxia Peng Bing Zhang, Ph. D.

Page 28: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 29: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 30: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 31: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 32: A Combinatorial Approach to the Analysis of Differential Gene Expression Data
Page 33: A Combinatorial Approach to the Analysis of Differential Gene Expression Data