pattern recognition in clinical data

Post on 10-May-2015

184 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

Pattern Recognition In Clinical Data

Saket ChoudharyDual Degree Project

Guide: Prof. Santosh Noronha

C G C A T C G A G C T

C G C G T C G A G C T

October 30, 2013

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

INTRODUCTION

INTRODUCTION

Objective

SIGNIFICANT MUTATIONS

Next Generation SequencingMotivationComputational Methods for Driver Detection

VIRAL GENOME DETECION

Workflow

REPRODUCIBILITY

Reproducibility

CONCLUSIONS

Wrapping up

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

OBJECTIVE

Next GenerationSequencing

&Cancer Research

Toolsfor

BiologistsReproducible

analysis

Driver& Pas-senger

MutationDetection

GalaxyTools

ViralGenome

Inte-gration

GalaxyWork-flow

Better/NewAlgorithms

Bench-marking

Alignmenttools

BWAv/s

BWA-PSSM

Driver& Pas-senger

MutationDetection

EnsemblMethod

Lit.Survey

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NEXT GENERATION SEQUENCING

C G C G T C G A G C T A G C A

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NEXT GENERATION SEQUENCING

C G C G T C G A G C T A G C A

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NEXT GENERATION SEQUENCING

C G C G T C G A G C T A G C A

G C G T∗ C G A G∗ C T A G∗

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NEXT GENERATION SEQUENCING

C G C G T C G A G C T A G C A

G C G T∗ C G A G∗ C T A G∗

A G C G C G T C G A G C T A G C A C A

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NGS: WHY?

I Molecular Approach: Study of variations at the ’base’level

I Low Cost: 1000$ genomeI Faster: Quicker than traditional sequencing techniques

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NGS: WHY?

I Molecular Approach: Study of variations at the ’base’level

I Low Cost: 1000$ genomeI Faster: Quicker than traditional sequencing techniques

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NGS: WHY?

I Molecular Approach: Study of variations at the ’base’level

I Low Cost: 1000$ genomeI Faster: Quicker than traditional sequencing techniques

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NGS: WHERE?

I Study variations, genotype-phenotype associationI Look for ’markers of diseases’I Prognosis

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NGS: WHERE?

I Study variations, genotype-phenotype associationI Look for ’markers of diseases’I Prognosis

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NGS: WHERE?

I Study variations, genotype-phenotype associationI Look for ’markers of diseases’I Prognosis

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

NGS: MUTATIONS

I 3x109 base pairsI We are all 99.9% similar, at DNA levelI More than 2 million SNPsI No particular pattern of SNPsI If a certain mutation causes a change in an amino acid, it is

referred to as non synonymous(nsSNV)

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVERS AND PASSENGERS I

Cancer is known to arise due to mutationsNot all mutations are equally important!

Somatic MutationsSet of mutations acquired after zygote formation, over andabove the germline mutations

Driver MutationsMutations that confer growth advantages to the cell, beingselected positively in the tumor tissue

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVERS AND PASSENGERS

Drivers are NOT simply loss of function mutations, but morethan that:

I Loss of function: Inactivate tumor suppressor proteinsI Gain of function: Activates normal genes transforming

them to oncogenesI Drug Resistance Mutations: Mutations that have evolved

to overcome the inhibitory effect of drugs

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVERS AND PASSENGERS

Drivers are NOT simply loss of function mutations, but morethan that:

I Loss of function: Inactivate tumor suppressor proteinsI Gain of function: Activates normal genes transforming

them to oncogenesI Drug Resistance Mutations: Mutations that have evolved

to overcome the inhibitory effect of drugs

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVERS AND PASSENGERS

Drivers are NOT simply loss of function mutations, but morethan that:

I Loss of function: Inactivate tumor suppressor proteinsI Gain of function: Activates normal genes transforming

them to oncogenesI Drug Resistance Mutations: Mutations that have evolved

to overcome the inhibitory effect of drugs

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVER MUTATIONS: WHY?

Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis

I Low cost of NGS comes with a heavier roadblock of dataanalysis

I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem

I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial

I However there is no tool that allows one to visualise theresults on an input across the cohort of tools

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVER MUTATIONS: WHY?

Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis

I Low cost of NGS comes with a heavier roadblock of dataanalysis

I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem

I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial

I However there is no tool that allows one to visualise theresults on an input across the cohort of tools

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVER MUTATIONS: WHY?

Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis

I Low cost of NGS comes with a heavier roadblock of dataanalysis

I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem

I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial

I However there is no tool that allows one to visualise theresults on an input across the cohort of tools

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVER MUTATIONS: WHY?

Identify driver mutations −→ better therapeutic targetsBut how does one zero down upon the exact set? −→experiments are too costly, probably infeasible for 2 million+SNPs −→ Leverage computational analysis

I Low cost of NGS comes with a heavier roadblock of dataanalysis

I Searching among 2 million+ SNPs is a non-trivial, and acomputationally intensive problem

I Softwares have a low consensus ratio amongst them selves←→ Defining a driver, computationally is non-trivial

I However there is no tool that allows one to visualise theresults on an input across the cohort of tools

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

MACHINE LEARNING ITwo datasets:

I Training: Labeled dataset, containing a table of featureswith mutations labelled as ”drivers/passengers”

I Test: ’Learning’ from training dataset, test the predictionmodel

Table: Training Dataset

Chromosome Position Ref Alt Type1 27822 A G Driver1 27832 T G Driver2 47842 G C Passenger. . . . .. . . . .. . . . .

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

MACHINE LEARNING II

Table: Test Dataset

Chromosome Position Ref Alt Type1 27824 A G ?1 47832 T G ?

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

MACHINE LEARNING: FEATURE SELECTION I

Machine Learning relies on a set of features for trainingRedundant features should be avoidedCHASM [1] makes use ofp(Xi) represents the probability of occurrence of an event XiConsidering a series of events X1, X2, X3...,Xn analogous ’seriesof packets’ in communication theory , the information receivedat each step can be quantified on a log scale by:

1log2(Xi)

= −log2(p(Xi)) (1)

The expected value of information from a series of events iscalled shannon entropy: H(X):

H(X) = −∑

i

p(Xi) log2 p(Xi) (2)

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

MACHINE LEARNING: FEATURE SELECTION IIMutual Information between two random variables X,Y isdefined as the amount of information gained about randomvariable X due to additional information gained from thesecond, Y:

I(X,Y) = H(X)−H(X|Y) (3)

Here:X: Class Label[Driver/Passenger]Y: Predictive Featureand hence I(X,Y) represents how much information wasgained about the class label Y from knowledge of a feature XSimplifying :

I(X,Y) =∑

p(x, y)log2p(x, y)

p(x)p(y)(4)

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

FUNCTIONAL IMPACT I

I If a certain mutation confers an advantage to the cell interms of replication rate, it is probably going to be selectedwhile all those mutations that reduce its fitness have ahigher chance of being eliminated from the population.

I Certain residues in a MSA of homologous sequences aremore conserved than others. A highly conserved ifmutated is possibly going to cost a lot since what had’evolved’ is disturbed!

I Scores can be assigned based on this ”conservation”parameter.

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

FUNCTIONAL IMPACT IIFigure: SIFT [?] algorithm

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

Some of the common tools/algorithms used for drivermutation prediction:

I SIFTI PolyphenI Mutation AssesorI TransFICI Condel

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

FRAMEWORK FOR COMPARING VARIOUS TOOLS I

I Different tools use different formats, give different outputsfor similar input

I Running analysis on multiple tools −→ keep shifting dataformats

I Concordance?

Polyphen2 Input

chr1:888659 T/Cchr1:1120431 G/Achr1:1387764 G/Achr1:1421991 G/Achr1:1599812 C/Tchr1:1888193 C/Achr1:1900186 T/C

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

FRAMEWORK FOR COMPARING VARIOUS TOOLS II

SIFT Input

1,888659,T,C1,1120431,G,A1,1387764,G,A1,1421991,G,A1,1599812,C,T1,1888193,C,A1,1900186,T,C

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

DRIVER MUTATIONS: TOOLS DON’T AGREE

X Axis: Condel Score Y Axis: MA Score

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

Solution?:Galaxy[?], an open source web-based platform forbioinformatics, makes it possible to represent the entire dataanalysis pipeline in an intuitive graphical interface

Figure: Galaxy Workflow polyphen2 algorithm

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

Run all tools in one go:

Figure: Run all tools

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

Compare all tools:

Figure: Compare all tools

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

VIRAL GENOME DETECION

Cervical cancers have been proven to be associated withHuman Papillomavirus(HPV)Cervical cancer datasets from Indian women was put throughan analysis to detect :

1. Any possible HPV integration2. Sites of HPV integration

Who Cares?I Replacing whole genome sequencing, by targeted

sequencing at the sites where these virus have beendetected in a cohort of samples, thus speeding up thewhole process.

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

RawData

Align datato humangenome

reference

Extractunmapped

regions

Alignunmapped

regionsto Virusgenome

ExtractmappedregionsBLAST

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

RawData

Align datato humangenome

reference

Extractunmapped

regions

Alignunmapped

regionsto Virusgenome

ExtractmappedregionsBLAST

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

RawData

Align datato humangenome

reference

Extractunmapped

regions

Alignunmapped

regionsto Virusgenome

ExtractmappedregionsBLAST

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

RawData

Align datato humangenome

reference

Extractunmapped

regions

Alignunmapped

regionsto Virusgenome

ExtractmappedregionsBLAST

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

RawData

Align datato humangenome

reference

Extractunmapped

regions

Alignunmapped

regionsto Virusgenome

ExtractmappedregionsBLAST

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

RawData

Align datato humangenome

reference

Extractunmapped

regions

Alignunmapped

regionsto Virusgenome

ExtractmappedregionsBLAST

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

RawData

Align datato humangenome

reference

Extractunmapped

regions

Alignunmapped

regionsto Virusgenome

ExtractmappedregionsBLAST

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

GALAXY WORKFLOW

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

Figure: Aligned HPV genomes

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

REPRODUCIBILITY

I In pursuit of novel ’discovery’, standardizing the dataanalysis pipeline is often ignored, leading to dubiousconclusions

I Analysis should be reproducible and above all, correctI Parameter’s values can change the results by a big factor,

they need to be documented/loggedI Garbage in, Garbage out

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

CONCLUSIONS

With the Galaxy tool box for identification of significantmutations and the study of the science behind the methods, thenext steps would be to:

I Open source the toolbox to the community: A tool makeslittle sense if it is not in a usable form, communityfeedback will be used to add more tools and improve theexisting ones

I A new method for driver mutation prediction: all themethods have low level of concordance. A new methodthat takes into account the available data at all levels :mutations, transcriptome and micro array data is possible.With the Galaxy toolbox in place, it would be possible tointegrate information at various levels

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

FUTURE WORK

I Develop an algorithm that integrates machine learningapproach with functional approach by zeroing down upononly those attributes that are known to have an impact

I The algorithm would also account for information at otherlevels: RNA expressions, Clinical data.

I Integrating information at all levels would provide adeeper insight

I The developed Galaxy toolbox will be used a the basicframework for integrating information

INTRODUCTION SIGNIFICANT MUTATIONS VIRAL GENOME DETECION REPRODUCIBILITY CONCLUSIONS

REFERENCES I

Hannah Carter, Sining Chen, Leyla Isik, SvitlanaTyekucheva, Victor E Velculescu, Kenneth W Kinzler, BertVogelstein, and Rachel Karchin.Cancer-specific high-throughput annotation of somaticmutations: computational prediction of driver missensemutations.Cancer research, 69(16):6660–6667, 2009.

top related