timesvector: a vectorized clustering approach to the...

24
TimesVector: A vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes Inuk Jung, Hongryul Ahn, Kyuri Jo, Hyejin Kang, Youngjae Yu and Sun Kim Inuk Jung ([email protected]) Bio and Health Informatics lab Seoul National University

Upload: others

Post on 16-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

TimesVector:AvectorizedclusteringapproachtotheanalysisoftimeseriestranscriptomedatafrommultiplephenotypesInukJung,HongryulAhn,KyuriJo,HyejinKang,YoungjaeYuandSunKim

InukJung([email protected])BioandHealthInformaticslabSeoulNationalUniversity

Page 2: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

GoalofthisstudyIdentifybiologicallymeaningfulgeneclusters(triclusters)thathavesignificantlysimilarordifferentialexpressionpatternsfrom3dimensionaltimeseriesdata (Gene-Time-Condition)

C

Page 3: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

ExampleOrganism:Mouse(18117genes)Timepoints:day0,day3,day7,day14Conditions:Malariainfectedintactfemale,gonadectomized*

(gdx) female,intactmale,gdx male

289872 expressionvalues(GxTxC)

DifferentiallyExpressedPatterns(DEP) Similarly ExpressedPattern(SEP)

100genes 80genes 200genes

Goalofthisstudy

*Removementof ovaries or testis

Page 4: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Twotechnicalproblemstatements

1. Highclusteringcomplexitybydimensions

2. Technicaldifficultytocapturedifferentialexpressionpatterns betweentwoormoreconditions(WhatareDEGsintimeseriesdata?)

Page 5: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

P1.Highclusteringcomplexitybydimensions

DEGanalysisusedfortimeseriesanalysis[1](2000)

Biclustering algorithmdeveloped fortimeseriesdata[2](2000)

Doesnottakeintoaccountthesequential natureoftimeseries expressiondata

Biclustering isNP-hard andisbound to2dimensionalclustering(eithergene-timeorgene-condition)

Firsttriclusteringalgorithmdeveloped ,TriCluster [3](2005)Onlyabletoidentifytriclusters withsimilarexpressionpatterns(SEP)

Triclustering toolthatisabletoidentifyDEPs[4](2012)IdentificationprocessofDEPisbasedonsimilaritymeasures– poorperformance

Onedimension(C) Twodimensions(GT,orGC)

Threedimensions(GCT)Threedimensions(GCT)

[1]Alizadeh etal,Distinct typesofdiffuselargeB-cell lymphomaidentified bygeneexpression profiling, Nature2000[2]Chengetal,Biclustering ofexpression data,ISMB2000[3]Zhaoetal,TheTricluster algorithm,ACMSIGMOD2005[4]Tchagang etal,TheOPTricluster algorithm, BMCBioinformatics 2012

Page 6: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

• Divergentpatternrecognitionisnotavailable• Expressionpatterndiffersbetweenallpatterns

• OPTricluster performsapairwisecomparisonfordetectingdivergentexpressionpatternclusters• Incaseoffourconditions– A,B,C,D• AvsBCD,BvsACD,CvsABD,DvsABC• HenceA!=B!=C!=Disnotsupported

P2.Capturingdifferentialexpressionpatternsbetweentwoormoreconditions

Page 7: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

TimesVector Framework

Clustering

Detecting patterns

Page 8: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Clustering– Dimensionreduction• Dimensionreductionbystrippingawaythesampledimensionand

concatenatingittothetimedimension• Takesburdenoffofforclusteringandpost-processingprocedures• Noinformationislost

t1 t2 t3

25 23 22

48

17

16

t1 t2 t3

5

12

1

13

t1 t2 t3

g1 15 20 10

g2 39 52 31

g3 8 16 6

… … …

gi 25 23 25

Gen

es

(i)

Time (j)

G×C×T matrix t1 t2 t3 t1 t2 t3

g1 15 20 10 15 10 5

g2 39 52 31 35 22 12

g3 8 16 6 7 3 1…

… … … … … …

gi 25 23 25 14 15 13

s1 s2 skG×CT matrix

t1 t2 t3

25 23 22

55 52 48

20 18 17

… … …

17 16 16

…Concatenate

samplesG

enes

(i)

Time (j)⋅Conditions(k)

3 dimensional matrix 2 dimensional matrix

Page 9: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

SphericalK-meansclustering• SphericalK-means(skmeans)forclustering thevectors

• AK-means clusteringalgorithmwithcosinesimilarity asitsdistancemetric• Vectorsarenormalizedtounitvectors– thiscauses projectionofvectorstoasphere

• Minimize thecosinedissimilarity inallclusters

: indicator of a gene having membership to cluster

: the centroid of cluster

: expression level vector of gene

: total number of genes: total number of clusters

Page 10: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

SelectingK bysilhouettescore

• UsingfourmicroarrayandRNA-seq time-seriesdata,theKwiththehighestsilhouettescorewaschosen

0

100

200

300

400

500

600

700

0 5 10 15 20 25 30

Opt

imal

K

Condition × Time points

Data C T C×T KGSE74465 (Rice) 2 3 6 100GSE11651 (Yeast) 5 3 15 200GSE4324 (Mouse) 4 4 16 500GSE39429 (Rice) 4 6 24 600

: C×T

Page 11: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

• Re-introduceconditiondimensionbysplittingvectorsbyconditions• ThebZIP genevectorisdissectedintothenumberofconditions

v(bZIP)=<1, 1,1,3,3,3,3.5,2.5,3,3.7,2.2,3>

Detectingclusterswithdistinctexpressionpatterns

0h

1h

6hABCD

1 2 3 4 5

1

2

3

A(0h, 1h 6h)

B(0h, 1h 6h)

C(0h, 1h 6h)

Conditions

centroid

D(0h, 1h 6h)

Page 12: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Threetypesofpatternsaredefined• DEP(DifferentiallyExpressedPattern)

• Allsamplesinaclusterhavedifferentexpressionpatterns

• ODEP(OneDifferentiallyExpressedPattern)• Onesampleinaclusterhavedifferentexpressionfromtheothers

• SEP(SimilarlyExpressedPattern)• Allsampleshavesimilarexpressionpatterninacluster

Page 13: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Method– DEPpatternrecognition• Objective:TestifexpressionofconditionsA,B,CareA!=B!=C

• Buildcentroidforeachconditionwithineachcluster• Selectthemostoutercentroidasbasecentroid

0h

1h

6h

clusterC1

clusterC2

ABC

AcentroidBcentroidCcentroid

clusterC3

1 2 3 4 5

1

2

3

Page 14: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

1. Compute cosinedistancefromeachdissected vectortothebasecentroidforeachcluster2. Rankdissected vectorsbycosinedistance3. MeasureMutualInformationwithXasdistancetobasecentroidandYascondition4. MeasuresignificanceofMIby1000randompermutatedtests,

Method– DEPpatternrecognition

0h

1h

6h

clusterC2

ABC

AcentroidBcentroidCcentroidBasecentroid

1 2 3 4 5

1

2

3

Phenotype A A A A B B B B C C C C

clid G1_A G2_A G3_A G4_A G1_B G2_B G3_B G4_B G1_C G2_C G3_C G4_C

C2 0.9 0.87 0.96 0.99 0.1 0.05 0.2 0.18 0.5 0.6 0.57 0.61

Rank 10 9 11 12 2 1 4 3 5 7 6 8

Discretized Rank 3 3 3 3 1 1 1 1 2 2 2 2

MI Log2(4)=2

Page 15: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Method– ODEPpatternrecognitionObjective:TestifexpressionpatternofaconditionamongA,B,CisA!=BC(B=C)orB!=AC(A=C)orC!=AB(A=B)

1. Computeabasecentroidofcomparingconditions(BC,AC,AB)2. Computecosinedistanceofdissectedvectorstothecentroidfor

eachcombination3. PerformANOVAonthecomputedcosinedistancecombinations

0h

1h

6h

clusterC1

clusterC2

ABC

AcentroidBcentroidCcentroid

clusterC3

1 2 3 4 5

1

2

3

Page 16: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Method– SEPpatternrecognitionObjective:TestifexpressionofconditionsA,B,CisA=B=C

1. Computeabasecentroidofallconditionswithinacluster2. Computecosinedistanceofdissectedvectorstothebasecentroid3. Tightness- lowerboundof99%confidenceintervalofallclusters4. Clusterswithtightnesslessthan99%CIareSEPclusters

0h

1h

6h

clusterC1

clusterC2

ABC

AcentroidBcentroidCcentroid

clusterC3

1 2 3 4 5

1

2

3

Page 17: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Results

• Data

• Biologicallysignificantclustersdetected

• PerformancecomparedwithTricluster andOPTricluster

*

Malaria infected / Gonadectomized male and female mice

Rice plants treated with 4 phytohormones

Dehydration stress treated rice plants

Fermentation of five yeast strains

Page 18: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Results– Clusterpatterns

C=4, T=4 C=4, T=6

Page 19: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Results– MalariainfectedMousedata

(a) DEP cluster 51

(b) ODEP cluster 20

(c) SEP cluster 357

Page 20: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Results– Phytohormone treatedriceplants

• 5clusterswerefound thatrespondedtotheABA (Absicic acid)phytohormone• Genesweregraduallyinducedovertime.• EnrichedGOtermsintheseclusterswererelatedto‘Responsetoabscisic acid’

Page 21: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Results– Comparisonwithothertools

Average number of genes per cluster

Tightness (average within cosine distance of clusters)

Weighted silhouette score

Page 22: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Conclusion

• TimesVector isabletodetectgeneclustersin3Dtime-seriesdatathatexhibitdistinctexpressionpatterns

• Especially,itisabletodetectclusterswithdistinctivelydifferentexpressionpatternsacrossconditions

• Itshowedsignificantlyimprovedclusteringqualitycomparedtorecenttriclustering tools

Page 23: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Funding• TheCooperativeResearchProgramforAgricultureScience&Technology

Development (ProjectNo.PJ01121102)RuralDevelopmentAdministration(RDA),RepublicofKorea

• TheBio&MedicalTechnologyDevelopmentProgramoftheNationalResearchFoundation (NRF) fundedbytheMinistryofScience,ICT&FuturePlanning (2012M3A9D1054622)

• TheKoreaHealthTechnologyR&DProjectthrough theKoreaHealthIndustryDevelopment Institute(KHIDI), funded bytheMinistryofHealth&Welfare,RepublicofKorea(HI15C3224)

Page 24: TimesVector: A vectorized clustering approach to the ...admis.fudan.edu.cn/giw2016/slides/session-09/1-GIW2016_TimesVe… · expression patterns (SEP) Triclustering tool that is able

Thankyouforyourattention