enriching workflow tools with termite

Post on 27-May-2022

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EnrichingWorkflowToolsWithTERMiteDataworkflow/pipeliningtoolssuchasPipelinePilot[http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/]andKNIME[https://www.knime.org/]enjoyastrongusercommunitywithinthelifescienceindustry.Workflowtoolsenabledata-savvyscientiststoperformcomplexanalysiswithouttheneedtolearnacomplexprogramminglanguageandalsoaidscientificreproducibility–providingamechanismtorepeatinsilicoexperimentsusingexactlythesameconditions.Naturally,thereisastrongusecaseforconnectingTERMitewiththesetoolsviaasimple,easytouseprocess.ThisdocumentwillreviewTERMite’ssupportforthesetoolsandsomeoftheuse-casestowhichtheyhavebeenapplied.PipelinePilotSupportOutofthebox,TERMiteshipswithapipelinepilotcollectionthatmakeusingthesoftwareinPipelinePilotveryeasy.FeedtextintotheTERMiteAnnotatorcomponent,andthat’sit.Thecomponentwilltakethetextfromanysource(Medlineshownhere)andprovidearichannotationlayerthatcanbeusedforanytext-miningprojectOfcourse,morecomplexworkflowsarepossible,suchastheexamplebelowwhichsearchesfordrug-generelationships(Inphrasessuchas:“TheGTPase,RhoB,wassynergisticallyup-regulatedincellstreatedwithixabepiloneandsunitinib”).

Thisprotocolcanbebrokendownintothefollowingstages,1.CollectarticlesfromtheMedlinedatabasementioningaparticulardrug(Ixabepilone)2.AnnotatethecorpususingtheSciBiteVOCabs.3.Filterpaperstoremoveanythatdon’tfocusonIxabepiloneusingSciBite’srelevancyalgorithm,whichidentifiesthemostimportanttopicswithinanyarticle.4.UsingtheTExpressmoduleidentifyspecificsemanticpatternswithinasentencesuchasGene-Verb-Drugandextracttheseintoatable.Oncetheprotocolisbuiltandtested,itisarelativelysimpleprocesstorepeatforothervariablesordatasetsthroughtheclickofabutton.Acollectionofprotocolscanbecreatedtoexplorethevariousnuancesofaparticulartopicextractingvaluableinsightfromtextandservetheneedsofawiderteamwithouttheirneedtobeexpertsintextminingorprogramming.SupportForKNIMEKNIMEhasgainedalotofsupportduetoitsopen,Java-basedframeworkthat’senrichedbyathrivingcommunityofscientistsanddevelopers.ManyofSciBitecustomersareKNIMEusersandassuch,itwasimportantweservethiscommunitytoasimilarlevel.

Regardlessoftheworkflowsoftwareused,it’sstillasimpleprocesstobringtextannotationintoyourprotocols.Here,theprotocol

1. Readsafileoftext2. PassesittotheCallTERMitenodetoexecuteontheserver

3. TheresultissenttoParseTERMiteJsontotransformtheresultsintoanextensivedatatable.

TheCallTERMiteandParseTERMiteJsonnodeareseparatedasusersmaywishtocustomisethe“parse”componenttofilterthecomprehensivesetofresultsfromtheCallTERMitenode(whichtermswereused,wheretheywerefound,withwhatconfidenceetc.).

AswithPipelinePilot,itiseasytohookthisintoadatabasesuchasMedline.Herewe’vetakenthearticlesontheproteinBRCA2andaskedwhicharethemostfrequentco-occurringproteinsintheliterature.Ofcourse,BRCA1isthereatthetop,butyoucanstarttoseethelandscapeofthedifferentplayersinBRCA2biology.

Thesesimpleexamplesarejustthetipofaniceberg,buthopefullytheydemonstratehoweasyitistoconnectTERMitetothetwomostpopularworkflowtoolsinlifesciencetoday.UseCasesWhiletheexamplesabovearedeliberatelysimple,ourcustomersareperformingavarietyofdeepdataminingactivitiesusingTERMiteincombinationwithworkflowtools.Currentprojectsconcerntopicssuchas:

• CompetitiveIntelligence-Target-specificreportsandalerting• Dataintegration-aligningdifferentcommercialdatatoacommonontology• Pharmacovigilance-Miningelectronichealthrecordsfordrug-adverseevent

relationships

• Annotationasaservice–Centralresearchteamsservingtherapeuticteamsusingcollectionsofprotocolstoanalyseanyformofscientifictext.

• Creatingprotein-proteininteractionnetworks• Geneexpressionworkflows,enrichingwithdatafromtheliterature

Ifyou’dliketoknowmoreaboutusingTERMiteinyourworkflowsorwantedtodiscussaparticularusecaseinmoredetail,getintouch!

top related