enriching workflow tools with termite

4
Enriching Workflow Tools With TERMite Data workflow / pipelining tools such as PipelinePilot [http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/] and KNIME [https://www.knime.org/] enjoy a strong user community within the life science industry. Workflow tools enable data-savvy scientists to perform complex analysis without the need to learn a complex programming language and also aid scientific reproducibility – providing a mechanism to repeat in silico experiments using exactly the same conditions. Naturally, there is a strong use case for connecting TERMite with these tools via a simple, easy to use process. This document will review TERMite’s support for these tools and some of the use-cases to which they have been applied. Pipeline Pilot Support Out of the box, TERMite ships with a pipeline pilot collection that make using the software in Pipeline Pilot very easy. Feed text into the TERMite Annotator component, and that’s it. The component will take the text from any source (Medline shown here) and provide a rich annotation layer that can be used for any text-mining project Of course, more complex workflows are possible, such as the example below which searches for drug-gene relationships (In phrases such as: “The GTPase, RhoB, was synergistically up-regulated in cells treated with ixabepilone and sunitinib”).

Upload: others

Post on 27-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enriching Workflow Tools With TERMite

EnrichingWorkflowToolsWithTERMiteDataworkflow/pipeliningtoolssuchasPipelinePilot[http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/]andKNIME[https://www.knime.org/]enjoyastrongusercommunitywithinthelifescienceindustry.Workflowtoolsenabledata-savvyscientiststoperformcomplexanalysiswithouttheneedtolearnacomplexprogramminglanguageandalsoaidscientificreproducibility–providingamechanismtorepeatinsilicoexperimentsusingexactlythesameconditions.Naturally,thereisastrongusecaseforconnectingTERMitewiththesetoolsviaasimple,easytouseprocess.ThisdocumentwillreviewTERMite’ssupportforthesetoolsandsomeoftheuse-casestowhichtheyhavebeenapplied.PipelinePilotSupportOutofthebox,TERMiteshipswithapipelinepilotcollectionthatmakeusingthesoftwareinPipelinePilotveryeasy.FeedtextintotheTERMiteAnnotatorcomponent,andthat’sit.Thecomponentwilltakethetextfromanysource(Medlineshownhere)andprovidearichannotationlayerthatcanbeusedforanytext-miningprojectOfcourse,morecomplexworkflowsarepossible,suchastheexamplebelowwhichsearchesfordrug-generelationships(Inphrasessuchas:“TheGTPase,RhoB,wassynergisticallyup-regulatedincellstreatedwithixabepiloneandsunitinib”).

Page 2: Enriching Workflow Tools With TERMite

Thisprotocolcanbebrokendownintothefollowingstages,1.CollectarticlesfromtheMedlinedatabasementioningaparticulardrug(Ixabepilone)2.AnnotatethecorpususingtheSciBiteVOCabs.3.Filterpaperstoremoveanythatdon’tfocusonIxabepiloneusingSciBite’srelevancyalgorithm,whichidentifiesthemostimportanttopicswithinanyarticle.4.UsingtheTExpressmoduleidentifyspecificsemanticpatternswithinasentencesuchasGene-Verb-Drugandextracttheseintoatable.Oncetheprotocolisbuiltandtested,itisarelativelysimpleprocesstorepeatforothervariablesordatasetsthroughtheclickofabutton.Acollectionofprotocolscanbecreatedtoexplorethevariousnuancesofaparticulartopicextractingvaluableinsightfromtextandservetheneedsofawiderteamwithouttheirneedtobeexpertsintextminingorprogramming.SupportForKNIMEKNIMEhasgainedalotofsupportduetoitsopen,Java-basedframeworkthat’senrichedbyathrivingcommunityofscientistsanddevelopers.ManyofSciBitecustomersareKNIMEusersandassuch,itwasimportantweservethiscommunitytoasimilarlevel.

Regardlessoftheworkflowsoftwareused,it’sstillasimpleprocesstobringtextannotationintoyourprotocols.Here,theprotocol

1. Readsafileoftext2. PassesittotheCallTERMitenodetoexecuteontheserver

Page 3: Enriching Workflow Tools With TERMite

3. TheresultissenttoParseTERMiteJsontotransformtheresultsintoanextensivedatatable.

TheCallTERMiteandParseTERMiteJsonnodeareseparatedasusersmaywishtocustomisethe“parse”componenttofilterthecomprehensivesetofresultsfromtheCallTERMitenode(whichtermswereused,wheretheywerefound,withwhatconfidenceetc.).

AswithPipelinePilot,itiseasytohookthisintoadatabasesuchasMedline.Herewe’vetakenthearticlesontheproteinBRCA2andaskedwhicharethemostfrequentco-occurringproteinsintheliterature.Ofcourse,BRCA1isthereatthetop,butyoucanstarttoseethelandscapeofthedifferentplayersinBRCA2biology.

Thesesimpleexamplesarejustthetipofaniceberg,buthopefullytheydemonstratehoweasyitistoconnectTERMitetothetwomostpopularworkflowtoolsinlifesciencetoday.UseCasesWhiletheexamplesabovearedeliberatelysimple,ourcustomersareperformingavarietyofdeepdataminingactivitiesusingTERMiteincombinationwithworkflowtools.Currentprojectsconcerntopicssuchas:

• CompetitiveIntelligence-Target-specificreportsandalerting• Dataintegration-aligningdifferentcommercialdatatoacommonontology• Pharmacovigilance-Miningelectronichealthrecordsfordrug-adverseevent

relationships

Page 4: Enriching Workflow Tools With TERMite

• Annotationasaservice–Centralresearchteamsservingtherapeuticteamsusingcollectionsofprotocolstoanalyseanyformofscientifictext.

• Creatingprotein-proteininteractionnetworks• Geneexpressionworkflows,enrichingwithdatafromtheliterature

Ifyou’dliketoknowmoreaboutusingTERMiteinyourworkflowsorwantedtodiscussaparticularusecaseinmoredetail,getintouch!