pacific symposium on biocomputing 2019 · pacific symposium on biocomputing 2019 abstract book...

112
PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page that your abstract is on and put your poster on the poster board with the corresponding number (e.g., if your abstract is on page 50, put your poster on board #50). Proceedings papers with oral presentations #2-29 are not assigned poster space. Abstracts are organized first by session, then the last name of the first author. Presenting authors’ names are in bold text.

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

PACIFICSYMPOSIUMONBIOCOMPUTING2019

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison

page50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-29arenotassignedposterspace.

Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareinboldtext.

Page 2: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

i

TABLEOFCONTENTS

PROCEEDINGSPAPERSWITHORALPRESENTATIONPATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK..........................................................................................................................................................................1THEEFFECTIVENESSOFMULTITASKLEARNINGFORPHENOTYPINGWITHELECTRONICHEALTHRECORDSDATA....................................................................................................................................................2DaisyYiDing,ChloeSimpson,StephenPfohl,DaveC.Kale,KennethJung,NigamH.Shah.................2

ODAL:AONE-SHOTDISTRIBUTEDALGORITHMTOPERFORMLOGISTICREGRESSIONSONELECTRONICHEALTHRECORDSDATAFROMMULTIPLECLINICALSITES................................................3RuiDuan,MaryReginaBoland,JasonH.Moore,YongChen..............................................................................3

PVCDETECTIONUSINGACONVOLUTIONALAUTOENCODERANDRANDOMFORESTCLASSIFIER.4MaxGordon,CranosWilliams..........................................................................................................................................4

PLATYPUS:AMULTIPLE–VIEWLEARNINGPREDICTIVEFRAMEWORKFORCANCERDRUGSENSITIVITYPREDICTION..................................................................................................................................................5KileyGraim,VerenaFriedl,KathleenE.Houlahan,JoshuaM.Stuart............................................................5

DEEPDOM:PREDICTINGPROTEINDOMAINBOUNDARYFROMSEQUENCEALONEUSINGSTACKEDBIDIRECTIONALLSTM....................................................................................................................................6YuexuJiang,DuolinWang,DongXu.............................................................................................................................6

IMPLEMENTINGANDEVALUATINGAGAUSSIANMIXTUREFRAMEWORKFORIDENTIFYINGGENEFUNCTIONFROMTNSEQDATA...........................................................................................................................7KevinLi,RachelChen,WilliamLindsey,AaronBest,MatthewDeJongh,ChristopherHenry,NathanTintle.............................................................................................................................................................................................7

RES2S2AM:DEEPRESIDUALNETWORK-BASEDMODELFORIDENTIFYINGFUNCTIONALNONCODINGSNPSINTRAIT-ASSOCIATEDREGIONS.............................................................................................8ZhengLiu,YaoYao,QiWei,BenjaminWeeder,StephenA.Ramsey...............................................................8

BI-DIRECTIONALRECURRENTNEURALNETWORKMODELSFORGEOGRAPHICLOCATIONEXTRACTIONINBIOMEDICALLITERATURE.............................................................................................................9ArjunMagge,DavyWeissenbacher,AbeedSarker,MatthewScotch,GracielaGonzalez-Hernandez.......9

COMPUTATIONALKIRCOPYNUMBERDISCOVERYREVEALSINTERACTIONBETWEENINHIBITORYRECEPTORBURDENANDSURVIVAL...............................................................................................10RachelM.Pyke,RaphaelGenolet,AlexandreHarari,GeorgeCoukos,DavidGfeller,HannahCarter......................................................................................................................................................................................................10

SEMANTICWORKFLOWSFORBENCHMARKCHALLENGES:ENHANCINGCOMPARABILITY,REUSABILITYANDREPRODUCIBILITY......................................................................................................................11ArunimaSrivastava,RavaliAdusumilli,HunterBoyce,DanielGarijo,VarunRatnakar,RajivMayani,ThomasYu,RaghuMachiraju,YolandaGil,ParagMallick.............................................................11

REMOVINGCONFOUNDINGFACTORSASSOCIATEDWEIGHTSINDEEPNEURALNETWORKSIMPROVESTHEPREDICTIONACCURACYFORHEALTHCAREAPPLICATIONS.......................................12HaohanWang,ZhenglinWu,EricP.Xing...............................................................................................................12

Page 3: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

ii

PRECISIONMEDICINE:IMPROVINGHEALTHTHROUGHHIGH-RESOLUTIONANALYSISOFPERSONALDATA..................................................................................................................................................13ANOPTIMALPOLICYFORPATIENTLABORATORYTESTSININTENSIVECAREUNITS.....................14Li-FangCheng,NiranjaniPrasad,BarbaraE.Engelhardt.............................................................................14

CROWDVARIANT:ACROWDSOURCINGAPPROACHTOCLASSIFYCOPYNUMBERVARIANTS......15PeytonGreenside,JustinZook,MarcSalit,RyanPoplin,MadeleineCule,MarkDePristo.................15

AREPOSITORYOFMICROBIALMARKERGENESRELATEDTOHUMANHEALTHANDDISEASESFORHOSTPHENOTYPEPREDICTIONUSINGMICROBIOMEDATA...............................................................16WontackHan,YuzhenYe................................................................................................................................................16

AICM:AGENUINEFRAMEWORKFORCORRECTINGINCONSISTENCYBETWEENLARGEPHARMACOGENOMICSDATASETS..............................................................................................................................17ZhiyueTomHu,YutingYe,PatrickA.Newbury,HaiyanHuang,BinChen...............................................17

INTEGRATINGRNAEXPRESSIONANDVISUALFEATURESFORIMMUNEINFILTRATEPREDICTION...........................................................................................................................................................................18DerekReiman,LingdaoSha,IrvinHo,TimothyTan,DeniseLau,AlyA.Khan........................................18

OUTGROUPMACHINELEARNINGAPPROACHIDENTIFIESSINGLENUCLEOTIDEVARIANTSINNONCODINGDNAASSOCIATEDWITHAUTISMSPECTRUMDISORDER....................................................19MayaVarma,KelleyMariePaskov,Jae-YoonJung,BriannaSierraChrisman,NateTylerStockham,PeterYigitcanWashington,DennisPaulWall.................................................................................19

PRECISIONDRUGREPURPOSINGVIACONVERGENTEQTL-BASEDMOLECULESANDPATHWAYTARGETINGINDEPENDENTDISEASE-ASSOCIATEDPOLYMORPHISMS....................................................20FrancescaVitali,JoanneBerghout,JungweiFan,JianrongLi,QikeLi,HaiquanLi,YvesA.Lussier......................................................................................................................................................................................................20

DETECTINGPOTENTIALPLEIOTROPYACROSSCARDIOVASCULARANDNEUROLOGICALDISEASESUSINGUNIVARIATE,BIVARIATE,ANDMULTIVARIATEMETHODSON43,870INDIVIDUALSFROMTHEEMERGENETWORK.......................................................................................................21XinyuanZhang,YogasudhaVeturi,ShefaliS.Verma,WilliamBone,AnuragVerma,AnastasiaM.Lucas,ScottHebbring,JoshuaC.Denny,IanStanaway,GailP.Jarvik,DavidCrosslin,EricB.Larson,LauraRasmussen-Torvik,SarahA.Pendergrass,JordanW.Smoller,HakonHakonarson,PatrickSleiman,ChunhuaWeng,DavidFasel,Wei-QiWei,IftikharKullo,DanielSchaid,WendyK.Chung,MarylynD.Ritchie................................................................................................................................................................21

SINGLECELLANALYSIS–WHATISTHEFUTURE?....................................................................................22LISA:ACCURATERECONSTRUCTIONOFCELLTRAJECTORYANDPSEUDO-TIMEFORMASSIVESINGLECELLRNA-SEQDATA.........................................................................................................................................23YangChen,YupingZhang,ZhengqingOuyang....................................................................................................23

PARAMETERTUNINGISAKEYPARTOFDIMENSIONALITYREDUCTIONVIADEEPVARIATIONALAUTOENCODERSFORSINGLECELLRNATRANSCRIPTOMICS.......................................................................24QiwenHu,CaseyS.Greene..............................................................................................................................................24

TOPOLOGICALMETHODSFORVISUALIZATIONANDANALYSISOFHIGHDIMENSIONALSINGLE-CELLRNASEQUENCINGDATA......................................................................................................................................25TongxinWang,TravisJohnson,JieZhang,KunHuang.....................................................................................25

Page 4: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

iii

WHENBIOLOGYGETSPERSONAL:HIDDENCHALLENGESOFPRIVACYANDETHICSINBIOLOGICALBIGDATA.......................................................................................................................................26LEVERAGINGSUMMARYSTATISTICSTOMAKEINFERENCESABOUTCOMPLEXPHENOTYPESINLARGEBIOBANKS................................................................................................................................................................27AngelaGasdaska,DerekFriend,RachelChen,JasonWestra,MatthewZawistowski,WilliamLindsey,NathanTintle.....................................................................................................................................................27

EVALUATIONOFPATIENTRE-IDENTIFICATIONUSINGLABORATORYTESTORDERSANDMITIGATIONVIALATENTSPACEVARIABLES........................................................................................................28KippW.Johnson,JessicaK.DeFreitas,BenjaminS.Glicksberg,JasonR.Bobe,JoelT.Dudley.......28

PROTECTINGGENOMICDATAPRIVACYWITHPROBABILISTICMODELING...........................................29SeanSimmons,BonnieBerger,CenkSahinalp.....................................................................................................29

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSPATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK.......................................................................................................................................................................30SNPS2CHIP:LATENTFACTORSOFCHIP-SEQTOINFERFUNCTIONSOFNON-CODINGSNPS...........31ShankaraAnand,LaurynasKalesinskas,CraigSmail,YosukeTanigawa................................................31

DNASTEGANALYSISUSINGDEEPRECURRENTNEURALNETWORKS.......................................................32HoBae,ByunghanLee,SunyoungKwon,SungrohYoon...................................................................................32

LEARNINGCONTEXTUALHIERARCHICALSTRUCTUREOFMEDICALCONCEPTSWITHPOINCAIRÉEMBEDDINGSTOCLARIFYPHENOTYPES................................................................................................................33BrettK.Beaulieu-Jones,IsaacS.Kohane,AndrewL.Beam...........................................................................33

EXPLORINGMICRORNAREGULATIONOFCANCERWITHCONTEXT-AWAREDEEPCANCERCLASSIFIER.............................................................................................................................................................................34BlakePyman,AlirezaSedghi,ShekoofehAzizi,KathrinTyryshkin,NeilRenwick,ParvinMousavi........34

ESTIMATINGCLASSIFICATIONACCURACYINPOSITIVE-UNLABELEDLEARNING:CHARACTERIZATIONANDCORRECTIONSTRATEGIES.....................................................................................35RashikaRamola,ShantanuJain,PredragRadivojac.........................................................................................35

EXTRACTINGALLELICREADCOUNTSFROM250,000HUMANSEQUENCINGRUNSINSEQUENCEREADARCHIVE......................................................................................................................................................................36BrianTsui,MichelleDow,DylanSkola,HannahCarter....................................................................................36

AUTOMATICHUMAN-LIKEMININGANDCONSTRUCTINGRELIABLEGENETICASSOCIATIONDATABASEWITHDEEPREINFORCEMENTLEARNING......................................................................................37HaohanWang,XiangLiu,YifengTao,WentingYe,QiaoJin,WilliamW.Cohen,EricP.Xing..........37

Page 5: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

iv

PRECISIONMEDICINE:IMPROVINGHEALTHTHROUGHHIGH-RESOLUTIONANALYSISOFPERSONALDATA..................................................................................................................................................38

INFLUENCEOFTISSUECONTEXTONGENEPRIORITIZATIONFORPREDICTEDTRANSCRIPTOME-WIDEASSOCIATIONSTUDIES........................................................................................................................................39BinglanLi,YogasudhaVeturi,YukiBradford,ShefaliS.Verma,AnuragVerma,AnastasiaM.Lucas,DavidW.Haas,MarylynD.Ritchie............................................................................................................................39

SINGLECELLANALYSIS–WHATISTHEFUTURE?....................................................................................40SHALLOWSPARSELY-CONNECTEDAUTOENCODERSFORGENESETPROJECTION............................41MaxwellP.Gold,AlexanderLeNail,ErnestFraenkel.........................................................................................41

WHENBIOLOGYGETSPERSONAL:HIDDENCHALLENGESOFPRIVACYANDETHICSINBIOLOGICALBIGDATA.......................................................................................................................................42IMPLEMENTINGAUNIVERSALINFORMEDCONSENTPROCESSFORTHEALLOFUSRESEARCHPROGRAM................................................................................................................................................................................43MeganDoerr,ShiraGrayson,SarahMoore,ChristineSuver,JohnWilbanks,JenniferWagner......43

POSTERPRESENTATIONSGENERAL.................................................................................................................................................................44ACONVOLUTIONALNEURALNETPREDICTSBINDINGPROPERTIESOFANANTIBODYLIBRARY45RishiBedi,RachelHovde,JacobGlanville................................................................................................................45

CNVAR:ASOFTWARETOOLFORGENOTYPINGCYP2D6USINGSHORTREADNEXTGENERATIONSEQUENCINGTECHNOLOGY...........................................................................................................................................46JohnLoganBlackIIIMD,HuguesSicottePhD,SandraE.Peterson,KimberleyJ.Harris,LieweiWangMDPhD,StevenSchererPhD,EricBoerwinklePhD,RichardA.GibbsPhD,SuzetteJ.BielinskiPhD,RichardWeinshilboumMD...................................................................................................................................46

NETWORKANALYSISOFDISTINCTCOHORTSALLOWSFORTHECOMPARISONOFKEYBIOLOGICALFUNCTIONSRELATEDTOTBPATHOGENESIS...........................................................................47CarlyBobak,MeghanE.Muse,AlexanderJ.Titus,BrockC.Christensen,A.JamesO'Malley,JaneE.Hill..............................................................................................................................................................................................47

VARIATIONINOPIOIDPRESCRIBINGPATTERNSINSURGICALPOPULATIONS....................................48SolineM.Boussard,MarylynD.Ritchie,MichelleWhirl-Carrillo,TinaHernandez-Boussard,TeriE.Klein......................................................................................................................................................................................48

REGIONALHETEROGENEITYINGENEEXPRESSION,REGULATIONANDCOHERENCEINHIPPOCAMPUSANDDORSOLATERALPREFRONTALCORTEXACROSSDEVELOPMENTANDSCHIZOPHRENIA..................................................................................................................................................................49LeonardoCollado-Torres,EmilyE.Burke,AmyPeterson,JooHeonShin,RIchardE.Straub,AnanditaRajpurohit,StephenA.Semick,WilliamS.Ulrich,BrainSeqConsortium,CristianValencia,RanTao,AmyDeep-Soboslay,ThomasM.Hyde,JoelE.Kleinman,DanielRWeinberger,,AndrewE.Jaffe1....................................................................................................................................................................49

FULL-LENGTHSEQUENCEASSEMBLYANDCHARACTERIZATIONOFHIGHLYPURIFIEDCIRCRNAISOFORMS................................................................................................................................................................................50SupriyoDe,AmareshC.Panda,MyriamGorospe.................................................................................................50

Page 6: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

v

ACOMPREHENSIVEREVIEWANDASSESSMENTOFEXISTINGPATHWAYANALYSISAPPROACHES.........................................................................................................................................................................51Tuan-MinhNguyen,AdibShafi,TinNguyen,SorinDraghici.........................................................................51

ANEWPHYLOGENETICSAMPLINGMETHODUSINGGENERALIZED-ENSEMBLEALGORITHM.....52TetsuFurukawa,HiroyukiToh....................................................................................................................................52

CONVERGENTMECHANISMSPERTURBEDBYSCATTEREDSNPSSUSCEPTIBLETOALZHEIMER'SDISEASE....................................................................................................................................................................................53JialiHan,EdwinBaldwin,JinZhou,FeiYin,HaiquanLi,...................................................................................53

IDENTIFICATIONANDEVALUATIONOFCO-EXPRESSIONGENENETWORKSFORPACLITAXEL-INDUCEDPERIPHERALNEUROPATHYINBREASTCANCERSURVIVORS.................................................54KordM.Kober,JonD.Levine,JudyMastick,BruceCooper,StevenPaul,ChristineMiaskowsk1.....54

VARIFI-WEB-BASEDAUTOMATICVARIANTIDENTIFICATION,FILTERINGANDANNOTATIONOFAMPLICONSEQUENCINGDATA....................................................................................................................................55MilicaKrunic,PeterVenhuizen,LeonhardMüllauer,BettinaKaserer,ArndtvonHaeseler............55

STATISTICALINFERENCERELIEF(STIR)FEATURESELECTION..................................................................56TrangT.Le,RyanJ.Urbanowicz,JasonH.Moore,BrettA.McKinney.........................................................56

DEEPLEARNING-BASEDLONGITUDINALHETEROGENEOUSDATAINTEGRATIONFRAMEWORKFORAD-RELEVANTFEATUREEXTRACTION..........................................................................................................57GaramLee,KwangsikNho,ByungkonKang,Kyung-AhSohn,DokyoonKim..........................................57

MICROBIOMEANALYSISOFUNEXPLAINEDCASESOFPNEUMONIAINSOUTHKOREA....................58SooyeonLim,JaeKyungLee,JiYunNoh,WooJooKim......................................................................................58

POTRA:PATHWAYANALYSISOFCANCERGENOMICSDATAINTHECLOUD.........................................59MargaretLinan,JunwenWang,ValentinDinu.....................................................................................................59

EVALUATINGCELLLINESASMODELSFORMETASTATICCANCERTHROUGHINTEGRATIVEANALYSISOFOPENGENOMICDATA..........................................................................................................................60KeLiu,PatrickA.Newbury,BenjaminS.Glicksberg,WilliamZeng,EranR.Andrechek,BinChen60

PATHWAYANALYSISOFEHRANDNON-EHR-BASEDGWASCONNECTSLIPIDMETABOLISMTOTHEIMMUNERESPONSE.................................................................................................................................................61JasonE.Miller,ThomasJ.Hoffmann,ElizabethTheusch,CarlosIribarren,MarisaW.Medina,NeilRisch,RonaldM.Krauss,MarylynD.Ritchie............................................................................................................61

META-ANALYSISOFHETEROGENEITYANDBATCHEFFECTSINTHEA549CELLLINE...................62AbigailMoore,JohnCastorino.....................................................................................................................................62

HYPERPARAMETERTUNINGFORCHIP-SEQPEAKCALLINGSOFTWARETOOLSUSINGPARALLELIZEDBAYESIANOPTIMIZATION.............................................................................................................63DongpinOh,JinheeLee,SeonghyeonKim,DohyeonLee,DongwonChoo,GiltaeSong.......................63

Page 7: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

vi

CROSS-STUDYMETA-ANALYSISIDENTIFIESALTEREDBACTERIALSTRAINSSEPARATINGRESPONDERANDNON-RESPONDERPOPULATIONSACROSSMULTIPLECHECKPOINT-INHIBITORTHERAPYDATASETS..........................................................................................................................................................64JayamaryDivyaRavichandar,EricaRutherford,YongganWu,ThomasWeinmaier,Cheryl-EmilianeChow,ShokoIwai,HelenaKiefel,KareemGraham,KarimDabbagh,ToddDeSantis.......64

AHYPOTHESISOFTHESTABILIZINGROLEOFALUEXPANSIONVIAHOMOLOGYDIRECTEDREPAIROFSPONTANEOUSDNADOUBLESTRANDEDBREAKS....................................................................65TanmoyRoychowdhury,AlexejAbyzov....................................................................................................................65

STATISTICALLEARNINGWITHHIGH-DIMENSIONALMASSCYTOMETRYDATA..................................66PratyaydiptaRudra,ElenaHsieh,DebashisGhosh............................................................................................66

HARDWAREACCELERATIONOFAPPROXIMATESTRINGMATCHINGFORBOTHSHORTANDLONGREADMAPPING.......................................................................................................................................................67DamlaSenolCali,LavanyaSubramanian,ZülalBingöl,JeremieS.Kim,RachataAusavarungnirun,AnantV.Nori,GurpreetS.Kalsi,SreenivasSubramoney,SaugataGhose,CanAlkan,OnurMutlu

TRANSITIONOFREGULATORYFORCETOWARDTHEGENEEXPRESSIONSDURINGOSTEOBLASTCELLDIFFERENTIATION..................................................................................................................................................68YoichiTakenaka................................................................................................................................................................68

METHYLATIONPROFILESOFMELANOMATOPREDICTTILS........................................................................69YihsuanTsai,NanaNikolaishviliFeinberg,KathleenConway,SharonN.Edmiston,NancyE.Thomas,JoelS.Parker.......................................................................................................................................................69

HIGH-THROUGHPUTGENETOKNOWLEDGEMAPPINGTHROUGHMASSIVEINTEGRATIONOFPUBLICSEQUENCINGDATA............................................................................................................................................70BrianTsui,HannahCarter.............................................................................................................................................70

MANTA-RAE,PREDICTINGTHEIMPACTOFGENOMEVARIANTSONTHETRANSCRIPTIONFACTORBINDINGPOTENTIALOFREGULATORYELEMENTS........................................................................71RobinvanderLee,PhillipA.Richmond,OriolFornes,WyethW.Wasserman.......................................71

USINGQUANTITATIVEPHOSPHOPROTEOMICSTOUNDERSTANDFUNCTIONALSELECTIVITYOFRECEPTORTYROSINEKINASES....................................................................................................................................72J.Watson,C.Francavilla,J.M.Schwartz....................................................................................................................72

ANERISAPPLIED:SPARK-ENABLEDANALYTICSFORFULL-SCALEANDREPRODUCIBLEANNOTATION-BASEDGENOMICSTUDIES...............................................................................................................73NicholasWheeler,JeremyFondran,PennyBenchek,JonathanHaines,WilliamS.Bush..................73

PUTTINGRELICANTHUSINITSPLACE:IMPACTOFMIXTUREMODELCHOICEONPHYLOGENETICRECONSTRUCTION...........................................................................................................................74MadelyneXiao,MercerR.Brugler,EstefaniaRodriguez..................................................................................74

RATIONALDESIGNOFNOVELSKP2INHIBITORSUSINGDEEPNEURALNETWORKS........................75ShuxingZhang,BeibeiHuang,LonW.Fong..........................................................................................................75

Page 8: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

vii

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK.......................................................................................................................................................................76ODAL:AONE-SHOTDISTRIBUTEDALGORITHMTOPERFORMLOGISTICREGRESSIONSONELECTRONICHEALTHRECORDSDATAFROMMULTIPLECLINICALSITES.............................................77RuiDuan,MaryReginaBoland,JasonH.Moore,YongChen...........................................................................77

PLATYPUS:AMULTIPLE-VIEWLEARNINGPREDICTIVEFRAMEWORKFORCANCERDRUGSENSITIVITYPREDICTION...............................................................................................................................................78KileyGraim,VerenaFriedl,KathleenE.Houlahan,JoshuaM.Stuart.........................................................78

ASOFTWAREPIPELINEFORDETERMININGFINE-SCALETEMPORALGENOMEVARIATIONPATTERNSINEVOLVINGPOPULATIONSUSINGANON-PARAMETRICSTATISTICALTEST............79MinjungKwak,SeokwooKang,DongwonChoo,DohyeonLee,JinheeLee,SeonghyeonKim,GiltaeSong...........................................................................................................................................................................................79

ADEEPLEARNINGAPPROACHTOIDENTIFYINGTHECELLULARCOMPOSITIONOFSOLIDTISSUEWITHDNAMETHYLATIONDATA................................................................................................................................80MeghanE.Muse,CurtisL.Petersen,CarmenJ.Marsit,DianeGilbert-Diamond,BrockC.Christensen..80

DIRECTLYMEASURINGTHERATEANDDYNAMICSHUMANMUTATIONBYSEQUENCINGLARGE,MULTI-GENERATIONALPEDIGREES..........................................................................................................................81ThomasA.Sasani,BrentS.Pedersen,MarkLeppert,RayWhite,LisaBaird,AaronR.Quinlan,LynnB.Jorde..........................................................................................................................................................................81

AVAILABLEPROTEIN3DSTRUCTURESDONOTREFLECTHUMANGENETICANDFUNCTIONALDIVERSITY...............................................................................................................................................................................82GregorySliwoski,NeelPatel,R.MichaelSivley,CharlesR.Sanders,JensMeiler,WilliamS.Bush,JohnA.Capra..........................................................................................................................................................................82

SEMANTICWORKFLOWSFORBENCHMARKCHALLENGES:ENHANCINGCOMPARABILITY,REUSABILITYANDREPRODUCIBILITY......................................................................................................................83ArunimaSrivastava,RavaliAdusumilli,HunterBoyce,DanielGarijo,VarunRatnakar,RajivMayani,ThomasYu,RaghuMachiraju,YolandaGil,ParagMallick.............................................................83

PRECISIONMEDICINE:IMPROVINGHEALTHTHROUGHHIGH-RESOLUTIONANALYSISOFPERSONALDATA..................................................................................................................................................84CLASSPRIORESTIMATIONANDQUANTIFICATIONOFTHELOSSANDGAINOFRESIDUEFUNCTIONUPONMUTATION.........................................................................................................................................85ShantanuJain,JoseLugo-Martinez,MarthaWhite,MichaelW.Trosset,PredragRadivojac..........85

PREDICTIONOFTIMETOINSULINUSINGCLINICALANDGENETICBIOMARKERSINTYPE2DIABETESPATIENTS..........................................................................................................................................................86RikkeLinnemannNielsen,LouiseDonnelly,AgnesMartineNielsen,KonstantinosTsirigos,KaixinZhou,BjarneErsboell,LineClemmensen,EwanPearson,RamneekGupta................................................86

PATHOGENICITYANDFUNCTIONALIMPACTOFINSERTION/DELETIONANDSTOPGAINVARIATIONINTHEHUMANGENOME.......................................................................................................................87KymberleighA.Pagel,DannyAntaki,MatthewMort,DavidN.Cooper,JonathanSebat,LiliaM.Iakoucheva,SeanD.Mooney,PredragRadivojac...............................................................................................87

Page 9: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

viii

DETECTINGPOTENTIALPLEIOTROPYACROSSCARDIOVASCULARANDNEUROLOGICALDISEASESUSINGUNIVARIATE,BIVARIATE,ANDMULTIVARIATEMETHODSON43,870INDIVIDUALSFROMTHEEMERGENETWORK......................................................................................................88XinyuanZhang,YogasudhaVeturi,ShefaliS.Verma,WilliamBone,AnuragVerma,AnastasiaM.Lucas,ScottHebbring,JoshuaC.Denny,IanStanaway,GailP.Jarvik,DavidCrosslin,EricB.Larson,LauraRasmussen-Torvik,SarahA.Pendergrass,JordanW.Smoller,HakonHakonarson,PatrickSleiman,ChunhuaWeng,DavidFasel,Wei-QiWei,IftikharKullo,DanielSchaid,WendyK.Chung,MarylynD.Ritchie................................................................................................................................................................88

PHARMGKB:THEAPIANDINFOBUTTONS......................................................................................................................89MichelleWhirl-Carrillo,RyanM.Whaley,MarkWoon,RussB.Altman,TeriE.Klein.......................89

SINGLECELLANALYSIS–WHATISINTHEFUTURE?...............................................................................90INTRATUMORHETEROGENEITY(ITH)METRICOFCIRCULATINGTUMORCELL(CTC)-DERIVEDXENOGRAFTMODELSINSMALLCELLLUNGCANCER.......................................................................................91YuanxinXi,C.AllisonStewart,CarlM.Gay,HaiTran,BonnieGlisson,JohnV.Heymach,PaulRobson,LaurenA.Byers,JingWang............................................................................................................................91

WHENBIOLOGYGETSPERSONAL:HIDDENCHALLENGESOFPRIVACYANDETHICSINBIOLOGICALBIGDATA.......................................................................................................................................92QUANTIFYINGTHEIDENTIFIABILITYOFINDIVIDUALSUSINGASPARSESETOFSNPS...................93PrashantS.Emani,GamzeGursoy,MarkB.Gerstein........................................................................................93

TRANSCRIPTOMICSUMMARYSPLICINGDATAMAYLEAKPERSONALPRIVATEINFORMATIONBYCOMPUTATIONALLINKAGETOTHEGENOMICVARIANTS.............................................................................94ZhiqiangHu,MarkB.Gerstein,StevenE.Brenner..............................................................................................94

WORKSHOPSMERGINGHETEROGENEOUSDATATOENABLEKNOWLEDGEDISCOVERY.....................................95TOSEARCHAHETNET...HOWARETWONODESCONNECTED?....................................................................96DanielHimmelstein,MichaelZietz,KyleKloster,MichaelNagle,BlairSullivan,CaseyS.Greene96

TEXTMININGANDMACHINELEARNINGFORPRECISIONMEDICINE.................................................97LITVAR:MININGGENOMICVARIANTSFROMBIOMEDICALLITERATUREFORDATABASECURATIONANDPRECISIONMEDICINE.....................................................................................................................98AlexisAllot,YifanPeng,Chih-HsuanWei,KyubumLee,LonPhan,ZhiyongLu......................................98

AUTHORINDEX.............................................................................................................................................99

Page 10: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

1

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 11: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

2

THEEFFECTIVENESSOFMULTITASKLEARNINGFORPHENOTYPINGWITHELECTRONICHEALTHRECORDSDATA

DaisyYiDing1,ChloeSimpson1,StephenPfohl1,DaveC.Kale2,KennethJung1,NigamH.Shah1

1StanfordUniversity,2UniversityofSouthernCalifornia

Ding,DaisyYiElectronicphenotypingisthetaskofascertainingwhetheranindividualhasamedicalconditionofinterestbyanalyzingtheirmedicalrecordandisfoundationalinclinicalinformatics.Increasingly,electronicphenotypingisperformedviasupervisedlearning.Weinvestigatetheeffectivenessofmultitasklearningforphenotypingusingelectronichealthrecords(EHR)data.Multitasklearningaimstoimprovemodelperformanceonatargettaskbyjointlylearningadditionalauxiliarytasksandhasbeenusedindisparateareasofmachinelearning.However,itsutilitywhenappliedtoEHRdatahasnotbeenestablished,andpriorworksuggeststhatitsbenefitsareinconsistent.WepresentexperimentsthatelucidatewhenmultitasklearningwithneuralnetsimprovesperformanceforphenotypingusingEHRdatarelativetoneuralnetstrainedforasinglephenotypeandtowell-tunedbaselines.Wefindthatmultitaskneuralnetsconsistentlyoutperformsingle-taskneuralnetsforrarephenotypesbutunderperformforrelativelymorecommonphenotypes.Theeffectsizeincreasesasmoreauxiliarytasksareadded.Moreover,multitasklearningreducesthesensitivityofneuralnetstohyperparametersettingsforrarephenotypes.Last,wequantifyphenotypecomplexityandfindthatneuralnetstrainedwithorwithoutmultitasklearningdonotimproveonsimplebaselinesunlessthephenotypesaresufficientlycomplex.

Page 12: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

3

ODAL:AONE-SHOTDISTRIBUTEDALGORITHMTOPERFORMLOGISTICREGRESSIONSONELECTRONICHEALTHRECORDSDATAFROMMULTIPLE

CLINICALSITES

RuiDuan,MaryReginaBoland,JasonH.Moore,YongChen

DepartmentofBiostatistics,Epidemiology&Informatics,UniversityofPennsylvaniaChen,YongElectronicHealthRecords(EHR)containextensiveinformationonvarioushealthoutcomesandriskfactors,andthereforehavebeenbroadlyusedinhealthcareresearch.IntegratingEHRdatafrommultipleclinicalsitescanaccelerateknowledgediscoveryandriskpredictionbyprovidingalargersamplesizeinamoregeneralpopulationwhichpotentiallyreducesclinicalbiasandimprovesestimationandpredictionaccuracy.Toovercomethebarrierofpatient-leveldatasharing,distributedalgorithmsaredevelopedtoconductstatisticalanalysesacrossmultiplesitesthroughsharingonlyaggregatedinformation.Thecurrentdistributedalgorithmoftenrequiresiterativeinformationevaluationandtransferringacrosssites,whichcanpotentiallyleadtoahighcommunicationcostinpracticalsettings.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmforlogisticregressionwithoutrequiringiterativecommunicationsacrosssites.Oursimulationstudyshowedouralgorithmreachedcomparativeaccuracycomparingtotheoracleestimatorwheredataarepooledtogether.WeappliedouralgorithmtoanEHRdatafromtheUniversityofPennsylvaniahealthsystemtoevaluatetherisksoffetallossduetovariousmedicationexposures.

Page 13: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

4

PVCDETECTIONUSINGACONVOLUTIONALAUTOENCODERANDRANDOMFORESTCLASSIFIER

MaxGordon,CranosWilliams

NorthCarolinaStateUniversityGordon,MaxTheaccuratedetectionofprematureventricularcontractions(PVCs)inpatientsisanimportanttaskincardiaccareforsomepatients.Insomecases,theusefulnesstophysiciansindetectingPVCsstemsfromtheirlong-termcorrelationswithdangerousheartconditions.Inothercasestheirpotentialasaprecursortoseriouscardiaceventsmaymaketheirdetectionausefulearlywarningmechanism.Inmanyoftheseapplications,thelong-termnatureofthemonitoringrequiredandtheinfrequencyofPVCsmakemanualobservationforPVCsimpractical.ExistingmethodsofautomatedPVCdetectionsufferfromdrawbackssuchastheneedtousedifficulttoextractmorphologicalfeatures,domain-specificfeatures,orlargenumbersofestimatedparameters.Inparticular,systemsusinglargenumbersoftrainedparametershavethepotentialtorequirelargeamountsoftrainingdataandcomputationandmayhaveissuesgeneralizingduetotheirpotentialtooverfit.Toaddresssomeofthesedrawbacks,wedevelopedanovelPVCdetectionalgorithmbasedaroundaconvolutionalautoencodertoaddresstheseweaknessesandvalidatedourmethodusingtheMIT-BIHarrhythmiadatabase.

Page 14: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

5

PLATYPUS:AMULTIPLE–VIEWLEARNINGPREDICTIVEFRAMEWORKFORCANCERDRUGSENSITIVITYPREDICTION

KileyGraim,VerenaFriedl,KathleenE.Houlahan,JoshuaM.Stuart

Dept.ofBiomolecularEngineeringUniversityofCaliforniaSantaCruz,FlatironInstituteandPrincetonUniversity,OntarioInstituteofCancerResearchandUniversityofTorontoGraim,KileyCancerisacomplexcollectionofdiseasesthataretosomedegreeuniquetoeachpatient.Precisiononcologyaimstoidentifythebestdrugtreatmentregimeusingmoleculardataontumorsamples.Whileomics-leveldataisbecomingmorewidelyavailablefortumorspecimens,thedatasetsuponwhichcomputationallearningmethodscanbetrainedvaryincoveragefromsampletosampleandfromdatatypetodatatype.Methodsthatcan‘connectthedots’toleveragemoreoftheinformationprovidedbythesestudiescouldoffermajoradvantagesformaximizingpredictivepotential.Weintroduceamulti-viewmachine-learningstrategycalledPLATYPUSthatbuilds‘views’frommultipledatasourcesthatareallusedasfeaturesforpredictingpatientoutcomes.Weshowthatalearningstrategythatfindsagreementacrosstheviewsonunlabeleddataincreasestheperformanceofthelearningmethodsoveranysingleview.Weillustratethepoweroftheapproachbyderivingsignaturesfordrugsensitivityinalargecancercelllinedatabase.CodeandadditionalinformationareavailablefromthePLATYPUSwebsitehttps://sysbiowiki.soe.ucsc.edu/platypus.

Page 15: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

6

DEEPDOM:PREDICTINGPROTEINDOMAINBOUNDARYFROMSEQUENCEALONEUSINGSTACKEDBIDIRECTIONALLSTM

YuexuJiang,DuolinWang,DongXu

DepartmentofElectricalEngineeringandComputerScience,BondLifeSciencesCenter,UniversityofMissouri,Columbia,Missouri65211,USAEmail:[email protected]

Jiang,YuexuProteindomainboundarypredictionisusuallyanearlysteptounderstandproteinfunctionandstructure.Mostofthecurrentcomputationaldomainboundarypredictionmethodssufferfromlowaccuracyandlimitationinhandlingmulti-domaintypes,orevencannotbeappliedoncertaintargetssuchasproteinswithdiscontinuousdomain.Wedevelopedanab-initioproteindomainpredictorusingastackedbidirectionalLSTMmodelindeeplearning.Ourmodelistrainedbyalargeamountofproteinsequenceswithoutusingfeatureengineeringsuchassequenceprofiles.Hence,thepredictionsusingourmethodismuchfasterthanothers,andthetrainedmodelcanbeappliedtoanytypeoftargetproteinswithoutconstraint.WeevaluatedDeepDombya10-foldcrossvalidationandalsobyapplyingitontargetsindifferentcategoriesfromCASP8andCASP9.ThecomparisonwithothermethodshasshownthatDeepDomoutperformsmostofthecurrentab-initiomethodsandevenachievesbetterresultsthanthetop-leveltemplate-basedmethodincertaincases.ThecodeofDeepDomandthetestdataweusedinCASP8,9canbeaccessedthroughGitHubathttps://github.com/yuexujiang/DeepDom.

Page 16: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

7

IMPLEMENTINGANDEVALUATINGAGAUSSIANMIXTUREFRAMEWORKFORIDENTIFYINGGENEFUNCTIONFROMTNSEQDATA

KevinLi1,RachelChen2,WilliamLindsey3,AaronBest4,MatthewDeJongh4,ChristopherHenry5,NathanTintle3

1ColumbiaUniversity,2NorthCarolinaStateUniversity,3DordtCollege,4HopeCollege,

5ArgonneLaboratoryLi,KevinTherapidaccelerationofmicrobialgenomesequencingincreasesopportunitiestounderstandbacterialgenefunction.Unfortunately,onlyasmallproportionofgeneshavebeenstudied.Recently,TnSeqhasbeenproposedasacost-effective,highlyreliableapproachtopredictgenefunctionsasaresponsetochangesinacell’sfitnessbefore-aftergenomicchanges.However,majorquestionsremainabouthowtobestdeterminewhetheranobservedquantitativechangeinfitnessrepresentsameaningfulchange.Toaddressthelimitation,wedevelopaGaussianmixturemodelframeworkforclassifyinggenefunctionfromTnSeqexperiments.Inordertoimplementthemixturemodel,wepresenttheExpectation-MaximizationalgorithmandahierarchicalBayesianmodelsampledusingStan’sHamiltonianMonte-Carlosampler.WecomparetheseimplementationsagainstthefrequentistmethodusedincurrentTnSeqliterature.FromsimulationsandrealdataproducedbyE.coliTnSeqexperiments,weshowthattheBayesianimplementationoftheGaussianmixtureframeworkprovidesthemostconsistentclassificationresults.

Page 17: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

8

RES2S2AM:DEEPRESIDUALNETWORK-BASEDMODELFORIDENTIFYINGFUNCTIONALNONCODINGSNPSINTRAIT-ASSOCIATEDREGIONS

ZhengLiu,YaoYao,QiWei,BenjaminWeeder,StephenA.Ramsey

OregonStateUniversityLiu,ZhengNoncodingsinglenucleotidepolymorphisms(SNPs)andtheirtargetgenesareimportantcomponentsoftheheritabilityofdiseasesandotherpolygenictraits.IdentifyingtheseSNPsandtargetgenescouldpotentiallyrevealnewmolecularmechanismsandadvanceprecisionmedicine.Forpolygenictraits,genome-wideassociationstudies(GWAS)arepreferredtoolsforidentifyingtrait-associatedregions.However,identifyingcausalnoncodingSNPswithinsuchregionsisadifficultproblemincomputationalbiology.TheDNAsequencecontextofanoncodingSNPiswell-establishedasanimportantsourceofinformationthatisbeneficialfordiscriminatingfunctionalfromnonfunctionalnoncodingSNPs.Wedescribetheuseofadeepresidualnetwork(ResNet)-basedmodel—entitledRes2s2aM—thatfusesflankingDNAsequenceinformationwithadditionalSNPannotationinformationtodiscriminatefunctionalfromnonfunctionalnoncodingSNPs.Onaground-truthsetofdisease-associatedSNPscompiledfromtheGenome-wideRepositoryofAssociationsbetweenSNPsandPhenotypes(GRASP)database,Res2s2aMimprovesthepredictionaccuracyoffunctionalSNPssignificantlyincomparisontomodelsbasedonlyonsequenceinformationaswellasaleadingtoolforpost-GWASnoncodingSNPprioritization(RegulomeDB).

Page 18: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

9

BI-DIRECTIONALRECURRENTNEURALNETWORKMODELSFORGEOGRAPHICLOCATIONEXTRACTIONINBIOMEDICALLITERATURE

ArjunMagge1,DavyWeissenbacher2,AbeedSarker2,MatthewScotch1,GracielaGonzalez-Hernandez2

1ArizonaStateUniversity,2UniversityofPennsylvania

Magge,ArjunPhylogeographyresearchinvolvingvirusspreadandtreereconstructionreliesonaccurategeographiclocationsofinfectedhosts.InsufficientlevelofgeographicinformationinnucleotidesequencerepositoriessuchasGenBankmotivatestheuseofnaturallanguageprocessingmethodsforextractinggeographiclocationnames(toponyms)inthescientificarticleassociatedwiththesequence,anddisambiguatingthelocationstotheirco-ordinates.Inthispaper,wepresentanextensivestudyofmultiplerecurrentneuralnetworkarchitecturesforthetaskofextractinggeographiclocationsandtheireffectivecontributiontothedisambiguationtaskusingpopulationheuristics.ThemethodspresentedinthispaperachieveastrictdetectionF-1scoreof0.94,disambiguationaccuracyof91%andanoverallresolutionF-1scoreof0.88thataresignificantlyhigherthanpreviouslydevelopedmethods,improvingourcapabilitytofindthelocationofinfectedhostsandenrichmetadatainformation.

Page 19: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

10

COMPUTATIONALKIRCOPYNUMBERDISCOVERYREVEALSINTERACTIONBETWEENINHIBITORYRECEPTORBURDENANDSURVIVAL

RachelM.Pyke1,RaphaelGenolet2,AlexandreHarari2,GeorgeCoukos2,DavidGfeller2,HannahCarter1

1UniversityofCalifornia-SanDiego,2LudwigInstituteforCancerResearch-Universityof

LausannePyke,RachelM.Naturalkiller(NK)cellshaveincreasinglybecomeatargetofinterestforimmunotherapies1.NKcellsexpresskillerimmunoglobulin-likereceptors(KIRs),whichplayavitalroleinimmuneresponsetotumorsbydetectingcellularabnormalities.Thegenomicregionencodingthe16KIRgenesdisplayshighpolymorphicvariabilityinhumanpopulations,makingitdifficulttoresolveindividualgenotypesbasedonnextgenerationsequencingdata.Asaresult,theimpactofpolymorphicKIRvariationoncancerphenotypeshasbeenunderstudied.Currently,labor-intensive,experimentaltechniquesareusedtodetermineanindividual’sKIRgenecopynumberprofile.Here,wedevelopanalgorithmtodeterminethegermlinecopynumberofKIRgenesfromwholeexomesequencingdataandapplyittoacohortofnearly5000cancerpatients.Weuseak-merbasedapproachtocapturesequencesuniquetospecificgenes,counttheiroccurrencesinthesetofreadsderivedfromanindividualandcomparetheindividual’sk-merdistributiontothatofthepopulation.Copynumberresultsdemonstratehighconcordancewithpopulationcopynumberexpectations.OurmethodrevealsthattheburdenofinhibitoryKIRgenesisassociatedwithsurvivalintwotumortypes,highlightingthepotentialimportanceofKIRvariationinunderstandingtumordevelopmentandresponsetoimmunotherapy.

Page 20: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

11

SEMANTICWORKFLOWSFORBENCHMARKCHALLENGES:ENHANCINGCOMPARABILITY,REUSABILITYANDREPRODUCIBILITY

ArunimaSrivastava1,RavaliAdusumilli2,HunterBoyce2,DanielGarijo3,VarunRatnakar3,RajivMayani3,ThomasYu4,RaghuMachiraju1,YolandaGil3,ParagMallick2

1TheOhioStateUniversity,2StanfordUniversity,3UniversityofSouthernCalifornia,4Sage

BionetworksSrivastava,ArunimaBenchmarkchallenges,suchastheCriticalAssessmentofStructurePrediction(CASP)andDialogueforReverseEngineeringAssessmentsandMethods(DREAM)havebeeninstrumentalindrivingthedevelopmentofbioinformaticsmethods.Typically,challengesareposted,andthencompetitorsperformapredictionbaseduponblindedtestdata.Challengersthensubmittheiranswerstoacentralserverwheretheyarescored.RecenteffortstoautomatethesechallengeshavebeenenabledbysystemsinwhichchallengerssubmitDockercontainers,aunitofsoftwarethatpackagesupcodeandallofitsdependencies,toberunonthecloud.Despitetheirincrediblevalueforprovidinganunbiasedtest-bedforthebioinformaticscommunity,thereremainopportunitiestofurtherenhancethepotentialimpactofbenchmarkchallenges.Specifically,currentapproachesonlyevaluateend-to-endperformance;itisnearlyimpossibletodirectlycomparemethodologiesorparameters.Furthermore,thescientificcommunitycannoteasilyreusechallengers’approaches,duetolackofspecifics,ambiguityintoolsandparametersaswellasproblemsinsharingandmaintenance.Lastly,theintuitionbehindwhyparticularstepsareusedisnotcaptured,astheproposedworkflowsarenotexplicitlydefined,makingitcumbersometounderstandtheflowandutilizationofdata.HereweintroduceanapproachtoovercometheselimitationsbasedupontheWINGSsemanticworkflowsystem.Specifically,WINGSenablesresearcherstosubmitcompletesemanticworkflowsaschallengesubmissions.Bysubmittingentriesasworkflows,itthenbecomespossibletocomparenotjusttheresultsandperformanceofachallenger,butalsothemethodologyemployed.Thisisparticularlyimportantwhendozensofchallengeentriesmayusenearlyidenticaltools,butwithonlysubtlechangesinparameters(andradicaldifferencesinresults).WINGSusesacomponentdrivenworkflowdesignandoffersintelligentparameteranddataselectionbyreasoningaboutdatacharacteristics.Thisprovestobeespeciallycriticalinbioinformaticsworkflowswhereusingdefaultorincorrectparametervaluesispronetodrasticallyalteringresults.Differentchallengeentriesmaybereadilycomparedthroughtheuseofabstractworkflows,whichalsofacilitatereuse.WINGSishousedonacloudbasedsetup,whichstoresdata,dependenciesandworkflowsforeasysharingandutility.ItalsohastheabilitytoscaleworkflowexecutionsusingdistributedcomputingthroughthePegasusworkflowexecutionsystem.WedemonstratetheapplicationofthisarchitecturetotheDREAMproteogenomicchallenge.

Page 21: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

12

REMOVINGCONFOUNDINGFACTORSASSOCIATEDWEIGHTSINDEEPNEURALNETWORKSIMPROVESTHEPREDICTIONACCURACYFORHEALTHCARE

APPLICATIONS

HaohanWang1,ZhenglinWu2,EricP.Xing3

1CarnegieMellonUniversity,2UniversityofIllinoisUrbana-Champaign,3CarnegieMellonUniversity

Wang,HaohanTheproliferationofhealthcaredatahasbroughttheopportunitiesofapplyingdata-drivenapproaches,suchasmachinelearningmethods,toassistdiagnosis.Recently,manydeeplearningmethodshavebeenshownwithimpressivesuccessesinpredictingdiseasestatuswithrawinputdata.However,the``black-box''natureofdeeplearningandthehigh-reliabilityrequirementofbiomedicalapplicationshavecreatednewchallengesregardingtheexistenceofconfoundingfactors.Inthispaper,withabriefargumentthatinappropriatehandlingofconfoundingfactorswillleadtomodels'sub-optimalperformanceinreal-worldapplications,wepresentanefficientmethodthatcanremovetheinfluencesofconfoundingfactorssuchasageorgendertoimprovetheacross-cohortpredictionaccuracyofneuralnetworks.Onedistinctadvantageofourmethodisthatitonlyrequiresminimalchangesofthebaselinemodel'sarchitecturesothatitcanbepluggedintomostoftheexistingneuralnetworks.WeconductexperimentsacrossCT-scan,MRA,andEEGbrainwavewithconvolutionalneuralnetworksandLSTMtoverifytheefficiencyofourmethod.

Page 22: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

13

PRECISIONMEDICINE:IMPROVINGHEALTHTHROUGHHIGH-RESOLUTIONANALYSISOFPERSONALDATA

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 23: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

14

ANOPTIMALPOLICYFORPATIENTLABORATORYTESTSININTENSIVECAREUNITS

Li-FangCheng,NiranjaniPrasad,BarbaraE.Engelhardt

PrincetonUniversityPrasad,NiranjaniLaboratorytestingisanintegraltoolinthemanagementofpatientcareinhospitals,particularlyinintensivecareunits(ICUs).Thereexistsaninherenttrade-offintheselectionandtimingoflabtestsbetweenconsiderationsoftheexpectedutilityinclinicaldecision-makingofagiventestataspecifictime,andtheassociatedcostorriskitposestothepatient.Inthiswork,weintroduceaframeworkthatlearnspoliciesfororderinglabtestswhichoptimizesforthistrade-off.Ourapproachusesbatchoff-policyreinforcementlearningwithacompositerewardfunctionbasedonclinicalimperatives,appliedtodatathatincludeexamplesofcliniciansorderinglabsforpatients.Tothisend,wedevelopandextendprinciplesofParetooptimalitytoimprovetheselectionofactionsbasedonmultiplerewardfunctioncomponentswhilerespectingtypicalproceduralconsiderationsandprioritizationofclinicalgoalsintheICU.Ourexperimentsshowthatwecanestimateapolicythatreducesthefrequencyoflabtestsandoptimizestimingtominimizeinformationredundancy.Wealsofindthattheestimatedpoliciestypicallysuggestorderinglabtestswellaheadofcriticalonsets---suchasmechanicalventilationordialysis---thatdependonthelabresults.Weevaluateourapproachbyquantifyinghowthesepoliciesmayinitiateearlieronsetoftreatment.

Page 24: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

15

CROWDVARIANT:ACROWDSOURCINGAPPROACHTOCLASSIFYCOPYNUMBERVARIANTS

PeytonGreenside1,JustinZook2,MarcSalit3,RyanPoplin4,MadeleineCule5,MarkDePristo4

1StanfordUniversity,2NationalInstituteofStandardsandTechnologies(NIST),3NationalInstituteofStandardsandTechnologies(NIST)/JointInitiativeforMetrologyinBiology

(JIMB),4GoogleInc./VerilyLifeSciences,5Calico/VerilyLifeSciencesGreenside,PeytonCopynumbervariants(CNVs)areanimportanttypeofgeneticvariationthatplayacausalroleinmanydiseases.TheabilitytoidentifyhighqualityCNVsisofsubstantialclinicalrelevance.However,CNVsarenotoriouslydifficulttoidentifyaccuratelyfromarray-basedmethodsandnext-generationsequencing(NGS)data,particularlyforsmall(<10kbp)CNVs.Manualcurationbyexpertswidelyremainsthegoldstandardbutcannotscalewiththepaceofsequencing,particularlyinfast-growingclinicalapplications.Wepresentthefirstproof-of-principlestudydemonstratinghighthroughputmanualcurationofputativeCNVsbynon-experts.Wedevelopedacrowdsourcingframework,calledCrowdVariant,thatleveragesGoogle'shigh-throughputcrowdsourcingplatformtocreateahighconfidencesetofdeletionsforNA24385(NISTHG002/RM8391),anAshkenazimreferencesampledevelopedinpartnershipwiththeGenomeInABottle(GIAB)Consortium.Weshowthatnon-expertstendtoagreebothwitheachotherandwithexpertsonputativeCNVs.Weshowthatcrowdsourcednon-expertclassificationscanbeusedtoaccuratelyassigncopynumberstatustoputativeCNVcallsandidentify1,781highconfidencedeletionsinareferencesample.MultiplelinesofevidencesuggestthesecallsareasubstantialimprovementoverexistingCNVcallsetsandcanalsobeusefulinbenchmarkingandimprovingCNVcallingalgorithms.OurcrowdsourcingmethodologytakesthefirststeptowardshowingtheclinicalpotentialformanualcurationofCNVsatscaleandcanfurtherguideothercrowdsourcinggenomicsapplications.

Page 25: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

16

AREPOSITORYOFMICROBIALMARKERGENESRELATEDTOHUMANHEALTHANDDISEASESFORHOSTPHENOTYPEPREDICTIONUSINGMICROBIOMEDATA

WontackHan,YuzhenYe

IndianaUniversityHan,WontackThemicrobiomeresearchisgoingthroughanevolutionarytransitionfromfocusingonthecharacterizationofreferencemicrobiomesassociatedwithdifferentenvironments/hoststothetranslationalapplications,includingusingmicrobiomefordiseasediagnosis,improvingtheefficacyofcancertreatments,andpreventionofdiseases(e.g.,usingprobiotics).Microbialmarkershavebeenidentifiedfrommicrobiomedataderivedfromcohortsofpatientswithdifferentdiseases,treatmentresponsiveness,etc,andoftenpredictorsbasedonthesemarkerswerebuiltforpredictinghostphenotypegivenamicrobiomedataset(e.g.,topredictifapersonhastype2diabetesgivenhisorhermicrobiomedata).Unfortunately,thesemicrobialmarkersandpredictorsareoftennotpublishedsoarenotreusablebyothers.Inthispaper,wereportthecurationofarepositoryofmicrobialmarkergenesandpredictorsbuiltfromthesemarkersformicrobiome-basedpredictionofhostphenotype,andacomputationalpipelinecalledMi2P(fromMicrobiometoPhenotype)forusingtherepository.Asaninitialeffort,wefocusonmicrobialmarkergenesrelatedtotwodiseases,type2diabetesandlivercirrhosis,andimmunotherapyefficacyfortwotypesofcancer,non-small-celllungcancer(NSCLC)andrenalcellcarcinoma(RCC).Wecharacterizedthemarkergenesfrommetagenomicdatausingourrecentlydevelopedsubtractiveassemblyapproach.Weshowedthatpredictorsbuiltfromthesemicrobialmarkergenescanprovidefastandreasonablyaccuratepredictionofhostphenotypegivenmicrobiomedata.Asunderstandingandmakinguseofmicrobiomedata(oursecondgenome)isbecomingvitalaswemoveforwardinthisageofprecisionhealthandprecisionmedicine,webelievethatsucharepositorywillbeusefulforenablingtranslationalapplicationsofmicrobiomedata.

Page 26: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

17

AICM:AGENUINEFRAMEWORKFORCORRECTINGINCONSISTENCYBETWEENLARGEPHARMACOGENOMICSDATASETS

ZhiyueTomHu1,YutingYe1,PatrickA.Newbury2,HaiyanHuang2,3,4,BinChen5

1UniversityofCaliforniaBerkeley,DepartmentofBiostatistics;1UniversityofCaliforniaBerkeley,DepartmentofBiostatistics;2UniversityofCaliforniaBerkeley,Departmentof

PediatricsandHumanDevelopment;3MichiganStateUniversity,DepartmentofStatistics,4UniversityofCaliforniaBerkeley,DepartmentofPharmacologyand

Toxicology;5MichiganStateUniversityhu,ZhiyueTheinconsistencyofopenpharmacogenomicsdatasetsproducedbydifferentstudieslimitstheusageofsuchdatasetsinmanytasks,suchasbiomarkerdiscovery.Investigationofmultiplepharmacogenomicsdatasetsconfirmedthatthepairwisesensitivitydatacorrelationbetweendrugs,orrows,acrossdifferentstudies(drug-wise)isrelativelylow,whilethepairwisesensitivitydatacorrelationbetweencell-lines,orcolumns,acrossdifferentstudies(cell-wise)isconsiderablystrong.Thiscommoninterestingobservationacrossmultiplepharmacogenomicsdatasetssuggeststheexistenceofsubtleconsistencyamongthedifferentstudies(i.e.,strongcell-wisecorrelation).However,significantnoisesarealsoshown(i.e.,weakdrug-wisecorrelation)andhavepreventedresearchersfromcomfortablyusingthedatadirectly.Motivatedbythisobservation,weproposeanovelframeworkforaddressingtheinconsistencybetweenlarge-scalepharmacogenomicsdatasets.Ourmethodcansignificantlyboostthedrug-wisecorrelationandcanbeeasilyappliedtore-summarizedandnormalizeddatasetsproposedbyothers.Wealsoinvestigateouralgorithmbasedonmanydifferentcriteriatodemonstratethatthecorrecteddatasetsarenotonlyconsistent,butalsobiologicallymeaningful.Eventually,weproposetoextendourmainalgorithmintoaframework,sothatinthefuturewhenmoredatasetsbecomepubliclyavailable,ourframeworkcanhopefullyoffera"ground-truth"guidanceforreferences.

Page 27: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

18

INTEGRATINGRNAEXPRESSIONANDVISUALFEATURESFORIMMUNEINFILTRATEPREDICTION

DerekReiman1,LingdaoSha1,IrvinHo1,TimothyTan2,DeniseLau1,AlyA.Khan3

1TempusLabs,2NorthwesternUniversity,3ToyotaTechnologicalInstituteatChicagoKhan,AlyPatientresponsestocancerimmunotherapyareshapedbytheiruniquegenomiclandscapeandtumormicroenvironment.Clinicaladvancesinimmunotherapyarechangingthetreatmentlandscapebyenhancingapatient'simmuneresponsetoeliminatecancercells.Whilethisprovidespotentiallybeneficialtreatmentoptionsformanypatients,onlyaminorityofthesepatientsrespondtoimmunotherapy.Inthiswork,weexaminedRNA-seqdataanddigitalpathologyimagesfromindividualpatienttumorstomoreaccuratelycharacterizethetumor-immunemicroenvironment.Severalstudiesimplicateaninflamedmicroenvironmentandincreasedpercentageoftumorinfiltratingimmunecellswithbetterresponsetospecificimmunotherapiesincertaincancertypes.WedevelopedNEXT(Neural-basedmodelsforintegratinggeneEXpressionandvisualTexturefeatures)tomoreaccuratelymodelimmuneinfiltrationinsolidtumors.TodemonstratetheutilityoftheNEXTframework,wepredictedimmuneinfiltratesacrossfourdifferentcancertypesandevaluatedourpredictionsagainstexpertpathologyreview.Ouranalysesdemonstratethatintegrationofimagingfeaturesimprovespredictionoftheimmuneinfiltrate.Ofnote,thiseffectwaspreferentiallyobservedforBcellsandCD8Tcells.Insum,ourworkeffectivelyintegratesbothRNA-seqandimagingdatainaclinicalsettingandprovidesamorereliableandaccuratepredictionoftheimmunecompositioninindividualpatienttumors.

Page 28: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

19

OUTGROUPMACHINELEARNINGAPPROACHIDENTIFIESSINGLENUCLEOTIDEVARIANTSINNONCODINGDNAASSOCIATEDWITHAUTISMSPECTRUM

DISORDER

MayaVarma,KelleyMariePaskov,Jae-YoonJung,BriannaSierraChrisman,NateTylerStockham,PeterYigitcanWashington,DennisPaulWall

StanfordUniversity

Varma,MayaAutismspectrumdisorder(ASD)isaheritableneurodevelopmentaldisorderaffecting1in59children.Whilenoncodinggeneticvariationhasbeenshowntoplayamajorroleinmanycomplexdisorders,thecontributionoftheseregionstoASDsusceptibilityremainsunclear.GeneticanalysesofASDtypicallyuseunaffectedfamilymembersascontrols;however,wehypothesizethatthismethoddoesnoteffectivelyelevatevariantsignalinthenoncodingregionduetofamilymembershavingsubclinicalphenotypesarisingfromcommongeneticmechanisms.Inthisstudy,weuseaseparate,unrelatedoutgroupofindividualswithprogressivesupranuclearpalsy(PSP),aneurodegenerativeconditionwithnoknownetiologicaloverlapwithASD,asacontrolpopulation.Weusewholegenomesequencingdatafromalargecohortof2182childrenwithASDand379controlswithPSP,sequencedatthesamefacilitywiththesamemachinesandvariantcallingpipeline,inordertoinvestigatetheroleofnoncodingvariationintheASDphenotype.Weanalyzesevenmajortypesofnoncodingvariants:microRNAs,humanacceleratedregions,hypersensitivesites,transcriptionfactorbindingsites,DNArepeatsequences,simplerepeatsequences,andCpGislands.Afteridentifyingandremovingbatcheffectsbetweenthetwogroups,wetrainedanl1-regularizedlogisticregressionclassifiertopredictASDstatusfromeachsetofvariants.Theclassifiertrainedonsimplerepeatsequencesperformedwellonaheld-outtestset(AUC-ROC=0.960);thisclassifierwasalsoabletodifferentiateASDcasesfromcontrolswhenappliedtoacompletelyindependentdataset(AUC-ROC=0.960).ThissuggeststhatvariationinsimplerepeatregionsispredictiveoftheASDphenotypeandmaycontributetoASDrisk.Ourresultsshowtheimportanceofthenoncodingregionandtheutilityofindependentcontrolgroupsineffectivelylinkinggeneticvariationtodiseasephenotypeforcomplexdisorders.

Page 29: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

20

PRECISIONDRUGREPURPOSINGVIACONVERGENTEQTL-BASEDMOLECULESANDPATHWAYTARGETINGINDEPENDENTDISEASE-ASSOCIATED

POLYMORPHISMS

FrancescaVitali1,2,JoanneBerghout1,2,3,JungweiFan1,2,JianrongLi1,QikeLi1,HaiquanLi1,2,4,YvesA.Lussier1,2,3,5

1CenterforBiomedicalInformaticsandBiostatistics(CB2)ofTheUniversityofArizona,2DepartmentofMedicineCOM-TofTheUniversityofArizona,3TheCenterforApplied

GeneticsandGenomicsinMedicineofTheUniversityofArizona,4DepartmentofBiosystemsEngineeringofTheUniversityofArizona,5UACancerCenterUAHealth

Science(UAHS)ofTheUniversityofArizonaVitali,FrancescaRepurposingexistingdrugsfornewtherapeuticindicationscanimprovesuccessratesandstreamlinedevelopment.Useoflarge-scalebiomedicaldatarepositories,includingeQTLregulatoryrelationshipsandgenome-widediseaseriskassociations,offersopportunitiestoproposenovelindicationsfordrugstargetingcommonorconvergentmolecularcandidatesassociatedtotwoormorediseases.Thisproposednovelcomputationalapproachscalesacross262complexdiseases,buildingamulti-partitehierarchicalnetworkintegrating(i)GWAS-derivedSNP-to-diseaseassociations,(ii)eQTL-derivedSNP-to-eGeneassociationsincorporatingbothcis-andtrans-relationshipsfrom19tissues,(iii)proteintarget-to-drug,and(iv)drug-to-diseaseindicationswith(iv)GeneOntology-basedinformationtheoreticsemantic(ITS)similaritycalculatedbetweenproteintargetfunctions.OurhypothesisisthatiftwodiseasesareassociatedtoacommonorfunctionallysimilareGene-andadrugtargetingthateGene/proteininonediseaseexists-theseconddiseasebecomesapotentialrepurposingindication.Toexplorethis,allpossiblepairsofindependentlysegregatingGWAS-derivedSNPsweregenerated,andastatisticalnetworkofsimilaritywithineachSNP-SNPpairwascalculatedaccordingtoscale-freeoverrepresentationofconvergentbiologicalprocessesactivityinregulatedeGenes(ITSeGENE-eGENE)andscale-freeoverrepresentationofcommoneGenetargetsbetweenthetwoSNPs(ITSSNP-SNP).SignificanceofITSSNP-SNPwasconservativelyestimatedusingempiricalscale-freepermutationresamplingkeepingthenode-degreeconstantforeachmoleculeineachpermutation.Weidentified26newdrugrepurposingindicationcandidatesspanning89GWASdiseases,includingapotentialrepurposingofthecalcium-channelblockerVerapamilfromcoronarydiseasetogout.PredictionsfromourapproacharecomparedtoknowndrugindicationsusingDrugBankasagoldstandard(oddsratio=13.1,p-value=2.49x10-8).Becauseofspecificdisease-SNPsassociationstocandidatedrugtargets,theproposedmethodprovidesevidenceforfutureprecisiondrugrepositioningtoapatient’sspecificpolymorphisms.

Page 30: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

21

DETECTINGPOTENTIALPLEIOTROPYACROSSCARDIOVASCULARANDNEUROLOGICALDISEASESUSINGUNIVARIATE,BIVARIATE,ANDMULTIVARIATE

METHODSON43,870INDIVIDUALSFROMTHEEMERGENETWORK

XinyuanZhang1,YogasudhaVeturi1,ShefaliS.Verma1,WilliamBone1,AnuragVerma1,AnastasiaM.Lucas1,ScottHebbring2,JoshuaC.Denny3,IanStanaway4,GailP.Jarvik4,DavidCrosslin4,EricB.Larson5,LauraRasmussen-Torvik6,SarahA.Pendergrass7,JordanW.Smoller8,HakonHakonarson9,PatrickSleiman9,ChunhuaWeng10,DavidFasel10,Wei-

QiWei3,IftikharKullo11,DanielSchaid11,WendyK.Chung10,MarylynD.Ritchie1

1UniversityofPennsylvania,2MarshfieldClinic,3VanderbiltUniversity,4UniversityofWashington,5KaiserPermanenteWashingtonHealthResearchInstitute,6Northwestern

University,7GeisingerHealthSystem,8MassachusettsGeneralHospital,9Children'sHospitalofPhiladelphia,10ColumbiaUniversity,11MayoClinic

Zhang,XinyuanThelinkbetweencardiovasculardiseasesandneurologicaldisordershasbeenwidelyobservedintheagingpopulation.Diseasepreventionandtreatmentrelyonunderstandingthepotentialgeneticnexusofmultiplediseasesinthesecategories.Inthisstudy,wewereinterestedindetectingpleiotropy,orthephenomenoninwhichageneticvariantinfluencesmorethanonephenotype.Marker-phenotypeassociationapproachescanbegroupedintounivariate,bivariate,andmultivariatecategoriesbasedonthenumberofphenotypesconsideredatonetime.HereweappliedonestatisticalmethodpercategoryfollowedbyaneQTLcolocalizationanalysistoidentifypotentialpleiotropicvariantsthatcontributetothelinkbetweencardiovascularandneurologicaldiseases.Weperformedouranalyseson~530,000commonSNPscoupledwith65electronichealthrecord(EHR)-basedphenotypesin43,870unrelatedEuropeanadultsfromtheElectronicMedicalRecordsandGenomics(eMERGE)network.Therewere31variantsidentifiedbyallthreemethodsthatshowedsignificantassociationsacrosslateonsetcardiac-andneurologic-diseases.Wefurtherinvestigatedfunctionalimplicationsofgeneexpressiononthedetected“leadSNPs”viacolocalizationanalysis,providingadeeperunderstandingofthediscoveredassociations.Insummary,wepresenttheframeworkandlandscapefordetectingpotentialpleiotropyusingunivariate,bivariate,multivariate,andcolocalizationmethods.Furtherexplorationofthesepotentiallypleiotropicgeneticvariantswillworktowardunderstandingdiseasecausingmechanismsacrosscardiovascularandneurologicaldiseasesandmayassistinconsideringdiseasepreventionaswellasdrugrepositioninginfutureresearch.

Page 31: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

22

SINGLECELLANALYSIS–WHATISTHEFUTURE?

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 32: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

23

LISA:ACCURATERECONSTRUCTIONOFCELLTRAJECTORYANDPSEUDO-TIMEFORMASSIVESINGLECELLRNA-SEQDATA

YangChen1,YupingZhang2,ZhengqingOuyang1

1TheJacksonLaboratoryforGenomicMedicine,2UniversityofConnecticutOuyang,ZhengqingCelltrajectoryreconstructionbasedonsinglecellRNAsequencingisimportantforobtainingthelandscapeofdifferentcelltypesanddiscoveringcellfatetransitions.Despiteintenseeffort,analyzingmassivesinglecellRNA-seqdatasetsisstillchallenging.WeproposeanewmethodnamedLandmarkIsomapforSingle-cellAnalysis(LISA).LISAisanunsupervisedapproachtobuildcelltrajectoryandcomputepseudo-timeintheisometricembeddingbasedongeodesicdistances.TheadvantagesofLISAinclude:(1)Itutilizesk-nearest-neighborgraphandhierarchicalclusteringtoidentifycellclusters,peaksandvalleysinlow-dimensionrepresentationofthedata;(2)BasedonLandmarkIsomap,itconstructsthemaingeometricstructureofcelllineages;(3)Itprojectscellstotheedgesofthemaincelltrajectorytogeneratetheglobalpseudo-time.AssessmentsonsimulatedandrealdatasetsdemonstratetheadvantagesofLISAoncelltrajectoryandpseudo-timereconstructioncomparedtoMonocle2andTSCAN.LISAisaccurate,fast,andrequireslessmemoryusage,allowingitsapplicationstomassivesinglecelldatasetsgeneratedfromcurrentexperimentalplatforms.

Page 33: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

24

PARAMETERTUNINGISAKEYPARTOFDIMENSIONALITYREDUCTIONVIADEEPVARIATIONALAUTOENCODERSFORSINGLECELLRNATRANSCRIPTOMICS

QiwenHu,CaseyS.Greene

UniversityofPennsylvaniaHu,QiwenSingle-cellRNAsequencing(scRNA-seq)isapowerfultooltoprofilethetranscriptomesofalargenumberofindividualcellsatahighresolution.Thesedatausuallycontainmeasurementsofgeneexpressionformanygenesinthousandsortensofthousandsofcells,thoughsomedatasetsnowreachthemillion-cellmark.Projectinghigh-dimensionalscRNA-seqdataintoalowdimensionalspaceaidsdownstreamanalysisanddatavisualization.Manyrecentpreprintsaccomplishthisusingvariationalautoencoders(VAE),generativemodelsthatlearnunderlyingstructureofdatabycompressitintoaconstrained,lowdimensionalspace.ThelowdimensionalspacesgeneratedbyVAEshaverevealedcomplexpatternsandnovelbiologicalsignalsfromlarge-scalegeneexpressiondataanddrugresponsepredictions.Here,weevaluateasimpleVAEapproachforgeneexpressiondata,Tybalt,bytrainingandmeasuringitsperformanceonsetsofsimulatedscRNA-seqdata.Wefindanumberofcounter-intuitiveperformancefeatures:i.e.,deeperneuralnetworkscanstrugglewhendatasetscontainmoreobservationsundersomeparameterconfigurations.Weshowthatthesemethodsarehighlysensitivetoparametertuning:whentuned,theperformanceoftheTybaltmodel,whichwasnotoptimizedforscRNA-seqdata,outperformsotherpopulardimensionreductionapproaches–PCA,ZIFA,UMAPandt-SNE.Ontheotherhand,withouttuningperformancecanalsoberemarkablypooronthesamedata.Ourresultsshoulddiscourageauthorsandreviewersfromrelyingonself-reportedperformancecomparisonstoevaluatetherelativevalueofcontributionsinthisareaatthistime.Instead,werecommendthatattemptstocompareorbenchmarkautoencodermethodsforscRNA-seqdatabeperformedbydisinterestedthirdpartiesorbymethodsdevelopersonlyonunseenbenchmarkdatathatareprovidedtoallparticipantssimultaneouslybecausethepotentialforperformancedifferencesduetounequalparametertuningissohigh.

Page 34: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

25

TOPOLOGICALMETHODSFORVISUALIZATIONANDANALYSISOFHIGHDIMENSIONALSINGLE-CELLRNASEQUENCINGDATA

TongxinWang1,TravisJohnson2,JieZhang3,KunHuang4,5

1DepartmentofComputerScience,IndianaUniversityBloomington;2DepartmentofBiomedicalInforamtics,OhioStateUniversity;3DepartmentofMedicalandMolecularGenetics,IndianaUniversitySchoolofMedicine;4DepartmentofMedicine,Indiana

UniversitySchoolofMedicine;5RegenstriefInstituteWang,TongxinSingle-cellRNAsequencing(scRNA-seq)techniqueshavebeenverypowerfulinanalyzingheterogeneouscellpopulationandidentifyingcelltypes.VisualizingscRNA-seqdatacanhelpresearcherseffectivelyextractmeaningfulbiologicalinformationandmakenewdiscoveries.WhilecommonlyusedscRNA-seqvisualizationmethods,suchast-SNE,areusefulindetectingcellclusters,theyoftentearaparttheintrinsiccontinuousstructureingeneexpressionprofiles.TopologicalDataAnalysis(TDA)approacheslikeMappercapturetheshapeofdatabyrepresentingdataastopologicalnetworks.TDAapproachesarerobusttonoiseanddifferentplatforms,whilepreservingthelocalityanddatacontinuity.Moreover,insteadofanalyzingthewholedataset,Mapperallowsresearcherstoexplorebiologicalmeaningsofspecificpathwaysandgenesbyusingdifferentfilterfunctions.Inthispaper,weappliedMappertovisualizescRNA-seqdata.Ourmethodcannotonlycapturetheclusteringstructureofcells,butalsopreservethecontinuousgeneexpressiontopologiesofcells.Wedemonstratedthatbycombiningwithgeneco-expressionnetworkanalysis,ourmethodcanrevealdifferentialexpressionpatternsofgeneco-expressionmodulesalongtheMappervisualization.

Page 35: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

26

WHENBIOLOGYGETSPERSONAL:HIDDENCHALLENGESOFPRIVACYANDETHICSINBIOLOGICALBIGDATA

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 36: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

27

LEVERAGINGSUMMARYSTATISTICSTOMAKEINFERENCESABOUTCOMPLEXPHENOTYPESINLARGEBIOBANKS

AngelaGasdaska1,DerekFriend2,RachelChen3,JasonWestra4,MatthewZawistowski5,WilliamLindsey4,NathanTintle4

1EmoryUniversity,2UniversityofNevadaReno,3NorthCarolinaStateUniversity,4Dordt

College,5UniversityofMichiganAnnArborTintle,NathanAsgeneticsequencingbecomeslessexpensiveanddatasetslinkinggeneticdataandmedicalrecords(e.g.,Biobanks)becomelargerandmorecommon,issuesofdataprivacyandcomputationalchallengesbecomemorenecessarytoaddressinordertorealizethebenefitsofthesedatasets.Onepossibilityforalleviatingtheseissuesisthroughtheuseofalready-computedsummarystatistics(e.g.,slopesandstandarderrorsfromaregressionmodelofaphenotypeonagenotype).Ifgroupssharesummarystatisticsfromtheiranalysesofbiobanks,manyoftheprivacyissuesandcomputationalchallengesconcerningtheaccessofthesedatacouldbebypassed.Inthispaperweexplorethepossibilityofusingsummarystatisticsfromsimplelinearmodelsofphenotypeongenotypeinordertomakeinferencesaboutmorecomplexphenotypes(thosethatarederivedfromtwoormoresimplephenotypes).Weprovideexactformulasfortheslope,intercept,andstandarderroroftheslopeforlinearregressionswhencombiningphenotypes.Derivedequationsarevalidatedviasimulationandtestedonarealdatasetexploringthegeneticsoffattyacids.

Page 37: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

28

EVALUATIONOFPATIENTRE-IDENTIFICATIONUSINGLABORATORYTESTORDERSANDMITIGATIONVIALATENTSPACEVARIABLES

KippW.Johnson1,JessicaK.DeFreitas1,BenjaminS.Glicksberg1,JasonR.Bobe1,JoelT.Dudley2

1InstituteforNextGenerationHealthcare-DepartmentofGeneticsandGenomicsSciences-IcahnSchoolofMedicineatMountSinai,2BakarComputationalHealth

SciencesInstituteTheUniversityofCaliforniaSanFranciscoDeFreitas,JessicaAvarietyofclinicaldataabstractedandanonymizedfromelectronichealthrecords(EHR)areoftenusedforresearchpurposes.Oneconsistentconcernwiththistypeofresearchistheriskforre-identificationofpatientsfromtheiranonymizeddata.Here,weusetheEHRof731,850patientstodemonstratethattheaveragepatientisuniquefromallothers98.4%ofthetimesimplybyexaminingwhatlaboratorytestshavebeenorderedforthem.Bythetimeapatienthasvisitedthehospitalontwoseparatedays,theyareuniquein74.2%ofcases.Wefurtherpresentacomputationalstudytoidentifyhowaccuratelytherecordsfromasingledayofcarecanbeusedtore-identifypatientsfromasetof99otherpatients.Weshowthat,givenasinglevisit’slaboratoryordersforapatient,wecanre-identifythepatientatleast25%ofthetime.Furthermore,wecanplacethispatientamongthetop10mostsimilarpatients47%ofthetime.Finally,wepresentaproof-of-concepttechniqueusingavariationalautoencodertoencodelaboratoryresultsintoalower-dimensionallatentspace.Wedemonstratethatreleasinglatent-spaceencodedlaboratoryorderssignificantlyimprovesprivacycomparedtoreleasingrawlaboratoryorders(<5%re-identification),whilepreservinginformationcontainedwithinthelaboratoryorders(AUCof>0.9forrecreatingencodedvalues).Ourfindingspotentiallyhaveconsequencesforthepublicreleaseofanonymizedlaboratoryteststothebiomedicalresearchcommunity.Wewishtonotethatourfindingsdonotimplythatlaboratorytestsalonearepersonallyidentifiable,butwouldrequireathreatactorhavinganexternalsourceoflaboratoryvalueswhicharelinkedtopersonalidentifierstobeginwith.

Page 38: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

29

PROTECTINGGENOMICDATAPRIVACYWITHPROBABILISTICMODELING

SeanSimmons1,BonnieBerger2,CenkSahinalp3

1BroadInstitute,2MIT,3IndianaUniversitySimmons,SeanTheproliferationofsequencingtechnologiesinbiomedicalresearchhasraisedmanynewprivacyconcerns.Theseincludeconcernsoverthepublicationofaggregatedataatagenomicscale(e.g.minorallelefrequencies,regressioncoefficients).Methodssuchasdifferentialprivacycanovercometheseconcernsbyprovidingstrongprivacyguarantees,butcomeatthecostofgreatlyperturbingtheresultsoftheanalysisofinterest.Hereweinvestigateanalternativeapproachforachievingprivacy-preservingaggregategenomicdatasharingwithoutthehighcosttoaccuracyofdifferentiallyprivatemethods.Inparticular,wedemonstratehowotherideasfromthestatisticaldisclosurecontrolliterature(inparticular,theideaofdisclosurerisk)canbeappliedtoaggregatedatatohelpensureprivacy.ThisisachievedbycombiningminimalamountsofperturbationwithBayesianstatisticsandMarkovChainMonteCarlotechniques.WetestourtechniqueonaGWASdatasettodemonstrateitsutilityinpractice.Animplementationisavailableathttps://github.com/seanken/PrivMCMC.

Page 39: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

30

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 40: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

31

SNPS2CHIP:LATENTFACTORSOFCHIP-SEQTOINFERFUNCTIONSOFNON-CODINGSNPS

ShankaraAnand,LaurynasKalesinskas,CraigSmail,YosukeTanigawa

StanfordUniversityTanigawa,YosukeGeneticvariationsofthehumangenomearelinkedtomanydiseasephenotypes.Whilewhole-genomesequencingandgenome-wideassociationstudies(GWAS)haveuncoveredanumberofgenotype-phenotypeassociations,theirfunctionalinterpretationremainschallenginggivenmostsinglenucleotidepolymorphisms(SNPs)fallintothenon-codingregionofthegenome.Advancesinchromatinimmunoprecipitationsequencing(ChIP-seq)havemadelarge-scalerepositoriesofepigeneticdataavailable,allowinginvestigationofcoordinatedmechanismsofepigeneticmarkersandtranscriptionalregulationandtheirinfluenceonbiologicalfunction.Toaddressthis,weproposeSNPs2ChIP,amethodtoinferbiologicalfunctionsofnon-codingvariantsthroughunsupervisedstatisticallearningmethodsappliedtopublicly-availableepigeneticdatasets.WesystematicallycharacterizedlatentfactorsbyapplyingsingularvaluedecompositiontoChIP-seqtracksoflymphoblastoidcelllines,andannotatedthebiologicalfunctionofeachlatentfactorusingthegenomicregionenrichmentanalysistool.Usingtheseannotatedlatentfactorsasreference,wedevelopedSNPs2ChIP,apipelinethattakesgenomicregion(s)asaninput,identifiestherelevantlatentfactorswithquantitativescores,andreturnsthemalongwiththeirinferredfunctions.Asacasestudy,wefocusedonsystemiclupuserythematosusanddemonstratedourmethod'sabilitytoinferrelevantbiologicalfunction.WesystematicallyappliedSNPs2ChIPonpubliclyavailabledatasets,includingknownGWASassociationsfromtheGWAScatalogueandChIP-seqpeaksfromapreviouslypublishedstudy.Ourapproachtoleveragelatentpatternsacrossgenome-wideepigeneticdatasetstoinferthebiologicalfunctionwilladvanceunderstandingofthegeneticsofhumandiseasesbyacceleratingtheinterpretationofnon-codinggenomes.

Page 41: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

32

DNASTEGANALYSISUSINGDEEPRECURRENTNEURALNETWORKS

HoBae1,ByunghanLee2,3,SunyoungKwon2,4,SungrohYoon1,2,5

1InterdisciplinaryPrograminBioinformatics,SeoulNationalUniversity;2ElectricalandComputerEngineering,SeoulNationalUniversity;3ElectronicandITMediaEngineering,SeoulNationalUniversityofScienceandTechnology;4ClovaAIResearch,NAVERCorp;

5ASRIandINMC,SeoulNationalUniversityBae,HoRecentadvancesinnext-generationsequencingtechnologieshavefacilitatedtheuseofdeoxyribonucleicacid(DNA)asanovelcovertchannelsinsteganography.Therearevariousmethodsthatexistinotherdomainstodetecthiddenmessagesinconventionalcovertchannels.However,theyhavenotbeenappliedtoDNAsteganography.Thecurrentmostcommondetectionapproaches,namelyfrequencyanalysis-basedmethods,oftenoverlookimportantsignalswhendirectlyappliedtoDNAsteganographybecausethosemethodsdependonthedistributionofthenumberofsequencecharacters.Toaddressthislimitation,weproposeageneralsequencelearning-basedDNAsteganalysisframework.Theproposedapproachlearnstheintrinsicdistributionofcodingandnon-codingsequencesanddetectshiddenmessagesbyexploitingdistributionvariationsafterhidingthesemessages.Usingdeeprecurrentneuralnetworks(RNNs),ourframeworkidentifiesthedistributionvariationsbyusingtheclassificationscoretopredictwhetherasequenceistobeacodingornon-codingsequence.Wecompareourproposedmethodtovariousexistingmethodsandbiologicalsequenceanalysismethodsimplementedontopofourframework.Accordingtoourexperimentalresults,ourapproachdeliversarobustdetectionperformancecomparedtoothertools.

Page 42: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

33

LEARNINGCONTEXTUALHIERARCHICALSTRUCTUREOFMEDICALCONCEPTSWITHPOINCAIRÉEMBEDDINGSTOCLARIFYPHENOTYPES

BrettK.Beaulieu-Jones,IsaacS.Kohane,AndrewL.Beam

HarvardMedicalSchoolBeaulieu-Jones,BrettBiomedicalassociationstudiesareincreasinglydoneusingclinicalconcepts,andinparticulardiagnosticcodesfromclinicaldatarepositoriesasphenotypes.Clinicalconceptscanberepresentedinameaningful,vectorspaceusingwordembeddingmodels.Theseembeddingsallowforcomparisonbetweenclinicalconceptsorforstraightforwardinputtomachinelearningmodels.Usingtraditionalapproaches,goodrepresentationsrequirehighdimensionality,makingdownstreamtaskssuchasvisualizationmoredifficult.WeappliedPoincaréembeddingsina2-dimensionalhyperbolicspacetoalarge-scaleadministrativeclaimsdatabaseandshowperformancecomparableto100-dimensionalembeddingsinaeuclideanspace.Wethenexaminediseaserelationshipsunderdifferentdiseasecontextstobetterunderstandpotentialphenotypes.

Page 43: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

34

EXPLORINGMICRORNAREGULATIONOFCANCERWITHCONTEXT-AWAREDEEPCANCERCLASSIFIER

BlakePyman,AlirezaSedghi,ShekoofehAzizi,KathrinTyryshkin,NeilRenwick,ParvinMousavi

Queen'sUniversity

Pyman,BlakeBackground:MicroRNAs(miRNAs)aresmall,non-codingRNAthatregulategeneexpressionthroughpost-transcriptionalsilencing.DifferentialexpressionobservedinmiRNAs,combinedwithadvancementsindeeplearning(DL),havethepotentialtoimprovecancerclassificationbymodellingnon-linearmiRNA-phenotypeassociations.WeproposeanovelmiRNA-baseddeepcancerclassifier(DCC)incorporatinggenomicandhierarchicaltissueannotation,capableofaccuratelypredictingthepresenceofcancerinwiderangeofhumantissues.Methods:miRNAexpressionprofileswereanalyzedfor1746neoplasticand3871normalsamples,across26typesofcancerinvolvingsixorgansub-structuresand68celltypes.miRNAswererankedandfilteredusingaspecificityscorerepresentingtheirinformationcontentinrelationtoneoplasticity,incorporating3levelsofhierarchicalbiologicalannotation.ADLarchitecturecomposedofstackedautoencoders(AE)andamulti-layerperceptron(MLP)wastrainedtopredictneoplasticityusing497abundantandinformativemiRNAs.AdditionalDCCsweretrainedusingexpressionofmiRNAcistronsandsequencefamilies,andcombinedasadiagnosticensemble.ImportantmiRNAswereidentifiedusingbackpropagation,andanalyzedinCytoscapeusingiCTNetandBiNGO.Results:Nestedfour-foldcross-validationwasusedtoassesstheperformanceoftheDLmodel.Themodelachievedanaccuracy,AUC/ROC,sensitivity,andspecificityof94.73\%,98.6\%,95.1\%,and94.3\%,respectively.Conclusion:DeepautoencodernetworksareapowerfultoolformodellingcomplexmiRNA-phenotypeassociationsincancer.TheproposedDCCimprovesclassificationaccuracybylearningfromthebiologicalcontextofbothsamplesandmiRNAs,usinganatomicalandgenomicannotation.AnalyzingthedeepstructureofDCCswithbackpropagationcanalsofacilitatebiologicaldiscovery,byperforminggeneontologysearchesonthemosthighlysignificantfeatures.

Page 44: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

35

ESTIMATINGCLASSIFICATIONACCURACYINPOSITIVE-UNLABELEDLEARNING:CHARACTERIZATIONANDCORRECTIONSTRATEGIES

RashikaRamola,ShantanuJain,PredragRadivojac

NortheasternUniversityRamola,RashikaAccuratelyestimatingperformanceaccuracyofmachinelearningclassifiersisoffundamentalimportanceinbiomedicalresearchwithpotentiallysocietalconsequencesuponthedeploymentofbest-performingtoolsineverydaylife.Althoughclassificationhasbeenextensivelystudiedoverthepastdecades,thereremainunderstudiedproblemswhenthetrainingdataviolatethemainstatisticalassumptionsrelieduponforaccuratelearningandmodelcharacterization.Thisparticularlyholdstrueintheopenworldsettingwhereobservationsofaphenomenongenerallyguaranteeitspresencebuttheabsenceofsuchevidencecannotbeinterpretedastheevidenceofitsabsence.Learningfromsuchdataisoftenreferredtoaspositive-unlabeledlearning,aformofsemi-supervisedlearningwherealllabeleddatabelongtoone(say,positive)class.Toimprovethebestpracticesinthefield,weherestudythequalityofestimatedperformanceinpositive-unlabeledlearninginthebiomedicaldomain.Weprovideevidencethatsuchestimatescanbewildlyinaccurate,dependingonthefractionofpositiveexamplesintheunlabeleddataandthefractionofnegativeexamplesmislabeledaspositivesinthelabeleddata.Wethenpresentcorrectionmethodsforfoursuchmeasuresanddemonstratethattheknowledgeoraccurateestimatesofclasspriorsintheunlabeleddataandnoiseinthelabeleddataaresufficientfortherecoveryoftrueclassificationperformance.Weprovidetheoreticalsupportaswellasempiricalevidencefortheefficacyofthenewperformanceestimationmethods.

Page 45: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

36

EXTRACTINGALLELICREADCOUNTSFROM250,000HUMANSEQUENCINGRUNSINSEQUENCEREADARCHIVE

BrianTsui,MichelleDow,DylanSkola,HannahCarter

DepartmentofMedicine,UniversityofCaliforniaSanDiego,9500GilmanDrive,SanDiego,California92093,USA

Tsui,BrianYTheSequenceReadArchive(SRA)containsoveronemillionpubliclyavailablesequencingrunsfromvariousstudiesusingavarietyofsequencinglibrarystrategies.Thesedatainherentlycontaininformationaboutunderlyinggenomicsequencevariantswhichweexploittoextractallelicreadcountsonanunprecedentedscale.Wereprocessedover250,000humansequencingruns(>1000TBdataworthofrawsequencedata)intoasingleunifieddatasetofallelicreadcountsfornearly300,000variantsofbiomedicalrelevancecuratedbyNCBIdbSNP,wheregermlinevariantsweredetectedinamedianof912sequencingruns,andsomaticvariantsweredetectedinamedianof4,876sequencingruns,suggestingthatthisdatasetfacilitatesidentificationofsequencingrunsthatharborvariantsofinterest.Allelicreadcountsobtainedusingatargetedalignmentwereverysimilartoreadcountsobtainedfromwhole-genomealignment.AnalyzingallelicreadcountdataformatchedDNAandRNAsamplesfromtumors,wefindthatRNA-seqcanalsorecovervariantsidentifiedbyWholeExomeSequencing(WXS),suggestingthatreprocessedallelicreadcountscansupportvariantdetectionacrossdifferentlibrarystrategiesinSRA.ThisstudyprovidesarichdatabaseofknownhumanvariantsacrossSRAsamplesthatcansupportfuturemeta-analysesofhumansequencevariation.

Page 46: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

37

AUTOMATICHUMAN-LIKEMININGANDCONSTRUCTINGRELIABLEGENETICASSOCIATIONDATABASEWITHDEEPREINFORCEMENTLEARNING

HaohanWang1,XiangLiu2,YifengTao1,WentingYe1,QiaoJin3,WilliamW.Cohen4,EricP.Xing5

1CarnegieMellonUniversity,2ChineseUniversityofHongKong,3TsinghuaUniversity,

4GoogleAI,5PettumIncWang,HaohanTheincreasingamountofscientificliteratureinbiologicalandbiomedicalscienceresearchhascreatedachallengeinthecontinuousandreliablecurationofthelatestknowledgediscovered,andautomaticbiomedicaltext-mininghasbeenoneoftheanswerstothischallenge.Inthispaper,weaimtofurtherimprovethereliabilityofbiomedicaltext-miningbytrainingthesystemtodirectlysimulatethehumanbehaviorssuchasqueryingthePubMed,selectingarticlesfromqueriedresults,andreadingselectedarticlesforknowledge.Wetakeadvantageoftheefficiencyofbiomedicaltext-mining,theflexibilityofdeepreinforcementlearning,andthemassiveamountofknowledgecollectedinUMLSintoanintegrativeartificialintelligentreaderthatcanautomaticallyidentifytheauthenticarticlesandeffectivelyacquiretheknowledgeconveyedinthearticles.Weconstructasystem,whosecurrentprimarytaskistobuildthegeneticassociationdatabasebetweengenesandcomplextraitsofthehuman.Ourcontributionsinthispaperarethree-fold:1)Weproposetoimprovethereliabilityoftext-miningbybuildingasystemthatcandirectlysimulatethebehaviorofaresearcher,andwedevelopcorrespondingmethods,suchasBi-directionalLSTMfortextminingandDeepQ-Networkfororganizingbehaviors.2)Wedemonstratetheeffectivenessofoursystemwithanexampleinconstructingageneticassociationdatabase.3)Wereleaseourimplementationasagenericframeworkforresearchersinthecommunitytoconvenientlyconstructotherdatabases.

Page 47: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

38

PRECISIONMEDICINE:IMPROVINGHEALTHTHROUGHHIGH-RESOLUTIONANALYSISOFPERSONALDATA

PROCEEDINGSPAPERWITHPOSTERPRESENTATION

Page 48: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

39

INFLUENCEOFTISSUECONTEXTONGENEPRIORITIZATIONFORPREDICTEDTRANSCRIPTOME-WIDEASSOCIATIONSTUDIES

BinglanLi1,YogasudhaVeturi1,YukiBradford1,ShefaliS.Verma1,AnuragVerma1,AnastasiaM.Lucas1,DavidW.Haas2,MarylynD.Ritchie1

1UniversityofPennsylvania,2VanderbiltUniversity

Ritchie,MarylynTranscriptome-wideassociationstudies(TWAS)haverecentlygainedgreatattentionduetotheirabilitytoprioritizecomplextrait-associatedgenesandpromotepotentialtherapeuticsdevelopmentforcomplexhumandiseases.TWASintegratesgenotypicdatawithexpressionquantitativetraitloci(eQTLs)topredictgeneticallyregulatedgeneexpressioncomponentsandassociatespredictionswithatraitofinterest.Assuch,TWAScanprioritizegeneswhosedifferentialexpressionscontributetothetraitofinterestandprovidemechanisticexplanationofcomplextrait(s).Tissue-specificeQTLinformationgrantsTWAStheabilitytoperformassociationanalysisontissueswhosegeneexpressionprofilesareotherwisehardtoobtain,suchasliverandheart.However,aseQTLsaretissuecontext-dependent,whetherandhowthetissue-specificityofeQTLsinfluencesTWASgeneprioritizationhasnotbeenfullyinvestigated.Inthisstudy,weaddressedthisquestionbyadoptingtwodistinctTWASmethods,PrediXcanandUTMOST,whichassumesingletissueandintegrativetissueeffectsofeQTLs,respectively.Thirty-eightbaselinelaboratorytraitsin4,360antiretroviraltreatment-naïveindividualsfromtheAIDSClinicalTrialsGroup(ACTG)studiescomprisedtheinputdatasetforTWAS.WeperformedTWASinatissue-specificmannerandobtainedatotalof430significantgene-traitassociations(q-value<0.05)acrossmultipletissues.Singletissue-basedanalysisbyPrediXcancontributed116ofthe430associationsincluding64uniquegene-traitpairsin28tissues.Integrativetissue-basedanalysisbyUTMOSTfoundtheother314significantassociationsthatinclude50uniquegene-traitpairsacrossall44tissues.Bothanalyseswereabletoreplicatesomeassociationsidentifiedinpastvariant-basedgenome-wideassociationstudies(GWAS),suchashigh-densitylipoprotein(HDL)andCETP(PrediXcan,q-value=3.2e-16).Bothanalysesalsoidentifiednovelassociations.Moreover,singletissue-basedandintegrativetissue-basedanalysisshared11of103uniquegene-traitpairs,forexample,PSRC1-low-densitylipoprotein(PrediXcan’slowestq-value=8.5e-06;UTMOST’slowestq-value=1.8e-05).Thisstudysuggeststhatsingletissue-basedanalysismayhaveperformedbetteratdiscoveringgene-traitassociationswhencombiningresultsfromalltissues.Integrativetissue-basedanalysiswasbetteratprioritizinggenesinmultipletissuesandintrait-relatedtissue.Additionalexplorationisneededtoconfirmthisconclusion.Finally,althoughsingletissue-basedandintegrativetissue-basedanalysissharedsignificantnoveldiscoveries,tissuecontext-dependencyofeQTLsimpactedTWASgeneprioritization.Thisstudyprovidespreliminarydatatosupportcontinuedworkontissuecontext-dependencyofeQTLstudiesandTWAS.

Page 49: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

40

SINGLECELLANALYSIS–WHATISTHEFUTURE?

PROCEEDINGSPAPERWITHPOSTERPRESENTATION

Page 50: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

41

SHALLOWSPARSELY-CONNECTEDAUTOENCODERSFORGENESETPROJECTION

MaxwellP.Gold,AlexanderLeNail,ErnestFraenkel

MassachusettsInstituteofTechnologyGold,MaxwellWhenanalyzingbiologicaldata,itcanbehelpfultoconsidergenesets,orpredefinedgroupsofbiologicallyrelatedgenes.Methodsexistforidentifyinggenesetsthataredifferentialbetweenconditions,butlargepublicdatasetsfromconsortiumprojectsandsingle-cellRNA-Sequencinghaveopenedthedoorforgenesetanalysisusingmoresophisticatedmachinelearningtechniques,suchasautoencodersandvariationalautoencoders.Wepresentshallowsparsely-connectedautoencoders(SSCAs)andvariationalautoencoders(SSCVAs)astoolsforprojectinggene-leveldataontogenesets.Wetestedtheseapproachesonsingle-cellRNA-SequencingdatafrombloodcellsandonRNA-Sequencingdatafrombreastcancerpatients.BothSSCAandSSCVAcanrecoverknownbiologicalfeaturesfromthesedatasetsandtheSSCVAmethodoftenoutperformsSSCA(andsixexistinggenesetscoringalgorithms)onclassificationandpredictiontasks.

Page 51: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

42

WHENBIOLOGYGETSPERSONAL:HIDDENCHALLENGESOFPRIVACYANDETHICSINBIOLOGICALBIGDATA

PROCEEDINGSPAPERWITHPOSTERPRESENTATION

Page 52: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

43

IMPLEMENTINGAUNIVERSALINFORMEDCONSENTPROCESSFORTHEALLOFUSRESEARCHPROGRAM

MeganDoerr1,ShiraGrayson1,SarahMoore1,ChristineSuver1,JohnWilbanks1,JenniferWagner2

1SageBionetworks,2CenterforTranslationalBioethics&HealthCarePolicyGeisinger

Doerr,MeganTheUnitedStates’AllofUsResearchProgramisalongitudinalresearchinitiativewithambitiousnationalrecruitmentgoals,includingofpopulationstraditionallyunderrepresentedinbiomedicalresearch,manyofwhomhavehighgeographicmobility.Theprogramhasadistributedinfrastructure,withkeyprogrammaticresourcesspreadacrosstheUS.Givenitsplanneddurationandgeographicreachbothintermsofrecruitmentandprogrammaticresources,adiversityofstateandterritorylawsmightapplytotheprogramovertimeaswellastothedeterminationofparticipants’rights.Herewepresentalistinganddiscussionofstateandterritoryguidanceandregulationofspecificrelevancetotheprogram,andourapproachtotheirincorporationwithintheprogram’sinformedconsentprocesses.

Page 53: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

44

GENERAL

POSTERPRESENTATIONS

Page 54: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

45

ACONVOLUTIONALNEURALNETPREDICTSBINDINGPROPERTIESOFANANTIBODYLIBRARY

RishiBedi,RachelHovde,JacobGlanville

DistributedBioHovde,RachelResearchbyGlanvilleetal.describedamethodthatenabledTCRsoftheadaptiveimmunesystemtobeclusteredintospecificitygroupsandalloweddenovodesignofTCRswithaparticularspecificity.Inthisstudy,weapplydeeplearningmethodstoperformcharacterizationandengineeringofantibodies.Togenerateenoughdatatoaddressthisquestionwithmachinelearningmethods,wecreatedacomputationally-optimizedantibodylibrarycapableofgeneratingthousandsofhighaffinityhitsagainstanyantigen.Byroboticallypanning11antigensinreplicateagainstthelibrary,wegenerated,sequenced,andvalidatedadatasetofover55,000uniquehighaffinitybinders.Tocharacterizethefunctionalpropertiesofthislibrary,wetrainaconvolutionalneuralnetworktopredictthebindingspecificityofeachclone.Ourmodeloutperformsalternativeapproachesandsuccessfullypredictsbindingspecificityinheld-out,increasinglydissimilartestsets.Usingthetrainedmodeltoperformoptimizationontheinputsequence,wegeneratecharacteristicclassexamples,aswellas"foolingsequences"thatrepresenttheboundariesbetweenpairsofbindingspecificities.Weusethereal-valuedoutputoftheconvolutionalandlinearlayersofthenetworkasanembeddinganddemonstratephysically-meaningfulclustering.Thesetechniquesletusassessthecontributionofparticularmotifstothelock-and-keyinteractionwiththetargetantigen,andenablevirtual"epitopebinning"todistinguishantibodiesinourlibrarythatbindsimilarepitopes.Thisenablesfutureworkinvirtualmutagenesis,whereweleveragetheseinsightstogenerateantibodiesthatexhibitdesirablebindingproperties.

Page 55: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

46

CNVAR:ASOFTWARETOOLFORGENOTYPINGCYP2D6USINGSHORTREADNEXTGENERATIONSEQUENCINGTECHNOLOGY

JohnLoganBlackIIIMD1,HuguesSicottePhD1,SandraE.Peterson1,KimberleyJ.Harris1,LieweiWangMDPhD1,StevenSchererPhD2,EricBoerwinklePhD2,RichardA.Gibbs

PhD2,SuzetteJ.BielinskiPhD1,RichardWeinshilboumMD1

1MayoClinic,2BaylorCollegeofMedicine

Black,JohnIntroduction:CYP2D6isanimportantpharmacogeneinvolvedinthemetabolismofmanymedications.CYP2D6isknowntohavenumerouscopynumbervariations(CNV)includinggeneduplications/multiplications,genedeletion,andhybridgenesinvolvingthepseudogene,CYP2D7.SoftwarethatenablesthegenotypingofCYP2D6fromshortreadnextgenerationsequencing(NGS)isurgentlyneededtocost-effectivelyandaccuratelydetermineclinicalCYP2D6phenotypes.Methods:ModellingofexpectedratiosforspecificgeneregionswithandwithoutCNVwasdonebasedupontheknownconfigurationsoftheCYP2Dlocus.ThisdatawasusedtogeneratetheCNVARsoftwarewhichanalyzesvcfandbamfilestodeterminevariantallelicratiosandreaddepthforallexonsandthepromotersoftheCYP2D6andCYP2D7genesafterNGS.ThesoftwareusesstatisticalmethodstodetecttheCNVsandemploysmultiplequalitymetricstodeterminethebestfitforpossiblegenotypesolutions.Italsodetectsnamedhaplotypesplusanynovelvariants.CNVARwaspreviouslyvalidatedagainst500sampleswithknowngenotypesdeterminedbytargetedgenotypingandSangersequencing.SamplessequencedaspartoftheMayoClinicCenterforIndividualizedMedicine'sRIGHT10KStudyforPharmacogenomicsarenowbeinganalyzed.SequencingwasdoneatBaylorCollegeofMedicine'sHumanGenomeSequencingCenterusingthereagentcalledPGx-seqandanalysisoftheCYP2D6sequenceresultsisbeingperformedinthePersonalizedGenomicsLaboratoryatMayoClinic.Results:6921sampleshavebeenanalyzedusingtheCNVARsoftwaretoderiveCYP2D6diplotypes.968(14%)sampleshadqualityflagsindicatingeitherunexpectedallelefrequencies,CNVratios,anovelvariantwasdetected,orseveraldiplotypesolutionsfitthefindingsequallywell.102(1.5%)samplesweredeterminedtohavenovelvariantsornovelhybridgenes.Alloftheremainingsamples,except55(0.79%),couldberesolvedbyvisualinspectionofCNVARoutputs.These55remainingsampleswerereferredforadditionalSangersequencingtodeterminetheactualdiplotypeandquantitativertPCRtodetermineactualcopynumber.Conclusions:CNVARisasoftwaretoolwhichcandetectCYP2D6diplotypes,CNVsandhybridgenesfromNGSshortreadtechnology.Thesoftwareidentifiessamplesthatcannotbegenotypedwithcertaintysothatadditionalevaluationcanbeperformedtoderivetheactualgenotype.Novelvariantsandhybridalleleswerealsoidentifiedsothatvariantcurationandclassificationcouldbedone.ThisworkwassupportedbyMayoClinicCenterforIndividualizedMedicineandtheRobertD.andPatriciaE.KernCenterfortheScienceofHealthCareDelivery,NationalInstitutesofHealthgrantsU19GM61388(ThePharmacogenomicsResearchNetwork),R01GM28157,U01HG005137,R01GM125633,R01AG034676(TheRochesterEpidemiologyProject),andU01HG06379andU01HG06379Supplement(TheElectronicMedicalRecordandGenomics(eMERGE)Network).

Page 56: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

47

NETWORKANALYSISOFDISTINCTCOHORTSALLOWSFORTHECOMPARISONOFKEYBIOLOGICALFUNCTIONSRELATEDTOTBPATHOGENESIS

CarlyBobak,MeghanE.Muse,AlexanderJ.Titus,BrockC.Christensen,A.JamesO'Malley,JaneE.Hill

DartmouthCollege

Bobak,CarlyChallengeswithreproducibilityofmicroarraydatasetscanlimittheabilitytoanalyzeandinterpretintegratedgeneexpressiondatasets.Oneapproachtotacklereproducibilityacrossmicroarraydatasetsbuildsamulti-cohortframeworkusingpubliclyavailabledatatobettermirrordiversepopulationsseeninclinics.Analternativewayofincreasingthereproducibilityofresultsisemphasizingunderlyingpathwayornetworklevelanalyses.Whiledifferentialexpressionofgenesmayvarybetweendatasetsanddataanalysistechniques,thebiologicalprocessesunderlyinggeneexpressionaremorerobust.Theresultsfromtheseanalysescandrivehypothesesregardingthebiologicalmechanismsbehinddiseases.Weproposeusingamulti-cohortdesignandapathway-levelgeneexpressionanalysistoidentifykeybiologicalprocessesinactiveTuberculosis(TB)disease.Amulti-cohortapproachisparticularlyimportantwhenanalyzingTBbecausephenotypicpresentationofthediseasediffersamongpatients,especiallythosewhoareco-infectedwithhumanimmunodeficiencyvirus(HIV),oramongchildren.Assuch,thesesubgroupsareoftenexcludedfromstudiesexamininghumangeneexpressionarraydata.However,in2016,10%ofincidentTBcaseswerepeoplelivingwithHIV,and10%werechildren,anddespitethedifficultyofstudyingthesepopulationsalongsideadults,theymakeupasubstantialproportionofbothcurrentlyTBinfectedandtheoverallTBsusceptiblepopulation.TovisualizedifferencesacrosscohortscontainingtheseTBsubgroups,weuseanapproachcalledanEnrichmentMapwhichallowsustorepresenteachdistinctdatasetinonenetwork.Weselectedthreerepresentativepubliclyavailabledatasets(n=1148)andusedDifferentialExpressionandGeneSetEnrichmentAnalysis.Genesetswhichweresignificantlyenrichedbecamethenodesofthenetwork,withedgesrepresentativeoftheoverlapbetweenthesegenesets.TheresultsofthesecombinedanalyseswereusedasaninputtoEnrichmentMap,toclusterandannotateimportantbiologicalfunctions.TheEnrichmentMapnetworkidentifiedmanyprocessesexpectedbasedoncurrentTBknowledge,suchasinterferon-gammaactivity(6genesets).Aswell,someotherprocesseswhichrepresentpotentiallynovelinsightstothediseaseareidentified.WeexamineoneclusterofnodesrelatedtoDNAmethylation(6genesets)indepth.TheDNAmethylationgenesetwithinthisclusterwasstronglyenrichedinthedatasetwithnoHIV+patients(FDR=0.004)andappearstobeenriched,althoughinsignificantlyso,inthetwodatasetsincludingHIV+patients(FDR=0.518,0.879).FurtherunsupervisedanalysisofDNAmethylationgeneswithinthesesetsrevealsclearclusteringofactiveTBpatientsfromthosewithlatentTBinfection,irrespectiveofHIV+.Thus,wetheorizethatwhileconventionalmethodswouldnotimplicateDNAmethylationasplayingaroleinactiveTBinfection,bycomparingenrichmentsacrossdatasetsatthenetworklevelwecanobservepatternsingeneexpressionwithafinerdegreeofgranularity.

Page 57: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

48

VARIATIONINOPIOIDPRESCRIBINGPATTERNSINSURGICALPOPULATIONS

SolineM.Boussard1,MarylynD.Ritchie2,MichelleWhirl-Carrillo3,TinaHernandez-Boussard3,TeriE.Klein3

1CastillejaSchool,2UniversityofPennsylvania,3StanfordUniversity

Boussard,SolineIntroductionInclinicalsettings,patients'responsetoopioidscanvarybyasmuchas40-fold.CommonopioidsrequiremetabolismbyliverenzymeCYP-2D6andconsiderablevariationexistsintheamountofCYP-2D6producedbyindividuals.Therefore,pharmacogenomicsmayshedlightastohowtoaddressdifferentresponsesbyfindingthemosteffectivemedicationanddosageforeachpatient.Asafirststepatidentifyingopportunitiesforpersonalizedpainmanagement,weanalyzedpostoperativepainandopioidprescribingpatternsacrossfourcommonsurgeriesknownforhighpostoperativepain.MethodsWeusedEHRstoidentifypatientsundergoing4surgeries(totalkneereplacement(TKA),thoracotomy,distalradiusfracture,andmastectomy).Themainoutcomesweredischargepainmedicationsandpostoperativepainscores.ThisresearchwaspossiblethroughtheuseofstructuredEHRdataandthemappingofmedicationstoontologies.Patientswereidentifiedusingproceduralcodes;painscores(painscoresrangefrom0to10with10beingthemostsevere)wereidentifiedfromflowsheetswithintheEHR,anddischargemedicationsweremappedtoRXNorm.Datawereaggregatedtothepatientlevel.Painscoreswereaveragedacrossdifferenttimepoints.RStudiowasusedforstatisticalcomputingandgraphics.Chi-square,t-testsandanalysisofvariancewereusedforstatisticaltesting.ResultsAtotalof63,500patientswereincluded.Themeanagewas61.31(SD:14.3),65.3%werefemale,62.1%werewhiteand13.3%wereHispanic/Latinoethnicity.Onaverage,painscoreswerelowerat30daysfollow-upcomparedtopre-operativeandpatientsreceived4.1differenttypesofopioidsduringtheirinpatientstay,withamajorityofpatientsswitchingbetweenhydrocodoneandoxycodone.Totalkneereplacementrepresented61.6%followedby20.0%thoracotomy,16.4%mastectomyand2.0%distalradiusfracture.Atdischarge,themajorityofpatientsreceivedoxycodone(69.15%)andhydrocodone(15.29%).Inmastectomy,47.89%receivedhydrocodoneand44.05%receivedoxycodone.ForTKA,78.20%receivedoxycodone,followedby8.90%receivingtramadol.Follow-uppainwassimilaracrossthe4surgeries,howeverthefollow-uppaindifferedbyopioidsreceivedwithpatientsonoxymorphonehavingthehighestfollow-uppain(6.24)andpatientsonpropoxyphenehavingthelowestpain(1.29,p<.0001).DiscussionInthisstudythatexaminespost-operativeoutcomesandprescriptionsinareal-worldsetting,opioidprescribingpatternsvariedsignificantlyacrosssurgerytype.Ourdatasuggestcodeinewasassociatedwithlowerfollow-uppaininTKAcomparedtootheropioids.Thisdatafromreal-worldevidencesuggeststhatwecanusesuchmethodologytoidentifyacohortofpatientsthatmaybetargetedforgenotypingforpersonalizedmedicine.TargetingpatientswithpoorpainrelieffromopioidsthatrequireCYP-2D6foractivationcouldidentifypatientswithgenevariationsthataffectopioidmetabolism.Futurestudiescouldlookatwhatvariantsthatcouldaffectpatients'metabolismforcodeine.

Page 58: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

49

REGIONALHETEROGENEITYINGENEEXPRESSION,REGULATIONANDCOHERENCEINHIPPOCAMPUSANDDORSOLATERALPREFRONTALCORTEX

ACROSSDEVELOPMENTANDSCHIZOPHRENIA

LeonardoCollado-Torres1,EmilyE.Burke1,AmyPeterson1,JooHeonShin1,RIchardE.Straub1,AnanditaRajpurohit1,StephenA.Semick1,WilliamS.Ulrich1,BrainSeq

Consortium,CristianValencia1,RanTao1,AmyDeep-Soboslay1,ThomasM.Hyde1,JoelE.Kleinman1,DanielRWeinberger1,+,AndrewE.Jaffe1,+

1LieberInstituteforBrainDevelopment,Baltimore,MD,USA

Background:Wepreviouslyidentifiedwidespreadgenetic,developmental,andschizophrenia-associated(SCZD)changesinpolyadenylatedRNAsinthedorsolateralprefrontalcortex(DLPFC),butthelandscapeofhippocampal(HIPPO)expressionusingRNAsequencingislesswell-explored.

Methods:WeperformedRNA-sequsingRiboZeroon900RNA-seqsamplesacross551individuals(SCZDN=286)inDLPFC(N=453)andHIPPO(N=447).WequantifiedexpressionofmultiplefeaturesummarizationsoftheGencodev25referencetranscriptome,includinggenes,exonsandsplicejunctions.Withinandacrossbrainregions,wemodeledage-relatedchangesincontrolsusinglinearsplines,integratedgeneticdatatoperformexpressionquantitativetraitloci(eQTL)analyses,andperformeddifferentialexpressionanalysescontrollingforobservedandlatentconfounders.

Results:WeidentifiedwidespreaddevelopmentalregulationbetweentheDLPFCandHIPPOoveragingwith10,839genesdifferentiallyexpressed(Bonferroni<0.01)andreplicatinginBrainSpan(n=79tissuesamples,DLPFC=40,HIPPO=39).Ofthesegenes,5,982(55%)containdifferentiallyexpressedexonsandsplicejunctionsthatreplicatedinBrainSpan.Byextendingqualitysurrogatevariableanalysis(qSVA)tomultiplebrainregions,weidentified48and245differentiallyexpressedgenes(DEG)bySCZDdiagnosis(FDR<5%)inHIPPOandDLPFC,respectively,withsurprisinglyminimaloverlapinDEGbetweenthetwobrainregions.Wefurtheridentified205,618brainregion-dependenteQTLs(FDR<1%)andfoundthat124GWASrisklocicontaineQTLsinatleastoneoftheregions.Wealsoidentifypotentialmolecularcorrelatesofinvivoevidenceofalteredprefrontal-hippocampalfunctionalcoherenceinschizophrenia.ThroughoureQTLbrowserresourcehttp://eqtl.brainseq.org/wehavemadealleQTLssetsavailableforfurtherexploration.

Discussion:Weshowextensiveregionalspecificityofdevelopmentalandgeneticregulation,andSCZD-associatedexpressiondifferencesbetweenHIPPOandDLPFC.Theseresultsunderscorethecomplexityandregionalheterogeneityofthetranscriptionalcorrelatesofschizophrenia,andsuggestfutureschizophreniatherapeuticsmayneedtotargetmolecularpathologieslocalizedtospecificbrainregions.

Page 59: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

50

FULL-LENGTHSEQUENCEASSEMBLYANDCHARACTERIZATIONOFHIGHLYPURIFIEDCIRCRNAISOFORMS

SupriyoDe,AmareshC.Panda,MyriamGorospe

LaboratoryofGeneticsandGenomics,NationalInstituteonAgingIRP,NIHDe,SupriyoCircularRNAsarealargeheterogenousclassofhighlystablenoncodingRNAsbuttheyarepoorlycharacterized.ManysoftwaretoolsexistforidentifyingcircularRNAsbyfindingtheircircularizingjunctions,butverylittleisknownaboutthesequenceoftheirfulllengthortheirisoforms/alternatelysplicedforms.TheassemblyandcharacterizationofisoformsisalsolimitedbythelackofmethodologiestoextracthighlypurecircRNAs.Whileexoribonuclease(RNaseR)treatmentiswidelyusedtodegradelinearRNAsandenrichcircRNAsfromtotalRNA,itdoesnotefficientlyeliminatealllinearRNAs.Thislimitationcomplicatestheassemblyprocesstogetfull-lengthcircRNAs.HerewedescribeanovelmethodforisolatinghighlypurecircRNApopulationsinvolvingRNaseRtreatmentfollowedbyPolyadenylationandpoly(A)+RNADepletion(RPAD),whichremoveslinearRNAtonearcompletion.OncetheRNApopulationishighlyenriched,sequenceassemblyalgorithmssuchasCufflinkscanbeusedtoidentifythebodyofthecircRNA,whilethecircularizing/back-splicedjunctionscanbefoundusingmanydifferentsoftwaretoolssuchasCircexplorer,CIRIetc.High-throughputsequencingofRNApreparedusingRPADfromhumancervicalcarcinomaHeLacellsandmouseC2C12myoblastsfollowedbythisnovelanalysispipelineledtoidentificationofmanycircRNAisoformswithanidenticalback-splicesequence(circularizingjunction)butwithdifferentbodysizesandsequences.AsoneofthemainfunctionsofcircRNAsisspongingregulatoryRNAsandproteins,full-lengthcharacterizationofcircRNAisoformswillbecriticalforenablingthefunctionalcharacterizationofcircRNAs.Acknowledgement:ThisresearchwassupportedbyIntramuralResearchProgramoftheNationalInstituteonAging,NIH.

Page 60: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

51

ACOMPREHENSIVEREVIEWANDASSESSMENTOFEXISTINGPATHWAYANALYSISAPPROACHES

Tuan-MinhNguyen1,AdibShafi1,TinNguyen2,SorinDraghici1

1DeptofComputerScience,WayneStateUniversity;2DeptofComputerScience,

UniversityofNevadaDraghici,SorinInmanyhigh-throughputexperiments,itiscrucialtounderstandthebiologicalmechanismsofgenesandtheirproductsfromexpressiondata.Pathwayanalysisisacrucialstepinanyphenotypecomparisonbecauseitallowsustogaininsightsintotheunderlyingbiologicalphenomena.Becauseoftheimportanceofthistypeofanalysis,morethan35pathwayanalysismethodshavebeenproposedsofar.Thesecanbecategorizedintotwomaincategories:non-pathwaytopologybased(non-TB)andtopology-based(TB)approaches.Non-TBmethodsconsiderpathwaysassimplegenesetsandignorethepositionandroleofthegenes,aswellasthedirectionandtypeofsignalsdescribedbythepathwaywhileTBmethodsincludethisadditionalinformationintheanalysis.Althoughtherearesomereviewpapersdiscussingthistopic,therehasbeennostudythatsystematicallyassessestheperformancesofthemethodsusinganunbiasedandlargenumberofdatasetsavailable.Furthermore,themajorityofthepathwayanalysisapproachesrelyontheassumptionofuniformityofp-valuesunderthenullhypothesis,whichisnotalwaystrue.Noneoftheseexistingreviewstaketheperformancesofthestudiedmethodsunderthenullintoaccountintheircomparisons.Inordertoprovideanaccurateandobjectiveassessmentsothatresearchersandbiologistscanchooseamethodsuitablefortheirpurpose,weprovideanextensiveanalysisof11widelyusedpathwayanalysismethodsfrombothnon-TBandTBgroupsusing2601samplesfrom75humandiseasedatasetsand8methodsusing121samplesfrom11knock-outmousedatasets.Inaddition,weinvestigatetheextenttowhicheachmethodisbiasedunderthenullhypothesis.Overall,theresultshowsTBmethodsperformbetterthannon-TBmethodssincetheytakeintoconsiderationthetopologyinformationandsignalpropagation.Viapermutationandbootstrap,wediscoveranothercriticalconclusionthatmostifnotalllistedapproachesarebiasedandproduceveryskewedresultsunderthenull.

Page 61: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

52

ANEWPHYLOGENETICSAMPLINGMETHODUSINGGENERALIZED-ENSEMBLEALGORITHM

TetsuFurukawa,HiroyukiToh

DepartmentofBiomedicalChemistry,SchoolofScienceandTechnology,Kwansei-GakuinUniversity,Sanda,Hyogo,Japan669-1337

Furukawa,TetsuBayesianinferencehasbeenwidelyutilizedfortheevolutionaryanalysisincludingphylogenetictreereconstruction,whereMonteCarlosamplingsuchasMarkovchainMonteCarlo(MCMC)orMetropolis-coupledMCMC(MC3)generatesaposteriordistribution.MonteCarlosamplingisalsoutilizedformolecularsimulationofbiopolymerslikeproteinsandDNA.Oneoftherepresentativemethodsisthereplicaexchangealgorithm,whichisequivalenttoMC3inthemolecularphylogeny.Besidesthereplicaexchangealgorithm,severaldifferentsamplingmethodshavebeendevelopedformolecularsimulation,whicharecollectivelytermedasthegeneralizedensemblealgorithm.Inthisstudy,weexaminedthepossibilitytoapplytheotheralgorithmsbelongingtothegeneralizedensemblealgorithmtothetreereconstruction,inordertodevelopmoreefficientsamplingmethodforthemolecularphylogeny.TheprogramimplementedwiththeothergeneralizedensemblealgorithmwasdevelopedbasedonthesourcecodeofBEASTversion2.5.1.Toevaluatetheperformance,artificialalignmentsweregenerated,sothattheposteriordistributionsofthecorrespondingtreesaredifficulttoberegeneratedbysampling,i.e.thedistributionwithmultiplepeaks.Weappliedourprogramandexistingtoolstotheartificialdata.Then,wecomparedtheresultssuchasthetimesrequiredfortheconvergenceandthedegreeofregenerationoftheposteriordistributions.Thebenefitandpitfallsofourprogramwillbediscussedbasedonthecomparison.

Page 62: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

53

CONVERGENTMECHANISMSPERTURBEDBYSCATTEREDSNPSSUSCEPTIBLETOALZHEIMER'SDISEASE

JialiHan1,2,EdwinBaldwin1,JinZhou3,FeiYin4,5,HaiquanLi1,6,

1UniversityofArizona,DepartmentofBiosystemsEngineering;2UniversityofArizona,

DepartmentofSystemsandIndustrialEngineering;3UniversityofArizona,DepartmentofPublicHealth;4UniversityofArizona,DepartmentofPharmacology;5UniversityofArizonaCenterfor

InnovationinBrainScience;6UniversityofArizonaCenterforBiomedicalInformaticsandBiostatistics

Han,JialiAlzheimer'sDisease(AD)isthemostprevalentneurodegenerativedisorderaffecting

approximately50millionpeopleworldwide.Genome-wideassociationstudies(GWAS)haveidentifiedhundredsofsinglenucleotidepolymorphisms(SNPs)associatedwithAD,whiletheeffectsizeofeachindividualSNPislargelymodest.Themolecularmechanismsunderlyingtheseassociationsareyettobeunderstood.OurrecentgenomicanalysisfocusedonunveilingcommondownstreambiologicaleffectorsofintergenicSNPsassociatedwithAD,aimingtounderstandtheinteractive-andsynergeticeffectsthatthegeneticvariantsacrossnon-codingandintergenicregionsareplayinginthepathogenesisofAD.Inthisstudy,datafromGWASandexpressionquantitativetraitlocus(eQTL)studiesbyGTExprojectareintegrated,anddownstreamfunctionalsimilaritybetweentwoSNPsisimputedusinganenhancedmultiscaleinformationtheoreticdistancemodel[1].ThesignificancelevelsaredeterminedthroughextensivepermutationsoftheeQTL-derivedmultiscalenetworkformRNAoverlap,functionalsimilarityandsharedbiologicalprocesses[2].ConvergentmolecularmechanismsbasedongeneontologyareprioritizedatFDR<0.05.

TheprioritizedmechanisticnetworkforADrendersseveralfunctionalmodulesperturbedbyeithercis-eQTLortrans-eQTLelements,correspondingtomultiplecommonmechanismsdownstreamofdistincteQTLswithsomeofthembeingcross-chromosome.Forinstance,SNPsonchromosomessixandonearebothassociatedwithantigenprocessingandpresentationviaregulatingmultiplehumanleukocyteantigengenes(e.g.,HLA-DRB1andHLA-DQA1)andcytokinegenes,suggestingthegeneticinvolvementoftheimmunesystemsandneuroinflammationinthepathogenesisofAD.SNPsonchromosome17andchromosome19co-regulategenesinvolvedinsynaptictransmission,whichisessentialforneuronscommunicationanditsdysfunctionisknowninADleadingtomemoryloss.Otherthancross-chromosomeSNPs,independentintergenicSNPsonthesamechromosomealsoprovideinsightstoADgeneticrisks.ApairofSNPsonchromosome17isprioritizedbyourmethodthroughtheirconvergentassociationwiththeMAPTgene,whichencodestauprotein,regulatesaxonextension,andisknownasariskfactorofavarietyofneurodegenerativedisordersincludingnotonlyADbutalsoFrontotemporaldementiaandParkinsondisease.AnotherpairofSNPsonchromosome19isprioritizedbytheircommonassociationwiththeABCA7gene,whichregulateslipidmetabolismacrosscellularmembranesandissuggestedtobesusceptiblelociforthelate-onsetAD.

ThisstudysuggestsanewstrategyconnectingscatteredAD-susceptiblegeneticvariantswithriskgenesandconvergentdownstreammechanismsimplicatedinADpathogenesis.TheresultswillhelptounderstandhowgeneticvariantsandunderlyingfunctionalmodulesworkinteractivelyandsystematicallytowardADonsetandcouldthusidentifygenetics-specificmoleculartargetsandinspirenewpersonalizedtherapeuticstrategies.[1]Li,H.,etal.npjGenomicMedicine1:16006,2016.[2]Han,J.,etal.PSB,2018,pp.524-535.

Page 63: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

54

IDENTIFICATIONANDEVALUATIONOFCO-EXPRESSIONGENENETWORKSFORPACLITAXEL-INDUCEDPERIPHERALNEUROPATHYINBREASTCANCER

SURVIVORS

KordM.Kober1,JonD.Levine2,JudyMastick1,BruceCooper1,StevenPaul1,ChristineMiaskowski1

1UCSFSchoolofNursing,2UCSFSchoolofMedicine

Kober,KordChronicchemotherapy-inducedperipheralneuropathy(CIPN)isthemostcommonandsevereadversedrugreactionassociatedwithneurotoxicchemotherapy(CTX)withprevalenceratesthatrangefrom30%to70%incancersurvivors.NopharmacologicinterventionsareavailabletopreventCIPN.LackofknowledgeofthefundamentalmechanismsthatunderlieCIPNthwartoureffortstodevelopinterventionstopreventortreatit.IncreasedknowledgeofCIPN'smolecularmechanismscouldidentifytherapeutictargetsforthiscondition.FindingsfromanimalstudiessuggestthatanumberofdiversemechanismsareinvolvedinthedevelopmentofchronicPIPNincludingdamagetoDRGcellbodies;microtubuleassociatedtoxicity;inflammation;distalaxonalinjury;damagetotheperipheralvasculature;modulationofionchannels;andmitochondrialdysfunction.TaxolisacommonCTXdrugthatisassociatedwiththedevelopmentofCIPN.Paclitaxel-inducedperipheralneuropathy(PIPN)isthedoselimitingtoxicityofthisCTXdrug.ThepurposeofthispilotstudywastoevaluateforcoordinatedexpressionvariationsofgenesinRNAextractedfromperipheralbloodfrombreastcancersurvivors,andfromthesemodulesidentifyco-expressedgenesthatareassociatedwithchronicPIPN.GeneexpressioninperipheralbloodwasassayedusingRNA-seqinasampleofbreastcancer(BC)survivorswhodid(n=25)anddidnot(n=25)developPIPN.BCsurvivorswithPIPNweresignificantlyolder;morelikelytobeunemployed;reportedloweralcoholuse;hadahigherBMIandapoorerfunctionalstatus;andhadahighernumberoflowerextremitysiteswithlossoflighttouch,cold,andpainsensations,andhighervibrationthresholds.NobetweengroupdifferenceswerefoundinthecumulativedoseofpaclitaxelreceivedorinthepercentageofpatientswhohadadosereductionordelayduetoPIPN.Co-expressionnetworkanalysiswasperformedtoidentifymodulesofgeneswithhighlycorrelatedexpressionusingthetop5000mostvariantgenes.Thirteencolor-codedmodulesweredetectedranginginsizefrom36to1653genes.Theeigengenesofthe"black"module(n=1653genes)weresignificantlycorrelatedwiththeCIPNphenotype(PearsonR2=0.224,p=0.02).GOenrichmentwasfoundininflammation-relatedterms(e.g.,C-Cchemokinereceptoractivity,Chemokine-mediatedsignalingpathway,Tcellco-stimulation).Functionalproteinassociationnetworkanalysisidentifiedanenrichmentofprotein-proteininteractions(p<0.0002)includinghighlyconnectedgenesthathavepreviouslybeenidentifiedtoberelatedtoCIPN(i.e.,Gprotein-coupledreceptor55,GPR55,andC-X-CMotifChemokineReceptor5,CXCR5).Toourknowledge,thisisthefirststudytoapplysystemsbiologyapproachesusingcirculatingbloodRNA-seqdatainasampleofbreastcancersurvivorswithandwithoutchronicPIPN.WerevealednetworksandcandidategenesassociatedwithchronicPIPNrelatedtoinflammation,andsuggestgenesforvalidationandaspotentialtherapeutictargets.

Page 64: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

55

VARIFI-WEB-BASEDAUTOMATICVARIANTIDENTIFICATION,FILTERINGANDANNOTATIONOFAMPLICONSEQUENCINGDATA

MilicaKrunic1,PeterVenhuizen2,LeonhardMüllauer3,BettinaKaserer3,ArndtvonHaeseler1,4

1CenterforIntegrativeBioinformaticsVienna,MaxF.PerutzLaboratories,Universityof

Vienna,MedicalUniversityofVienna,Dr.Bohrgasse9,1030Vienna,Austria;2DepartmentofAppliedGeneticsundCellBiology,UniversityofNaturalResourcesandLifeSciences,Muthgasse18,1190Vienna,Austria;3InstituteofPathology,Medical

UniversityVienna,WähringerGürtel18-20,1090Vienna,Austria;4BioinformaticsandComputationalBiology,FacultyofComputerScience,UniversityofVienna,Vienna,

Austria

Krunic,MilicaFastandaffordablebenchtopsequencersarebecomingmoreimportantinimprovingpersonalizedmedicaltreatment.Still,distinguishinggeneticvariantsbetweenhealthyanddiseasedindividualsfromsequencingerrorsremainsachallenge.HerewepresentVARIFI,apipelineforfindingreliablegeneticvariants(SNPsandINDELs).WeoptimizedparametersinVARIFIbyanalyzingmorethan170ampliconsequencedcancersamplesproducedonthePersonalGenomeMachine(PGM).Incontrasttoexistingpipelines,VARIFIcombinesdifferentanalysismethodsand,basedontheirconcordance,assignsaconfidencescoretoeveryidentifiedvariant.Furthermore,VARIFIappliesvariantfiltersforbiasesassociatedwiththesequencingtechnologies(e.g.incorrectlycalledhomopolymer-associatedindelswithIonTorrent).VARIFIautomaticallyextractsvariantinformationfrompubliclyavailabledatabasesandincorporatesmethodsforvarianteffectprediction.VARIFIrequiresonlylittlecomputationalexperienceandnoin-housecomputepowersincetheanalysesaredoneonourserver.VARIFIisaweb-basedtoolavailableatvarifi.cibiv.univie.ac.at.

Page 65: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

56

STATISTICALINFERENCERELIEF(STIR)FEATURESELECTION

TrangT.Le1,RyanJ.Urbanowicz1,JasonH.Moore1,BrettA.McKinney2

1InstituteofBiomedicalInformatics,DepartmentofBiostatistics,EpidemiologyandInformatics,UniversityofPennsylvania,Philadelphia,PA;2TandySchoolofComputer

Science,DepartmentofMathematics,UniversityofTulsa,Tulsa,OKLe,TrangMotivation:Identifyingrelevantfeaturesinhigh-dimensionaldatacanbechallengingwhentheireffectonanoutcomemaybeobscuredbyacomplexinteractionarchitecture.Usingnearestneighbors,Relief-basedalgorithmsaccountforstatisticalinteractionswhenselectingfeatures.However,Relief-basedestimatorsarenon-parametricinthestatisticalsensethattheydonothaveaparameterizedmodelwithanunderlyingprobabilitydistributionfortheestimator,makingitdifficulttodeterminethestatisticalsignificanceofRelief-basedattributeestimates.Thus,astatisticalinferentialformalismisneededtoavoidimposingarbitrarythresholdstoselectthemostimportantfeatures.Method:WereconceptualizetheRelief-basedfeatureselectionalgorithmtocreateanewfamilyofSTatisticalInferenceRelief(STIR)estimatorsthatretainstheabilitytoidentifyinteractionswhileincorporatingsamplevarianceofthenearestneighbordistancesintotheattributeimportanceestimation.ThisvariancepermitsthecalculationofstatisticalsignificanceoffeaturesandadjustmentformultipletestingofRelief-basedscores.Specifically,wedevelopapseudot-testversionofRelief-basedalgorithmsforcase-controldata.Results:WedemonstratethestatisticalpowerandcontroloftypeIerroroftheSTIRfamilyoffeatureselectionmethodsonapanelofsimulateddatathatexhibitspropertiesreflectedinrealgeneexpressiondata,includingmaineffectsandnetworkinteractioneffects.WeshowedthatthestatisticalperformanceusingSTIRp-valuesisthesameasusingpermutationp-valuesbutmuchmorecomputationallyefficient.WecomparetheperformanceofSTIRwhentheadaptiveradiusmethodisusedasthenearestneighborconstructorwithSTIRwhenthefixed-knearestneighborconstructorisused.ApplyingSTIRtorealRNA-Seqdatafromastudyofmajordepressivedisorder,wefoundthat32significantSTIRgenesincludeall8significantgenesfromstandardt-test.STIRgenesoutsideoftheintersectionwitht-testmaybegoodcandidatesforinteractioneffects.Conclusion:STIRisthefirstmethodtouseatheoreticaldistributiontocalculatethestatisticalsignificanceofReliefattributescoreswithoutthecomputationalexpenseofpermutation.ThisvalidatestheSTIRpseudot-testandmeansonecanuseitinsteadofcostlypermutationtesting.STIRformalismgeneralizestoallRelief-basedneighborfindingalgorithms,includingMultiSURF.k=m/6offersabetterdefaultthanthepervasiveuseofk=10,whichisanarbitrarychoiceintheearlyliterature.ExtensionsofSTIRwillinvolvemulti-classdata,quantitativetraitdata(regression)andcorrectionforcovariates.Similarly,weenvisionregression-STIRtofollowalinearmodelformalism.FuturestudieswillapplySTIRtoGWASaswellaseQTLandotherhighdimensionaldatatoidentifyinteractioneffects.

Page 66: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

57

DEEPLEARNING-BASEDLONGITUDINALHETEROGENEOUSDATAINTEGRATIONFRAMEWORKFORAD-RELEVANTFEATUREEXTRACTION

GaramLee1,KwangsikNho2,ByungkonKang1,Kyung-AhSohn1,DokyoonKim3

1AjouUniversity,2IndianaUniversitySchoolofMedicine,3Geisinger

Kim,DokyoonAlzheimer'sdisease(AD)isaprogressiveneurodegenerativeconditionmarkedbyadeclineincognitivefunctionswithnovalidateddiseasemodifyingtreatment.ItiscriticalfortimelytreatmenttodetectADinitsearlierstagebeforeclinicalmanifestation.Mildcognitiveimpairment(MCI)isanintermediatestagebetweencognitivelynormalolderadultsandAD.TopredictconversionfromMCItoprobableAD,weappliedadeeplearningapproach,multimodalrecurrentneuralnetwork.Wedevelopedanintegrativeframeworkthatcombinesnotonlycross-sectionalneuroimagingbiomarkersatbaselinebutalsolongitudinalcerebrospinalfluid(CSF)andcognitiveperformancebiomarkersobtainedfromtheAlzheimer'sDiseaseNeuroimagingInitiativecohort(ADNI).Theproposedframeworkintegratedlongitudinalmulti-domaindatawithmissingvalues.ThepythonpackageLIFAD(Deeplearning-basedLongitudinalheterogeneousdataIntegrationFrameworkforAD-relevantfeatureextraction)providespre-constructeddeeplearningarchitectureforaclassificationtask.Ourresultsshowedthat1)ourpredictionmodelforMCIconversiontoADyieldedupto75%accuracy(areaunderthecurve(AUC)=0.83)whenusingonlyasinglemodalityofdataseparately;and2)ourpredictionmodelachievedthebestperformancewith80%accuracy(AUC=0.86)whenincorporatinglongitudinalmulti-domaindata.Amulti-modaldeeplearningapproachhaspotentialtoidentifypersonsatriskofdevelopingADwhomightbenefitmostfromaclinicaltrialorasastratificationapproachwithinclinicaltrials.

Page 67: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

58

MICROBIOMEANALYSISOFUNEXPLAINEDCASESOFPNEUMONIAINSOUTHKOREA

SooyeonLim,JaeKyungLee,JiYunNoh,WooJooKim

DepartmentofInternalMedicine,GuroHospital,KoreaUniversityLim,SooyeonNasalswabsampleswereobtainedfrompatientswithsymptomsofpneumoniathroughthetertiaryhospital-basedinfluenzasurveillancesysteminSouthKoreaduring2011-2017.Althoughthesymptomsweresuspectedtobeofviralcausepneumonia,collectedsampleswereconfirmednegative,usingtherespiratoryviruspanel,for16commonrespiratorypathogens,inadditiontothefollowingfiveviruses:EnterovirusD68,WUpolyomavirus,KIpolyomavirus,Parechovirustype1,3,6,andPteropineorthoreovirus.Therefore,16SrRNAscreeningwasperformedtostudythemicrobiomecommunityofthepatients.V3andV4sequencesof16SrRNAwereobtainedusingNexteraXTDNAlibrarypreparationkitandMiSeqReagentKitv3(Illumina).Microbiomeprofilesof92patientsampleswereobtainedthroughIlluminaMiSeq.Thetotaltaxonomiccompositionofthesamplesconsistedof99bacterialgenus,whosesequencesweredetectedinmorethan1%ofthesamples.Commonbacterialpathogenswerepresentaseithersinglepathogenorincombinationwithotherorganismsinthepatientsamples.Althoughsamplescollectedweredifferentinconditions,suchasage,gender,location,andseason,commondominantgenusofbacteriacommonlyknownaspathogenswererevealed.Themostdominantgeneraofbacteriawerethefollowing:Streptococcus,Corynebactierum,Haemophilus,Rhizobium.Basedoncomparativeanalysisofgenuscompositionsaresimilarbutdemonstratedthedifferenceinmicrobialcompositionbetweenagegroups.Wetriedtoisolationdominantcoloniesthroughthemediacultureforwholegenomesequencingandisolatedsinglecolonyand8speciesareidentifiedusingsangersequencing.Aftermoreisolationofsinglecolonies,wewillfocusedonwholegenomesequencingtofindoutreasonofpneumoniasymptomsindetail.

Page 68: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

59

POTRA:PATHWAYANALYSISOFCANCERGENOMICSDATAINTHECLOUD

MargaretLinan1,2,JunwenWang1,2,ValentinDinu1,2

1DepartmentofBiomedicalInformatics,ArizonaStateUniversity,Scottsdale,Arizona,

USA;2DepartmentofHealthSciences,MayoClinic,Scottsdale,Arizona,USA

Dinu,ValentinWehaverecentlydevelopedPoTRA(PathwaysofTopologicalRankAnalysis),anovelalgorithmthatusestheGoogleSearchPageRankalgorithmtoidentifybiologicalpathwaysinvolvedincancer.Theanalyticalapproachismotivatedbytheobservationthatlossofconnectivityisacommontopologicaltraitofgeneregulatorynetworksincancer.WeleveragedtheCancerGenomicsCloudenvironmentandappliedPoTRAtoanalyzeTheCancerGenomeAtlas(TCGA)genomicdata,ahigh-qualitypubliclyavailabledatasetoftumorandmatchednormalsamples.Thetopmostinfluentialpathwaysandmostdysregulatedpathwaysin17TCGAprojectswerefound,usingtheKEGG(KyotoEncyclopediaofGenesandGenomes)pathwaydatabase.Overall,pathwaysincanceristhemostcommondysregulatedpathway,andtheMAPKsignalingpathwayisthemostinfluential,whilethepurinemetabolismpathwayisthemostsignificantlydysregulatedmetabolicpathway.Additionally,genomicanalysisworkflowswerecreatedusingdockerandrabixforthedetectionofmRNAmediateddysregulatedpathwaysintheopenaccessTCGArepositorywiththePoTRAtoolintheCGCplatform.Ourapproachillustratestheadvantagesofemployingpowerfulcomputationalmethodstoanalyzelargegenomicdatasetswiththeaimofimprovingourunderstandingofcancerandidentifyingbetterdiagnosesandtreatments.

Page 69: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

60

EVALUATINGCELLLINESASMODELSFORMETASTATICCANCERTHROUGHINTEGRATIVEANALYSISOFOPENGENOMICDATA

KeLiu1,PatrickA.Newbury1,BenjaminS.Glicksberg2,WilliamZeng2,EranR.Andrechek3,BinChen1

1DepartmentofPediatricsandHumanDevelopment,CollegeofHumanMedicine,MichiganStateUniversity,GrandRapids,MI,USA;2BakarComputationalHealthSciencesInstitute,UniversityofCaliforniaSanFrancisco,SanFrancisco,CA,USA;

3DepartmentofPhysiology,MichiganStateUniversity,EastLansing,MI,USAChen,BinMetastasisisthemostcommoncauseofcancer-relateddeathand,assuch,thereisanurgentneedtodiscovernewtherapiestotreatmetastasizedcancers.Cancercelllinesarewidely-usedmodelstostudycancerbiologyandtestdrugcandidates.However,itisstillunknowntowhatextenttheyadequatelyresemblethediseaseinpatients.Therecentaccumulationoflarge-scalegenomicdataincelllines,mousemodels,andpatienttissuesamplesprovidesanunprecedentedopportunitytoevaluatethesuitabilityofcelllinesformetastaticcancerresearch.Inthiswork,weusedbreastcancerasacasestudy.Thecomprehensivecomparisonofthegeneticprofilesof57breastcancercelllineswiththoseofmetastaticbreastcancersamplesrevealedsubstantialgeneticdifferences.Inaddition,weidentifiedcelllinesthatmorecloselyresembledifferentsubtypesofmetastaticbreastcancer.Surprisingly,acombinedanalysisofmutation,copynumbervariationandgeneexpressiondatasuggestedthatMDA-MB-231,themostcommonlyusedtriplenegativecelllineformetastaticbreastcancerresearch,hadlittlegenomicsimilaritywithBasal-likemetastaticbreastcancersamples.Wefurthercomparedcelllineswithorganoids,anewtypeofpreclinicalmodelwhicharebecomingmorepopularinrecentyears.Wefoundthatorganoidsoutperformedcelllinesinresemblingthetranscriptomeofmetastaticbreastcancersamples.However,additionaldifferentialexpressionanalysissuggestedthatbothtypesofmodelscouldnotmimictheeffectsoftumormicroenvironmentandmeanwhilehadtheirownbiastowardsmodelingspecificbiologicalprocesses.Ourworkprovidesaguideofcelllineselectioninmetastasis-relatedstudyandshedslightonthepotentialoforganoidsintranslationalresearch.

Page 70: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

61

PATHWAYANALYSISOFEHRANDNON-EHR-BASEDGWASCONNECTSLIPIDMETABOLISMTOTHEIMMUNERESPONSE

JasonE.Miller1,ThomasJ.Hoffmann2,3,ElizabethTheusch4,CarlosIribarren5,MarisaW.Medina4,NeilRisch2,3,5,RonaldM.Krauss4,MarylynD.Ritchie1

1DepartmentofGenetics,UniversityofPennsylvania,Philadelphia,PA,USA;2Institutefor

HumanGenetics,UniversityofCalifornia,SanFrancisco,SanFrancisco,CA,USA;3DepartmentofEpidemiologyandBiostatistics,UniversityofCalifornia,SanFrancisco,SanFrancisco,CA,USA;4Children’sHospitalOaklandResearchInstitute,Oakland,CA,USA;5DivisionofResearch,KaiserPermanente,NorthernCalifornia,Oakland,CA,USA

Miller,JasonPathway-analysisisacommonlyusedmethodtointerpretgenome-wideassociationstudy(GWAS)results.Recentlyithasbeenillustratedthatelectronic-health-record(EHR)datafromasingle-cohortcanbeusedtoperformGWAS.However,itisunclearhowthisnewstudydesignmightaffectreplicationofpathway-levelresultswhencomparedtoanon-EHR-basedGWAS.ItisalsounclearhowanEHR-basedstudywillaffectdownstreamanalysessuchastheidentificationofgenesthatareassociatedwithsaidpathways.Weproposeevaluatingthepathway-levelsimilaritiesfromanalysesoftwoseparateGWASstudiesthatuseddifferentmethodologiestoinvestigatethesametraits.Here,weemploythesoftwarePARIS(PathwayAnalysisbyRandomizationIncorporatingStructure)tocomparesummary-levelresultsacrossstudies,thusmakingitmoregeneralizable.PARISgeneratesrandomizedcollectionsoffeatureswhichmimicpathwaystocalculateempiricalp-values.ThisprocessreducestypeIerrorandthemultipletestingburden.WecomparedEHRtonon-EHR-basedGWASresultsusingfourdifferentlipidtraits:low-densitylipoprotein(LDL),high-densitylipoprotein(HDL),triglycerides(TG),andtotalcholesterol(TC).ThedatacamefromtwoGWAS,theGeneticEpidemiologyResourceonAdultHealthandAging(GERA),asingle-cohortEHR-basedGWAS,andtheGlobalLipidsGeneticsConsortium(GLGC),whichusedameta-analysisstudydesign.KEGGpathwaysexpectedtoexplainvariationinlipidvaluessuchas"cholesterolmetabolism"and"PPARsignalingpathway"wereidentifiedfrombothstudies.Moreover,therewasasignificantoverlapbetweenthepathwaysidentifiedbetweenstudiesforthesametraits(p<1x10^-14).Thus,specificpathwayscanbereplicatedacrossdistinctcohortsandstudydesigns.Severalpathwaysmadeupofgeneswhoseproteinsareimportantforanimmuneresponsewereidentifiedinbothdatasetsandacrossmultiplelipidtraits.Toseeiflipidmodifyingtherapyaffectsthesamepathwaysofinterest,weperformedpathwayanalysisofCAPRNA-seqexpressionfromTheusch,E.,etal.,2016,whichmeasuredexpressioninimmortalizedcellspreandpoststatinexposure.AmongthepathwaysrepresentedinboththePARISresults(p<0.01)andLCLRNA-seqgenesetenrichmentresults(FDR<25%)are"cholesterolmetabolism"(CM)and"HepatitisC"(HC)pathways.HepatitisCvirus(HCV)infectioncancausechronicliverdiseaseandisassociatedwithahostoflipidandlipoproteinmetabolicdisorders.PARIScanalsoidentifygenesthatarestatisticallysignificantwithineachpathway.Interestingly,inboththeLDLandTCGWAS,thegenethatwassignificantlyassociatedwithboththeCMandHCpathwayswaslowdensitylipoproteinreceptororLDLR,agenethataffectsbothlipidmetabolismandHCviralactivity.Statins,incombinationwithothertherapies,canincreaseefficacyofantiviraltherapybyblockingviralreplication.OurresultshighlighttheneedforfurtherinvestigationintohowgeneticvariationaffectsoutcomesfromthetreatmentofHCVwithstatins,particularlywithrespecttolociassociatedwithlipidtraits.Inconclusion,pathway-levelanalysisofGWASsummary-levelresultscanbeusedtocharacterizesimilaritiesacrossEHRandnon-EHR-basedstudiesandimprovebiologicalinterpretation.

Page 71: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

62

META-ANALYSISOFHETEROGENEITYANDBATCHEFFECTSINTHEA549CELLLINE

AbigailMoore,JohnCastorino

SchoolofNaturalSciences,HampshireCollege,Amherst,MAMoore,AbigailMeta-analysisofRNA-seqdataofferstheopportunitytoincreasereproducibilitybyintegratingdatafrommultiplestudies.SuchanalysesarechallengedbyheterogenouscellcultureandRNA-seqtechniques,whichmayconfoundorhidetruebiologicalfindings.Thus,wesoughttoidentifybatchcharacteristicsthatmostsignificantlyaffectgeneexpressioninacelllinecommontolungcancerandviralstudies.WequeriedtheNCBIGEOforRNA-seqdatafromtheA549celllineandfilteredtheresultsforpaired-enddataobtainedviatotalRNAextraction.Acrosseightstudies,wedownloadedrawRNA-seqdatafor23untreatedsamplesandcollectedcorrespondingmetadata.DifferentialexpressionanalysiswithSalmon,TXimportandedgeRidentified3,802differentiallyexpressedgenes(atleasttwofold-change,FDR<0.05).Principalvariantcomponentanalysisrevealedthatmediachoicealoneexplains54%ofexpressionvariationwithin139differentiallyexpressedlungcancerprognosticgenes.Ourfindingshighlighttheimpactofspecificbatcheffectsonbiologicallysignificantgenes.Infuturework,hopetoextendthisanalysistoconsidersinglenucleotidevariants.

Page 72: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

63

HYPERPARAMETERTUNINGFORCHIP-SEQPEAKCALLINGSOFTWARETOOLSUSINGPARALLELIZEDBAYESIANOPTIMIZATION.

DongpinOh,JinheeLee,SeonghyeonKim,DohyeonLee,DongwonChoo,GiltaeSong

SchoolofComputerScienceandEngineering,PusanNationalUniversityLee,JinheeChIP-Seqiswidelyusedtounderstandprotein-DNAinteractionandgeneregulation.InChiP-seqdataanalysis,identifyingpeaksignalsisoneofcorecomputationalsteps,butmostexistingsoftwaretoolsstillsufferfromlargeportionoffalsepositivecallsowingtosequencingerrorsandbias,inpartcausedbycopynumbervariations.ChiP-seqanalysistoolsrequirehyperparameterssetbyusersdependingonsequencingqualityandcopynumbervariationrate.However,itishardforuserstoknowthevalidvaluesofthehyperparametersbeforerunningthesoftwaretools.Inaddition,wewouldhavemorefalsepositivepeakcallsforgivenChiP-seqdataifthehyperparametersofpeakcallingtoolsarelessthanoptimal.Inthisstudy,wedevelopasoftwarepipelineforidentifyingtheoptimalvaluesofthehyperparametersinmajorChiP-seqpeakcallingtools.FirstwecollectChiP-seqdatawhosepeaksignalsarelabeledmanuallybyexperts.Thesedataareusedastrainingdatainourhyperparametertuning.Secondwedefineanobjectivefunctiontomeasuretheaccuracyofpeakcallingresults.ThenwelearnoptimalhyperparametersusingthesetrainingdataandobjectivefunctionbasedonBayesianoptimization.WeuseMatern5/2kernelfunctionfortheoptimizationandMonteCarloMarkovChainforparallelprocessing.WevalidateourapproachusingourcollectionofChiP-seqdatalabeledforaround2,000genomicsegmentsincludingpeaksornopeaks.WeapplyoursoftwarepipelineformajorChiP-seqpeakcallingtoolssuchasMACS,SICER,HOMER,andPeakSeq.

Page 73: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

64

CROSS-STUDYMETA-ANALYSISIDENTIFIESALTEREDBACTERIALSTRAINSSEPARATINGRESPONDERANDNON-RESPONDERPOPULATIONSACROSS

MULTIPLECHECKPOINT-INHIBITORTHERAPYDATASETS

JayamaryDivyaRavichandar,EricaRutherford,YongganWu,ThomasWeinmaier,Cheryl-EmilianeChow,ShokoIwai,HelenaKiefel,KareemGraham,KarimDabbagh,

ToddDeSantis

SecondGenomeRavichandar,JayamaryDivyaThegutmicrobiotahasemergedasanimportantmodulatorincancerprogressionandagrowingbodyofevidencesupportstheinfluenceofgutmicrobiotaonresponsetocancertherapy,especiallyinthecontextofcheckpointinhibitortherapy.Whileseveralstudiespresentinsightintothelandscapeofmicrobialshiftsmodulatingresponsetocheckpointinhibitors,theymaybeundulyinfluencedbycohort,sequencing-technology,anddataanalysismethods.Further,individualstudiesareoftenunder-poweredtodetectmicrobesdifferentiallyabundantinresponderandnon-responderpopulations,whichcanlimittherapeuticdevelopment.Keytomicrobiome-baseddrugdiscoveryistheidentificationofproteinswiththerapeuticpotentialthatareefficaciousacrosscohorts.Herein,existingpublisheddatasetsinthecheckpoint-inhibitorspacewereminedandintegratedviaacross-studymeta-analysistoidentifybacterialstrainsseparatingresponderandnon-responderpopulations.Wecomparedthebaselinegutmicrobiotaassociatedwithstoolsamplescollectedfromfivediscretecancerpatientcohortsundergoingcheckpoint-inhibitortherapy.Samplesweresequencedononeormoretechnologies(Illumina16SNGS,45416SNGS,andIlluminashotgunmetagenomics)andatotalofsevenpublicly-availabledatasetswereanalyzedherein.Leveragingourmulti-facetedbioinformaticsplatform,whichenablesappropriatemethod-specificqualityfilteringandstatisticaltestingtoidentifydifferentiallyabundantbacteriaatthestrain-level,wewereabletosuccessfullyintegrateanalysisresultsacrossmultiplemicrobiome-profilingtechnologies.Weperformedarandomeffectsmodelbasedmeta-analysisandidentifiedstrainsthatwereconcordantlyenrichedinresponderpopulationsacrossdatasets.Inaseparateanalysiswealsoappliednaturallanguageprocessingtothetextofcancercheckpointinhibitorstudies(availableinPubmed)inordertoobtainadditionalinsightsaboutthemicrobiomeandstrainsofinterestfrompublicationswithnorawdataavailable.Thestrainsidentifiedhereinpresentopportunitiesforminingproteinswithpotentialtoimproveresponsetocheckpointinhibitors.Thiscrossstudymeta-analysisdemonstratesthepowerofSecondGenome'sbioinformaticspipelinetoleveragepubliclyavailabledatasetsandsystematicallyintegratemicrobialshiftsnotonlyacrosssamplesfrommultiplecohortsbutalsoacrosssamplessequencedondifferenttechnologies.Ourin-housestraindatabasethatenablestaxonomicannotationdowntothestrain-levelallowedforcomparisonoffine-grainedbacterialidentitiesacrossdatasets,resolvingakeychallengewithmicrobiomemeta-analysis.Thissystematicandstatistically-drivenintegrationofdatasetsenabledidentificationofstrainsassociatedwithresponseacrossmultipleresponderpopulationsthatwerenotpreviouslyreportedintheindependentanalysisofthesedatasets.

Page 74: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

65

AHYPOTHESISOFTHESTABILIZINGROLEOFALUEXPANSIONVIAHOMOLOGYDIRECTEDREPAIROFSPONTANEOUSDNADOUBLESTRANDEDBREAKS

TanmoyRoychowdhury,AlexejAbyzov

MayoClinicAbyzov,AlexejStructuralvariations(SVs)inthehumangenomeoriginatefromdifferentmechanismsrelatedtoDNArepair,replicationerrors,andretrotransposition.Ouranalysesof26,927SVsfromthe1000GenomesProjectrevealeddifferentialdistributionsandconsequencesofSVsofdifferentorigin,e.g.,deletionsfromnon-allelichomologousrecombination(NAHR)aremorepronetodisruptchromatinorganizationwhileprocessedpseudogenescancreateaccessiblechromatin.Spontaneousdoublestrandedbreaks(DSBs)arethebestpredictorofenrichmentofNAHRdeletionsinopenchromatin.Thisevidence,alongwithstrongphysicalinteractionofNAHRbreakpointsbelongingtothesamedeletionsuggeststhatmajorityofNAHRdeletionsarenon-meiotici.e.,originatefromerrorsduringhomologydirectedrepair(HDR)ofspontaneousDSBs.Inturn,theoriginofthespontaneousDSBsisassociatedwithtranscriptionfactorbindinginaccessiblechromatinrevealingthevulnerabilityoffunctional,openchromatin.Thechromatinitselfisenrichedwithrepeats,particularlyAluelementsthatprovidethehomologyrequiredtomaintainstabilityviaHDR.Additionally,weobservedastrikingdifferencebetweendistributionsoffixedandvariableAlusacrossgenomecompartments.Throughco-localizationoffixedAlusandNAHRdeletionsinopenchromatinwehypothesizethatoldAluexpansioninhominidlineagehadastabilizingroleonthehumangenome.

Page 75: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

66

STATISTICALLEARNINGWITHHIGH-DIMENSIONALMASSCYTOMETRYDATA

PratyaydiptaRudra1,ElenaHsieh2,DebashisGhosh2

1OklahomaStateUniversity,2UniversityofColoradoDenver

Rudra,PratyaydiptaRecentdevelopmentsinsingle-cellbasedtechnologies,suchasmasscytometry(CyTOF),hasledtotheneedforcomputationalandanalyticapproachesthatcanaccommodatethehighdimensionalityandsingle-cellgranularity.TheanalysisofCyTOFdatacanelucidatenoveldiseasebiomarkersandmechanismsoftheunderlyingimmunopathology,leadingtoimprovedtreatmentsandprognosticmeasures.Theuseofsingle-celltechnologiesallowsforconsiderationofexpressionfrombothaspatialandtemporalframework.Inspiteofthepromisingnatureoftheseplatforms,muchworkremainsinordertobeabletomeaningfullyinterpretthedatainthecontextofbiologicalquestions.Whileend-to-endreproduciblemethodsexistforfluorescenceflowcytometrydataanalysis,theydonotscalewellforCyTOFdatawhichhavemuchhigherdimensionality.Thedataareoftenclusteredintocellsub-populationsfirst,whichcanthenbeusedtoanswerscientificquestionsregardingtheabundanceofcelltypesandexpressionsofspecificparameters(e.g.surfacemarkers,signalingproteins,cytokines)acrossgroups,suchasdiseaseandcontrolgroups,orstimulationregimes.Thestatisticalquestionsaboutthetree-structuredcellpopulationdatacanbevisualizedintwolayers.First,itisclinicallyinterestingtoknowiftheabundanceofthecellsubpopulationsisdifferentacrosstwoormoregroupsand/orconditions.Giventheproportionofcelltypesforeachsample,thenextquestioniswhetherthereisanydifferentialexpressionofsignalingproteinsorcytokines(functionalmeasurementsofthecellpopulationsstudied).Modelingdatawithmultiplelayersofcorrelationusingaclassicalparametricmodeloftenbecomesachallengingtask.Theclassicalparametricmodelsalsohavelimitingdistributionalassumptionssuchasnormality,whichmaynotbetrueforcytometrydata.Inordertotacklethis,wedevelopedanewstatisticallearningmethodologybasedonthekerneldistancecovarianceframeworktocomparethecelltypecompositiondifferentdiseasegroupsandstimulationconditions.High-dimensionalstatisticallearningusingakernelmachineregressionisalsodevelopedtotestthedifferenceincytokineexpressionlevelsacrossdifferentcell-typesanddifferentconditions.Themethodsareappliedtohigh-dimensionaldatasetwecollectedcontainingdifferentsubgroupsofpopulationsincludingSystemicLupusErythematosuspatientsandhealthycontrolsubjects.Thesamplesfromtheperipheralbloodofthesubjectsweretreatedusingthreedifferentstimulationmethods.Preliminaryanalysisofthedatarevealedclinicallyrelevantpatternssuchasdifferentialcelltypeabundancebetweenthediseaseandthecontrolgroup,andalsodifferentialexpressionofseveralcytokines.Forexample,theexpressionofthecytokinesMCP1,Mip1bandIL-1RAwerefoundtobedifferentamongCD14highmonocytesacrossthetwogroups.Anextensivesimulationstudytocompareourstatisticalmethodwiththeexistingapproachesiscurrentlybeingconducted.

Page 76: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

67

HARDWAREACCELERATIONOFAPPROXIMATESTRINGMATCHINGFORBOTHSHORTANDLONGREADMAPPING

DamlaSenolCali1,LavanyaSubramanian2,ZülalBingöl3,JeremieS.Kim1,4,RachataAusavarungnirun1,AnantV.Nori2,GurpreetS.Kalsi2,SreenivasSubramoney2,Saugata

Ghose1,CanAlkan3,OnurMutlu1,4

1CarnegieMellonUniversity,2IntelLabs,3BilkentUniversity,4ETHZurichKim,JeremieHighthroughputsequencing(HTS)technologyenablesfastandinexpensivegenerationofbillionsofDNAsequences(i.e.,reads)fromagenome[1,2].Toquicklyandaccuratelyprocesstheplethoraofreads,weneednewcomputationaltechniques.AnalyzingHTSdatarequiresfindingtheoriginallocationsofeachreadviaanapproximatestringmatchingprocessagainstalongreferencegenome.Approximatestringmatchingistypicallyperformedwithanexpensivedynamicprogrammingalgorithm,whichconsumesover90%ofthefirststep'sexecutiontime.Manypriorstudies[3,4]haveidentifiedthisbottleneckinmappingandhaveproposednumerousmethodsforacceleratingthisexpensivesteponawide-arrayofcomputationalplatforms.Ourgoalinthisworkistoprovideafastandefficientimplementationofapproximatestringmatchingtowardsenablingfasterreadmapping.WechoosetoaccelerateBitap[6,7]duetoitsabilitytoperformapproximatestringmatchingwithfastandsimplebitwiseoperations,thatcanbehighlyparallelizedforhighthroughput.Wemodifiedthealgorithmtoenablesearchinglongerpatternsandtoremovethedatadependencybetweentheiterationsandprovideparallelismforthelargeamountofiterations.Unfortunately,inourstudyofBitaponexistingsystems,wefindthatCPUsandGPUsalonearebothlimitedbytheirrespectivearchitecturesandthuscannotfullyutilizetheavailablehardwareformaximalefficiency.Specifically,wefindthattheCPUimplementationofBitapisbottleneckedbycomputationsincetheworkingsetfitswithintheL1cacheandthelimitednumberofcorespreventsthefurtherparallelspeedup.TheGPUimplementationofBitapisbottleneckedbylimitedamountofprivatememoryanddestructiveinterferenceofthreadswhileaccessingthesharedmemory.Inordertoovercometheimbalanceineachoftheabovesystems,weproposeacustomacceleratorforBitapwithcharacteristicsthatfallsbetweentheCPUandGPU.Thisachievesafinerbalanceincomputeresourcesandmemoryforhigherperformanceinapproximatestringmatching.Wealsoexplorethedesignspaceofvariousaccelerators,includingprocessinginmemory.REFERENCES[1]Alkan,Can,etal."LimitationsofNext-generationGenomeSequenceAssembly,"NatureMethods,2011.[2]VanDijk,ErwinL.,etal."TenYearsofNext-generationSequencingTechnology,"TrendsinGenetics,2014.[3]Alser,Mohammed,etal."GateKeeper:ANewHardwareArchitectureforAcceleratingPre-alignmentinDNAShortReadMapping,"Bioinformatics,2017.[4]Kim,JeremieS.,etal."GRIM-Filter:FastSeedLocationFilteringinDNAReadMappingusingProcessing-in-MemoryTechnologies,"BMCGenomics,2018.[5]Baeza-Yates,Ricardo,etal."ANewApproachtoTextSearching,"CommunicationsoftheACM,1992.[6]Wu,Sun,etal."FastTextSearchAllowingErrors."CommunicationsoftheACM,1992.

Page 77: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

68

TRANSITIONOFREGULATORYFORCETOWARDTHEGENEEXPRESSIONSDURINGOSTEOBLASTCELLDIFFERENTIATION

YoichiTakenaka

KansaiUniversityTakenaka,YoichiUnderstandingthedynamicsofcelldifferentiationsystemisoneofthebigissueinbiologyandmedicine.IthelpstoacquirecellsofdiseasedorgansfrompluripotentstemcellssuchasEScellofiPScell.Toanalyzethedynamics,thetime-seriesgeneexpressionprofilesofcelllinesfromvariousorganismshavebeenmeasured.Ithasbeenmadeclearthatthemovementofthegeneexpression.However,thedynamicsofthesystemsuchasgeneregulationmechanismarenotwellrevealedyet.

Intheposter,theauthorshowsthedynamicsofgeneregulationsduringtheosteoblastcelldifferentiationprocessfrommesenchymalstemcell.Therearemanygenesandgeneregulationsthatareknowntobeactiveduringtheprocess.However,ithasbeennotreportedthattheactivitytimeofeachregulationandthestrengthoftheactivatedregulations.Theauthorproposedamethodtoelucidatethetransitionsbetweentheactivationandinactivationofgeneregulationsatthetemporalresolutionofsingletimepoints.Themethodmeasuresthestrengthofthegeneregulationsofeachtimepointbyleaveone-time-pointoutway.Thenitdecomposesthetimeseriesofthegeneexpressiondataintopartialseriesusinginformationcriterion.Finally,itdetermineswhethereachgeneregulationofeachpartialtimeseriesisactivatedorinactivated.

Thegeneexpressionprofileoftheosteoblastcelldifferentiationprocessincludes65timepointsrangedfromminus6hourto192hourwhere0houristhetimethecelldifferentiationprocessstarts.TheprofilewasdownloadedfromGenomeNetworkPlatformofNationalInstituteofGenetics,Japan.Thegeneregulatorynetworkthatisactivatedatleastonetimepointduringthedifferentiationwascomposedfromthreereviewedpapers.Itincludes19genesand22regulationswhereRunx2,thekeytranscriptionfactorassociatedwithosteoblastdifferentiation,islocatedatthecenterofthenetwork.Osx,transcriptionfactorSp7,whichservesasamakerforosteoblastdifferentiation,isatthedownstreamofRunx2.

Theresultshowstherearefourdistinctperiodsduringtheosteoblastcelldifferentiation.Andeachperiodindicateswhentheexpressionsofgenesarestronglycontrolled.

Beforethecelldifferentiationprocessstarts,Osx,BMP2,DLX5andHDAC3arethemoststronglycontrolledamongallthe65timepoints.Next,EP300iscontrolledstronglyatthefirstperiod.Then,Creb,HDAC3,HDAC4,HDAC5andOsxare.Andatthefinalperiod,Runx2,Bglap,DLX5,DHAC7andSMAD6are.Theanalysisgivesthehinttocontrolthecelldifferentiationprocess.

Page 78: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

69

METHYLATIONPROFILESOFMELANOMATOPREDICTTILS

YihsuanTsai1,NanaNikolaishviliFeinberg1,KathleenConway2,SharonN.Edmiston1,NancyE.Thomas3,JoelS.Parker4

1LinebergerComprehensiveCancerCenter(LCCC),UniversityofNorthCarolinaatChapelHill;2DepartmentofEpidemiology,SchoolofPublicHealth,DepartmentofDermatology,SchoolofMedicine,LinebergerComprehensiveCancerCenter(LCCC),UniversityofNorthCarolinaatChapelHill;3DepartmentofDermatology,SchoolofMedicine,LinebergerComprehensiveCancerCenter(LCCC),UniversityofNorthCarolinaatChapelHill;

4LinebergerComprehensiveCancerCenter(LCCC),DepartmentofGenetics,SchoolofMedicine

Tsai,YihsuanCorrelationsbetweentumorinfiltratinglymphocytes(TILs)andprolongedsurvivalhave

beenreportedinmanycancersincludingmelanoma.However,currentTILassessmentbypathologistsreviewingtheslidesectionsisnotalwaysideal.Inter-observeragreementbetweenpathologistsmaybelowiftheassessmentwasquantitative.Toachieveahigheragreement,theestimatesmaybetranslatedtocategories.HereweproposedtotrainanepigenomicsmodeltoestimatetheT-cellpopulationsinmelanomasamplesusingimmunofluorescence(IF)imageofCD3andCD8T-cells,whichprovidesamoreobjectiveestimationofTILs.

Inpreviouswork,wegeneratedmethylationprofilesfor89melanomaand78nevisamples.TohaveagoldstandardofTILestimate,80outofthe89melanomasampleswerestainedwithIFtoimageCD3,CD8,S100(melanomamarker)andanuclearcounterstain.WedefinedthefractionofCD3and/orCD8positivecellsasT-cellfractionandfounditsestimatefromtheIFimagehasthemostsignificantassociationwithpatientsurvival.Therefore,anelasticnetmodelwasbuiltusingfeaturesfromthemethylationdatasetwithT-cellfractionestimatesfromtheIFimageasresponse.Monte-Carlocrossvalidationwasperformedon2/3ofthesamplestotunetheparameters.Weidentified121CpGsinthefinalmodeltoestimateT-cellfractionwhichgaveusthehighestcorrelationwithpearsonr=0.87invalidationandr=0.91inallsamples.Wealsocomparedthismethodwithtwoothermethods.Inanaïvemethod,weidentifiedCpGswithhighmethylationlevelinexternallymphocytesamplesandlowinournevisamples.Theseprobesrepresentalymphocytemethylationsignatureonanunmethylatednevibackground.Therefore,wecalculatedthemodeofkernel-smoothedDNAmethylationdistributionatthesesitesforeachsampleasasurrogateforlymphocytefractionforthatsample.ThismethodgaveacorrelationrelativetothegoldstandardofR=0.64.Anothermethodusesreference-basedcelldeconvolutionalgorithms,whereapre-builtmethylationreferencewasusedtocomputethefractionsofeachcelltypesviathreedifferentalgorithms.Whileallthreealgorithmsgavesimilarresults,RobustPartialCorrelations(RPC)providesthehighestcorrelationwiththegoldstandard(R=0.58).

Wethenappliedourfinalmodel(121methylationmarkers)toanexternaldataset,TCGA-SKCM,toestimatetheT-cellfractions.SincethereisnogoldstandardforTCGA-SKCMdataset,weusedsurvivalasasurrogate.WefoundtheT-cellfractionestimatefromourmodelhadastrongsurvivalassociation(coxp-value=3.85e-05).WewilllookatthecorrelationofourestimationwithexpressionofT-cellgenemodulesnext.

Insummary,thepredictedT-cellfractionfromourmethylationmarkershasveryhighcorrelationwiththeestimatesfromIFimagesandit'salsohighlycorrelatedwithpatientsurvival.

Page 79: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

70

HIGH-THROUGHPUTGENETOKNOWLEDGEMAPPINGTHROUGHMASSIVEINTEGRATIONOFPUBLICSEQUENCINGDATA

BrianTsui,HannahCarter

DepartmentofMedicine,UniversityofCalifornia,SanDiegoTsui,BrianY.SequencingReadArchivecontainsmorethanonemillionrunsofpubliclyavailablesequencingdata.However,thelackofconsistentlypreprocessedsummaryandmolecularquantificationdata(forexample,geneexpressionquantificationforRNAseq)foreachsequencingrunhindersefficientBigDatainterpolation.Here,weintroduceSkymap,astandalonedatabasethatoffersasingle,multi-speciesdatamatrixincorporatingallpublicsequencingstudies.Thedatamatrixcontainsseveralomiclayers,includingexpressionquantification,allelicreadcounts,microbesreadcounts,chip-seq.Wereprocessedpetabytesofsequencingdatatogeneratethedatamatrixforeachdatatype.Wealsoofferareprocessedbiologicalmetadatafilethatdescribestherelationshipsbetweenthesequencingrunsandtheassociatedkeywords,extractedfromover3millionfreetextannotationsusingnaturallanguageprocessing.Theprocesseddatacanfitintoasingleharddrive(<500GB).Inhttps://github.com/brianyiktaktsui/Skymap,weshowcasehowonecan(1)retrieveandanalyzetheSNPsandexpressionofageneticvariantacross>250krunsinlessthanaminuteand(2)increasethetemporalresolutionfortrackinggeneexpressioninmousedevelopmentalhierarchy.

Page 80: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

71

MANTA-RAE,PREDICTINGTHEIMPACTOFGENOMEVARIANTSONTHETRANSCRIPTIONFACTORBINDINGPOTENTIALOFREGULATORYELEMENTS

RobinvanderLee,PhillipA.Richmond,OriolFornes,WyethW.Wasserman

CentreforMolecularMedicineandTherapeutics-DepartmentofMedicalGenetics-BCChildren’sHospitalResearchInstitute-UniversityofBritishColumbia-Vancouver,

CanadavanderLee,RobinInterpretingthefunctionalimpactandpathogenicityofnoncodingvariantsremainschallenging.Increasingevidencesuggestsanimportantroleforalterationsthatimpactcis-regulatoryelementsandtranscriptionfactor(TF)bindingsites(TFBSs).WearedevelopingMANTA-RAE,atoolforMutationalANalysisofTfbsAlterationsbyReconstructionofAlteredregulatoryElements.MANTA-RAEwillpredicttheeffectsofvariantsonTFBSsinregulatoryelementsinathree-stepapproach:(i)reconstructingreferenceandalternativegenotypesbasedonuser-suppliedsetsofgenomicvariantsandregulatoryelements,(ii)predictingTFBSthroughsequencescanningwithcuratedTFbindingmodelsfromJASPAR,and(iii)deltaregulatorycapacityanalysisbycomparingtheTFBSpotentialofthereferenceandalternativesequences.MANTA-RAEwillhavethecapacitytoevaluate(i)bothlossesandgainsofTFBSsand(ii)changesbeyondsinglenucleotidevariants,includingsmallinsertions,deletions,andlargercopynumberchanges.Envisionedapplicationsincludeprioritizationofvariantsfromrarediseaseandcancergenomes.Thesefeaturesshouldcontributetoricherdetectionofregulation-alteringnoncodingvariantsthatmaycontributetodisease.

Page 81: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

72

USINGQUANTITATIVEPHOSPHOPROTEOMICSTOUNDERSTANDFUNCTIONALSELECTIVITYOFRECEPTORTYROSINEKINASES

J.Watson,C.Francavilla,J.M.Schwartz

FacultyofBiology,MedicineandHealth,UniversityofManchesterWatson,JoanneCellsignallingistheprocessoftranslatingextracellularmessages,orsignals,totheinsideofthecellinordertocoordinatecellularactivity.Cellsreceivesignalsfromtheexternalenvironmentinamyriadofways,includingbythebindingofextracellularproteins,calledligands,toreceptorsonthesurfaceofthecell.Uponligandbinding,thesignalistransmittedacrossthecellsurfacebythereceptorandthesignalpropagatesthoughthecell,primarilybythepost-translationalmodificationofproteins.Forthereceptortyrosinekinase(RTK)family,thisprocessismediatedbyphosphorylation,amodificationwhichisaddedtoserine,threonineortyrosineresiduesofproteinsbytheactivityofkinasesandremovedbyphosphatases.Theadditionofphosphoryl-groupsisassociatedwithactivationofproteinfunction.LigandbindinginducesRTKdimerizationandactivationofkinaseactivity,allowingfullactivationofthereceptor.Thisinitiatesasequentialcascadeofproteinphosphorylation,ultimatelyregulatingtranscriptionfactoractivitytomodulatecellularbehavior.Anunansweredquestioninthefieldishowdifferentligandsbindingtothesamereceptorinducedistinctsignalingcascades,definedbychangesinphosphorylationdynamicsandconsequentcellularbehavior,aconceptknownasfunctionalselectivity.Thisisdemonstratedbyfibroblastgrowthfactor(FGF)-receptor2b;whenstimulatedbyeitherFGF7orFGF10anincreaseinproliferationormigrationrespectivelyisobserved.Quantitativephosphoproteomicsisapowerfulmethodforcomparingonaglobalscalethesignalingcascadesinducingthesedifferentbehaviors.Thiscomparisonwillallowustodefinepatternsofphosphorylationassociatedwithsignallingbydifferentligands,andusethistoidentifykeyphosphorylationsitesassociatedwithparticularcellbehaviors.Wehavedevelopedaworkflowtointerrogatetemporalphosphoproteomicsdatasetstodirectlycomparethephosphorylationdynamicsofcellsstimulatedbydifferentligands.Asproteinsmayhavemultiplephosphorylationsiteswhichcanhaveindependenteffectsandregulation,ourapproachconsidersdataonthelevelofboththephosphorylationsiteandassociatedprotein.Initialclusteringofphosphorylatedsiteswithsimilardynamicsovertimeisfollowedbyprotein-levelanalysisoffunctionalsimilarity,usingconnectivityingraphdatabases,enrichmentforontologicalterms,androlesinwell-studiedsignallingpathways(extractedfromKEGG).Subsequentstepsintheworkflowaimtomovetheanalysisfromtheproteintothephosphorylatedsites.Byintegratingnetwork-basedanalyseswithphosphoproteomicsdata,wewilldevelopnovelmethodsforunderstandingandvisualizingtheroleofphosphorylationinfunctionalselectivity.

Page 82: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

73

ANERISAPPLIED:SPARK-ENABLEDANALYTICSFORFULL-SCALEANDREPRODUCIBLEANNOTATION-BASEDGENOMICSTUDIES

NicholasWheeler,JeremyFondran,PennyBenchek,JonathanHaines,WilliamS.Bush

CaseWesternReserveUniversityWheeler,NicholasModerngenomicstudiesarerapidlygrowinginscale,andtheanalyticalapproachesusedtoanalyzegenomicdataareincreasingincomplexity.Genomicdatamanagementposeslogisticandcomputationalchallenges,andanalysesareincreasinglyreliantongenomicannotationresourcesthatcreatetheirowndatamanagementandversioningissues.Asaresult,genomicdatasetsareincreasinglyhandledinwaysthatlimittherigorandreproducibilityofmanyanalyses.Inthiswork,wedescribeananalysisframeworkbasedonSparkinfrastructurethatprovidesmanagement,rapidaccess,andflexibleanalysisofgenomicdata.Bystoringlarge-scalegenomicandvariantannotationresourcesalongsidegenomicdatainadistributedsystem,weprovideefficientmethodsfortestingavarietyofbiologically-drivenhypothesesforrarevariants.Usingthewell-establishedSparkframeworkandanalysesdesignedusingJupyternotebooks,weprovidetoolsthatimproveprocessingspeed,reduceuser-drivendatapartitioning,andenhancethereproducibilityoflarge-scalegenomicstudies.

Page 83: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

74

PUTTINGRELICANTHUSINITSPLACE:IMPACTOFMIXTUREMODELCHOICEONPHYLOGENETICRECONSTRUCTION

MadelyneXiao1,MercerR.Brugler2,EstefaniaRodriguez1

1DepartmentofInvertebrateZoology,AmericanMuseumofNaturalHistory,CentralParkWestat79thStreet,NewYork,NY10024;2BiologicalSciencesDepartmentNYC

CollegeofTechnology(CUNY),285JayStreet,Brooklyn,NY11201Xiao,MadelyneFirstdescribedin2006,Relicanthusdaphneaeisadeep-seaanthozoanthatlivesontheoceanfloornearhydrothermalventsintheEastPacific.Itwasoriginallyclassifiedasananemoneuntilaphylogeneticanalysisin2014calledthisclassificationintoquestion.ThetreeresultingfromamaximumlikelihoodanalysisfortheOrderActiniaria(anemones)placedRelicanthusoutsideofActiniaria;arecentanalysisofRelicanthus'mitochondrialgeneorder,however,suggestsitsmembershipamongtheanemones.Anongoingstudyseekstorelatethechoiceofmixturemodel(e.g.,maximumlikelihood,maximumparsimony,Bayesianinference)totheresultingphylogenetictree,takingintoaccounttherobustnessofthedatasetinquestion(numberofgenes,specimens,etc).Inparticular,weareinterestedintheimpactofmixturemodelchoiceontheplacementofRelicanthuswithrespecttotheactiniarians.

Page 84: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

75

RATIONALDESIGNOFNOVELSKP2INHIBITORSUSINGDEEPNEURALNETWORKS

ShuxingZhang,BeibeiHuang,LonW.Fong

IntelligentMolecularDiscoveryLaboratory,DepartmentofExperimentalTherapeutics,MDAndersonCancerCenter,Houston,TX77054

Zhang,ShuxingRecentlyithasgainedmoreandmoreattentionwithdeeplearningtechniques,whichshowsignificantpromiseingeneratingpredictivemodelsforpharmaceuticalresearch.Inthepresentstudy,weattempttodevelopdeepneuralnetworksmethodtodesignnoveltherapeuticagentsfortriple-negativebreastcancer(TNBC)bytargetingacrucialE3ligaseSkp2.TNBCrepresentsabout20%ofbreast-cancercases.Itishighlyaggressivewithpoorclinicaloutcome,andnotargetedagentshavebeenshowntobeclinicallyeffectiveintreatingTNBC.Skp2isanF-boxprotein,constitutingoneofthefoursubunitsoftheSkp1-Cullin-1(Cul-1)-F-Box(SCF)ubiquitinE3ligasecomplex.EarlierstudiesshowedthatSkp2regulatescellcycleprogressionandproliferationbytargetingubiquitinationanddegradationofitssubstratessuchascellcycleinhibitorp27.Ourin-housedataalsorevealedthatSkp2wasoverexpressedinTNBCandcorrelatedwithpoorprognosis.Inaddition,werevealedthatgeneticSkp2inactivationalsotriggeredamassivecellularsenescenceand/orapoptosisresponseinap19Arf/p53-independent,butp27-dependentmanner.Takentogether,ourresultssuggestthattargetingSkp2mayrepresentageneral"pro-senescence/apoptosis"and"anti-glycolysis"approachandisapromisingtherapeuticstrategyforTNBCdevelopmentandmetastasis.Hereinwedevelopedanoveldeepneuralnetwork(DNN)methodtopredictTNBCcellresponsestodrugsbasedsolelyontheirchemicalfeatures.Inparticularacostfunctionwasemployedtosuppressoverfitting.Wealsoadoptedan"earlystopping"strategytofurtherreduceoverfitandimprovetheaccuracyofourmodels.Currentlythesoftwarehasbeenintegratedwithageneticalgorithm-basedvariableselectionapproachandimplementedaspartofourDL4DRpackage.WeobservedthatDL4DRcouldhandlebigdatasetefficiently,significantlyoutperformingothermethodsinmodel-buildingandpredictionandobtainingbetterresultsinbigdataanalysis.WhenemployedtopredictdrugresponsesofseveralhighlyaggressiveTNBCcelllines,DL4DRproducedrobustandaccuratepredictions.Therefore,weappliedtheseTNBCmodelstorationallydesignnewsmallmoleculeinhibitorsbytargetingSkp2.AfterscreeningofmillionsofchemicalcompoundsanddesigningnovelstructuresbasedonourleadcompoundZL25,weconductedaseriesofbiochemicalandcellularstudies.TheseexperimentalexaminationsdemonstratethatthetoprankedmoleculesindeedinhibitSkp2E3ubiquitinationfunctionssignificantlyandkillTNBCcellseffectively.HenceithasbeenusedforourleadoptimizationofSkp2inhibitors,andweanticipatethatDL4DRcanbeemployedasageneraltoolforhitidentificationandleadrationaldesignforcancertherapeuticsdevelopment.

Page 85: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

76

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

POSTERPRESENTATIONS

Page 86: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

77

ODAL:AONE-SHOTDISTRIBUTEDALGORITHMTOPERFORMLOGISTICREGRESSIONSONELECTRONICHEALTHRECORDSDATAFROMMULTIPLE

CLINICALSITES

RuiDuan,MaryReginaBoland,JasonH.Moore,YongChen

DepartmentofBiostatistics,Epidemiology&Informatics,UniversityofPennsylvaniaChen,YongElectronicHealthRecords(EHR)containextensiveinformationonvarioushealthoutcomesandriskfactors,andthereforehavebeenbroadlyusedinhealthcareresearch.IntegratingEHRdatafrommultipleclinicalsitescanaccelerateknowledgediscoveryandriskpredictionbyprovidingalargersamplesizeinamoregeneralpopulationwhichpotentiallyreducesclinicalbiasandimprovesestimationandpredictionaccuracy.Toovercomethebarrierofpatient-leveldatasharing,distributedalgorithmsaredevelopedtoconductstatisticalanalysesacrossmultiplesitesthroughsharingonlyaggregatedinformation.Thecurrentdistributedalgorithmoftenrequiresiterativeinformationevaluationandtransferringacrosssites,whichcanpotentiallyleadtoahighcommunicationcostinpracticalsettings.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmforlogisticregressionwithoutrequiringiterativecommunicationsacrosssites.Oursimulationstudyshowedouralgorithmreachedcomparativeaccuracycomparingtotheoracleestimatorwheredataarepooledtogether.WeappliedouralgorithmtoanEHRdatafromtheUniversityofPennsylvaniahealthsystemtoevaluatetherisksoffetallossduetovariousmedicationexposures.

Page 87: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

78

PLATYPUS:AMULTIPLE-VIEWLEARNINGPREDICTIVEFRAMEWORKFORCANCERDRUGSENSITIVITYPREDICTION

KileyGraim1,VerenaFriedl2,KathleenE.Houlahan3,JoshuaM.Stuart3

1FlatironInstitute&PrincetonUniversity,2UniversityofCaliforniaSantaCruz,3Ontario

InstituteofCancerResearchFriedl,VerenaCancerisacomplexcollectionofdiseasesthataretosomedegreeuniquetoeachpatient.Precisiononcologyaimstoidentifythebestdrugtreatmentregimeusingmoleculardataontumorsamples.Whileomics-leveldataisbecomingmorewidelyavailablefortumorspecimens,thedatasetsuponwhichcomputationallearningmethodscanbetrainedvaryincoveragefromsampletosampleandfromdatatypetodatatype.Methodsthatcan"connectthedots"toleveragemoreoftheinformationprovidedbythesestudiescouldoffermajoradvantagesformaximizingpredictivepotential.Weintroduceamulti-viewmachine-learningstrategycalledPLATYPUSthatbuilds"views"frommultipledatasourcesthatareallusedasfeaturesforpredictingpatientoutcomes.Weshowthatalearningstrategythatfindsagreementacrosstheviewsonunlabeleddataincreasestheperformanceofthelearningmethodsoveranysingleview.Weillustratethepoweroftheapproachbyderivingsignaturesfordrugsensitivityinalargecancercelllinedatabase.CodeandadditionalinformationareavailablefromthePLATYPUSwebsitehttps://sysbiowiki.soe.ucsc.edu/platypus.

Page 88: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

79

ASOFTWAREPIPELINEFORDETERMININGFINE-SCALETEMPORALGENOMEVARIATIONPATTERNSINEVOLVINGPOPULATIONSUSINGANON-PARAMETRIC

STATISTICALTEST

MinjungKwak1,SeokwooKang2,DongwonChoo2,DohyeonLee2,JinheeLee2,SeonghyeonKim2,GiltaeSong2

1YeungnamUniversity,2PusanNationalUniveristy

Song,GiltaeAbnormalvariationsarefrequentinclonalgenomeevolutionofcancers.Suchaberrationalvariationsoftenfunctionasadriverincancercellgrowth.Understandingfundamentalevolutionarydynamicsunderlyingthesevariationsintumormetastasisstillisunderstudiedowingtotheirgeneticcomplexity.Recently,wholegenomesequencingempowerstodeterminegenomevariationsinshort-termevolutionofcellpopulations.Thisapproachhasbeenappliedtoevolvingpopulationsofmodelorganismsincludingyeast.Itissubstantialprogressinevolutionarygenomicstoexaminesequencechangesatsuchfine-scaleresolution.However,existingstatisticaltestsforanalyzingvariationtemporalchangesinmultipletime-pointsarelimitedtoidentifythefullspectrumofintermediatechanges.WedesignedanewstatisticalapproachbasedonKolmogorov-Smirnovtestandintegrateditintoasoftwaretoolfordeterminingthevariationpatternsinfine-scaletemporalresolutioninexperimentalevolutionstudies.Wevalidatedourmethodusingsimulationdatathatmimictheevolutionoffruitflypopulations.WecomparedtheresultsofoursandotherexistingmethodssuchastheCochran-Mantel-Haenszel(CMH)testandthebeta-binomialGaussianprocess(BBGP)method.Weanalyzedyeast(Saccharomycescerevisiae)W303straingenomesfrom40populationsat12time-pointsusingoursoftwarepipeline.Ourtoolsetcanbealsoappliedforidentifyingabnormalvariationchangesinotherevolvingpopulations.

Page 89: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

80

ADEEPLEARNINGAPPROACHTOIDENTIFYINGTHECELLULARCOMPOSITIONOFSOLIDTISSUEWITHDNAMETHYLATIONDATA

MeghanE.Muse1,CurtisL.Petersen1,CarmenJ.Marsit2,DianeGilbert-Diamond1,BrockC.Christensen1

1DartmouthCollege,2EmoryUniversity

Muse,MeghanDNAmethylationisinvolvedintheestablishmentofcellularidentityandmeasuredprofilesofDNAmethylationcanbeleveragedtodeconvolutetheunderlyingcellularcompositionofatissuesample.Currently,bothreference-basedandreference-freemethodsexisttoestimatetherelativeproportionofinferredcelltypesinsolidtissueusingDNAmethylationdata.However,establishingDNAmethylationlibrariesforreference-baseddeconvolutioninsolidtissuesischallenginganduseofreference-freeapproachestoestimateputativecelltypeproportionsarecomputationallyintensive,particularlyassamplesizeincreases.AsobservedpatternsinDNAmethylationcanbemoststronglyexplainedbytherelativeproportionofcelltypesinatissuesample,weinvestigatedtheutilityofimplementinganunsupervisedvariationalautoencoder(VAE)approachtolearnadefinednumberoflatentdimensionsinDNAmethylationdataandtestedtheirrelationshipwithinferredcelltypeproportionsfromareference-freeapproach.WeimplementtheTybaltmodeldevelopedbyWayetal.tolearnlatentrepresentationsofDNAmethylationdatameasuredontheIllumina450Karrayin334placentalsamples.Wecomparetheresultsofthismethodtothosefromawell-establishedreferencefreemethodforinferringtherelativeproportionsofputativecelltypes.Weconsideredmodelsthatlearned10to100latentdimensionsandselectedthemodelinwhichthegreatestnumberofputativecelltypesidentifiedbythereferencefreemethodhadmoderatecorrelation(r2>0.5)withatleastonelatentdimension.Thisresultedintheselectionofamodellearning10latentdimensions.Inthismodel,learnedlatentdimensionshadmoderatecorrelationwith5ofthe9putativeplacentalcelltypesidentifiedbythereferencefreemethodandstrongcorrelation(r2>0.7)with2putativeplacentalcelltypes.Tobetterunderstandtheunderlyingbiologyrepresentedbytheselatentdimensions,weassesstheCpGlocimoststronglycorrelatedwiththeactivationsofthese5latentdimensionsasameansofidentifyinggenesthatarerepresentativeofcellularidentity.

Page 90: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

81

DIRECTLYMEASURINGTHERATEANDDYNAMICSHUMANMUTATIONBYSEQUENCINGLARGE,MULTI-GENERATIONALPEDIGREES

ThomasA.Sasani,BrentS.Pedersen,MarkLeppert,RayWhite,LisaBaird,AaronR.Quinlan,LynnB.Jorde

DepartmentofHumanGenetics,UniversityofUtah

Quinlan,AaronDevelopinganaccurateestimateofthehumangermlinemutationrateiscriticaltoourunderstandingofevolution,demography,andgeneticdisease.Earlyphylogeneticanalysesinferredmutationratesfromtheobservedsequencedivergencebetweenhumansandrelatedprimatespeciesatparticulargenesandpseudogenes.However,aswholegenomesequencinghasbecomeubiquitous,theseestimateshavebeenrefinedusingpedigree-basedapproaches.Byidentifyingmutationspresentinoffspringthatareabsentfromtheirparents(denovomutations),itispossibletomoreaccuratelyapproximatethehumangermlinemutationrate.Toobtainaprecise,unbiasedestimateofthemutationrateinhumans,weperformeddeepwhole-genomesequencingonblood-derivedDNAfrom34oftheoriginalthree-generationCEPHfamiliesfromUtah,comprisingatotalof604individuals.Thesefamilies,whicheachcontaingrandparents(P0generation),parents(F1),andtheirchildren(F2),areconsiderablylargerthananyusedinpriorestimatesofthehumanmutationrate,andofferuniquepowertodetectandvalidatedenovomutation.Withamedianof8F2individualsperpedigree,wewereabletobiologicallyvalidateputativedenovomutationsintheF1generationbyassessingtheirtransmissiontoathirdgeneration.Usingthisdataset,wehavegeneratedahigh-confidenceestimateofthehumanmutationrate(1.31x10-8/bp/generation),observeasignificantparentalageeffectontherateofdenovomutation,andidentifywidevariabilityinfamily-specificageeffectsacrossCEPHpedigrees.Toourknowledge,thisstudyrepresentsthefirstexampleofalongitudinalanalysisoftheeffectofparentalagewithinindividualfamilies.Additionally,wehaveidentifiedrecurrentdenovovariantspresentinmultipleF2offspring,whicharelikelytheresultofmosaicismintheparentalgermline.Finally,wehavetrainedaclassificationmodelonthehigh-quality,transmitteddenovovariantsinourdataset,andusedthismodeltoidentifydenovomutationsinalargecohortofchildrenfromtheSimonsFoundationforAutismResearchInitiative(SFARI).Combiningthedenovomutationsobservedin34UtahfamilieswiththeSFARIcallset,wehavegeneratedadensegenomicmapofspontaneoushumanmutation.Weobserveregionalenrichmentofdenovovariationinthehumangenome,andexploretheroleofsequencecontext,aswellasmolecularprocesseslikerecombinationandgeneconversion,ontherateofhumanmutation.

Page 91: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

82

AVAILABLEPROTEIN3DSTRUCTURESDONOTREFLECTHUMANGENETICANDFUNCTIONALDIVERSITY

GregorySliwoski,NeelPatel,R.MichaelSivley,CharlesR.Sanders,JensMeiler,WilliamS.Bush,JohnA.Capra

DepartmentofBiomedicalInformatics,VanderbiltUniversityMedicalCenter,Nashville,

TN,USA,CenterforStructuralBiology,VanderbiltUniversity,Nashville,TN,USA;InstituteforComputationalBiology,DepartmentofPopulationandQuantitativeHealth

Sciences,CaseWesternReserveUniversity,Cleveland,OH,USA;DepartmentofBiochemistry,VanderbiltUniversity,Nashville,TN,USA;DepartmentofMedicine,VanderbiltUniversityMedicalCenter,Nashville,TN,USA;DepartmentofChemistry,

VanderbiltUniversity,Nashville,TN,USA;InstituteforComputationalBiology,DepartmentofPopulationandQuantitativeHealthSciences,CaseWesternReserve

University,Cleveland,OH,USA;DepartmentofBiologicalSciences,VanderbiltUniversity,Nashville,TN,USA;VanderbiltGeneticsInstitute,VanderbiltUniversityMedicalCenter,

Nashville,TN,USABush,WilliamGenomicdatabasesandclinicaltrialsaresubstantiallybiasedtowardsEuropeanancestrypopulations,andthisbiassignificantlycontributestohealthdisparities.Structuralbiologyhasanessentialroleininvestigatingproteinfunctionandclinicalvariantinterpretation,providingpowerfultoolsforinvestigatingtheimpactofgeneticvariantsonproteinstructureandfunction.However,studiesthatanalyzethe3Dstructureofproteinstypicallyconsiderasinglecanonicalaminoacidsequenceasrepresentativeoftheprotein.Here,weevaluatethepotentialforthissimplificationtobiasresultstowarddifferentpopulationsbyevaluatinghowwell66,971experimentallycharacterizedhumanprotein3Dstructuresrepresentthesequencediversityoftheproteinstheymodel.Thousandsofproteinstructureshaveunrepresentedalternativesequencescommonlyfoundinhumanpopulations,andAfricanancestryindividuals'sequencesaretheleastlikelytoberepresentedbyavailablestructures.Becausesequencevariabilityisoftenlimitedtoafewpositionswithinaprotein,weevaluatethelikelihoodofthesesmallchangestoinfluenceproteinfunction.Combiningexistingannotationsandcomputationalmodeling,weidentifythousandsofproteinsforwhichuseofasinglestructureasrepresentativeof"wildtype"maybiasresultsagainstcertainpopulationsorindividuals.Variantssegregatinginhumanpopulations,butunrepresentedinstructures,areobservedacrossfunctionalsitesinvolvedinstability(134disulfidebondcysteines),regulation(94phosphorylationsites),DNAbinding(322residues),smallmoleculebinding(1,463residueswith362withindrugbindingsites),andprotein-proteininterfaces(6,144residues).Wecomputationallymodelmorethan700unrepresentedvariants'effectsonproteinstabilityandprotein-proteininteraction.Changesinpredictedproteinstabilityarefoundfor28%(156)ofthe556variants,withstabilizing(41)anddestabilizing(115)effectspredicted.Of161protein-interfacevariantsmodeled,25%(41)arepredictedtoimpactprotein-proteinbinding.Thesevariantsinhumanpopulationshavepotentialtoimpactthestudyoftheirprotein'sstructureandfunction.Withthewidespreaduseofproteinstructuresinbasicscienceandclinicalvariantinterpretation,humanproteinsequenceandstructuraldiversitymustbeconsideredtoenableaccurateandreproducibleconclusionsfromstructuralanalyses.

Page 92: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

83

SEMANTICWORKFLOWSFORBENCHMARKCHALLENGES:ENHANCINGCOMPARABILITY,REUSABILITYANDREPRODUCIBILITY

ArunimaSrivastava1,RavaliAdusumilli2,HunterBoyce2,DanielGarijo3,VarunRatnakar3,RajivMayani3,ThomasYu4,RaghuMachiraju1,YolandaGil3,ParagMallick2

1TheOhioStateUniversity,2StanfordUniversity,3UniversityofSouthernCalifornia,4Sage

BionetworksSrivastava,ArunimaBenchmarkchallenges,suchastheCriticalAssessmentofStructurePrediction(CASP)andDialogueforReverseEngineeringAssessmentsandMethods(DREAM)havebeeninstrumentalindrivingthedevelopmentofbioinformaticsmethods.Typically,challengesareposted,andthencompetitorsperformapredictionbaseduponblindedtestdata.Challengersthensubmittheiranswerstoacentralserverwheretheyarescored.RecenteffortstoautomatethesechallengeshavebeenenabledbysystemsinwhichchallengerssubmitDockercontainers,aunitofsoftwarethatpackagesupcodeandallofitsdependencies,toberunonthecloud.Despitetheirincrediblevalueforprovidinganunbiasedtest-bedforthebioinformaticscommunity,thereremainopportunitiestofurtherenhancethepotentialimpactofbenchmarkchallenges.Specifically,currentapproachesonlyevaluateend-to-endperformance;itisnearlyimpossibletodirectlycomparemethodologiesorparameters.Furthermore,thescientificcommunitycannoteasilyreusechallengers'approaches,duetolackofspecifics,ambiguityintoolsandparametersaswellasproblemsinsharingandmaintenance.Lastly,theintuitionbehindwhyparticularstepsareusedisnotcaptured,astheproposedworkflowsarenotexplicitlydefined,makingitcumbersometounderstandtheflowandutilizationofdata.HereweintroduceanapproachtoovercometheselimitationsbasedupontheWINGSsemanticworkflowsystem.Specifically,WINGSenablesresearcherstosubmitcompletesemanticworkflowsaschallengesubmissions.Bysubmittingentriesasworkflows,itthenbecomespossibletocomparenotjusttheresultsandperformanceofachallenger,butalsothemethodologyemployed.Thisisparticularlyimportantwhendozensofchallengeentriesmayusenearlyidenticaltools,butwithonlysubtlechangesinparameters(andradicaldifferencesinresults).WINGSusesacomponentdrivenworkflowdesignandoffersintelligentparameteranddataselectionbyreasoningaboutdatacharacteristics.Thisprovestobeespeciallycriticalinbioinformaticsworkflowswhereusingdefaultorincorrectparametervaluesispronetodrasticallyalteringresults.Differentchallengeentriesmaybereadilycomparedthroughtheuseofabstractworkflows,whichalsofacilitatereuse.WINGSishousedonacloudbasedsetup,whichstoresdata,dependenciesandworkflowsforeasysharingandutility.ItalsohastheabilitytoscaleworkflowexecutionsusingdistributedcomputingthroughthePegasusworkflowexecutionsystem.WedemonstratetheapplicationofthisarchitecturetotheDREAMproteogenomicchallenge.

Page 93: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

84

PRECISIONMEDICINE:IMPROVINGHEALTHTHROUGHHIGH-RESOLUTIONANALYSISOFPERSONALDATA

POSTERPRESENTATIONS

Page 94: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

85

CLASSPRIORESTIMATIONANDQUANTIFICATIONOFTHELOSSANDGAINOFRESIDUEFUNCTIONUPONMUTATION

ShantanuJain1,JoseLugo-Martinez2,MarthaWhite3,MichaelW.Trosset4,PredragRadivojac1

1NortheasternUniversity,2Carnegie-MellonUniversity,3UniversityofAlberta,4Indiana

UniversityJain,ShantanuStandardalgorithmsforbinaryclassificationassumeaccesstolabeleddatafromboththepositiveandthenegativeclass.However,inmanybiologicalproblems,labeledexamplesfromoneoftheclasses(say,negatives)isnotavailable.Inthisscenario,apositive-unlabeledlearner,thatreliesonpositiveandunlabeledexamplesonly,isused.Surprisingly,thisstrategyleadstoanoptimalscorefunction.However,pickinganoptimalthresholdtoconstructthefinalclassifierrequirestheknowledgeoftheclasspriors,theproportionofpositivesandnegativesintheunlabeleddata.Iwill1)presentannonparametricalgorithmforestimationoftheclasspriorsbasedonamixturemodelformulation,2)elucidatetheassumptionsnecessaryforthealgorithm,and3)deriveaclasspriorpreservingunivariatetransformfordimensionalityreductionandtherebyobtainapracticalalgorithmformultivariatedata.Moreover,Iwillalsodemonstratehowtheposteriorcanbeestimatedusingtheestimateoftheclasspriors.Iwillfurtherextendtheseresultstoamoregeneralsettingwheresomeoftheexampleslabeledaspositiveareinfactnegative.Iwillpresentexperimentalresultsdemonstratingtheefficacyofouralgorithm,comparingitwiththestateoftheartmethodsandotherbaselinemethodsonmanyrealandsyntheticdatasets.Lastly,Iwillpresentabiologicalapplicationofthisworktoestablishthelossandgainofresiduefunctionasacommonmechanismforinheriteddiseases.

Page 95: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

86

PREDICTIONOFTIMETOINSULINUSINGCLINICALANDGENETICBIOMARKERSINTYPE2DIABETESPATIENTS

RikkeLinnemannNielsen1,LouiseDonnelly2,AgnesMartineNielsen3,KonstantinosTsirigos1,KaixinZhou2,BjarneErsboell3,LineClemmensen3,EwanPearson2,Ramneek

Gupta1

1DepartmentofBioandHealthInformatics,TechnicalUniversityofDenmark;2MedicalResearchInstitute,UniversityofDundee,UnitedKingdom;3DepartmentofApplied

MathematicsandComputerScience,TechnicalUniversityofDenmarkNielsen,RikkeLinnemannTypeIIdiabetes(T2D)isacomplexmetabolicdisorderwheretheriskofafastorslowdiseaseprogressionishighlydependentofeachindividual.Therefore,itisusefultoidentifypredictivebiomarkersfordiabetesprogressionandrelevantpatientsubgroupscharacteristicsthatmayassistclinicaldecisionsinT2Dtreatmentmanagement.Inthisstudy,weobtainedelectronicmedicalrecordsfromacohort-basedpopulationinTayside,UKregisteredfromDecember1994toSeptember2015.Usinglife-styledata,anthropometry,biochemicaldata,drug-prescriptiondataandgeneticfeaturesfromelectronicmedicalrecordson6871T2Dpatients,artificialneuralnetworkmodels(ANN)weretrainedwithtwo-layercross-validationtoclassifyT2Dpatients’progressiongivenaspatients’timetoinsulin(TTI).TTIwasdefinedasthefirstdayofinsulintreatmentorastheclinicalneedforinsulin(HbA1c>8.5%treatedwithtwoormorenon-insulindiabetestherapies).PredictiontargetswereTTIwithinyear1,3or5fromthetimeofdiagnosis.GeneticvariantswereselectedbypriorknowledgeonT2DandglycemictraitpredispositionSNPsfrom~80MimputedSNPs.Predictionmodelswereinvestigatedforunderstandingwhichbiomarkersweremostpredictiveofprogression.ANNswithalldataexceptgeneticvariantspredictedTTIforyear1(0.92±0.02,0.83±0.04,0.86±0.04forAUC,sensitivityandspecificity,respectively),year3(0.82±0.03,0.71±0.05,0.78±0.04)andyear5(0.78±0.02,0.66±0.02,0.76±0.02).MostimportantfeaturesincludedHbA1c,GADantibodyconcentrationandthetypeofdiabetestherapypatientswerereceivingatthetimeofconfirmeddiagnosis.Integrationofgeneticvariants,usingaforwardselectionstrategy,resultedinaslightlyimprovedperformanceinallthreemodels;year1(0.94±0.01,0.83±0.03,0.90±0.01),year3(0.85±0.02,0.72±0.05,0.80±0.02),andyear5(0.80±0.03,0.68±0.04,0.78±0.02).WearecurrentlyexaminingtherobustnessoftheselectedSNPsbybuildinganensembleofmultiplemodelswithdifferentfeaturesandinvestigatingifthegeneticfeaturesarerelevanttospecificpatientsubgroups,aswellascarryingoutfurtherlongitudinalworkwiththephenotypetoincludemoreinformationaboutagivenpatientusinglongitudinalpatientinformationacrossirregularsampledtimepoints.

Page 96: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

87

PATHOGENICITYANDFUNCTIONALIMPACTOFINSERTION/DELETIONANDSTOPGAINVARIATIONINTHEHUMANGENOME

KymberleighA.Pagel1,DannyAntaki2,MatthewMort3,DavidN.Cooper3,JonathanSebat2,LiliaM.Iakoucheva2,SeanD.Mooney4,PredragRadivojac5

1IndianaUniversity,2UniversityofCaliforniaSanDiego,3CardiffUniversity,4Universityof

Washington,5NortheasternUniversityRadivojac,PredragAnindividualhumanexomemaycontainhundredsofprotein-codinginsertion/deletions(indels)anddozensofproteintruncatingvariants.Accuratedifferentiationbetweenphenotypicallyneutralanddisease-causinggeneticvariationremainsanopenproblem,particularlyamongtheexcessofindelvariantsbroughtaboutbyrecentdevelopmentsinsequencingtechnologies.Indelandproteintruncatingvariantsexhibitdiverseimpactonproteinsequence,fromasingleresiduetothedeletionofentirefunctionaldomains.Wepresentmachinelearningmethodstopredictthepathogenicityandthetypesoffunctionalresiduesimpactedbyloss-of-functionandindelvariation.Themodelsshowgoodpredictiveperformanceandthepotentialtoidentifyeffectuponresidespredictedtoeffectstructuralandfunctionalfeatures,includingsecondarystructure,intrinsicdisorder,metalandmacromolecularbinding,post-translationalmodifications,andcatalyticresidues.WeidentifystructuralandfunctionalmechanismsthatareimpactedpreferentiallybygermlinevariationfromtheHumanGeneMutationDatabase,recurrentsomaticvariationinCOSMIC,anddenovovariationfromindividualswithneurodevelopmentaldisorders.Collectively,thepathogenicitypredictionandpredictedfunctionaleffectsprovideaframeworktofacilitatetheinterrogationofindelandproteintruncatingvariants.

Page 97: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

88

DETECTINGPOTENTIALPLEIOTROPYACROSSCARDIOVASCULARANDNEUROLOGICALDISEASESUSINGUNIVARIATE,BIVARIATE,ANDMULTIVARIATE

METHODSON43,870INDIVIDUALSFROMTHEEMERGENETWORK

XinyuanZhang1,YogasudhaVeturi1,ShefaliS.Verma1,WilliamBone1,AnuragVerma1,AnastasiaM.Lucas1,ScottHebbring2,JoshuaC.Denny3,IanStanaway4,GailP.Jarvik4,DavidCrosslin4,EricB.Larson5,LauraRasmussen-Torvik6,SarahA.Pendergrass7,JordanW.Smoller8,HakonHakonarson9,PatrickSleiman9,ChunhuaWeng10,DavidFasel10,Wei-

QiWei3,IftikharKullo11,DanielSchaid11,WendyK.Chung10,MarylynD.Ritchie1

1UniversityofPennsylvania,2MarshfieldClinic,3VanderbiltUniversity,4Universityof

Washington,5KaiserPermanenteWashingtonHealthResearchInstitute,6NorthwesternUniversity,7GeisingerHealthSystem,8MassachusettsGeneralHospital,9Children's

HospitalofPhiladelphia,10ColumbiaUniversity,11MayoClinicZhang,XinyuanThelinkbetweencardiovasculardiseasesandneurologicaldisordershasbeenwidelyobservedintheagingpopulation.Diseasepreventionandtreatmentrelyonunderstandingthepotentialgeneticnexusofmultiplediseasesinthesecategories.Inthisstudy,wewereinterestedindetectingpleiotropy,orthephenomenoninwhichageneticvariantinfluencesmorethanonephenotype.Marker-phenotypeassociationapproachescanbegroupedintounivariate,bivariate,andmultivariatecategoriesbasedonthenumberofphenotypesconsideredatonetime.HereweappliedonestatisticalmethodpercategoryfollowedbyaneQTLcolocalizationanalysistoidentifypotentialpleiotropicvariantsthatcontributetothelinkbetweencardiovascularandneurologicaldiseases.Weperformedouranalyseson~530,000commonSNPscoupledwith65electronichealthrecord(EHR)-basedphenotypesin43,870unrelatedEuropeanadultsfromtheElectronicMedicalRecordsandGenomics(eMERGE)network.Therewere31variantsidentifiedbyallthreemethodsthatshowedsignificantassociationsacrosslateonsetcardiac-andneurologic-diseases.Wefurtherinvestigatedfunctionalimplicationsofgeneexpressiononthedetected"leadSNPs"viacolocalizationanalysis,providingadeeperunderstandingofthediscoveredassociations.Insummary,wepresenttheframeworkandlandscapefordetectingpotentialpleiotropyusingunivariate,bivariate,multivariate,andcolocalizationmethods.Furtherexplorationofthesepotentiallypleiotropicgeneticvariantswillworktowardunderstandingdiseasecausingmechanismsacrosscardiovascularandneurologicaldiseasesandmayassistinconsideringdiseasepreventionaswellasdrugrepositioninginfutureresearch.

Page 98: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

89

PHARMGKB:THEAPIANDINFOBUTTONS

MichelleWhirl-Carrillo1,RyanM.Whaley1,MarkWoon1,RussB.Altman2,TeriE.Klein3

1DepartmentofBiomedicalDataScience,StanfordUniversity;2DepartmentofBioengineering,MedicineandGenetics,StanfordUniversity;3DepartmentofBiomedical

DataScienceandMedicine,StanfordUniversityAlena,OrlenkoWithPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.PharmGKBknowledgeisdefinedbyadatamodel,storedinadatabase,andaccessedthroughtheApplicationProgrammingInterface(API).TheAPIsuppliesdatatothewww.pharmgkb.orgwebsitewhichisthemostcommonwayforpeopletoqueryandviewtheknowledgecontentofPharmGKB.Additionally,thePharmGKBAPIsupportstheInfoButtonspecificationwhichisusedontheClinGenwebsiteaswellasbyothersintheirEHRsystems.TheInfobuttonImplementationGuideprovidesastandardmechanismforEHRsystemstosubmitknowledgerequeststoknowledgeresourcesovertheHTTPprotocolforpoint-of-caredecisionsupport.PharmGKBprovidesthisaspartofitsstandardAPI,usingRXCUIs(RxNormconceptuniqueidentifiers)andnormalizationofdrugnames,andreturnsHTML,withplanstosupportJSONandXML(https://api.pharmgkb.org/infobutton.html).ForInfoButtons,theEHRdisplaysabuttonfortheusertoclickthatwillquerythePharmGKBanddisplayinformationdirectlyintheEHRapplication.Insidetheapplication,alistofdrugidentifiers(RXCUIs)arecreatedandthensubmittedtotheInfoButtonservice’sURL.TheURLthenreturnsareportinHTMLthatisdisplayedtotheEHRuserdirectlyintheinterface.ThePharmGKBInfoButtonimplementationdisplaysdosingguidelineannotations,druglabelannotations,andtop-levelclinicalannotationsthatarerelevanttothedrugidentifiersprovidedbytheuser.WemonitortheAPIrequestlogstoassessusage.

Page 99: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

90

SINGLECELLANALYSIS–WHATISINTHEFUTURE?

POSTERPRESENTATIONS

Page 100: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

91

INTRATUMORHETEROGENEITY(ITH)METRICOFCIRCULATINGTUMORCELL(CTC)-DERIVEDXENOGRAFTMODELSINSMALLCELLLUNGCANCER.

YuanxinXi1,C.AllisonStewart2,CarlM.Gay2,HaiTran2,BonnieGlisson2,JohnV.Heymach2,PaulRobson3,LaurenA.Byers2,JingWang1

1DepartmentofBioinformaticsandComputationalBiology,TheUniversityofTexasMDAndersonCancerCenter,Houston,TX,USA;2DepartmentofThoracic/Head&NeckMedicalOncology,TheUniversityofTexasMDAndersonCancerCenter,Houston,TX,

USA;3TheJacksonLaboratoryforGenomicMedicine,Farmington,CT,USAXi,YuanxinSmallcelllungcancer(SCLC)isanaggressivemalignancycharacterizedbyrapidonsetofplatinum-resistance.Onceconsideredahomogeneousdisease,recentanalysesofSCLChaveshownintra-tumoralheterogeneity(ITH)associatedwithtreatment-resistance.Tofurtherinvestigatethecontributionofintra-tumoralheterogeneity(ITH)toclinicaloutcomesinSCLC,weprofiledsingle-cellRNAseqexpressionofcirculatingtumorcell(CTC)-derivedxenograft(CDX)modelsfromSCLCpatientsthatrecapitulatepatienttumorgenomicsandresponsetoplatinumchemotherapy.Characterizingtheheterogeneityoftumorcellsubpopulationsremainsabioinformaticschallengeinanalyzingsingle-cellRNAseqdataforCTC-derivedCDXmodels,mostlyduetolackofanaccuratemethodtoquantifythecomplexityoftumorcellexpressionpatternsatsinglecellresolutionanddiscoverthecorrelationswithdifferenttumordevelopmentortreatmentresponsemechanisms.Inthisstudy,wedevelopedavariance-basedmetrictomeasuretheoverallheterogeneityoftumorcellpopulationsbasedonsinglecellRNAseqexpressionprofilesWeappliedthismetrictotheChromium10xsinglecellRNAseqdataof4SCLCCDXmodelsthathasdifferentplatinumtreatmentresponses,andidentifiedaglobalincreaseofintra-tumorheterogeneityinplatinum-resistantmodelscomparedwithplatinum-sensitivemodels,anddefinedvariablegeneexpressionasareliablehallmarkofincreasingtherapeuticresistanceinSCLC.Furthergenesetenrichmentanalysis(GSEA)ofthetreatmentnaïveandrelapsedsamplesrevealedthattheincreasedITHmetricwereassociatedwithmultipleconcurrentresistancemechanisms,suggestingthatresistancetomolecularlytargetedtherapiesdoesnotfollowapredictable,reproduciblepathwaywithinthesameCDXmodel.Theseresultsshowedthatthevariance-basedITHmetricsuccessfullycharacterizedtheresistanceassociatedheterogeneityincreasesinSCLCtumorcells,andmorebroadly,itprovidesageneralpurposequantitativemeasurementofthetumorcellsubpopulationheterogeneityinsinglecellanalysis.

Page 101: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

92

WHENBIOLOGYGETSPERSONAL:HIDDENCHALLENGESOFPRIVACYANDETHICSINBIOLOGICALBIGDATA

POSTERPRESENTATIONS

Page 102: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

93

QUANTIFYINGTHEIDENTIFIABILITYOFINDIVIDUALSUSINGASPARSESETOFSNPS

PrashantS.Emani,GamzeGursoy,MarkB.Gerstein

DepartmentofMolecularBiophysicsandBiochemistry,YaleUniversityEmani,PrashantTherecentrevolutioninhigh-throughputgenomicshasledtotheproliferationofpubliclyavailabledatasetsanddatabasesenablingqueriesonindividualgenotypes,whetherintheformofreferencegenotypes,singlenucleotidepolymorphism(SNP)"beacons"orfunctionalgenomicsdatawithsignificantidentifying-informationleakage.ItisthereforeofinteresttoquantifythepowerofasparsesetofSNPstorevealtheidentityofanindividual,asthiswouldhelpdeterminetheprivacyrisksofmakingparticulardatasetsaccessibletotheresearchcommunity.Suchanevaluationwouldenableaprincipledcost-benefitanalysistodeterminetherightbalanceofpublicandprivatedataaccessibility.Wepresentatoolforsuchquantificationbasedonwell-establishedHiddenMarkovModels(HMMs)ofchromosomalrecombination(LiandStephens,2003):thecentralideaistoexplorethestatespaceofreferencehaplotypesfromadatabase,andfindthetrajectorythroughthisspacethatbestdescribesobservedgenotypes.ThetoolenablessimpleSNP-basedkinshipanalysisbytheidentificationofqueriedindividualsasa"mosaic",orpiecewisecombination,oftheinputreferencehaplotypes,whileallowingforgenotypingerroranddenovomutation.Theoutputincludesthebest-fitreferencehaplotypetrajectories,whichforasmallsetofinputSNPs,couldresultinseveralequal-probabilitypossibilities.However,eveninthiscase,inferencescouldbemadeonthemembershipofanindividualincertainhaplotypecommunitiesbasedontheirenrichmentwithinthebest-fittrajectories.Thisapproachparallelslinkagedisequilibrium-(LD-)basedmethods,butavoidsanyassumptionsofpopulationhomogeneityasitdoesnotrequireexplicitcalculationofallelefrequenciesorSNPcorrelations.Itis,ofcourse,dependentontheavailabilityofasufficientlyrichdatabasetoensurethatthequeriedindividualisatleastrelated.Thislimitationisfastbecominganon-issue,however,withtheconstantexpansionofpopulation-levelgenotypedatabases.Theresultsofrepresentativesimulationsusingthe1000GenomesreferencedatasetwithrandomlychosencommonSNPs(allelefrequency>0.05)fromasinglechromosomeare:searchingforagenotypedindividualamong100phasedgenotypes(=200referencehaplotypes)yieldedaccuratediscoverywithasfewas12SNPs;includingamutationrateof0.1–0.2increasedthenumberofSNPsrequiredforreliableidentificationto~25;simulationsofmosaicsamplescomposedoftworeferenceindividuals,eachcontributinghalfoftheSNPs,suggestedthat~30SNPscouldbesufficienttoidentifythetwoconstituentindividuals.Thesenumberswouldlikelybeimproveduponwhenallchromosomesarecombined.Insummary,weprovideatoolthatcanservetoidentifyobservedgenotypeseitherknowntobemembersofadatabase,orrelatedtoindividualswithinthedatabase,undervaryingconditionsofmutationandrecombinationrateswithnoassumptionsaboutthepopulation-specificallelefrequenciesofSNPs.

Page 103: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

94

TRANSCRIPTOMICSUMMARYSPLICINGDATAMAYLEAKPERSONALPRIVATEINFORMATIONBYCOMPUTATIONALLINKAGETOTHEGENOMICVARIANTS

ZhiqiangHu1,MarkB.Gerstein2,StevenE.Brenner1

1UniversityofCalifornia,Berkeley,2YaleUniversityBrenner,StevenSharinggenomeswithoutpersonalidentifiershasbeencommonpracticeinbiologicalandmedicalresearch.However,recentstudiesrevealedtheriskofre-identifyingpeoplefromtheirgenomes,orattachedquasi-identifiers,suchassex,birthdate,andzipcode.Moreover,consumerdatabasesnowcontaingeneticdataformillionsofindividuals;arecentstudysuggestedthatmostAmericanshavedetectablefamilyrelationshipsinthesedatabases,allowingtheiridentificationusingdemographicidentifiers.Theadditionalavailabilityofanindividual’sRNA-seqdatahasimplicationsforprivacy,asitmaybelinkedtothegenome,potentiallyallowingtheperson’sprivacytobebreached.Forexample,sexandethnicityinformationmaybeinferreddirectlyfromagenome,andthestudymayprovideazipcode.ThisgenomecouldbelinkedtoRNA-seqdatafromadiabetesstudywithattachedbirthdatesandincome.Thesecombinedquasi-identifiersmayuniquelyidentifytheperson,andthestudyrevealstheperson’sdiabetesdiseasestatus.NEWPARAGRAPHRNA-seqreadscontaingeneticvariants,andthuscanbedirectlylinkedtothegenome.Toavoidthisrisk,someresearchersnowreleasegeneexpression,isoformexpressionandexonreadcountdatainsteadoftherawsequencingreads.NEWPARAGRAPHHowever,geneexpressioncanalsobelinkedtothegenomebasedonexpressionQTLs(eQTLs).UsingaBayesianframework,wefoundthatitisfeasibletopredictgenomicvariantsfromsummarizedsplicingdata.BasedonGTExsplicingQTLs(sQTL)data,usingrelativeisoformexpressionfrom15genes,wecouldidentifythetargetgenomewithinapoolcontaininghundredsofindividualswith>90%accuracy.WecouldalsolinkRNA-seqdatafromacertaintissueorcelltypetothegenomeusingparameterstrainedfromasimilartissue,indicatingparameterstrainedonmajortissuesmayenablethelinkageofRNA-seqfromalltypesofhumansamplestothegenome.ByquantitativelymeasuringtheinformationleakagefromeachsQTL,wefoundthatitispossibletoidentifythetargetgenomeofanRNA-seqdatasetfrommillionsofindividualsusingmoresQTLs.ResearchershaveproposedtoeliminatetheriskofeQTL-basedlinkingattacksbyaddingnoisetothegeneexpressions,basedontheobservationthatonlyafewgenesenablelinkage.However,ourframeworksuggestedthattherearenowmanymoresuchgenesthanpreviouslyreported.Wefindthatexpressiondataenablesthere-identificationoftargetgenomefromapoolcontainingbillionsofgenomes.Ourresultimpliesthatmitigationofthelinkingriskbyaddingnoisewouldseverelyabrogatebiologicalentityofthedata,sincethedatawillnolongerbebiologicallymeaningfulwhenoverhalfofgeneexpressionsaremodified.Ourstudyalsoimpliesthatotherkindsof“omic”data,includingDNAmodificationandproteinmetabolitelevels,mayalsoleakgenomeprivacy.

Page 104: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

95

WORKSHOP:MERGINGHETEROGENEOUSDATATOENABLEKNOWLEDGEDISCOVERY

POSTERPRESENTATION

Page 105: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

96

TOSEARCHAHETNET...HOWARETWONODESCONNECTED?

DanielHimmelstein1,MichaelZietz1,KyleKloster2,MichaelNagle3,BlairSullivan2,CaseyS.Greene1

1UniversityofPennsylvania,2NorthCarolinaStateUniversity,3PfizerInc.

Himmelstein,DanielNetworkswithmultiplenodeandrelationshiptypes,calledhetnets,provideanidealdatastructuretointegratebiomedicalknowledge.Oneexample,Hetionet,has47thousandnodesof11typesand2.25millionrelationshipsof24typescoveringdiseases,smallmoleculedrugs,andtheentitiesinbetween,whichrangefrommolecular(e.g.genes&pathways)toorganismal(e.g.sideeffects&symptoms).WearebuildingasearchengineforhetnetconnectivityontheHetionetnetwork.Wewanttoprovideuserswithanimmediateanswertothequestion,"howarethesetwonodesconnected?"Weapproachthisproblembyidentifyingtypesofpathswhereasourceandtargetnodeareconnectedmorethanexpectedbychance(i.e.basedontheirdegreesalone).WhilestillaworkinprogressonGitHub(https://github.com/greenelab/hetmech),theprojectisnearingaprototypewebapplication.Reachingthisstagerequiredseveralmethodologicaladvances.First,weimplementedefficientpathcountingalgorithmsinPythonbasedonmatrixmultiplication.AnewHetMatdatastructureprovidesefficienton-diskstorageofhetnets,optimizedformatrixoperationsandcaching.Wedesignedanovelgamma-hurdlemethodforassessingthenulldistributionofadegree-weightedpathcount(DWPC)foragivenpairofsource-targetnodedegrees.Usingthesetechniques,wecomputedmeasuresofconnectivitybetweenallnode-pairsforthe2,205typesofpaths(metapaths)withlength≤3inHetionetv1.0(availableathttps://doi.org/cww7).Now,weaimtoexposethehiddeninformationthesemeasurescapture:namely,howaretwonodesrelatedintermsofmetapaths,individualpaths,andintermediatenodes.Stopbyourpostertolearnmoreanddiscusshowthissearchenginecanhelpyouperusebiomedicalknowledgeorinterpretyourcomputationalpredictions.

Page 106: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

97

WORKSHOP:TEXTMININGANDMACHINELEARNINGFORPRECISIONMEDICINE

POSTERPRESENTATION

Page 107: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

98

LITVAR:MININGGENOMICVARIANTSFROMBIOMEDICALLITERATUREFORDATABASECURATIONANDPRECISIONMEDICINE

AlexisAllot,YifanPeng,Chih-HsuanWei,KyubumLee,LonPhan,ZhiyongLu

NationalLibraryofMedicine,8600RockvillePike,Bethesda,MD20894Lu,ZhiyongTheidentificationandinterpretationofgenomicvariantsplayakeyroleinthediagnosisofgeneticdiseasesandrelatedresearchintheeraofprecisionmedicine.Tostayuptodate,researchersmustprocessanever-increasingamountofnewpublications.Thistaskiscomplicatedbytwofactors.First,authorsusemultipleabbreviationstorefertothesamevariant.Forexample,"A146T","c.436G>A",andAla146Thrallrefertothesamevariantrs121913527.Second,thesameabbreviation(e.g.,p.Ala94Thr)canrefertodifferentvariantsindifferentgenes.AsimplesearchonPubMedwouldthusreturnonlyasubsetofallrelevantarticlesforthevariantofinterest,whilereturningmanyarticlesthatareirrelevant.

Tohelpscientists,healthcareprofessionals,anddatabasecuratorsfindthemostup-to-datepublishedvariantresearch,wehavedevelopedLitVar,anovelwebserverforlinkinggenomicvariantdataintheliteraturewithintuitiveUI(1).Specifically,itemploysasuiteofstate-of-the-artentityrecognitiontoolsasitsbackendprocessingmethod.LitVarcombinesrobustandadvancedtextminingwithdataintegrationsfromPubMed(>28millionabstracts)andPubMedCentralSubset(>2.7millionfull-lengtharticles)toimprovebothsensitivityandspecificity.AsofMay2018,therearemorethan2millionuniquevariantsinoursystem,associatedwithhundredsofthousandsofpublicationsfromPubMedandPMCOpenAccessSubset.WhilecomparingwithPubMed,LitVarachievedanincreaseinsensitivityandspecificity.Forexample,withasearchof"rs113488022",noresultscanbefoundinPubMed,butover6,000articlesarereturnedbyLitVar.Ontheotherhand,asearchfor"H199R"onPubMedwillreturnarticleswherethisvariantpresentsbothonthegeneLIN28B(PMID:22964795)andCFTR(PMID:15084222),whilethedisambiguationprocessofLitVarwillallowtheusertoselectpreciselythevariant(andgene)ofinterest.

Tofurtherassistusers,LitVarallowsmatchingpublicationstobefilteredbyjournal,type,dateorpartofpublication.Moreover,publications'popularityintimecanbevisualisedasazoomablehistogram.Inadditiontothewebsite,LitVarprovidesRESTAPIstoallowuserstodisambiguateatextualqueryintoalistoftopmatchingvariants,orperformlarge-scaleanalysis,byretrievingpublicationslinkedtohundredsofsupplieddbSNPidentifiersinonequery.

LitVarisnowintegratedindbSNP.ThenewlyaddedlinkallowsusersnotonlytoviewmorepublicationsthanwiththelinktoPubMed,butalsotoassessthecontext(sentenceandrelateddiseases,chemicalsandothervariants)inwhichthevariantappearsineachpublication.

LitVarispubliclyavailableathttps://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.

[1]Allot,A.,Peng,Y.,Wei,C.H.,Lee,K.,Phan,L.andLu,Z.(2018)LitVar:asemanticsearchengineforlinkinggenomicvariantdatainPubMedandPMC.NucleicAcidsRes.

Page 108: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

99

AUTHORINDEX

A

Abyzov,Alexej·65Adusumilli,Ravali·11,83Alkan,Can·67Allot,Alexis·98Altman,RussB.·89Anand,Shankara·31Andrechek,EranR.·60Antaki,Danny·87Ausavarungnirun,Rachata·67Azizi,Shekoofeh·34

B

Bae,Ho·32Baird,Lisa·81Baldwin,Edwin·53Beam,AndrewL.·33Beaulieu-Jones,BrettK.·33Bedi,Rishi·45Benchek,Penny·73Berger,Bonnie·29Berghout,Joanne·20Best,Aaron·7Bielinski,SuzetteJ.·46Bingöl,Zülal·67BlackIII,JohnLogan·46Bobak,Carly·47Bobe,JasonR.·28Boerwinkle,Eric·46Boland,MaryRegina·3,77Bone,William·21,88Boussard,SolineM.·48Boyce,Hunter·11,83Bradford,Yuki·39BrainSeqConsortium·49Brenner,StevenE.·94Brugler,MercerR.·74Burke,EmilyE.·49Bush,WilliamS.·73,82Byers,LaurenA.·91

C

Capra,JohnA.·82Carter,Hannah·10,36,70Castorino,John·62Chen,Bin·17,60Chen,Rachel·7,27Chen,Yang·23Chen,Yong·3,77Cheng,Li-Fang·14Choo,Dongwon·63,79Chow,Cheryl-Emiliane·64Chrisman,BriannaSierra·19Christensen,BrockC.·47,80Chung,WendyK.·21,88Clemmensen,Line·86Cohen,WilliamW.·37Collado-Torres,Leonardo·49Conway,Kathleen·69Cooper,Bruce·54Cooper,DavidN.·87Coukos,George·10Crosslin,David·21,88Cule,Madeleine·15

D

Dabbagh,Karim·64DeFreitas,JessicaK.·28De,Supriyo·50Deep-Soboslay,Amy·49DeJongh,Matthew·7Denny,JoshuaC.·21,88DePristo,Mark·15DeSantis,Todd·64Ding,DaisyYi·2Dinu,Valentin·59Doerr,Megan·43Donnelly,Louise·86Dow,Michelle·36Draghici,Sorin·51Duan,Rui·3,77Dudley,JoelT.·28

Page 109: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

100

E

Edmiston,SharonN.·69Emani,PrashantS.·93Engelhardt,BarbaraE.·14Ersboell,Bjarne·86

F

Fan,Jungwei·20Fasel,David·21,88Feinberg,NanaNikolaishvili·69Fondran,Jeremy·73Fong,LonW.·75Fornes,Oriol·71Fraenkel,Ernest·41Francavilla,C.·72Friedl,Verena·5,78Friend,Derek·27Furukawa,Tetsu·52

G

Garijo,Daniel·11,83Gasdaska,Angela·27Gay,CarlM.·91Genolet,Raphael·10Gerstein,MarkB.·93,94Gfeller,David·10Ghose,Saugata·67Ghosh,Debashis·66Gibbs,RichardA.·46Gil,Yolanda·11,83Gilbert-Diamond,Diane·80Glanville,Jacob·45Glicksberg,BenjaminS.·28,60Glisson,Bonnie·91Gold,MaxwellP.·41Gonzalez-Hernandez,Graciela·9Gordon,Max·4Gorospe,Myriam·50Graham,Kareem·64Graim,Kiley·5,78Grayson,Shira·43Greene,CaseyS.·24,96Greenside,Peyton·15Gupta,Ramneek·86Gursoy,Gamze·93

H

Haas,DavidW.·39Haines,Jonathan·73Hakonarson,Hakon·21,88Han,Jiali·53Han,Wontack·16Harari,Alexandre·10Harris,KimberleyJ.·46Hebbring,Scott·21,88Henry,Christopher·7Hernandez-Boussard,Tina·48Heymach,JohnV.·91Hill,JaneE.·47Himmelstein,Daniel·96Ho,Irvin·18Hoffmann,ThomasJ.·61Houlahan,KathleenE.·5,78Hovde,Rachel·45Hsieh,Elena·66Hu,Qiwen·24Hu,Zhiqiang·94Hu,ZhiyueTom·17Huang,Beibei·75Huang,Haiyan·17Huang,Kun·25Hyde,ThomasM.·49

I

Iakoucheva,LiliaM.·87Iribarren,Carlos·61Iwai,Shoko·64

J

Jaffe,AndrewE.·49Jain,Shantanu·35,85Jarvik,GailP.·21,88Jiang,Yuexu·6Jin,Qiao·37Johnson,KippW.·28Johnson,Travis·25Jorde,LynnB.·81Jung,Jae-Yoon·19Jung,Kenneth·2

Page 110: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

101

K

Kale,DaveC.·2Kalesinskas,Laurynas·31Kalsi,GurpreetS.·67Kang,Byungkon·57Kang,Seokwoo·79Kaserer,Bettina·55Khan,AlyA.·18Kiefel,Helena·64Kim,Dokyoon·57Kim,JeremieS.·67Kim,Seonghyeon·63,79Kim,WooJoo·58Klein,TeriE.·48,89Kleinman,JoelE.·49Kloster,Kyle·96Kober,KordM.·54Kohane,IsaacS.·33Krauss,RonaldM.·61Krunic,Milica·55Kullo,Iftikhar·21,88Kwak,Minjung·79Kwon,Sunyoung·32

L

Larson,EricB.·21,88Lau,Denise·18Le,TrangT.·56Lee,Byunghan·32Lee,Dohyeon·63,79Lee,Garam·57Lee,JaeKyung·58Lee,Jinhee·63,79Lee,Kyubum·98LeNail,Alexander·41Leppert,Mark·81Levine,JonD.·54Li,Binglan·39Li,Haiquan·20,53Li,Jianrong·20Li,Kevin·7Li,Qike·20Lim,Sooyeon·58Linan,Margaret·59Lindsey,William·7,27Liu,Zheng·8Liu,Ke·60

Lcontinued

Liu,Xiang·37Lu,Zhiyong·98Lucas,AnastasiaM.·21,39,88Lugo-Martinez,Jose·85Lussier,YvesA.·20

M

Machiraju,Raghu·11,83Magge,Arjun·9Mallick,Parag·11,83Marsit,CarmenJ.·80Mastick,Judy·54Mayani,Rajiv·11,83McKinney,BrettA.·56Medina,MarisaW.·61Meiler,Jens·82Miaskowski,Christine·54Miller,JasonE.·61Mooney,SeanD.·87Moore,Abigail·62Moore,JasonH.·3,56,77Moore,Sarah·43Mort,Matthew·87Mousavi,Parvin·34Müllauer,Leonhard·55Muse,MeghanE.·47,80Mutlu,Onur·67

N

Nagle,Michael·96Newbury,PatrickA.·17,60Nguyen,Tin·51Nguyen,Tuan-Minh·51Nho,Kwangsik·57Nielsen,AgnesMartine·86Nielsen,RikkeLinnemann·86Noh,JiYun·58Nori,AnantV.·67

O

O'Malley,A.James·47Oh,Dongpin·63Ouyang,Zhengqing·23

Page 111: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

102

P

Pagel,KymberleighA.·87Panda,AmareshC.·50Parker,JoelS.·69Paskov,KelleyMarie·19Patel,Neel·82Paul,Steven·54Pearson,Ewan·86Pedersen,BrentS.·81Pendergrass,SarahA.·21,88Peng,Yifan·98Petersen,CurtisL.·80Peterson,Amy·49Peterson,SandraE.·46Pfohl,Stephen·2Phan,Lon·98Poplin,Ryan·15Prasad,Niranjani·14Pyke,RachelM.·10Pyman,Blake·34

Q

Quinlan,AaronR.·81

R

Radivojac,Predrag·35,85,87Rajpurohit,Anandita·49Ramola,Rashika·35Ramsey,StephenA.·8Rasmussen-Torvik,Laura·21,88Ratnakar,Varun·11,83Ravichandar,JayamaryDivya·64Reiman,Derek·18Renwick,Neil·34Richmond,PhillipA.·71Risch,Neil·61Ritchie,MarylynD.·21,39,48,61,88Robson,Paul·91Rodriguez,Estefania·74Roychowdhury,Tanmoy·65Rudra,Pratyaydipta·66Rutherford,Erica·64

S

Sahinalp,Cenk·29Salit,Marc·15Sanders,CharlesR.·82Sarker,Abeed·9Sasani,ThomasA.·81Schaid,Daniel·21,88Scherer,Steven·46Schwartz,J.M.·72Scotch,Matthew·9Sebat,Jonathan·87Sedghi,Alireza·34Semick,StephenA.·49SenolCali,Damla·67Sha,Lingdao·18Shafi,Adib·51Shah,NigamH.·2Shin,JooHeon·49Sicotte,Hugues·46Simmons,Sean·29Simpson,Chloe·2Sivley,R.Michael·82Skola,Dylan·36Sleiman,Patrick·21,88Sliwoski,Gregory·82Smail,Craig·31Smoller,JordanW.·21,88Sohn,Kyung-Ah·57Song,Giltae·63,79Srivastava,Arunima·11,83Stanaway,Ian·21,88Stewart,C.Allison·91Stockham,NateTyler·19Straub,RIchardE.·49Stuart,JoshuaM.·5,78Subramanian,Lavanya·67Subramoney,Sreenivas·67Sullivan,Blair·96Suver,Christine·43

T

Takenaka,Yoichi·68Tan,Timothy·18Tanigawa,Yosuke·31Tao,Ran·49Tao,Yifeng·37

Page 112: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

103

Tcontinued

Theusch,Elizabeth·61Thomas,NancyE.·69Tintle,Nathan·7,27Titus,AlexanderJ.·47Toh,Hiroyuki·52Tran,Hai·91Trosset,MichaelW.·85Tsai,Yihsuan·69Tsirigos,Konstantinos·86Tsui,Brian·36,70Tyryshkin,Kathrin·34

U

Ulrich,WilliamS.·49Urbanowicz,RyanJ.·56

V

Valencia,Cristian·49vanderLee,Robin·71Varma,Maya·19Venhuizen,Peter·55Verma,Anurag·21,39,88Verma,ShefaliS.·21,39,88Veturi,Yogasudha·21,39,88Vitali,Francesca·20vonHaeseler,Arndt·55

W

Wagner,Jennifer·43Wall,DennisPaul·19Wang,Duolin·6Wang,Haohan·12,37Wang,Jing·91Wang,Junwen·59Wang,Liewei·46Wang,Tongxin·25Washington,PeterYigitcan·19Wasserman,WyethW.·71Watson,J.·72Weeder,Benjamin·8Wei,Chih-Hsuan·98Wei,Qi·8Wei,Wei-Qi·21,88

Wcontinued

Weinberger,DanielR·49Weinmaier,Thomas·64Weinshilboum,Richard·46Weissenbacher,Davy·9Weng,Chunhua·21,88Westra,Jason·27Whaley,RyanM.·89Wheeler,Nicholas·73Whirl-Carrillo,Michelle·48,89White,Martha·85White,Ray·81Wilbanks,John·43Williams,Cranos·4Woon,Mark·89Wu,Yonggan·64Wu,Zhenglin·12

X

Xi,Yuanxin·91Xiao,Madelyne·74Xing,EricP.·12,37Xu,Dong·6

Y

Yao,Yao·8Ye,Wenting·37Ye,Yuting·17Ye,Yuzhen·16Yin,Fei·53Yoon,Sungroh·32Yu,Thomas·11,83

Z

Zawistowski,Matthew·27Zeng,William·60Zhang,Jie·25Zhang,Shuxing·75Zhang,Xinyuan·21,88Zhang,Yuping·23Zhou,Jin·53Zhou,Kaixin·86Zietz,Michael·96Zook,Justin·15