advancing discovery science predictive, evidential and ... · advancing discovery science...

25
1 Advancing Discovery Science Predictive, Evidential and Meta Analytical Methods Michel Dumontier Associate Professor of Medicine Stanford Center for Biomedical Informatics Research Stanford University AAAI Fall Symposium Accelerating Science: A Grand Challenge for AI November 18, 2016

Upload: trinhkien

Post on 06-Aug-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

1

AdvancingDiscoverySciencePredictive,EvidentialandMetaAnalytical

Methods

MichelDumontierAssociateProfessorofMedicine

StanfordCenterforBiomedicalInformaticsResearchStanfordUniversity

AAAIFallSymposiumAcceleratingScience:AGrandChallengeforAI

November18,2016

Newdiscoveriesarebeingmadeby“researchparasites”usingotherpeople’sdata

2

A common rejection module (CRM) for acute rejection across multiple organs identifies novel therapeutics for organ transplantationKhatri et al. JEM. 210 (11): 2205DOI: 10.1084/jem.20122709

towardsautomatedknowledgediscovery

3Dumontier:Maastricht

Development of an intelligent system for scientific inquiry using the totality of web-accessible data and services.

Challenge

Efficientanduniformaccess todistributed,versionedand selfdescribing

data andservicesforreproducible analyses

5

Afundamentalinabilitytoeasily queryandminestructuredknowledge

6Dumontier:Maastricht

LinkedOpenDatausesthewebasaframeworktoshareandlinksemanticallyannotateddata

7

Descriptionsofthedata(metadata)areoftenmessy,incompleteormissing

Vastnumbersofschemasandterminologiesmakeforaconfusing setofchoices

@micheldumontier::StanfordAI:11-05-2016 9

10

metadatacenter.org

NIHCOMMONS

Making it Easier, Possibly Even Pleasant, to Author Interoperable Experimental Metadata

smartAPI:semanticandselfdescribingRESTAPIs

@micheldumontier::CEDAR:14-04-2016 11SWAGGER

Field auto-suggestionw/conformance

Value auto-suggestioncallsthesmartAPI field-specificsuggestionservice

UnifyAPIdatawithLinkedOpenData

14

15

FAIR:Findable,Accessible,Interoperable,Re-usable

Appliestoall digitalresources andtheirmetadata

Challenge

Datawillalwaysbedescribedusingdifferentschemasandvocabularies.

Doesthatstillmatter?Canweautomate theintegrationofdata?

16

Schemaandvocabularyheterogeneityisachallengefordataretrievalanddatamining

17

NewMehtods forDataIntegration

• Manyelegantsolutionsforentityorconceptmappings,buttheseonlyofferanincompletesolutionwhencombinedwithschemas

• Needtolearn robusttransformation patterns– Subsumption,Similarity,Analogy,ML,Probability

• Evaluatetheseinthecontextofusecases– Queryanswering– Datamining– Prediction

18

Challenge

Canwescalethevalidationof interestingfindings?

19

20

Most published research findings are false- John Ioannidis, Stanford University

Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124.

TheProblemofReproducibilityinScientificResearch

• Non-reproducibilityofratesof65–89% inpharmacologicalstudiesand64% inpsychologicalstudies.

• Problemofmultipletestinginhigh-dimensionalexperiments.Forgeneexpressionanalyses,26of36(72%)genomicassociationsinitiallyreportedassignificantwerefoundtobeover-estimatesofthetrueeffect whentestedinotherdatasets

• Analyticfocushasbeenonsignificance (P)valuesratherthaneffectsizeorindependentverification.

21

SupportandGapAnalysisusingOpenDataandOpenServices

• HyQue isaplatformforknowledgediscoverythatusesdataretrievalcoupledwithcontradiction-basedautomatedreasoningtovalidatescientifichypotheses

• Leveragessemantictechnologiestoprovideaccesstolinkeddata,ontologies,andsemanticwebservices

• Usespositiveand negativefindings,capturesprovenance

• Weighsevidenceaccordingtocontext• Usedtofindaginggenesinworm,

assesscardiotoxicity oftyrosinekinaseinhibitors

HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.

Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. 22

23

Text co-mentionsGene ontology annotationsDifferential gene expression…

Access to open dataLinked to ontologiesRepresented with a universal languageQueried using a portable languageResults stored with their provenance

ScalingValidation

• Automatedexperimentation(Adam&Eve)• Crowdsourcing

– Asasimpletask– Asanopenproblem

• Automateddiscoveryofviablemethods• Automatedimplementationofviablemethods

24

KeyResearchChallenges• Scalable,shared,fault-tolerant,andreadilyre-deployable

frameworksforarchiving andprovidingversionedandmaximallyFAIRbiomedical(meta)data

• Scalablemethodsfortheprospective andretrospectiveauthoring,assessment,andrepairofmetadata.

• Scalablemethodstolearnequivalentrepresentationalpatterns• Scalableframeworksforopen,transparent,reproducible and

recurrent analysis andmeta-analysis ofFAIRresearchdata.• Methodstoidentifyinvestigative biases andknowledge gaps• Scalableandreliablemethodsfortheprioritizationscientific

hypothesesusingevidencegatheredacrossscalesandsources• Scalablemethodsforvalidation ofresearchfindings.

25