helping scientists do science•volatile and velocity –evolving, reanalysis •variant...

47
Helping Sciensts do Science Confessions of an Applied Computer Scienst Professor Carole Goble CBE FREng FBCS The University of Manchester, UK [email protected] and the myGrid Team hp://www.mygrid.org.uk ACM womENcourage Europe 01 March 2014, Manchester, UK Examples from

Upload: others

Post on 09-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Helping Scientists do Science

Confessions of an Applied Computer Scientist

Professor Carole Goble CBE FREng FBCSThe University of Manchester, [email protected]

and the myGrid Teamhttp://www.mygrid.org.uk

ACM womENcourage Europe 01 March 2014, Manchester, UK

Examples from

Page 2: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

e-Science, Computational ScienceScientific Computing

• Support global scientific collaboration, enable large scale resource, tools and results sharing, assist scientific processing, avoid unnecessary repeated work.

• Accelerate scientific discovery, improving scientific productivity, stimulate technological innovation.

• Cope with scales and speed of scientific innovation and data.

http://research.microsoft.com/en-us/collaboration/fourthparadigm/

Page 3: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Models of Human Physiology

VPH-Share

Next Generation Genome Sequencing based Patient DiagnosticsEagle Genomics

Astronomy & HelioPhysics analytical pipelines

HELIO, Wf4ever

Document

Preservation Digitisation

SCAPE

Systems Biology of Micro-Organisms data & model management

SysMO

Drug discovery, small molecules, targets, compounds OpenPHACTS

Ecological Niche and Population Modelling

BioVeL

Computational, data intensive

problemsmanaged

worlds / in the wild

Metagenomics

Ocean Sampling Day

Page 4: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Distributed Computing

Linking up different codes, resources, platforms & e-infrastructure.

Social Computing

Sharing different science stuff. Collaborations between different scientists.

Knowledge Computing

Describing, finding and linking up different data, models, methods, science stuff…

Page 5: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

ComputerScience

Software Engineering

Scientific InformaticsComputational Science

THEORY PRACTICEAPPLICATIONfundamental applied

PRODUCT(Open Source)

PRINCIPLE

Science

“USE CASE”

Page 6: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Biodiversity marine monitoring and health assessment

ecological niche modelling

Data Intensive ScienceCollaborative Science

Pilumnus hirtellusEnclosed sea problem (Ready et al., 2010)

Sarah Bourlat

Page 7: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting
Page 8: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

http://www.catalogueoflife.org/

Lots of different resources

Page 9: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Lots of different software

Including other researcher’s software

Page 10: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a

Computational science: ...Error…why scientific programming does not compute.

Page 11: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Aleksandra Pawlik

Devasena Inupakutika

Ghaithaa Manla

Page 12: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting
Page 13: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Data discoveryData discovery

Data assembly, cleaning, and refinement

Data assembly, cleaning, and refinement

Ecological Niche Modeling

Ecological Niche Modeling

Statistical analysisStatistical analysis

Analytical cycle

Data collectionData collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Page 14: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

• Volume• Variety

– Integrative Multi-*– Multi-step, repetitive process

• Volatile and Velocity– evolving, reanalysis

• Variant– Comparable: sweep across data

& parameters– different experiments.

• Valid– Reporting & Replication

Data discoveryData discovery

Data assembly, cleaning, and refinement

Data assembly, cleaning, and refinement

Ecological Niche Modeling

Ecological Niche Modeling

Statistical analysisStatistical analysis

Analytical cycle

Data collectionData collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Page 15: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

data, parameters, configurations

E.Science laboris

Scientific Workflow Management Systems• Coordinate execution of

services and codes.• Dataflow at scale• Reusable variants• Comparable repetitions

• Import own data / codes + public libraries/datasets

• Honour hosted codes

• Shield operational complexity• Auto-document provenance• Package up dependencies

Page 16: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

data, parameters, configurations

E.Science laboris

Scientific Workflow Management Systems

Page 17: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

•Visual Programming•Computational Lambda Calculus•Process mining•Adaptive & parallel computing•Cloud computing•SOA, Semantic Web Services•Automated wrapping of codes•Data integration, knowledge modelling•Reporting & tracking•…..

E.Science laboris

Tools

Standards

Services

Page 18: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Design tools

and practic

es for morta

ls

Shielding vs

Obfuscation

Auto assembl

y. Guided as

sembly

Fragilitychanges in infrastructures & resourcesautomated adaption

Reproducible executions…Packaging, preservation & portability

Workfows as commodities

Security

Provenance

Page 19: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting
Page 20: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

d1

S0

d2

S1

w

S2

y

S4

df

d1'

S0

d2

S1

z w

S'2

y'

S4

df'

(i) Trace A (ii) Trace B

• How, What, Where, When, Why, Who

• Trace lineage, Process history, Accountability

• The link between computation and results

• Transparency

[Woodman et al, 2011]

Page 21: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Social

[Cheney, 2012]

Provenance Week June 9-13, 2014 , Cologne http://provenanceweek.dlr.de

Page 22: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Mind the Provenance Gap

Summarisation, Labelling,Distillation

Fine grainBigA White box

One SystemSpecial toolsCollectionA Big Graph

What do I cite?What did I do?N Black boxes

Many SystemsMy Lab BookAnalyticsSmart in situ Presentation

Sarah Cohen-BoulakiaPinar Alper

Juliana FreireSusan Davidson

Page 23: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Primacy of Method (a la Code)What code was run? – which executable?

Where can I get hold of the code / script / workflow?

How does it work? What are its assumptions?

How do I version it? What’s its licence?

How fragile is it? How do we repair it?

Who authored it? How do I cite it?How do I get credit for it?

Which options did you set? What was the input data?

Page 24: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Primacy of Methods

Page 25: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Systems BiologySharing and interlinking Methods, Models, Data…

Data

ModelArticle

ExternalDatabases

Metadata

Page 26: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

experimentalists, modellers, X-informaticians, computational Xs, software engineers, computer scientists, systems administrators, resource providers, tool builderssocial scientists, librarians, curators

Social ComputationStoring, Sharing and Reusing data, methods, models, between collaborating and competing scientists

e-Laboratories, collaboratories, VREs, repositories

An ego-system

Page 27: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

“Startup-Like” Balance Innovation with Usefulness

Page 28: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

[Josh Sommer]

Knowledge Turns amongst Scientists

Page 29: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

E.Science Sociam

• HCI, Human Factors• Security• Data and Knowledge

management• Distributed Computing• Digital Preservation• Social Machines• Information Systems• Social Science

Platforms

Standards

Services

Policies/Practices

Page 30: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Scientists Share Strategically and Sparingly

Data Hugging

Sharing

Creep

Data Flirting

Data Voyerism

Collaborating to Compete

RewardCost Risk

Tools

Page 31: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Computer Scientist

Software Engineer

Social Engineer

Page 32: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

credit is like ♥ not £$€¥• Universal identity• Inter-platform tracking • Auto-tracking• Credit recommendation• Credit recognition• Standards• Tools• Socio-Technical development• Credit for Developers!!

Page 33: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

credit is like ♥ not £$€¥

Liz Lyon

Kaitlin Thaney

Heather Piwowar

Katy Borner

Victoria Stodden

Christine Borgman

Anita De Waard

RebeccaLawrence

• Universal identity• Inter-platform tracking • Auto-tracking• Credit recommendation• Credit recognition• Standards• Tools• Socio-Technical development• Credit for Developers!!

Page 34: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Describing X well enough to share it, find it, understand it, reuse it, combine it with Y & Z

X, Y, Z = data, models, methods, workflows, services, codes, *

Page 35: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Knowledge Computation•Accurate, intelligible and comparable descriptions•Data interoperability•Machine readable metadata

Semantic technologies, Ontologies, Linked Data, Data schema

Page 36: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Semantic DescriptionDescribing and linking data in terms of

shared concepts, relationships and identifiers

Data

object propertydata propertysubClassOf

Ontology

Person

Organization

Place

Statename

birthdatebornIn

worksFor state

namephone

namelivesIn

CityEvent

ceolocation

organizer

nearby

startDate

endDatetitle

isPartOf

postalCode

Column 1 Column 2 Column 3 Column 4 Column 5Bill Gates Oct 1955 Microsoft Seattle WA

Mark Zuckerberg May 1984 Facebook White Plains NYLarry Page Mar 1973 Google East Lansing MI

[Taheriyan et al

adapted]

Page 37: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Environment Ontology shared, controlled, structured vocabulary for biomes, environmental features, and environmental materials.

Common source of names and synonyms for matching, linking, searching, indexing, structuring data

Page 38: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Web Ontology Language OWL

Page 39: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

E.Science Semantii

• Database theory• Query Answering• Description Logics• Reasoners• Artificial Intelligence• Automated annotation• Data integration & Search• Crowd sourcing

knowledge• Knowledge elicitation

Tools

Standards

Resources

Page 40: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Scalability

Changes in data & metadata

Crowd sourced Annotation

Rich knowledge representation and reasoning

Pay as you Go Integration

Adding Semantics to DataCapturing metadata

security

Semantic ETL pipelines

Smart Search

Page 41: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Curation Knowledge Ramps

Populoushttp://www.rightfield.org.uk

Katy Wolstencroft

Page 42: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

http://www.economist.com/printedition/2013-10-19

Page 43: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Lemberger T Mol Syst Biol 2014;10:715

©2014 by European Molecular Biology Organization

Born Reproducible | Exchangeable | ReusableRich descriptions

Open & Available

Transparent Method

Re-executable

Page 44: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Research Objects• Bundles and relate multi-hosted digital resources of a

scientific experiment or investigation using standard mechanisms

• Exchange, Releasing paradigm for publishing

http://www.researchobject.org/

Jun Zhao

Page 45: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

Research is like software. Release research

Jennifer Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

Page 46: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

"To be a proper professional you need to think about the context and motivation and justifications of what you're doing... once you see how important computing is for life you can't just leave it as a blank box and assume that somebody reasonably competent and relatively benign will do something right with it."

Karen Spärck Jones

IEEE Spectrum, Computer Science, A Woman's Work May 2007

Page 47: Helping Scientists do Science•Volatile and Velocity –evolving, reanalysis •Variant –Comparable: sweep across data & parameters –different experiments. •Valid –Reporting

• myGrid– http://www.mygrid.org.uk

• Taverna– http://www.taverna.org.uk

• myExperiment– http://www.myexperiment.org

• BioCatalogue– http://www.biocatalogue.org

• Biodiversity Catalogue– http://www.biodiversitycatalogue.org

• Seek– http://www.seek4science.org

• Rightfield– http://www.rightfield.org.uk

• Open PHACTS– http://www.openphacts.org

• Wf4ever– http://www.wf4ever-project.org

• Software Sustainability Institute– http://www.software.ac.uk

• BioVeL– http://www.biovel.eu

• Force11– http://www.force11.org