bosc2012 goble
DESCRIPTION
Keynote for BOSC (Bioinformatics Open Source Conference) 2012 at Long Beach, CA, USA, 14 July 2012 by Carole GobleTRANSCRIPT
If we build it will they come?
Prof Carole Goble FREng FBCS [email protected]
BOSC, Long Beach, July 14 2012
http://www.mygrid.org.uk
Improving Knowledge Turning, Enabling Reuse and Reproducibility
[Josh Sommer]
Est. 2001
Keep the vision, modify the plan
Computational MethodsScientific workflows. Distributed web/grid/cloud servicesThird party, independent service reuseData pipelines and analytics
Volunteerist Human Computation e-Laboratories - social collaboration and sharing environments for scientific artefacts. Libraries and Catalogues. Asset safe havens, sharing, reuse.
Knowledge Acquisition ToolsSemantic technology, semantic applications, research objects, executable papers.Data/Metadata curation & reuse
OWL
POPULOUS SKOSEdit
LGPL
BSD
Various
The Taverna Suite of Tools
Client User InterfacesGUI WorkbenchWorkflow Repository
Service Catalogue Third Party Tools
Web Portals
Activity and Service Plug-in Manager
Provenance Store
Workflow Server
Open Provenance
Model
Secure Service Access
Workflow Engine
Virtual Machine
Programming and APIs
Command Line
5820 members, 304 groups, 2415 workflows, 604 files and 229 packs (research objects)
Community HavenSharing ResourceSocial Collaboration
http://www.myexperiment.org
http://wiki.myexperiment.org/index.php/Galaxy
Contribute, Find and understand Web Services
Curate, review and comment
Learning resource
Monitor Services Cloud Registry
BioCatalogue: crowd curation of web services
2295 REST and SOAP services, 169 service providers. 674 members, 27 countries
Find, exchange and interlink, preserve, publish data, models, publications, SOPs & analyses.
ISA Compliant
Launch and validate models and analyses:JWS Online
Find experts, colleagues and peers.
Gateway to public tools and resources, e.g. BioModels
SysMO: 16 consortia, 110 institutes, 1600+ assets, 350+ members
livSYSiPS
GerontoSys
Public SEEKhttp://www.seek4science.org
Sharing Platform & Trusted Service
Standards & ContentGovernance & Policy
Preservation &Publication Platforms
Gateway
Software & ToolsOpen source
Knowledge Network Skills & Community Building
Comp Sci Research Platform
Laissez-faire Philosophy• Bottom Up
– Emergent & scruffy (to a degree…)
• Reliant on third party contributions – Non-prescriptive, non-interfering and
flexible– We make no content ourselves….
• Part of a wider ecosystem– Other services, data, tools, platforms,
people…
• Inspired by social environments • Scarred by top-down, dictated,
tech-driven and unused monoliths
Never underestimate how scruffy third
party stuff can be
How often metadata is missing and messy if
left to its own devices…
Liberty through Limitations
People say they want flexibility. They prefer the simplicity of order and will
adapt to adopt.
http
://w
ww
.flic
kr.c
om/p
hoto
s/he
llaoa
klan
d/31
3736
0455
/
Who is they?
• Jobbing Bioinformatician?
• Expert Bioinformatician?
• Sys admin?• Service provider?• Application
developer?• Tool developer?• Biologist?
PharmacogenomicsGWAS
Trypanosomiasis in African Cattle
Systems Biology of Micro-Organisms
Drug Toxicity
(OpenTox Project)
Metagenomics
Physiopathology of the human body Medical Imaging
Genetic differences between breeds of cattle
The Virtual Liver
Who is THEY?
Distributed Groups & Independent Lone rangers
Long tail, Disconnected from data providers and each other, emergent,
Organised, Planned, Strong connections with resource providers and each other.
Independents….Bovine
Trypanosomiasis Consortium
Consortia
Individuals
ResearchGroups
Specialise or Diversify?
• Flexibility and extensibility -> customised Software and Services, Cookie cutter
• Widen adoption• Spread risk, extend
resourcing streams
• Cross development alignment and coordination
• More communities to build, nurture, support and sustain
• Core Drift and Bashing
Helio-Physics
Document Preservation
BioDiversity Astronomy
Social Science Engineering: JPL, NASAFLOSS
BioDiversity Virtual e-Laboratory
Biodiversity Services
WebDaV Data ManagementWebDaV Data Management
BLAST,Hmmer,MrBayes,
Blast, PAML,EMBOSS,…
BLAST,Hmmer,MrBayes,
Blast, PAML,EMBOSS,…
R R
Synonyms Synonyms
Execution environment
Catalogues /Repositories
BioSTIFBioSTIF
Google RefineGoogle Refine CSW
openModelleropenModeller
WPS / WCPSWPS / WCPS
Auth
entic
ation
/ A
utho
risati
onAu
then
ticati
on /
Aut
horis
ation
Ope
nSe
arch
Ope
nSe
arch
Prov
enan
cePr
oven
ance
TavernaWorkbench
TavernaWorkflow Engine
and Server
Grid, Cloud, etc.
Phylogenetic
Taxonomic
Visualisation
Modelling/GeoProcessing
Platf
orm
sPl
atfor
ms
http://www.biovel.eu
Who is We? The ego-system
biologists, bioinformaticians, biodiversity informaticians, astro-informaticians, social scientistsmodellers, software engineers, computer scientists, systems administrators,
resource providers
Methods & PracticeCS Research Production
My World
Science
Research Objects Reproducibility, Integrated Publishing,
Carriers of Research Context
• Citation• Aggregation • Annotation • Provenance• Lifecycle• Preservation • Decay• Sharing• Stereotypical Profiles• Services and APIs• myExperiment 2.0 Encodings: Semantic Web: LOD, VoID,
OAI-ORE, AO/OAC, SIOC, OPM/PROV, Memento….
http://www.wf4ever-project.org
Production
Research
Applications
TrainingPublishing
Community Community
So if we build it will they come?Be useful for something: immediately, continuously, responsivelyBe usable by somebody: user experience, worth the effort, adoption pathSome of the time: as part of a big picture
Under promise and over deliverAcquire Critical Mass
Four things that drive adoption of software or service.
1. Added value– Do something that couldn’t do before or now do faster,
gain competitive advantage, improve productivity, scale up
2. New asset– Get or retain access to something important (data,
method, technique, skills, knowledge)
3. Keep up with the field. A Community.– Future-proof my practice, New skills and capacity,
there is a vibe about it and I’ll be left out
4. Because there is no choice– Business depends on it, its mandated, its de facto
mandated
Seven things that hinder adoption of software or service
1. Not enough added value• It doesn’t solve a problem or not as well or as cheaply
as something else, no content or the right content2. Not fit for take-on. It doesn’t work!
• No: help, guides, documentation, manuals, examples, content, templates, portability, migration / legacy support, easy installation, virtual machines, testing, stability, version control, release cycle, roadmap, sustainability prospect, way of introducing my favourite component/data/environment.
3. No Time or Capacity to take on• To learn, migrate personal legacy
code/data/applications, no pathway/ramp to adoption• Training and special system needs
It Sucks
Software practices
“As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software”
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute.
Software Stewardship
Software sustainabilitySoftware practicesSoftware depositionLong term access to softwareCredit for software Licensing advice
Open licensesReproducible Research Standard, Victoria Stodden, Intl J Comm Law & Policy, 13 2009
“Better Science through Superior Software” – C Titus Brown
Seven things that hinder adoption of software or service
4. Cost – Of disruption, of long-term ownership
5. Exposure to Risk. – First to take-up, Support and sustainability dependencies,
fear of scrutiny, misrepresentation or being scooped,
6. No Community– Support and comfort
7. Changes to work practices– Obligations, unclear or unenforced reciprocity protocols.
It’s too costly
• It sucks but it’s the only thing around
• It’s ace but it’s one of many, too late in the game and not enough to switch
• Tipping point is likely not technical
Betamax vs VHS
Bonus Hinder Never heard of it.
We’ve built it but we haven’t told anyone.
• Make noise…physically and virtually• Customer and Contributor Relationship Building• Self-supporting communities, multi-level marketing
• Highly Resource Intensive
Adoption Intentions Be careful what you wish for
• Incidental– “I built it for myself, and stuck it out there”
• Familial– “I built it for people just like me”
• Fundamental– “I built it for others, many who are not like me”
Open Innovation: Development and Contentyou are not alone. you can’t do it all alone
motivate & enable others to fill gaps “App Store Style”software, services, content, examples….
• Really Interoperate. Don’t tweak.• Be Simple and Standard.• Be Helpful. Be Set up. Be
reusable. Be Smart Galaxy+Taverna/myExperiment
• Others will develop on top of you. But don’t assume they will re-contribute or tell you.
• It’s much harder than you think.• It’s unequal.
Family
Friends
Acquaintances
Strangers
Family Friends Acquaintances StrangersMoore's technology adoption curve
Ladder Model of OSS Adoption (adapted from Carbone P., Value Derived from Open Source is a Function of Maturity Levels)
[FLOSS@Sycracuse]
"it's better, initially, to make a small number of users really love you than a
large number kind of like you" Paul Buchheit
paulbuchheit.blogspot.com
What’s in it for the PAL?– Long tail: Money, kudos,
special support, special resources, skills, reputation building, influence, stuff they can’t do alone, CV building
– Consortia: co-funded• Who is a PAL?
– Post-docs, Post-grads, Administrators, Developers
– PI: protector/champion• PAL handlers
– Customer Relationship Manager, Nanny and Mediator, Scientist
PALS: Building FriendshipsIntelligence, Guidance, Advocacy, Evangelism, Market Research
Do not under-estimate…
The power of the sprint / *-athon / fest / drinking
The power of a whizzy interface. Even for plumbing.
The importance of supporting and propagating best practice
Participatory, EmbeddedDesign-Build-Run-Manage is Good
Act LocalThink Global
Reality Check
The Bigger Picture
Eat your own Dog Food
Participatory DesignWork Together on a Real Problem
Project PIsData controlOwn databasesJust enough
exchange.Visibility limitationsProject dependence
PALsSpreadsheets.Yellow Pages.SOPsUnderstanding
standardsCurating.Examples.Safe HavenProject
independence
3 Years later 15/16 consortia abandoned their own systems and
went with the SEEK system.
FundersData sharing
Data standards
A database
Long term preservation
If you build it will they come and contribute?
ControlledClosedAccess
Participation
Lone scholars
Private Groups
Trusted Collaborators
Public scientists
Citizens
[based on an idea by Liz Lyon]
Open
Cooperation? Coordination? Collaboration? Integration? Evolution and entropy models
[Andrew Su]
Critical mass spiral: 90:9:1
Driven by needs of and benefits to the scientist, rather than top down policies.
Content tipping point
Trust, Fame and Blame: Reciprocity, Competition, Contribution and Use
Victoria Stodden, The Scientific Method in Practice: Reproducibility in the Computational Sciences Feb 9, 2010 MIT Sloan Research Paper No. 4773-10, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1550193
Nature 461, 145 (10 September 2009)
• Scooping, Scrutiny and Misinterpretation• Curation Cost• Poor quality• Reputation / Asset Economics• Public Peer Pressure
Reciprocity Sucks• Flirting• Hugging• Controlled Sharing • Voyerism• Poor feedback / credit
CarrotsHarness Competitiveness
Pride• Reputation: Cult, Credit & Attribution for all
Protection• Just enough Sharing, Licensing & Liability • Quality, Peer review, Metadata
Preservation• Safe havens and Sunsets (project churn)
Publishing / Release• Citability, Supporting Exchange
Productivity• Availability of assets, help, capability,
ramps
http://www.rightfield.org.uk
Adoption Ramps
Instrument familiar, widely-used tools
Spreadsheets and Email
Adoption Stealth• Data at home promise with
automated harvesting• Sharing creep, Incremental
metadata, Low obligations• URL upload in BioCatalogue• Web Service “come as you
are” take-on in Taverna
• Metadata prompting, Right tools, right time, right place
• Service collections & Packaged services
Be vigilant• PAL burn-out and
over familiarity• Unadjusted over-
user accommodation• Drifting apart and not
keeping it fresh• Step back, observe
and adapt/intervene!• So relieved to get a
community….• Instrument adoption
and observation
Participatory Development is a mutual long term relationshipNot flirty speed dating, One night stand, Crush, Me Me Me
Urgent-Important • Technical bog down,
operational burn-out• Little things that are
important but don’t seem that urgent…
• Dominant projects• Not-software content• It all takes way longer
than you think• Simplicity drift
Participatory Development is a mutual long term relationshipNot flirty speed dating, One night stand, Crush, Me Me Me
Beware Version 2 Syndrome!Version 2 Syndrome
The Jam-based Adoption Model
aka Added Value
Value Proposition Return On Investment
http://delicious-cooks.com/photos/raspberry-jam/04/
What’s is the Special Jam? What is your Jam Value Chain and for Who?
What:
SysMO: safe haven, spreadsheet tooling, linking SOPs, models and data, examples
Taverna: power, adaptability and myExperiment
Who:
Focused on contributors and experts
Provider-consumer balance
Functionality-Simplicity Syndrome
Changing Who - Challenging baked-ins
Jam today and more, better Jam tomorrow
Just Enough Jam, Just in Time not Just in Case
* Feature Creep Conundrum * Big Picture Paradox
* Core vs Specifics Syndrome * Content Decay Dilemma
* Working to working Stability Stress
Customised Specific Jam beats Generic
* Flexibility/Functionality – Simplicity Conundrum
* Diversification Dilemma
Where is my Jam? Jam for All
• What are WE (platform providers, Software builders, Community builders and Service providers) getting out if it?
• Need credit and interest too.• Altmetrics
http://james.howison.name/pubs/HowisonHerbsleb2011SciSoftIncentives.pdf
Howison and Herbsleb, Scientific Software Production: Incentives and Collaboration, CSCW 2011, March 19–23, 2011, Hangzhou, China
http
://ww
w.g
ettyim
ag
es.co
.uk/d
eta
il/ph
oto
/em
pty-ja
m-ja
r-roya
lty-free
-ima
ge
/13
69
76
19
8
Jam forever
They came. Have the evidence. Have a plan. Did you wish for this? Do you want it?
Fragile Flux• Content, services, bits, communities
Funding Plan• Novelty over sustainability, • Research-Production Falsehoods• Wave invention, Political lobbying
Securing the community• Leadership & Foundations
Business model???Software is Free like Puppies Are Free
Jam not forever
• Acquire
• Retain
• Widen – More/Different
• Reposition– Different/New Stage
• Changing Community is Challenging… [Daron Green]
The Social and the Technical
are Inseparable
Adoption is a Merry-Go-Round
You know they came when……you were useful and usable to someone some of the time, but they might not tell you
… people ask you to join their consortia or use it … they gave up their own home grown stuff for yours
… someone you don’t know uses it and tells you all about your own stuff. … someone publishes papers about it. Without citing you.… someone else claims credit.… people you don’t know start bitching about it.
… its just expected to be there and you are kind of expected to be there too.…your Head of School complains you don’t do enough CS research because you are doing too much Software Engineering and Support.
James Howison Heather Piwowar
Christine Borgman Nosh Contractor
Victoria Stodden Janet Vertesi
Jay Liebowitz Robert Kraut
Acknowledgements (1)
Acknowledgements (2)
• The myGrid family, friends and contributors• But especially: Katy Wolstencroft, David Withers, Marco
Roos, Alan Williams, Jits Bhagat, Stuart Owen, Stian Soiland-Reyes, Shoab Sufi, Robert Stevens, Paul Fisher, Peter Li, Ian Dunlop, Finn Bacall, Mannie Tags, Niall Beard, Rob Haines, Christian Brenninkmeijer, Alasdair Gray, Tim Clark, Pinar Alper, Paolo Missier, Khalid Belhajjame, Duncan Hull, Sean Bechhofer, david De Roure, Don Cruickshank, Wolfgang Mueller, Olga Krebs, Franco Du Preez, Quyen Nguyen, Jacky Snoep.
• The members of Wf4ever, SysMO, BioVel, HELIO, SCAPE, OMII, SSI, NeiSS, Obesity e-Lab and anyone else I forgot
Further Information• myGrid
– http://www.mygrid.org.uk• Taverna
– http://www.taverna.org.uk• myExperiment
– http://www.myexperiment.org• BioCatalogue
– http://www.biocatalogue.org• SysMO-SEEK
– http://www.sysmo-db.org• MethodBox
– http://www.methodbox.org.uk• Rightfield
– http://www.rightfield.org.uk• Wf4ever
– http://www.wf4ever-project.org• BioVeL
– http://www.biovel.eu• Software Sustainability Institute
– http://www.software.ac.uk• Software Carpentry
– http://software-carpentry.org/
Keep your Friends Close
EmbedFavours will Favour you
Know your Users
Anticipate Change
SkepticChampions
Coalface users
Patrons
End Users
Developers
Service Providers
System Administrators
Keep Sight of the Bigger Picture
Friends and Family
Fit in
Jam TodayJam Tomorrow
Just EnoughJust in Time
Act LocalThink Global
Enable Users to Add Value
Design for Network Effects
SUMMARY(De Roure and Goble, IEEE Software 2009)