prof. barend mons, biosemantics group at leiden university medical center and head of node of...
Post on 25-May-2015
584 Views
Preview:
DESCRIPTION
TRANSCRIPT
by Barend Mons
Brought to you for
in V parts
A Plea For Professional Datapublishing
Bringing Data to Broadway
The Cast
FAIR play For Research Data and other Research Objects
Findable Accessible
Interoperable Reusable
Part I
Moaning and Lamenting
Singers and Dancers
A-
The Curse of Multidisciplinarity
I can not keep my data experts !!!
2005: Text Mining ? Why Bury it first and then mine it again !
f
Part IIThe Explicitome
and the Elusive Part(our own fault)
The Explicitome: everything we already asserted
narrative
The Elusive Explicitome Phenomenon example from: Yepes & Verspoor, 2013
Tables/figures
abstract
# of assertions
Supplementary data
2% 4% 50%* # of SNP-Phen:
The Elusive Explicitome: what escapes us (95%)
Hurdle 1: Paywalls
Hurdle 2: ‘TIF’walls
Hurdle 3: The Wall of Broken Links
5 500* 1000 50K-1M+
Data loss is real and significant, while data growth is staggering
Nature news, 19 December 2013 • Computer speed and storage capacity is doubling every 18 months and this rate is steady
• DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade‘Oops, that link was the laptop of my PhD student’
Computer Analytics
(takes charge)
Enormity of datasets
(beyond narrative)
Collaborative Intelligence
(calls for million minds) Irreversable movement
(towards OA)
FAIR
Data Publishing &
Stewardship
?
The trends in e-Science
Professionalise Data Stewardship
Educate, Reward and Keep Data Experts
FA
IR
Part 3 Unavoidable: some science of ‘our own’
but…..as examples, sorry
Part III
INTERMEZZOSome Research….
….Sorry for the LS examples…..
Simplified eScience
RO’s The Explicitome
+ WorkFlows
User
New dataset
New Insights
Ridiculogram
Thanks to Peter WittenBurg
AERIAL SURVEY pattern recognition in
Ridiculograms
HUMAN EXCAVATION rationalisation and
‘confirmational reading’
‘Why would I believe this association’???
XFAIR for computers FAIR for people
For KD we need each association only once
23Cardinal Assertion
(<1011)
n identical assertions
‘n’ different provenances
We publish about less than a million LS Concepts !
24106 concept clusters (Knowlets)
⊲
BioSemantics Knowledge Discovery Pipeline data sources ‘coordinated’ data
!
nanopub cache
cardinal assertion
storesemantic data modellingindexing
reasoning algorithms
trends
phase transitions
‘new’ data differentials alerts
{funding priorities
LUMC - LIACSwww.biosemantics.org
• gene • disease
semantic query
{
© Phortos Consultants
44,000 hypotheses (PPI)
What about the other 43,999 ?
Part 3 Unavoidable: some science of ‘our own’
but…..as examples, sorry
Part IV
Towards SolutionsBigger is not Better
Zipping the Explicitome
Electronic Health
Databases
Value Added
Databases
The Rescued Explicitome
narrative
Tables/figures
Supplementary data
abstract
Total Explicitome an estimated 1014 asserted associations in 2,500 data sources
PROVENANCE
ETL to FAIR
FAIR to
read
Assertions
Concepts
1014
1011
10680%
20%
Semantic MedLineU+C+CT+EG+GO = 36 M
Cardinal
Zipping the Explicitome
Part 3 Unavoidable: some science of ‘our own’
but…..as examples, sorry
Part V
(FAIR) data should take CENTER STAGE
DOI
PID
ARK
HandlesUUID
TURI’s
?
PID
'provenance' (user defined)
Data (elements)
Metadata (intrinsic)
A simplified diagram of a Digital (data) Object irrespective of technological choices and naming
PID
'provenance' (user defined)
Data (elements)
Metadata (intrinsic)
Digital Object Architecture s are Digital Objects
Nanopublications are Research ObjectsSome Research Objects are
PID\\\
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
Totally UNFAIR
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
Findable Usable for Humans
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR metadata
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR data- restricted access
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR data- Open Access
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR data- Open Access/Functionally Linked
Data as increasingly FAIR Digital Objects
The Data Stewardship Cycle
35
5%
Repositories
Data Owners
(supp)data
Databases
ELIXIR FAIR Data Search Index
End-users
FAIR L2
ELIXIR semantic data repository
ELIXIR Data FAIR Port
ELIXIR federated data
FAIR L1
Search for datasets
Download data (sub)
sets in many formats (xml, rdf, json etc)
FAIR L3
FAIR L4
ASPs, Inhouse IT, Bioinformatics
Etc..
Tools & Applications
ElixirFin.
ElixirEsp.
ElixirNor.ElixirUKElixir
SWEElixirNL..ElixirFin.
ElixirEsp.
ElixirNor.ElixirUKElixir
SWEElixirNL..
FAIRport proof of concept
www.nanopubmed.org
Parties needed Typical Candidates NL-exampleTusted Party Usually Public Sector
With 'data stewardship' mandate 1
Executive Party/ Coordinator
Usually Public or Private Sector With Expert Knowledge on Project
ans relation management 2
Technology Providers PID/ARTA stewards3 4
DTL/ELIXIR-nl
others
Publishing pipeline EURETOS6
DOA architecture/IMS CNRI + EURETOS5
Repository Software7
eInfrastructure8
Malpractices…….
Journal Impact Factor
Ignore Altmetrics
No data stewardship plan
Obstruct Tenure Data Experts
‘supplementary data’
Knowledge Sharing Impaired
4/10/1440
EUDAT
DATAVERSE
BD2KELIXIR
NIHCom
monsH2020
DRYADRDA
FigShare
Nanopub
Biosharing
Elsevier
NatureScience
SageBio
NITRDFORCE11
ORCIDVIVO
HVPDataCite
EGA
Reseach Objects
NebulusEmbassy
SADI
EURETOSYARCdata
IMI
DANS
interoperability
ISA
Open PHACTS
Data Fabric
Good practices (apart from collaborating)
RO Impact Factor
Award Altmetrics
5% for data stewardship plan
Train & Tenure Data Experts
‘professional data publishing’
FAIR play
THE END
Thank you!
COMMENT: (till October 1st) ENDORSE: (after October 1st)
1. FAIR guiding principles with public discussion forum: https://www.force11.org/group/fairgroup/fairprinciples
2. Notes and Annexes: https://www.force11.org/node/6062/
3. Group home page https://www.force11.org/group/fairgroup
Endorsed by 82 organisations and [y] individuals
top related