big data, small data, data papers - short statement for "bdebate on biomedicine 2014"

13
What is Big Data in Biomedicine? Data Types to be considered Susanna-Assunta Sansone, PhD @biosharing @isatools @scientificdata B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 11 Nov, 2014 Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator

Upload: susanna-assunta-sansone

Post on 02-Jul-2015

440 views

Category:

Data & Analytics


0 download

DESCRIPTION

My short statement on the (close) debate on Big Data: http://www.bdebate.org/en/forum/big-data-biomedicine-challenges-and-opportunities

TRANSCRIPT

Page 1: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

!

What is Big Data in Biomedicine?!Data Types to be considered!

!

Susanna-Assunta Sansone, PhD!

!

@biosharing!@isatools!

@scientificdata!!

B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 11 Nov, 2014

Data Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

Page 2: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

•  Big science efforts represent only a small proportion!o  often featuring homogenous and well-organized data!!

•  There is a large proportion of small independent research efforts!o  a rich variety of specialty data sets!

Let’s not forget the long tail of research data

Page 3: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

•  Small independent research efforts fall in the long-tail of the distribution!o  Most of this (such as as siloed databases, null findings) is

unpublished!o  These dark data hold a potential wealth of knowledge!

Let’s not forget the long tail of research data

Page 4: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

•  Over 50% of completed studies in biomedicine do not appear in the published literature!

!

•  Instead reside in file drawers personal and hard drives!

!

•  Often because results do not conform to author's hypotheses!

“Only half the health-related studies funded by the European Union between 1998 and 2006 - an expenditure of €6 billion - led to identifiable reports”!

Plagued by selective reporting of data and methods

Page 5: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

Role of data papers and data journals

•  Incentive, credit for sharing!o  Big and small data!o  Unpublished data!o  Long tail of data!o  Curated aggregation !

•  Peer review focus!•  Value of data vs. analysis!•  Discoverability and reusability!

o  Complementing community databases!

•  Narrative/context!

Page 6: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

•  The power of “small data” are in their aggregation and integration with other datasets!

•  There is value in all well-curated, validated and reusable data – big and small!

Role of data papers and data journals

Page 7: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

Res

earc

h ar

ticle

s D

ata

reco

rds

Dat

a D

escr

ipto

rs

Adding value to research articles and data records

Page 8: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

Res

earc

h ar

ticle

s D

ata

reco

rds

Dat

a D

escr

ipto

rs

Adding value to research articles and data records

Credit for sharing your data

Focused on reuse and reproducibility

Peer reviewed, curated

Promoting community data and code repositories

Open Access

Page 9: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

~ 156

~ 70

~ 334

Source: BioPortal

Databases !implementing !

standards!

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Progressively refine guidance to authors and reviewers

Page 10: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

Mapping the landscape of standards and databases

Page 11: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

Mapping the landscape of standards and databases

Page 12: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them;

Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or funded or implemented

Help stakeholders to make informed decisions

Page 13: Big data, small data, data papers - short statement for "BDebate on Biomedicine 2014"

Summarizing

•  Selective reporting of data and methods is still an issue

•  Let’s not forget the potential value of the long-tail of data

•  Data papers and journals can provide incentive and credit to share more data - big and small

•  Content standards do help - but the current wealth of options is an obstacle