david lynn the wellcome trust data matters uk research data service conference 26 february 2009

18
David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Upload: samuel-malloy

Post on 28-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

David Lynn

The Wellcome Trust

Data MattersUK Research Data Service conference

26 February 2009

Page 2: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

The Wellcome Trust

• independent biomedical research charity

• established in 1936

• current spend of over £600m pa

• supports over 3,000 researchers in more than 50 countries, across six continents

• works to engage the public in research and to explore its societal implications

Page 3: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Mission and strategic aims

Our mission is to foster and promote research with the aim of improving human and animal health

We have six strategic aims

• advancing knowledge

• using knowledge

• engaging society

• developing people

• facilitating research

• developing our organisation

Page 4: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Facilitating research• we strive to foster a research

environment in which biomedical science can flourish

• we partner with others to develop key data resources:

Human Genome Project Structural Genomics Consortium UK Biobank

• we also fund key databases via: Wellcome Trust Sanger Institute European Bioinformatics Institute additional grant funding

• we work to maximise access to research outputs (publications, data and collections)

Page 5: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Research is generating rapidly increasing volumes of data…

0

20

40

60

80

100

120

DNA sequencing: total gigabases by week(80 gigabases per week is 130,000 bases per second)

Structural Biology: Structures deposited in the Protein Data Bank*

*graph extracted from the Interim Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (Dec 2008)

Courtesy of Julian Parkhill

Page 6: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Field descriptions:

1. submitted name, description MPD_name MPD_description intervention units agesex females only

Pedigree Number mouse_ id pedigree number nBody Weight Total body weight in gm bw body weight high-fat diet 14wks g 20wksLiver Weight Weight of the liver in gm liver_wt liver weight high-fat diet 14wks g 20wksWeek 14 HDL HDL cholesterol after W14 on Atherogenic diet (mg/dl) HDLC HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wksWeek 14 Non HDL Cho Non HDL cholesterol after W14 on Atherogenic diet (mg/dl) nonHDLC non-HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wksWeek 14 Total Cho Total cholesterol after W14 on Atherogenic diet (mg/dl) chol total cholesterol (plasma) high-fat diet 14wks mg/dL 20wksWeek 14 TG Triglycerides cholesterol after W14 on Atherogenic diet (mg/dl) TG triglycerides (plamsa) high-fat diet 14wks mg/dL 20wksBMD Bone mineral desity (g/cm2) BMD bone mineral density high-fat diet 14wks g/cm2 20wks%Fat Percent body fat (%) pct_ fat percent fat high-fat diet 14wks % 20wksBMI Body Mass Index (BMI = weight(gm)/ (length(cm))2) BMI body mass index (BMI) high-fat diet 14wks g/cm2 20wksAtherosclerotic Lesion Size Atherosclerotic Lesion Size (μm2) aortic_ lesion fatty streak aortic lesion size high-fat diet 14wks μm2 20wks

2. submitted name, description MPD_name MPD_description intervention units ageSex sex (0=female, 1=male) sexMouse-num Mouse ID number mouse_ idPedigree paternal grandmother [0 = (AxB)x(AxB) or (BxA)x(AxB); 1 = (AxB)x(BxA) or (BxA)x(BxA)pgm paternal grandmother [0 = (AxB)x(AxB) or (BxA)x(AxB); 1 = (AxB)x(BxA) or (BxA)x(BxA)BW Body Weight [gm] bw body weight high-fat diet 14wks g 20wksHDL HDL cholesterol [mg/dL] HDLC HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wkslogHDL log of HDL cholesterol HDLC_ log HDL cholesterol (plasma) (log) high-fat diet 14wks mg/dL 20wksLogNon-HDL log nonHDL cholesterol nonHDL_ log non-HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wksTG Triglycerides cholesterol [mg/dl] TG triglycerides (plamsa) high-fat diet 14wks mg/dL 20wksPnt-Fat Percent body fat (%) pct_ fat percent of body weight that is fathigh-fat diet 14wks % 20wksLogBMI log of Body Mass Index BMI_ log body mass index (BMI) (log) high-fat diet 14wks g/cm2 20wkslesion-binary Aortic Lesions (binary coding) aortic_ lesion_presencefatty streak aortic lesions (0=absent, 1=present)high-fat diet 14wks score 20wkslesion-lt0 Aortic Lesions (no lesion = 0, lesion = 1)lesion Atherosclerotic Lesion Size (μm2) aortic_ lesion fatty streak aortic lesion size high-fat diet 14wks μm2 20wksLogLesion-plus1 Log of the Lesion Size aortic_ lesion_ logfatty streak aortic lesion size (log)high-fat diet 14wks μm2 20wksTotal-BMD Bone mineral desity [g/cm2] BMD_total bone mineral density (whole body)high-fat diet 14wks g/cm2 20wksstd-tBMD Whole body areal Bone mineral density by DXA (PIXI) BMD_areal areal bone mineral density (whole body)high-fat diet 14wks g/cm2 20wksVertebral-BMD Vertebral Bone Mineral Desity [g/cm2] BMD_vertebralvertebral bone mineral density high-fat diet 14wks g/cm2 20wksstd-vBMD Spinal Areal Bone mineral density by DXA (PIXI) BMD_spinal spinal areal bone mineral densityhigh-fat diet 14wks g/cm2 20wksPrin1 Principle Component 1 (not defined) PC1 Principle Component 1 high-fat diet 14wks score 20wksPrin2 Principle Component 2 (not defined) PC2 Principle Component 2 high-fat diet 14wks score 20wksPTH Parathyroid hormone [pg/ml] PTH parathyroid hormone high-fat diet 14wks pg/mL 20wksLog PTH log Parathyroid hormone PTH_log parathyroid hormone (log) high-fat diet 14wks pg/mL 20wks

Data for this project were submitted in two data files, with some measurements duplicated. Data are collated in this data file; duplications have been purged.

…in a diverse range of formats

Page 7: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Through sharing data we can increase its powerGenome-wide association studies are revealing the genetic basis of common diseases through combining data across large patient cohorts…

High CholesterolObesityMyocardial infarctArrhythmiasType 2 DiabetesProstate cancerBreast cancerColon cancer

KCNJ 11

2003

KCNJ 11

20032000

PPAR

2000

PPAR

2001

IBD5NOD2

2001

IBD5NOD2

2005 20062002

CTLA4

2002

CTLA4

2004

PTPN22

2004

PTPN22

Age Related Macular DegenerationCrohns DiseaseType 1 DiabetesSystemic Lupus ErythematosusAsthmaRestless leg syndromeGallstone diseaseMultiple sclerosisRheumatoid arthritisGlaucoma

CD25IRF5

PCSK9CFH

2007

NOS1APIFIH1PCSK9CFB/C2

LOC3877158q24IL23RTCF7L2

CDKN2B/A8q24 #28q24 #38q24 #48q24 #58q24 #6

ATG16L15p13

10q21IRGM

NKX2-3IL12B3p211q24

PTPN2TCF2

CDKN2B/AIGF2BP2CDKAL1

HHEXSLC30A8

MEIS1LBXCOR1

BTBD9C3

8q24ORMDL3

4q25TCF2GCKRFTO

C12orf30ERBB3

KIAA0350CD22616p13PTPN2SH2B3FGFR2TNRC9

MAP3K1LSP18q24

LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A

NOS1APIFIH1PCSK9CFB/C2

LOC3877158q24IL23RTCF7L2

CDKN2B/A8q24 #28q24 #38q24 #48q24 #58q24 #6

ATG16L15p13

10q21IRGM

NKX2-3IL12B3p211q24

PTPN2TCF2

CDKN2B/AIGF2BP2CDKAL1

HHEXSLC30A8

MEIS1LBXCOR1

BTBD9C3

8q24ORMDL3

4q25TCF2GCKRFTO

C12orf30ERBB3

KIAA0350CD22616p13PTPN2SH2B3FGFR2TNRC9

MAP3K1LSP18q24

LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A

2006

… and databases such as DECIPHER at the Wellcome Trust Sanger Institute, enable researchers to share data to gain new insights

DECIPHER: Overview map of consortium members

Courtesy of Leena Peltonen

Page 8: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Researchers are integrating large datasets to gain new insights into complex systems

Malaria Atlas Project, University of Oxford Heart modelling - CardioViz3D*

*Toussaint et al (2008) An Integrated Platform for Dynamic Cardiac Simulation and Image Processing: Application to In Proc. Eurographics Workshop on Visual Computing for Biomedicine (VCBM) Linking cholera outbreaks to sea temperature in Bangladesh

Page 9: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

And there is immense potential to link research papers and data…

Links to related datsets

Link to UKPMC from PubMedcitation

OA licence –potential for text mining

Page 10: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

A vast number of research users are accessing key data resources

• For the year to August 07, the EBI website served an average of 340k unique hosts per month, with over 2m requests per day*

• The Wellcome Trust Sanger Institute website regularly received 15m hits per week during 2007/08 (a rise of 25% compared to previous year)

*Source: European Bioinformatics Institute, Annual Scientific Report 2007

Wellcome Trust Sanger Institute: Total number of web pages requested)

Page 11: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Meeting the challenges 1: infrastructure

• rising volumes and complexity of data pose immense challenges for storage and curation e.g. WT Sanger Institute data

storage capacity increased from 300 TB in 2005 to 1,500 TB in 2008

• key data resources need coordinated and long-term sustainable funding

• ELIXIR is aiming to build a sustainable infrastructure for biological information across Europe

Page 12: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Meeting the challenges 2: technical and cultural issues

• coordination and advocacy from key communities (e.g. funders, institutions, publishers)

• provision of information and guidance for researchers

• appropriate incentives and recognition for researchers

• development of key technical standards, metadata, etc

• nurturing skills in data management – career support and training

Page 13: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

Meeting the challenges 3: data security• research involving personal data

must have appropriate safeguards to protect participants

• recent high-profile incidents have sharpened concerns around privacy, confidentiality and responsibilities of researchers

• the issues have been addressed in several recent reports: Academy of Medical Sciences Council of Science and Technology Thomas/Walport data sharing review US Institute of Medicine

• management & governance of data is a key concern

Page 14: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

The Wellcome Trust’s approach• Long track record of promoting access to

research outputs: Bermuda principles (1996); Fort Lauderdale

principles (2003) tailored data policies for major initiatives strong advocate of open access publishing

• data management and sharing policy published in Jan 2007: researchers should maximise access to

research data with as few restrictions as possible

data management plans (DMPs) required for projects generating resources or large datasets that could be shared for added value

• will meet costs for data sharing activities outlined in DMPs

• increasing convergence of DMP approach amongst funders

Page 15: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

The UKRDS in context: ongoing Trust activity

• At UK level… ongoing interactions with RCUK, JISC,

HEFCE, RIN, UKDA and others several other multi-funder initiatives –

e.g. NCRI Informatics, UK Data Forum UK PubMed Central development active discussions around research uses

of electronic patient records

• At European level… ESFRI (ELIXIR) proposals for EU PMC resource

• At international level… developing a code of conduct for public

health and epidemiological data Fort Lauderdale follow-up meeting (led

by Genome Canada) in May 09

Page 16: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

UKRDS – what is needed

• a coordinated approach to preserve key research data and ensure its long-term value is maximised

• a “service” which meets the needs of researchers and funders

• but the devil is in the detail – must ensure that the project is developed in a way that truly adds value

• will still depend upon sustainable long-term funding for key data resources

Page 17: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

UKRDS – critical success factors

• must build upon and link effectively with existing UK activity, and develop effective international links

• will need full buy in from all major funders (Research and Funding Councils & charities) and other key stakeholders

• will need to accommodate the differences between funders’ approaches, and between disciplines

• must be appropriately resourced to meet its goals

Page 18: David Lynn The Wellcome Trust Data Matters UK Research Data Service conference 26 February 2009

UKRDS – pathfinder study

• clarification on the role that it is envisaged research funders will play

• a more detailed project specification will need to be developed with:

clearly stated expectations for partnersfull justification for the anticipated costs

• the study will need full buy-in from funders, institutions and the wider research community

• the resource implications of the study will need to be assessed carefully