data cycle health

16
DATA LIFE CYCLE HEALTH: CONSTRAINTS, IMMEDIATE AND LONG TERM 28 TH OCTOBER EMBL-ABR

Upload: jyotikhadake

Post on 14-Apr-2017

16 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Data cycle health

DATA LIFE CYCLE HEALTH: CONSTRAINTS, IMMEDIATE AND LONG TERM

28TH OCTOBEREMBL-ABR

Page 2: Data cycle health

Challenges for data

• Management *• Annotation• Analysis• Storage• Sharing *

Page 3: Data cycle health

Long term planning and maintenance

• Project Funding and continuity of data availability and management

• Funders/ Institutional requirements : data from public resources must be in public domain.

• Availability of storage facilities• Availability of analysis facilities

DMP

Page 4: Data cycle health

Data sharing: going beyond required• Granting authorities• Journal requirements• Improved health care

Facilitation by:• Institutional sharing• Availability of repositories/Archives : GIT hub, EBI

Repositories, NCBI repositories, Institutional and National Data archives

• Analysis, Annotaion and Advertisement of resource• Data publication

Page 5: Data cycle health

Analysis workflow metagenomics*

Unirule

Interpro2go

Page 6: Data cycle health

Resource list

• Submission: GEO, SRA, Array express, ENA/Genbank/DDBJ; PRIDE, Metabolights

• Annotation: UniProt, HPO, GO, Interpro, Reactome, PDB, Interactome …

• Visualisation: Ensembl, Networks, Structures …

• External annotations

Page 7: Data cycle health

Importance of meta-data• Data valuation by addition of metadata• Incorrect/inadequate meta data affects Analysis,

Rediscovery• No meta-data makes set impossible to find, and of

no value. Tagging helps.• Student exercise –Male Breast Cancer– before they

start a submission• If you use resources enrich them for use by yourself

or others through submissions and annotation

Page 8: Data cycle health

Ontologies and standards

Interoperatibility searching and reasoning• HPO – Phenotype Ontology• EFO – experimental eta-data, more terms may

be needed• OBIB ontology ? BioBanking• SNOMED-CT• DM + D• ICD

Page 9: Data cycle health

Resource catalogue

BioSamples – deposit and reference study details for ‘Omics expts

OMICs expts –access using OMICS Discovery Index (http://www.omicsdi.org)

Page 10: Data cycle health

Annotation transfer

Limited biochemical resources, limited number of manual curators to transfer data into databases (UniProtKB/Swiss-Prot, GO)Annotation transfer – Gene OntologyInterPro2GO EC2GOUniProt-keywords2GO Ensembl ComparaUniProt-subcellular locations2GO HAMAP2GOUniPathway2GO

Annotation transfer – TrEMBL annotationUniProt UniRules

All based on InterPro family/domain matches

Page 11: Data cycle health

ICO : code of practice• Processed lawfully, fairly and transparent manner• Collected for specified purpose• Adequate and relevant and limited to necessary• Accuracy is maintained• Data subject will not be identifiable for longer

than necessary• Processed in a secured manner and protected

against unexpected loss or destruction• Rights of the individual will be protected

Page 12: Data cycle health

ICO code of practice

This is handled in four ways:

Ethical approval for studyExplicit consent from individualFollow security guidelines from ICO Develop strong governance around this data

Page 13: Data cycle health

Proprietary data

• Any data generation funded by a commercial entity may have data restrictions associated with it.

• Any data generation involving proprietary organisms/environs may have data restrictions on them.

• Data withdrawal – obsolete vs destroy

Page 14: Data cycle health

Software

• Software graveyard and compute

• Costing for IT and sustainability or software resources

• Software publication

• Recognition and peer review

• Reproducibility in omics research

Page 15: Data cycle health

Data life-cycle

Sequence/assembly/Annotation/RNA seq

Public domaindata deposition

Update annotation

In-house data resource

Page 16: Data cycle health

Summary

• Identify potential issues early on in the project life-cycle – spending time identifying issues and planning how to address them

• Prepare to data share as early as possible – what information would you like to see if your were the data user.

• Think beyond the life-time of the grant, what are your long term plans for the sustainability of the data

• If issues with access do feed back. If not primary submission this should be sorted. Add bug reports as well.