overview of embl-european bioinformatics institute and ... · overview of embl-european...

Post on 16-Apr-2018

223 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Overview of EMBL-European Bioinformatics Institute

and Interactions with CDISC

Dominic Clark

Industry Programme Manager

clark@ebi.ac.uk

www.ebi.ac.uk/industry

Key topics

• EMBL-EBI Background, Services and Standards activities

• EMBL-EBI working with Industry

• The Genomic Standards Consortium

• Challenges ahead

OUR

MISSION

To contribute to

the advancement

of biology

through basic

investigator-

driven research

in bioinformatics

What is EMBL-EBI?

• Part of the European

Molecular Biology

Laboratory

• International, non-profit

scientific institute

• Europe’s hub for biological

data services

Where is EMBL-EBI?

© John Freebury

• We share a campus with

the Wellcome Trust

Sanger Institute

• Near Cambridge, UK

EMBL-EBI

Hinxton data centre

(Most services run

from data centres in

London)

14/11/2013 9

EMBL member states

Austria, Belgium, Croatia,

Denmark, Finland, France,

Germany, Greece, Iceland, Ireland,

Israel, Italy, Luxembourg, the

Netherlands, Norway, Portugal,

Spain, Sweden, Switzerland and

the United Kingdom

Associate member state: Australia

Our funders

EMBL member states: Austria, Belgium, Croatia, Denmark,

Finland, France, Germany, Greece, Iceland, Ireland, Israel,

Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain,

Sweden, Switzerland, United Kingdom.

Associate member state: Australia

Other major funders: the European Commission,

UK Research Councils, the US National Institutes of Health

and the Wellcome Trust

EMBL-EBI users: a snapshot

The new EBI building & ELIXIR Technical

hub.

14/11/2013 1

3

Who we are

~500 members of staff

~53 nationalities

~400 in services & support

~100 focus on basic research

EMBL-EBI works collaboratively

Hinxton Cambridge UK

Global Europe

EMBL-EBI research collaborations

We share funding and author

publications with partner

institutes throughout the world:

• 327 publications in 2011

(90% in collaboration with

other institutes)

• 843 grants shared with other

institutes in 2011

Data and tools for molecular life science

Services

www.ebi.ac.uk/services

Atlas

what happens where

From molecules to medicine

Biology is changing:

• Data explosion

• New types of data

• Emphasis on systems

• Growth of applied biology

• molecular medicine

• agriculture

• food

• environmental

sciences.

Big and bigger data

Key principles about our services

• Freely available

• A comprehensive collection of molecular databases

• Globally coordinated data collection and dissemination

• Produced in collaboration with other world leaders, e.g.:

• NCBI (United States)

• Wellcome Trust Sanger Institute (United Kingdom)

• National Institute of Genetics (Japan)

• SIB Swiss Institute of Bioinformatics (Switzerland)

Data resources at EMBL-EBI

Genes, genomes

& variation

RNA, protein &

metabolite

expression

Protein sequences,

families & motifs

Molecular & cellular

structures

Reactions, interactions &

pathways

Chemical biology

Ontologies & biological

samples

Scientific literature

Data resources at EMBL-EBI

Genomes & variation

• Ensembl

• Ensembl Genomes

• Genome-phenome archive

• Metagenomics

Nucleotide sequences

• European Nucleotide

Archive (ENA)

Expression

• ArrayExpress

• Expression Atlas

• PRIDE

• R-Workbench Proteins

• The Universal Protein

Resource (UniProt)

• InterPro Chemical biology

• ChEMBL

• ChEBI

Literature & ontology

• Europe PubMed

Central

• Gene Ontology

Molecular structures

• Protein Data Bank in Europe

• PDBsum

• ProFunc

Pathways

• IntAct

• Reactome

• MetaboLights

Systems

• BioModels

• Enzyme Portal

• BioSamples

Patent sequences

• Non-redundant patent

sequence dbs

• Patent compounds

Standards development – international collaborations Genomes

www.geneontology.org

gensc.org

Functional Genomics

www.fged.org

Protein sequence

www.uniprot.org

Proteomics

www.psidev.info/

Protein structure

www.wwpdb.org

Cheminformatics

www.ebi.ac.uk/chebi

Pathways

www.reactome.org

www.biopax.org

Systems modeling

www.sbml.org

www.sbgn.org

Metabolomics

www.metabolomicssociety.org

Literature and text mining

www.pistoiaalliance.org/

Nucleotide sequence

www.insdc.org

www.barcodeoflife.org/

Database collaborations: we collaborate on standards and data sharing

in global data sharing agreements for all our major databases.

14/11/2013 24

2005: The Genomics Standards Consortium

• A vast and rich body of information has grown up as a

result of the world’s enthusiasm for ’omics technologies.

Finding ways to describe and make available this

information that maximise its usefulness has become a

major effort across the ’omics world. At the heart of this

effort is the Genomic Standards Consortium (GSC), an

open-membership organization that drives community-

based standardization activities,

• The GSC call for the scientific community to join forces to

improve the quality and quantity of contextual information

about our public collections of genomes, metagenomes,

and marker gene sequences.

The GSC’s Mission

• the implementation of new genomic standards

• methods of capturing and exchanging metadata

• harmonization of metadata collection and analysis efforts across the wider genomics community

Community-driven solutions

The path:

• Identify the problem

• Define a community to address it

• Define scope of the solution

• Implement solution

• Gain adoption of solution

Data standardization at ENA

Petra ten Hoopen

European Nucleotide Archive

European Nucleotide Archive

http://www.ebi.ac.uk/ena/home

Permanent and comprehensive repository for public

domain nucleotide sequences and associated information

• Archiving

• Helpdesk

• Training

• Standards development

• Technology development

• Community building

ENA data model

Data = raw reads and nucleotide sequence assemblies

Metadata = information associated with sequences, includes

provenance of biological sample (sample), sequencing experiment (experiment) and its

scope (study), analysis and annotation of sequences (analysis), and files of raw data (run)

Study

Experiment

Analysis

Sample

Run

Data

ENA data standardization

Standardized reporting requirements for all metadata and

data objects

Study

Experiment

Analysis

Sample

Run

Data

agreed by

INSDC

Consortia of scientific domain-specific experts

(e.g. GSC, MicroB3, RNACentral)

implemented with

community-agreed checklists and control vocabularies,

data-type-specific file formats

ENA checklists 30 Checklists for assembled and annotated sequences in

WEBIN submission system

Large scale

• WGS unannotated

• WGS annotated

• EST

• GSS

• STS

• TSA unannotated

• TSA annotated

Community Standards

• Barcode COI

• MIMARKS 16S

• MIMARKS soil sample 16S

RNA

• Single CDS mRNA

• Single viral CDS genomic RNA

• ssRNA viral polyprotein

• ssRNA viral cRNA

DNA

• Single CDS genomic DNA

• MHC gene 1-exon

• MHC gene 2-exons

• Gene intron

• ITS region

• ETS region

• IGS

• Phylogenetic marker

• COI gene

• D-loop

• trnK-matK locus

• Satellite DNA

• Betasatellite

• rRNA gene

• 16-23S ISR

• Gene promoter

Power of ENA checklists

consistent reporting

user-friendly data submission

data validation

data retrieval

data discovery

data interoperability

data usability

ENA-implemented checklists support and improve:

1. help to achieve objectives of data standardization efforts

1. assist to both data submitters and data users

ENA-implemented checklists

The EMBL-EBI Industry Programme

We support larger companies through the “Industry

Programme”

• For the past 17 years the Industry Programme has been

an integral part of EMBL-EBI, providing on-going and

regular contact with key stakeholder groups.

• Established in 1996, the programme is now well

established as a subscription-funded service for larger

companies.

• Through the Industry Programme, EMBL-EBI provides

specialist workshops, standards-based activities and

pre-competitive research and development opportunities

of particularly relevance to the industry programme

members.

14/11/2013 36

www.ebi.ac.uk/industry

Industry Programme members

The EMBL-EBI Industry Programme

• Relationship between industry members, EMBL-EBI and

our collaborators.

• Enabling industrial update of innovations in bioinformatics

• Knowledge Exchange workshops with world

leaders/KOLs

• Neutral ground for members to explore strategic

developments and concepts

• Input into services development

• Pre-competitive collaboration

• Standards development

• Technical development

Early Success: development of MIAMI

Standard • MIAME describes the Minimum Information About a

Microarray Experiment that is needed to enable the

interpretation of the results of the experiment

unambiguously and potentially to reproduce the

experiment. [Brazma et al., Nature Genetics]

• The public repositories ArrayExpress at the EBI (UK),

GEO at NCBI (US) and CIBEX at DDBJ (Japan) are

designed to accept, hold and distribute MIAME compliant

microarray data.

The six most critical elements contributing

towards MIAME are:

• The raw data for each hybridisation (e.g., CEL or GPR files)

• The final processed (normalised) data for the set of hybridisations in the

experiment (study) (e.g., the gene expression data matrix used to draw the

conclusions from the study)

• The essential sample annotation including experimental factors and their

values (e.g., compound and dose in a dose response experiment)

• The experimental design including sample data relationships (e.g., which raw

data file relates to which sample, which hybridisations are technical, which

are biological replicates)

• Sufficient annotation of the array (e.g., gene identifiers, genomic coordinates,

probe oligonucleotide sequences or reference commercial array catalog

number)

• The essential laboratory and data processing protocols (e.g., what

normalisation method has been used to obtain the final processed data)

MIBBI - Minimum Information for Biological

and Biomedical Investigations

• The MIBBI project promotes extant efforts developing

minimum information guidelines for the reporting of

biological and biomedical science to the wider

community. Background and history of the MIBBI project

can be found here. We work to progressively move the

information to this new site that is also set to provide

additional search and link functionality to connect

guidelines with terminologies and exchange format, as

used by the community.

• There are 38 MIBBI records in BioSharing –

• http://www.biosharing.org/standards/mibbi

Knowledge Exchange Workshops

• The Industry Programme organises high quality workshops and

symposia, providing expert level presentations and strategic

discussion opportunities for members and other key opinion

leaders.

• Workshops:

• Prioritised by the IP members based on proposals

• Organised through a planning team,

• Include key opinion leaders as speakers

• Include appropriate stakeholders

• By individual/collective invitation only.

• Facilitated

• Take a significant amount of planning

14/11/2013 43

Member-driven workshops

Computational

systems

biology

Data

integration

Workshops in 2012

Workshop Title Date

Using electronic health records (EHRs) for

translational bioinformatics

Feb 2012

Chemogenomics Mar 2012

1000 Genomes Project Apr 2012

R & Bioconductor training workshop May 2012

Metabolomics May 2012

Antibody Informatics June 2012

Systems Biology for Toxicology Pathways Sept 2012

Secure Hosted Services Oct 2012

1000 Genomes and NSG data Analysis (Novartis

site, Cambridge, MA)

Nov 2012

Pre-clinical Safety Data (EMBL, Heidelberg, DE) Nov 2012

14/11/2013 45

Industry Programme Workshops for 2013

14.11.2013 4

6

Workshop Title Date

Oncogenomics 13-14 Mar 2013

Overview of Biomedical Ontologies 17-18 Apr 2013

Biomarkers 23-24 Apr 2013

Encode and Epigenomics 19-20 Jun 2013

Data Integration and its application 18-19 Sep 2013

Translational informatics 23-24 Oct 2013

Oncogenomics (Pfizer, Pearl River, NY) 14-15 Nov 2013

Computational tools for chemical biology,

phenotypic screening & target de-convolution

21-22 Nov 2013

RNA-seq data analysis 11-12 Dec 2013

Dates for 2014

14.11.2013 4

7

Workshop Title Date

Rare Diseases and drug repositioning 24-25 Mar 2014

Encode Workshop, Cambridge, MA, USA 15-16 Apr 2014

EBI/EuroDISH/NuGO workshop on Nutrition

Information, Ontologies and Nutrigenomics

29-30 Apr 2014

Systems Pharmacology 7th-8th May 2014

Biologics 21-22 May 2014

Shared Data, Shared Cost 18-19 June 2014

What happens after workshops?

• Presentations are made available in Industry members

website

• Short report

• Where appropriate EMBL-EBI will act as a coordinator or

broker in establishing pre-competitive

collaborations/initiatives between Industry programme

members (and third parties – academic groups, funding

organisations, other commercial companies)

• Publication: Examples from 2011

• MIABE paper in NRDD

• Tox ontology roadmap papers

14/11/2013 48

Published Sept. 2011, PMID: 21878981

14/11/2013 49

Major challenges remain

Variation: EGA and GWAS

• Explore datasets from Genome-

Wide Association Studies

(GWAS)

• All types of sequence and

genotype experiments:

• Case control

• Population

• Family studies

• SNP and CNV genotypes from

array-based methods

• Genotyping done with re-

sequencing methods

European Genome-

phenome Archive:

www.ebi.ac.uk/ega

A Global Alliance for sharing genomic

and clinical data • EMBL and EMBL-EBI have joined the Global Alliance, a large-scale,

international effort to enable the secure sharing of genomic and clinical data.

The Global Alliance invites commercial and not- for-profit organisations to

join forces with other leading data, health care, research, and disease

advocacy organisations to establish an evidence base for genomic research

and medicine that adheres to the highest standards of ethics and privacy.

• A White Paper circulated in early 2013 has the support of nearly 70

organisations in Asia, Australia, Africa, Europe, North America and South

America who are committed to creating a common framework that supports

data analysis and protects the autonomy and privacy of participating

individuals. Signatories of an accompanying Letter of Intent to create a not-

for-profit, inclusive, public–private, international, non-governmental

organisation include healthcare providers, research institutions, disease

advocacy groups, life science and information technology companies. Many

more are expected to join.

Summary

• EMBL-EBI is one of the global leaders in the storage,

annotation, interrogation and dissemination of large datasets

of relevance to the bio-industries.

• Standards are an important part of international data

exchange and effective utilisation of information.

• We work closely with industry in developing new standards.

• Major challenges remain.

Acknowledgements

• Peter Sterk (U. Oxford) and secretary of GSC.

• Petra ten Hoopen (EMBL-EBI)

top related