julio peironcely @ iccs 2011

32
June 9 th 2011 ICCS 2011 Julio E. Peironcely @peyron PhD student at Leiden University and TNO Improving Metabolite Identification with Chemoinformatics

Upload: julio-e-peironcely

Post on 13-Jan-2015

790 views

Category:

Health & Medicine


5 download

DESCRIPTION

Talk given at the 9th International Conference on Chemical Structures, June 9th. How metabolomics and metabolite identification can be improved by chemoinformatics.

TRANSCRIPT

June 9th 2011 ICCS 2011

Julio E. Peironcely @peyron

PhD student at Leiden University and TNO

Improving Metabolite Identification with Chemoinformatics

Metabolome

Julio E. Peironcely

all low molecular weight molecules(metabolites) in cells, body fluids,

tissues, etc.

Metabolomics the quantitative and qualitative

analysis of all metabolites in samples of cells, body fluids,

tissues, etc. June 9th 2011 ICCS 2011

Metabolomics Genome

Transcripts

Proteins

Metabolites

Phenotype

Julio E. Peironcely June 9th 2011 ICCS 2011

Metabolomics

Biological question

Sample preparation

Experi- mental design

Data acquisition

Data pre- processing

Biological inter-

pretation

Data analysis

Samples Raw data List of peaks/ biomolecules

Relevant biomolecules/ connectivities

& Models

Metabolites

Sampling

Protocol

Julio E. Peironcely June 9th 2011 ICCS 2011

LC-MS and Metabolite Identification

peak = m/z@rt = metabolite

RPLC-microTOF (Bruker)

Kloet et al, Metabolomics 2011

MS MS

LC

Julio E. Peironcely June 9th 2011 ICCS 2011

When all you have is a mass and a “window”

Mass + Window

MS KEGG HMDB

PubChem

ChemSpider

Measured Mass + Mass Window = Multiple Elemental Compositions

List of Elemental Compositions

List of Molecules

Julio E. Peironcely June 9th 2011 ICCS 2011

When you have MS

Mass + Window MS n

Use the expert system

Measured Mass + Mass Window + Fragments =

Single Elemental Composition

n

KEGG HMDB

PubChem

ChemSpider

fragments 1 EC

List of Molecules

Julio E. Peironcely June 9th 2011 ICCS 2011

The expert system

Dr. Albert Tas Dr. Ronnie van Doorn

Julio E. Peironcely June 9th 2011 ICCS 2011

Bottlenecks in Metabolite Identification

Many metabolites in LC-MS not identified

HighRes MS can obtain 1 EC = many structures

No tools for automatic identification

Takes long for the expert to identify metabolites

Julio E. Peironcely

Challenges

Analytical methods

Software

Databases

Julio E. Peironcely

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

1.1 1.2

1.1.1

1.2.1

1.2.2

1

1.1 1.2

1.1.1 1.2.1 1.2.2

Database MS n

present

absent

Identity assignment

Structure generator

Elemental Formula +

Fragments

Metabolite-likeness

Our approach

Julio E. Peironcely June 9th 2011 ICCS 2011

Experimental Data

Processing Data Trees

Generate Structures

Filter Structures

1.1 1.2

1.1.1

1.2.1

1.2.2

1

1.1 1.2

1.1.1 1.2.1 1.2.2

Structure generator

Elemental Formula +

Fragments

Metabolite-likeness

Our approach

Find Similar Trees Database

MS n present

absent

Identity assignment

Julio E. Peironcely June 9th 2011 ICCS 2011

Experimental Data

Processing Data Trees

Find Similar Trees

Filter Structures

1.1 1.2

1.1.1

1.2.1

1.2.2

1

1.1 1.2

1.1.1 1.2.1 1.2.2

Database MS n

present

absent

Identity assignment

Metabolite-likeness

Our approach

Generate Structures

Structure generator

Elemental Formula +

Fragments

Julio E. Peironcely June 9th 2011 ICCS 2011

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

1.1 1.2

1.1.1

1.2.1

1.2.2

1

1.1 1.2

1.1.1 1.2.1 1.2.2

Database MS n

present

absent

Identity assignment

Structure generator

Elemental Formula +

Fragments

Our approach

Filter Structures

Metabolite-likeness

Julio E. Peironcely June 9th 2011 ICCS 2011

From data to information

MEF: spectral to fragmentation tree

CML

MEF CDK

XCMS

Rojas-Cherto et al. Bioinformatics (submitted)

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Julio E. Peironcely June 9th 2011 ICCS 2011

MEF: spectral to fragmentation tree

1.1 1.2

1.1.1 1.2.1 1.2.2

1

1.1 1.2

1.1.1 1.2.1 1.2.2

Mass EC …

1.3

1.3.1 1.3.2

1.3

1.3.1 1.3.2 1.2.3

1.2.3

full scan

MS2

MS3

MS

Spectral Tree

Peak

Noise

υ υ υ υ

υ

Fragmentation Tree

Parent Ion

Reaction

Neutral loss Fragments

Rojas-Cherto et al. Bioinformatics (submitted)

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Julio E. Peironcely June 9th 2011 ICCS 2011

Extracting more information from your data

Fragmentation tree fingerprints

1.1 1.2

1.1.1

1.2.1

1.2.2

1

1.1 1.2

1.1.1 1.2.1 1.2.2

1.3

1.3.1

1.3.2

1.3

1.3.1 1.3.2 1.2.3 1.2.3

Miguel Rojas-Cherto, Leiden University

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Poster 59

Julio E. Peironcely June 9th 2011 ICCS 2011

Fragmentation tree fingerprints results

Miguel Rojas-Cherto, Leiden University

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Poster 59

“unknown” Most similar trees MCS

Julio E. Peironcely June 9th 2011 ICCS 2011

De-novo identification

Structure Generator Elemental Formula Fragments

Generate

Keep molecules if canonical

augmentation

Poster 38

In collaboration with Jean-Loup Faulon, Evry University

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

CDK Nauty

All non-duplicated molecules

Julio E. Peironcely June 9th 2011 ICCS 2011

Structure Generator Results

Glycine Phenylalanine Malic acid D-Cysteine p-Cresol sulfate

C2H5NO2 C9H11NO2 C4H6O5 C3H7NO2S C7H8O3S

84 277,810,163 8,070 3,838 10,203,389

6 4,037,499 1,601 100 19,940

93,137 948

584

278

Elemental Composition

# Output Molecules

1 Fragment

2 Fragments

3 Fragments

MOLGEN same # of molecules

In collaboration with Jean-Loup Faulon, Evry University

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Poster 38

Julio E. Peironcely June 9th 2011 ICCS 2011

Lots of candidates structures

Metabolite-likeness

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Metabolite likeness

In collaboration with Andreas Bender, Cambridge Univ.

Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Julio E. Peironcely June 9th 2011 ICCS 2011

Metabolite-likeness

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Metabolite likeness

1st RF – MDLPublicKeys Sensitivity Specificity AUC

99.84% 87.52% 99.20%

2nd RF – ECFP_4 Sensitivity Specificity AUC

99.77% 86.36% 99%

In collaboration with Andreas Bender, Cambridge Univ.

Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Julio E. Peironcely June 9th 2011 ICCS 2011

Metabolite-likeness, external validation

HMDB External

validation set ChEMBL

Metabolite likeness

DrugBank

Standardization

RF – MDLPublicKeys RF – ECFP_4

In collaboration with Andreas Bender, Cambridge Univ.

Random Selection

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Julio E. Peironcely June 9th 2011 ICCS 2011

Metabolite-likeness, external validation

In collaboration with Andreas Bender, Cambridge Univ.

Experimental Data

Processing Data Trees

Find Similar Trees

Generate Structures

Filter Structures

Julio E. Peironcely June 9th 2011 ICCS 2011

From information to knowledge

www.MetiTree.nl ICCS 2011

Group Database Upload and organize your own trees Visualize trees Find similar trees (to be added: Struct. Gen, Met-likeness)

Julio E. Peironcely June 9th 2011 ICCS 2011

Conclusions Chemoinformatics plays a crucial

role in the metabolite identification pipeline

Now it is the time to challenge this pipeline with real cases

Expert is still needed

Julio E. Peironcely June 9th 2011 ICCS 2011

Acknowledgements TNO Quality of Life Leon Coulier Albert Tas

Evry University Jean-Loup Faulon Davide Fichera

HMP University of Alberta David Wishart Ying (Edison) Dong

Wageningen University/PRI Egon Willighangen Justin van der Hooft Ric de Vos Jacques Vervoort

Leiden University Miguel Rojas-Cherto Piotr Kasper Michael van Vliet Theo Reijmers Rob Vreeken Ronnie van Doorn Thomas Hankemeier

University of Cambridge Andreas Bender

Julio E. Peironcely June 9th 2011 ICCS 2011