metabolomics - agilent ms connect you begin introduction 6 of performing metabolomics data analysis...

168
Metabolomics Workflow Discovery Workflow Guide

Upload: vumien

Post on 13-May-2018

245 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow

Discovery Workflow Guide

Page 2: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

2

Notices© Agilent Technologies, Inc. 2011

No part of this manual may be reproduced in any form or by any means (including elec-tronic storage and retrieval or translation into a foreign language) without prior agreement and written consent from Agilent Technolo-gies, Inc. as governed by United States and international copyright laws.

Manual Part Number

5990-7067EN

Edition

Revision A, August 2011

Printed in USA

Agilent Technologies, Inc. 5301 Stevens Creek Blvd. Santa Clara, CA 95051

Acknowledgements

Microsoft, Vista, and Windows are U.S. regis-tered trademarks of Microsoft Corporation.

Warranty

The material contained in this document is provided “as is,” and is subject to being changed, without notice, in future editions. Further, to the maximum extent permitted by applicable law, Agilent disclaims all warran-ties, either express or implied, with regard to this manual and any information contained herein, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Agilent shall not be liable for errors or for incidental or con-sequential damages in connection with the furnishing, use, or performance of this docu-ment or of any information contained herein. Should Agilent and the user have a separate written agreement with warranty terms cov-ering the material in this document that con-flict with these terms, the warranty terms in the separate agreement shall control.

Technology Licenses

The hardware and/or software described in this document are furnished under a license and may be used or copied only in accor-dance with the terms of such license.

Restricted Rights

If software is for use in the performance of a U.S. Government prime contract or subcon-tract, Software is delivered and licensed as “Commercial computer software” as defined in DFAR 252.227-7014 (June 1995), or as a “commercial item” as defined in FAR 2.101(a) or as “Restricted computer software” as defined in FAR 52.227-19 (June 1987) or any equivalent agency regulation or contract clause. Use, duplication or disclosure of Soft-ware is subject to Agilent Technologies’ stan-dard commercial license terms, and non-DOD Departments and Agencies of the U.S. Gov-ernment will receive no greater than Restricted Rights as defined in FAR 52.227-19(c)(1-2) (June 1987). U.S. Government users will receive no greater than Limited Rights as defined in FAR 52.227-14 (June 1987) or DFAR 252.227-7015 (b)(2) (November 1995), as applicable in any technical data.

Safety Notices

CAUTION

A CAUTION notice denotes a hazard. It calls attention to an operating procedure, practice, or the like that, if not correctly performed or adhered to, could result in damage to the product or loss of important data. Do not proceed beyond a CAUTION notice until the indicated conditions are fully understood and met.

WARNING

A WARNING notice denotes a hazard. It calls attention to an operating procedure, practice, or the like that, if not correctly performed or adhered to, could result in personal injury or death. Do not proceed beyond a WARNING notice until the indicated conditions are fully understood and met.

Page 3: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

3

Contents

2 Before You Begin 4Introduction 5Overview of the workflow 7Required items 8Compliance 11

3 Metabolomics Overview 12What is metabolomics? 13Capabilities of the metabolomics workflow 17Definition of an experiment 19System suitability 27Improving data quality 31

4 Metabolomics Workflow 35Introduction 36Features of example data sets 37Finding with MassHunter Qualitative Analysis 38Feature selection for recursion - experiment setup 51Feature selection for recursion - guided workflow 60Recursive finding with MassHunter Qualitative Analysis 75Analysis with Mass Profiler Professional - experiment setup 84Analysis with Mass Profiler Professional - guided workflow 86Analysis with Mass Profiler Professional - advanced workflow 87Analysis with Mass Profiler Professional - analysis 93Analysis with Mass Profiler Professional - additional features 124

5 Process Summary 151What is metabolomics? 152

6 Reference 156Definitions 157References 166

Page 4: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin

Make sure you read and understand the information in this chapter and have the necessary computer equipment, software, experiment design, and data before you start the analysis.

Introduction 5Overview of the workflow 7Required items 8Compliance 11

Page 5: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin Introduction

Introduction Metabolomics is an emerging field of 'omics' research that is concerned with the characterization and identification of metabolites, the end products of cellular metabolism. Metabolomics research leads to complex data sets involving hundreds to thousands of metabolites. Comprehensive analysis of metabolomics data requires an analytical approach and data analysis strategy that are often unique and require specialized data analysis software that enables cheminformatics analysis, bioinfor-matics, and statistics. Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional together enable metabolomics data analysis.

The first step in the metabolomics workflow, before performing a statistical analysis, is to “find” the signals or peaks in the data that are related. These signals are called features. Feature finding is accomplished using Agilent MassHunter Qualitative Analysis. Agilent Mass Profiler Professional is used to identify the most significant features and perform statistical analyses and interpretation.

MassHunter Qualitative Analysis software allows you to automatically find and extract all spectral and chromatographic information from a sample, even when the components are not fully resolved. Powerful data navigation capabilities per-mit you to browse through compound-specific information in a single sample and compare chromatograms and spectra among multiple samples. The software also includes a customizable user interface and the capability to save, export or copy results into other applications, such as Mass Profiler Professional.

Mass Profiler Professional is a powerful data reduction, analysis, and visualiza-tion tool that allows you to identify statistically significant answers to simple questions presented to complex data sets; data sets such as the chemical signa-tures from biological matrices encountered in metabolomics.

Metabolomic studies involve the process of identification and quantification of the endogenous components that form a chemical fingerprint of an organism, or situa-tion under study, and may involve the process of identifying correlations related to changes in the fingerprint as affected by external parameters (metabonomics). Mass Profiler Professional may be used in the study of metabolomics and metabonomics for small molecule studies, proteomics for protein biomarker studies, and general differential analysis. Regardless of the specific study and molecular class, the pro-cess is referred to as “metabolomics” throughout this workflow.

To increase your confidence in obtaining reliable and statistically significant results, the metabolomics analysis must include a carefully thought out experimental design that includes the collection of replicate samples. Replicate data collection and proper experimental definition establish the quality and significance of the Mass Profiler Professional analysis.

More information The metabolomics discovery workflow is part of the collection of Agilent manuals, help, application notes, and training videos. The current collection of manuals and help are valuable to users who understand the metabolomics workflow and who may require familiarization with the Agilent software tools. The collection of training videos provide a step-by-step guide to using the software tools to reduce example CG/MS and LC/MS data but require a significant time investment and ability to extrapolate the example processes. This workflow provides a step-by-step overview

5

Page 6: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin Introduction

of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional.

The following selection of publications provides materials related to metabolomics and Agilent Mass Profiler Professional:

• Application: An LC/MS Metabolomics Discovery Workflow for Malaria-Infected Red Blood Cells Using Mass Profiler Professional Software and LC-Triple Quadrupole MRM Confirmation (Agilent publication 5990-6790EN, November 19, 2010)

• Brochure: Emerging Insights: Metabolomics at Agilent (Agilent publication 5990-6048EN, September 10, 2010)

• Brochure: Integrated Biology from Agilent: The Future is Emerging (Agilent publication 5990-6047EN, September 1, 2010)

• Primer: Proteomics: Biomarker Discovery and Validation (Agilent publication 5990-5357EN, February 11, 2010)

• Brochure: Agilent Mass Profiler Professional Software (Agilent publication 5990-4164EN, December 22, 2009)

• Primer: Metabolomics: Approaches Using Mass Spectrometry (Agilent publication 5990-4314EN, October 27, 2009)

• Application: Profiling Approach for Biomarker Discovery using an Agilent HPLC-Chip Coupled with an Accurate-Mass Q-TOF LC/MS (Agilent publication 5990-4404EN, October 20, 2009)

A complete list of references may be found in “References” on page 166.

NOTE

This manual gives links to most references. If you have an electronic copy of this manual, you can easily download the documents from the Agilent literature library. Look for and click the blue hypertext; for example, you can click the “Agilent literature library” link in the previous sentence.

If you have a printed copy, go to the Agilent literature library at www.agilent.com/chem/library and type the publication num-ber in the Keywords or Part Number box. Then click Search. (Note: If you type the publication number into the Keywords box, you find the publication number and additional publica-tions that reference the publication number.)

“Definitions” on page 157 contains a complete list of refer-ences.

6

Page 7: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin Overview of the workflow

Overview of the workflow

The metabolomics workflow describes the basics of metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Profes-sional to formulate statistically significant answers to simple questions presented to complex data sets.

The five (5) specific goals of the metabolomics workflow are:1 Present general definitions of metabolomics and the metabolomics workflow

to highlight the analytical applicability of MassHunter Qualitative Analysis and Mass Profiler Professional to a wide variety of samples: such as metabolomics, proteomics, food safety, environmental, forensics, toxicology, petrochemical, and biofuels.

2 Introduce the concept of formulating a hypothesis, a question that proposes a possible correlation observed in the data, that differentiation identified within the data answers.

3 Relate the question to the sample collection and preparation so that a reason-able expectation is established for the outcome.

4 Present an effective step-by-step process for metabolomics discovery using MassHunter Qualitative Analysis and Mass Profiler Professional to serve as a guide for metabolomics discovery using any data set.

5 Present an effective step-by-step process for employing targeted analyses using quantitative data imported into Mass Profiler Professional.

This workflow describes how to use the following programs together: Agilent Mass-Hunter Qualitative Analysis software and Agilent Mass Profiler Professional soft-ware.

You can use this workflow as a road map for any analysis that requires the identifi-cation of statistically significant answers to simple questions presented to complex data sets.

Advantages of this workflow

This workflow automates many parts of metabolomics data analysis and provides power visual tools for statistically answering the question proposed to the experi-ment.

What you cannot do with the workflow

Statistically meaningful results require that the experiment is performed with a clear understanding of the variables and that the data is collected with sufficient repli-cates.

Safety notes This workflow does not involve the collection of data or operation of any instrument.

7

Page 8: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin Required items

Required items The Metabolomics Workflow performs best when using the hardware and software described in the “required” sections below. The required hardware and software is used to perform the data analysis tasks shown in Figure 1.

Figure 1 Agilent hardware and software used in performing metabolomics.

Required hardware • PC running Windows• Minimum: XP SP3 (32-bit) or Windows 7 (32-bit or 64-bit) with 4 GB of RAM• Recommended: Windows 7 (64-bit) with 8 GB or more of RAM

• At least 50 GB of free space on the C Partition of the hard drive• Data from an Agilent GC/MS, LC/MS, CE/MS and/or ICP-MS system or data that

may be imported from another instrument.

Required software • Agilent Mass Profiler Professional Software B.02.01 or later

Agilent Mass Profiler Professional software is a chemometrics software package designed to exploit the high information content of mass spectrometry data. Researchers can easily import, analyze and visualize GC/MS, LC/MS, CE/MS and ICP-MS data from large sample sets and complex MS data sets.

Mass Profiler Professional integrates smoothly with Agilent MassHunter Work-station and ChemStation software, and is ideal for any MS-based application where you need to determine relationships among sample group and variables, including metabolomics, proteomics, food safety, environmental, forensics and toxicology.

For metabolomics and proteomics studies, the optional Agilent Pathway Architect software helps you evaluate MS data in biological context.

8

Page 9: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin Required items

• Agilent MassHunter Qualitative Analysis software, Version B.03.01 or B.04.00 or later

The Agilent MassHunter Qualitative Analysis software allows you to automati-cally find and extract all spectral and chromatographic information from a sample, even when the components are not fully resolved. Powerful data navigation capa-bilities permit you to browse through compound-specific information in a single sample and compare chromatograms and spectra among multiple samples. The software also includes a customizable user interface and the capability to save, export or copy results into other applications.

• Agilent MassHunter Data Acquisition software, Version B.03.02 or B.04.00 or later

The DA Reprocessor program is a utility that is shipped with the Agilent Mass-Hunter Data Acquisition software. It is included on the Data Acquisition Utilities disk. See the Data Acquisition Installation Guide for information on installing this program. The version B.0X of the MassHunter Data Acquisition software must match the version of MassHunter Qualitative Analysis software (for example, B.03.02 MassHunter Data Acquisition must be used with B.03.01 MassHunter Qualitative Analysis and B.04.00 MassHunter Data Acquisition must be used with B.04.00 MassHunter Qualitative Analysis).

• Agilent MassHunter Quantitative Analysis software, Version B.03.02 or later

The MassHunter Quantitative Analysis software supports simple and efficient review of large multi-compound quantitation batches. A graphical “Batch-at-a-Glance” interface allows you to navigate results by compound or sample, or switch between the two approaches. A sophisticated quantitation engine allows you to set up over 20 different outlier criteria, and a parameter-less integrator facilitates reliable unsupervised quantitation. The ability to filter results and focus on outliers or questionable peak integrations significantly reduces the data review time for large multi-compound batches. A method task editor and “Curve-Fit Assistant” provide for simple method and multi-level calibration setup.

Optional software • Agilent ChemStation software

Agilent ChemStation handles a wide variety of separation techniques such as GC, LC, LC/MS, CE and CE/MS. It is a scalable data system ideally suited for applica-tions in all industries ranging from early product development to quality control. Extensive customization capabilities as well as configurable regulatory compli-ance provide the flexibility to support different workflows. Sophisticated level-5 control and monitoring of LAN-based instruments ensures fast and flexible data acquisition, which is complemented by advanced data analysis and reporting capabilities for highest productivity.

• AMDIS

AMDIS is an acronym for the automated mass spectral deconvolution and identi-fication system developed by NIST. (http://www.amdis.net) AMDIS helps analyze GC-MS data of complex mixtures, even data with strong background ions and

9

Page 10: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin Required items

coeluting peaks. AMDIS is not for use with data collected in SIM mode. AMDIS automatically extracts pure (background free) component mass spectra from highly complex GC-MS data files and uses these purified spectra when searching a mass spectral library.

• MassHunter ID Browser B.03.01 or later

ID Browser, which is built into Mass Profiler Professional and Qualitative Analy-sis, allows compound identification using:• LC/MS Personal Compound Database (METLIN, pesticides, forensics)• GC/MS libraries (NIST and Fiehn library)• Empirical Formula Calculation using Agilent's Molecular Formula Generator

(MFG) algorithm

Compounds may be quickly and easily identified within the Mass Profiler Profes-sional environment. ID Browser automatically annotates the entity list and puts the compound names onto any of the various visualization and pathway analysis tools.

• PCDL METLIN Personal Metabolite Database

METLIN is a metabolite database for metabolomics containing over 20,000 com-pounds; it also represents a data management system designed to assist in a broad array of metabolite research and metabolite identification by providing pub-lic access to its repository of current and comprehensive mass spectral metabo-lite data. (http://metlin.scripps.edu/)

• Agilent Fiehn Metabolomics GC/MS RTL Library

The Agilent Fiehn GC/MS Metabolomics RTL Library, developed in cooperation with Dr. Oliver Fiehn, contains searchable EI spectra and retention indexes for hundreds of common metabolites. The Fiehn library integrates with Agilent's other software tools for GC/MS metabolomics to deliver metabolite identities faster and expand knowledge of metabolomic samples.

10

Page 11: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Before You Begin Compliance

Compliance 21 CFR Part 11 is a result of the efforts of the US Food and Drug Administration (FDA) and members of the pharmaceutical industry to establish a uniform and enforceable standard by which the FDA considers electronic records equivalent to paper records and electronic signatures equivalent to traditional handwritten signa-tures. For more information, see

http://www.fda.gov/RegulatoryInformation/Guidances/ucm125067.htm

MassHunter Data Acquisition Compliance Software includes the following features which support 21 CFR Part 11 compliance:• Hash Signature for data files allow you to check the integrity of files during a

compliance audit• Roles that restrict actions to certain users• Method Audit Trail Viewer

MassHunter Quantitative Analysis Compliance Software includes the following fea-tures which support 21 CFR Part 11 compliance:• Security measures ensuring the integrity of acquired data, analysis, and report

results• Comprehensive audit-trail features for quantitative analysis, using a flexible and

configurable audit-trail map• Customizable user roles and groups that allow an administrator to individualize

user access to processing tasks

Before you begin creating methods and submitting studies, you may decide to install MassHunter Data Acquisition Compliance Software and MassHunter Quantitative Analysis Compliance Software.

The Quantitative Analysis Compliance program is installed separately from the Quantitative Analysis program. See Agilent MassHunter Quantitative Analysis Com-pliance Software Quick Start Guide (Agilent publication G3335-90099, Revision A, February 2011) for instructions on installing the Compliance program.

The Data Acquisition Compliance program is installed automatically with the Mass-Hunter Data Acquisition software. See Agilent MassHunter Data Acquisition Com-pliance Software Quick Start Guide (Agilent publication G3335-90098, Revision A, February 2011) for instructions on enabling and using the MassHunter Compliance Software.

Roles When Compliance is enabled, only certain users can perform certain actions. For example, the user that logs on to the system to submit a study needs to have certain Quantitative Analysis privileges to automatically build the quantitative analysis method.

11

Page 12: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview

Agilent MassHunter Qualitative Analysis and Mass Profiler Professional are powerful data reduction, analysis, and visualization tools that allow you to iden-tify statistically significant answers to questions from complex data sets such as the chemical signatures from biological matrices.

What is metabolomics? 13Capabilities of the metabolomics workflow 17Definition of an experiment 19System suitability 27Improving data quality 31

Page 13: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview What is metabolomics?

What is metabolomics? Metabolomics is the process of identifying and quantifying of all metabolites of an organism in a specified biological state. The metabolites of an organism represent a chemical “fingerprint” of the organism at a well defined state where the state is defined by a set of specific circumstances.

When one or more of the attributes of the state of the organism are manipulated, those attributes are referred to as independent variables. The biological response to the change in the attributes may manifest in a change in the metabolic profile. Each metabolite that undergoes a change in expressed concentration is referred to as a dependent variable. Metabolites that do not show any change with respect to the independent variable may be valuable as control or reference signals.

The metabolites in a sample may be individually referred to as a compound, feature, element, or entity during the various steps of the metabolomic data analysis. When hundreds to thousands of dependant variables (e.g., metabolites) are available, chemometric data analyses may be employed to reveal accurate and statistically meaningful correlations between the attributes (independent variables) and the met-abolic profile (dependant variables). Meaningful information learned from the metab-olite responses can subsequently be used for clinical diagnostics, for understanding the onset and progression of human diseases, and for treatment assessment. There-fore, metabolomic analyses are poised to answer questions related to causality and relationship as applied to chemically complex systems, such as organisms.

You can use the Agilent Metabolomics Workflow as a road map for any analysis that requires the identification of statistically significant answers to simple questions presented to complex data sets. The metabolomics workflow may be used to per-form the following analyses:

• Compare two or more biological groups• Find and identify potential biomarkers• Look for biomarkers of toxicology• Understand biological pathways• Discover new metabolites• Develop data mining and data processing procedures that produce character-

istic markers for a set of samples• Construct statistical models for sample classification.

Small molecule tuning (LC/MS)

Metabolomics involves the analysis of small molecules, molecules nominally with a molecular weight from 50 to 6000 amu. Since the best results for metabolomics studies involve the identification of the exact mass of the molecular ion, LC/MS instrument tunes should be adjusted to (1) improve the sensitivity for intact molecu-lar ions and (2) improve the overall sensitivity for small, low-mass, compounds. A typical automated instrument tune optimizes the instrument sensitivity across the entire mass range and may result in lower sensitivity for small molecules combined with higher fragmentation than desired for metabolomics.

13

Page 14: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview What is metabolomics?

Examples

Example 1 A typical Agilent metabolomics workflow is illustrated in Figure 2 starting with data acquisition through to analysis involving both un-targeted (discovery) LC/MS and targeted (confirmation) LC/MS/MS analyses. Molecular feature extraction (MFE) and Find by Formula (FbF) are two different algorithms used by MassHunter Qualita-tive Analysis for finding compounds. All results files generated by Agilent analytical platforms can be imported into Mass Profiler Professional for quality control, statis-tical analysis and visualization, and interpretation.

Figure 2 An Agilent metabolomics workflow from separation to pathway analysis typically involving either or both GC/MS and LC/MS analyses.

Example 2 A principal component analysis (PCA) of metabolite data from biological replicate samples shown in Figure 3 highlights variability in metabolite abundance profiles between Infected versus Control samples extracted at pH 7. A significant amount of the variability between the infected versus the control samples at pH 7 is contained in the z component. However, the z component of this PCA does not comprise the significant variability between the infected versus the control samples at pH 2 or 9.

14

Page 15: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview What is metabolomics?

Figure 3 Principal component analysis of metabolite data from replicate biologi-cal samples highlights variability in metabolite abundance profiles between Infected versus Control samples extracted at pH 7.

Example 3 PCA may be combined with common statistical treatments to find improved discrim-ination among the independent variables and find the effect of the independent vari-ables on the species under evaluation. Figure 4 shows how the effects of an experiment involving two attributes (Species and Treatment) are able to be statisti-cally and visually discerned when processed using PCA and ANOVA. PCA analysis only allowed differentiation between the species. When PCA was combined with ANOVA, differentiation was obtained not only for both species but also for the affected status regardless of the species. The statistical results generated by using Mass Profiler Professional can subsequently be used to generate a data model that automates future identification of affected versus unaffected samples from the tar-get species and thereby provide an indication for the application of the appropriate treatment.

15

Page 16: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview What is metabolomics?

Figure 4 Principle component analysis without prefiltering of the data and com-bined with 1-way analysis of variance (ANOVA).

Summary When you have hundreds, thousands, and more dependant variables (e.g., a meta-bolic profile), conventional target data analysis and correlation becomes difficult and resource prohibitive. Chemometric data analyses using Agilent Mass Profiler Profes-sional allows the analyst to obtain accurate and statistically meaningful information correlating changes within the metabolic profile to established independent vari-ables.

16

Page 17: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Capabilities of the metabolomics workflow

Capabilities of the metabolomics workflow

Mass spectrometry has been utilized in metabolomics research due to its wide dynamic range, reproducible quantitative analysis, and the ability to analyze very complex biological matrices. Due to the complex nature of these samples, separa-tion (gas chromatography, liquid chromatography or capillary electrophoresis) is often performed before mass analysis to facilitate the detection of as many metabo-lites as possible. The most compatible separation and analysis techniques for com-mon classes of compounds are shown in Figure 5.

Figure 5 Classes of compounds and the analytical techniques with which they are most compatible.

Discovery metabolomics using Mass Profiler Professional involves the comparison of metabolomes (the full metabolite complement of an organism) between control and test groups to find differences in their profiles. Usually, several steps are involved in discovery metabolomics analyses: profiling, identification, and interpreta-tion.

Profiling (also known as differential expression analysis) involves finding the inter-esting metabolites with statistically significant variations in abundance within a set of experimental and control samples. Profiling combines the targeted profiling of known metabolites with comprehensive feature extraction to find unexpected metabolites. A metabolite is represented by its molecular features, which are defined by the combination of retention time, mass or mass spectra and abundance.

The four steps involved in profiling are

1. Analysis: Analysis of samples by GC/MS or LC/MS is required to separate and detect all of metabolites. Because metabolomes are large and exhibit significant natural variation, even in normal organisms, real differences can only be seen by analyzing large numbers of samples containing large numbers of compounds. Analytical instruments that have low error rates and facilitate high throughput are necessary.

2. Feature Finding: Feature finding specifically identifies all of the metabolites by mass and retention time. An undetected metabolite is a lost opportunity; it is essential to find as many of the metabolites in a sample as possible. This goes beyond simple chromatographic peak finding. Even with the best separation, a peak can contain multiple components. You can find multiple components in the

17

Page 18: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Capabilities of the metabolomics workflow

same peak using the MassHunter Qualitative Analysis software as shown in Figure 6.

Figure 6 Deconvolution using MassHunter Qualitative Analysis finds metabolites that are chromatographically unresolved or poorly resolved. Deconvolution gener-ates an extracted ion chromatogram and a reconstructed, single component spec-trum for each metabolite.

3. Data Normalization: Normalization allows data collected over a period of time to be corrected for changes in retention time and/or response so that a single feature common to several samples is not treated as a unique feature being sepa-rately sought in each sample.

4. Statistical Analysis: Statistical data analysis is used to discover significant dif-ference between the sample sets.

Identification is the determination of the chemical structure of these metabolites after profiling. EI spectra from GC/MS are well suited to spectral library searching. LC/MS spectral data are evaluated using searches performed with a database of metabolite information to help narrow the list of possible candidates. Accurate mass data makes this database searching more effective by narrowing the mass window that needs to be searched and thus reducing the number of possible identities.

Interpretation, the last step in the workflow, makes connections between the metabolites discovered and the biological processes or conditions. Once the metab-olites are identified, it is necessary to understand their relation to biological path-ways in metabolism by interpreting the results of the experiment. Pathway analysis makes connections between the metabolite markers discovered and the biological processes or conditions being studied, and helps to elucidate the biological rele-vance of metabolomics data in a systems context. Doing pathway analysis typically requires integration of metabolomics data with genomics and proteomics data.

18

Page 19: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

Definition of an experiment

Agilent Mass Profiler Professional enables the process by which the vast data gen-erated in metabolomic studies is reduced to the significant information from which cause and effect correlations may be made. Following a systematic process medical professionals and research scientists may identify correlations in organism meta-bolic profiles as a result of disease or chemical exposure. Chemical exposure for most purposes may be classified as desired, such as in therapeutic treatment, or undesired, such as from a harmful environment. Whether the organism is exposed to disease or chemical influence, either case causes a change in the organism's state and subsequent metabolic profile. Analysis of the metabolic profile and corre-lation of the changes in the metabolic profile to the organism's current state as a result of a probable cause is made possible using both Agilent Mass Hunter Qualita-tive Analysis and Agilent Mass Profiler Professional.

For a living organism a complete metabolic understanding of the organism leads to the identification and quantitation of all of the metabolites of the organism's cellular makeup in a given state at a given point in time. Commensurately, the complete met-abolic understanding includes an understanding of the metabolic response, the change in the metabolic composition from the normal function, in altered function(s) as a result of specific disease or external influences.

Metabolic pathways are complex and interconnected (see Figure 7) making exact correlation of any change in a single metabolite, due to the organism response to disease or other influence, nearly impossible. However, due to the nature of the complex and interconnected pathways, study of the change in the metabolic profile does allow correlation to the organism's response to disease and other influences.

Figure 7 Metabolic pathways are complex and interconnected reducing the effectivity of correlating an organism's response using single metabolites compared to correlation of the organism's response using the metabolic profile.

Agilent's suite of tools correlate changes in the metabolic profile to empirical sam-ples taken across a set of conditions well defined by the medical and research per-sonnel. Replicate research and diagnostic samples following well defined processes

19

Page 20: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

involving controls, treatments, and exposures (independent variables) means that the acquired metabolic profile data has a significant potential to provide answers to correlations between the independent and dependant variables. Agilent's tools facil-itate the data extraction, data processing, and statistical analysis that lead to answers regarding critical questions regarding cause and effect correlations pre-sented to complex data sets.

The hypothesis - formulating the question

The first and most important step in the metabolomics workflow is to formulate the question (hypothesis) of correlation that is answered by the analysis. This question is a statement that proposes a possible correlation, for example a cause and effect, between a set of defined attributes (independent variables) and the resulting meta-bolic profile (dependent variables). The process described in this metabolomics workflow is used to prove or disprove the hypothesis.

Key elements in the sample specimens

Formulation of the hypothesis requires several key elements to be present in the samples that are analyzed. The presence of these elements is a reasonable indicator that data acquired from the samples is suitable for metabolomic analysis to prove or disprove a hypothesis. The samples are taken from a specimen, an individual organ-ism (e.g., a person, animal, plant, or other organism) of a class or group that is used in representation of a whole class or group. For clarification on the terms used throughout the workflow, see “Definitions” on page 157.

Many chemical constituents: Samples taken from a specimen should contain many chemical constituents (metabolites) that are naturally expressed by the specimen or are a product of the specimen during its interaction with controllable externalities.

Complex relationship among the chemical constituents: The relative composition of the chemical constituents within each sample is the result of complex relation-ships such as the biological functions within the specimen.

Subject specimens: The organisms selected for evaluation represent a random set of specimens selected from a larger population. The specimens in-turn are subject to either (a) specific and known perturbations in their externalities (independent variables) and are thus known as subject specimens, or are subject to (b) no pertur-bation by which the specimens are known as control specimens.

Control specimens: A sub-group existing within the sample specimen population that represents the “normal” attributes of the specimens under natural, not per-turbed, externalities. The inclusion of normal sample specimens reduces the occur-rence of false positive and false negative correlations and provides a point of reference with respect to assign a polarity to any observed effect.

Independent variables: The specific and known perturbations that may be made to the subject, or externalities that affect the subject, and which are expected to induce a measurable change in the subject. The relative concentrations of the chem-ical constituents of the subject may be influenced by attributes, or parameters, that are independently controlled through either, (a) informed selection of the subject

20

Page 21: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

(controlled sampling), or (b) application of a set of controlled externalities or attri-butes applied to the subject (experiments).

Dependent variables: The measurable response of the subject to the specific and known perturbations made through variation of the independent variables. The sub-ject response is manifest in the relative concentration of the chemical constituents (metabolic profile) with respect to each varied attribute. The data is collected using hyphenated mass spectrometry techniques. If the response of the metabolic profile to the independent variables is known and finite, then traditional quantitation tech-niques may be applicable (for example, GC/MS may be used to study how the abun-dance of marker metabolites change when subject organisms are treated versus not treated for a condition). Otherwise, if the response is unknown or complex, and spans a large number of chemical constituents, the data is suitable for analysis using the metabolomic workflow.

Hypothetical example - Introduction

A hypothetical example involving grape wine is described to help illustrate applica-tion of the key elements to the formulation of a hypothesis.

Grape wines are known to have a very complex chemical makeup. Proper character-ization of a wine and its geographical origin is important to quality control and reve-nue taxation across political boundaries where there may be financial gain through mislabeling the country of origin to realize increased profit. Thus, a simple question related to revenue taxation may be “Can grape wine be correlated to the country of origin?” This question leaves out any concern with regard to the wine variety. The question is directed to find a correlation simply between wine and country of origin that is independent of wine variety for the single purposes of political taxation. The presence of a complex relationship among many chemical constituents, derived from grapes and grape fermentation, and the ability to apply specific and known per-turbations to the sample specimens through controlled sampling leads to a question suitable for answering using the metabolomics workflow. A more precise question stated as a hypothesis is “Does the chemical makeup of grape wine provide a signa-ture (metabolic profile) that correlates to the country of origin?”

The potential independent variables described in this hypothetical example are the county of origin of the grape wine production and wine variety. The dependent vari-ables are manifest in the several thousand chemical constituents that are produced by the wine making process.

Traditional correlative analysis of the question would require the identification and quantification of thousands of chemical constituents and studying how the expres-sion of each constituent is affected by the change in geographical location, while ruling out false effects that may be manifest as a result of the various grape species and wine varieties that are also involved in the study. Without going into any calcu-lations, such a traditional correlative analysis of this hypothetical example should be readily apparent as a costly undertaking. Metabolomics renders such unwieldy anal-yses manageable.

Identification of the independent variables

The object of metabolomics is to reduce the dimensions of the data (the relation-ships among the independent and dependant variables) while retaining those dimen-

21

Page 22: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

sions that contribute most to the characteristic variability of the data (retaining the dimensionality of the data that best shows correlation between the samples and the independent variables). One way to identify the independent variables is to evaluate the hypothesis and various restatements of the hypothesis. The dependent variable is always the same, the myriad of chemical constituents expressed by the sample specimen that may or may not change in relation to variations selected as indepen-dent variables.

A hypothesis may be derived through the proposal of a number of more or less spe-cific questions that are important to the research. By evaluating these questions and evaluating their relationship with the available analytical approach to solve the prob-lem, you may identify the key variables, the independent variables and the depen-dent variables.

Hypothetical example - Variables

Returning to the grape wine hypothetical example, the question presented refers to two key variables that may affect the chemical makeup of the wine: (1) country of origin of the grape and wine production and (2) wine variety. Variations in either of these variables are expected to produce a change in the chemical makeup of the wine. The question however is, regardless of the wine variety, can the country of ori-gin be determined. While wine variety is not necessarily germane to the question, if this information is known and entered with the sample data, it may provide a benefi-cial statistical parameter to the metabolomic analysis and may even provide for the future analysis of another hypothesis with minimal additional effort.

Since the correlation of the independent and dependant variables for the hypotheti-cal wine example involves the analysis of thousands of chemicals and potentially hundreds of samples, the question “Can grape wine be correlated to the country of origin?” is an ideal question to address using the metabolomics workflow.

Natural variability Before any statistical analysis is begun, it is important to understand how any one sample from a specimen represents the population as a whole and how increasing the sample size improves the accuracy of the sample set in describing characteris-tics of the population.

Under identical conditions, all life systems produce a range of results. Specimens taken from the population may show one of the following:

• (1) characteristics reflected by the mean of the population (i.e., characteristics shown by the majority of the population)

• (2) characteristics greatly different from those shown by the majority of the population

• (3) characteristics anywhere in between deviating in the positive or negative from the population mean

In many biological and biochemical systems a natural variation to a characteristic exists that is found to show a probability of variation referred to as a normal distribu-tion. Figure 8 on page 23, for instance, shows a natural variation of a population to a characteristic such that 68% of the sampled population would be shown to have the mean characteristic plus or minus one standard deviation. This natural variation of the population response to identical conditions is referred to as natural variability.

22

Page 23: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

Thus, any single sample specimen taken from a population is not guaranteed to reflect the mean characteristics of the population.

Figure 8 Natural variability shown as a Gaussian (normal) distribution. Depend-ing on the predefined requirement for significance, if the mean of a sample set is beyond ±2s from the natural variation there may be a significant effect. Similarly, if a particular observation routinely falls beyond ±2s from the natural variability of the data the change producing the effect may be considered significant.

Natural variability is found to occur from inherent randomness and unpredictability in the natural world. Natural variability is found in all life and natural sciences and in all forms of engineering. For example, a population of plants grown under identical conditions of illumination, precipitation and nutrient availability shows a range in growth mass. This range of variable growth mass may be expressed as a mean whereby 95% of the population is expected to show a natural range of variability within two standard deviations of the mean (see Figure 8). In other words, for a set of fixed attributes (independent variables), a representative set of samples taken from the population of plants shows a natural variability in the dependant variable, such as growth mass. When an experiment is undertaken where plants from the same population are subject to variations in the fixed attributes, they show both a change in growth mass response and a natural variability in the change in growth mass. Thus if the entire population is sampled, we see two adjacent normal distribu-tions with means reflecting the plant growth mass under the two conditions (see Figure 9).

Figure 9 Natural variability of populations subject to two different experimental conditions where the means of the data sets fall outside of either mean ±3s.

23

Page 24: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

Likewise, an animal population subject to any controlled exposure shows a naturally varying effect of that exposure as expressed through the chemical makeup of serum samples taken from the population. Such unpredictability in the measurable variabil-ity of any biochemical expression must not be mistakenly correlated to deliberate variations of an independent variable.

The natural variability of the data representing a population during the experiment must be understood in order to confidently express any experimental correlation. The investment of time and resources in performing any statistical analysis requires that the natural variability of the subjects be known or reasonably estimated so that the results of the analysis may be conclusively shown to be either within the natural variability (no correlation) or outside of the natural variability and therefore provide for a degree of correlation to the independent variable(s). Experimental data collec-tion that does not incorporate consideration of the natural variability of the data does not yield meaningful results. Thus, crucial to the metabolomics workflow, as with all statistical data treatments, is an understanding and well planned collection of the data; without that, the results follow the adage “garbage in, garbage out.”

Replicate data Consideration of the natural variability of data can only be achieved through repli-cate sampling and measurement of the population. You do not have a guarantee that a single sample specimen from a population represents the mean of the population. Any single sample from a population with a natural variability shown in Figure 8 on page 23 likely lies anywhere within ±4s (s = standard deviation) of the mean of the true population, but in fact that single sample may on a rare occasion fall even fur-ther from the population mean. However, if ten (10) samples are taken from the pop-ulation, the mean of these samples produces a statistically more accurate approximation of the true mean of the population. The accuracy of the approxima-tion of the true population mean proportionally improves with more samples. The true value of the population is achieved only if the entire population is sampled. However, sampling the entire population is not typically feasible due to constraints imposed by time, resources, and finances. On the other hand, the fewer samples that are evaluated the greater the increase that false negative and false positive cor-relations are concluded. Figure 10 on page 25 shows that if too few samples are evaluated and if these samples just happen to be samples lying far from the mean due to natural variability, an incorrect conclusion may be drawn that the change in the independent variable produced no significant change in the population response. The estimate of the standard deviation of the sample mean estimate of a population mean (standard error) is equal to the standard deviation of the samples divided by the square root of the number of samples (Equation 1).

where SE = standard error of the population; s = standard deviation; and N = num-ber of samples.

Thus, as the sample size increases, the likelihood that the data approximates the true response of the population increases. The standard deviation of the sample may

Equation 1 SE σ N( )⁄=

24

Page 25: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

become smaller, and the likelihood of making a correct correlation between cause and effect is improved (Equation 2).

where s = standard deviation; N = number of samples; xi = the value of an individual

sample; and = mean (or average) of all N samples.

Figure 10 Replicate data are necessary to distinguish whether the represented populations actually show significant differences. The three data points in the region of overlap of the natural variability may, if too few replicates are selected, lead to a result suggesting a less significant difference in the populations.

Successful application of metabolomics depends on the availability of sufficient rep-licate samples. Coupled with an understanding of the systems under study and a well planned collection of the samples and concomitant data, the statistical data treatment of the replicate samples is the backbone of the metabolomics workflow. A sufficient set of replicate data, ten (10) or more replicates, may provide a significant answer to the hypothesis and prevent a total loss of time and resources invested in performing the described statistical analyses.

Hypothetical example - sampling

Using what has been discussed in regard to natural variability and using replicate data, an experiment design to answer the hypothesis may be proposed. The experi-ment design should not only outline the complete experiment but should also pro-pose an orderly sequence for sampling so that the hypothesis may be evaluated several times, but at least twice, during the course of the metabolomics workflow. By evaluating a set of replicate samples covering the most varied of the independent variables, the experimentalist gains the following:

• Familiarization with the metabolomics workflow• Familiarization with the Agilent tools• Partial answers to the hypothesis

Equation 2 σ 1N---- xi X–( )

i 1=

N

=

X

25

Page 26: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Definition of an experiment

The experience from early evaluations provides feedback to the experimentalist and allows optimal use of time, resources, and funds. By collecting and processing initial data that spans the range of independent variables, timely feedback regarding the quality and direction of the results is gained. If either the quality or the results are not meeting expectations, the remainder of the experiment and/or sampling may be adjusted.

Summary The metabolomics workflow involves the formulation of a hypothesis that may be answered by analyzing the potential correlation of independent variables on the resulting expression of large number of potentially interrelated chemical constitu-ents, dependent variables, through the analysis of replicate samples whose results are significant beyond natural variability. Metabolomics answers questions related to causality and relationships applied to chemically complex systems, such as organisms, where the application of traditional quantitation techniques are resource prohibitive to employ due the vast amount of data.

The metabolomics workflow involves: 1. Formulation of a hypothesis that may be answered by analyzing the data for

potential correlations among the variables. 2. Independent variables, attributes that are deliberately controlled in an experi-

ment. 3. Dependent variables, where the effect of the independent variable on the

organism is only understood through the analysis of the expression of large number of potentially interrelated chemical constituents.

4. Replicate samples so that the results are statistically significant beyond the natural variability of the samples.

26

Page 27: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview System suitability

System suitability Data analysis on a large number of samples is required to form an answer to the hypothesis. Data acquisition from a large set of samples may in turn require several days, months, or even longer depending on the experiment definition, sampling, and instrument availability. In order for the data to be meaningfully, statistically pro-cessed, the spectral data acquired over the time scale of the experiment must have a consistent methodology and means to adjust for experimental variability - referred to as system suitability. The data must be acquired using a suitable system that allows for data acquired over a wide range of time and with slight variations to be correlated in such a fashion that the chemical constituents within any data set may be aligned among all of the data sets. Regular instrument tuning and maintenance can help reduce system drift.

Typical GC/MS and LC/MS provide three dimensional data: chromatographic reten-tion time, ion mass to charge ratio, and ion signal intensity (Figure 11). Individual data collected at various periods of time over days, weeks, and months may show variations in any or all three of these dimensions. Such variations are referred to as system drift and may occur even when the same method and instrument is used for all of the data collection.

Figure 11 GC/MS and LC/MS data provide three dimensions of information: retention time, m/z, and signal intensity.

27

Page 28: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview System suitability

Regular instrument tuning and maintenance can reduce system drift. Additionally, the Agilent metabolomics workflow provides a means to minimize and remove instrumental variations that may affect each of the data dimensions (retention time, m/z, and signal intensity) through quality processes including retention time align-ment, intensity normalization, mass variation, and baselining. The quality processes of retention time alignment, intensity normalization and baselining are performed within Mass Profiler Professional. The quality process of mass variation is per-formed in MassHunter Qualitative Analysis.

Data quality Quality data that is the result of a well planned experimental design has stable dimensionality and internal markers, such as internal standards, that allow the data to be compared and correlated to data collected over different periods of time, col-lected by different operators, and collected from different instruments. The pro-cesses of retention time alignment, mass variation, intensity normalization, and standards provide both a quality check to evaluate system drift and a means to adjust for instrumental variations to assure meaningful results.

Retention time alignment: Retention time alignment is used to adjust the chromatographic retention time of the eluting components based on the elution of specific components that are (1) naturally present in each sample or (2) deliberately added to the sample through spiking the sample with a known compound or set of compounds that do not inter-fere with the sample. Alignment of retention time removes temporal variations of the chromatographic separation improving the correlation of identical features among data sets collected using different separation columns or using identical sep-aration columns but with differing usage and conditioning (for example, a shorter separation column and a well-used column where the immobile phase has been deactivated result in shorter retention times).

Unidentified compounds from different samples are aligned if they are within a spec-ified tolerance retention time window and provided that the compounds meet crite-ria for mass spectral similarity. GC/MS data alignment is performed as unidentified compounds.

Identified compounds from different samples are aligned based on the similarities of the assigned attributes of compound name and ionization mode. LC/MS data align-ment may be performed as identified compounds, unidentified compounds, or both (referred to as combined in the Mass Profiler Professional software).

More information regarding the specific details of alignment may be found in section 2.1.5 Experiment Creation > Getting Started > Alignment of the Mass Profiler Profes-sional User Manual.

Intensity normalization: Normalization of feature intensities is used to adjust the intensity value from that based on the absolute value of the extracted ion chromatogram (EIC) signal mea-sured at the detector to a relative intensity based on the signal provided by either an (1) internal standard, (2) an external scalar, or (3) both an internal standard and an external scalar.

28

Page 29: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview System suitability

The highest quality data is achieved using standards in the data collection. Stan-dards provide a means to align retention time and evaluate and compensation for variations in the m/z and ion intensity of the analytical system. The standard may be (1) one or more known, identified constituents internal to the sample matrix, (2) fab-ricated from one or more chemical compounds specifically acquired and added (spiked) to each sample in a precise amount, or (3) may be a pooled sample from a representative biological matrix that is added to each sample. The use of pooled samples as a standard provides good value from existing materials and provides an ideal match to the rest of the analytical conditions.

An internal standard is selected from each sample data set as a means to minimize the system variability due to instrumentation or sample preparation. After perform-ing an internal standard normalization, the individual samples are normalized to a value that reflects the differences in the signal intensity of each compound across all the samples with respect to the internal standard and free of much system vari-ability.

More information regarding the specific details of normalization may be found in section 2.1.6 Experiment Creation > Getting Started > Normalization of the Mass Profiler Professional User Manual.

Mass variation The adjustment of the mass to charge (m/z) resolution from unit m/z in 1,000 m/z (one part per thousand, ppt) to 0.001 m/z in 1,000 m/z (one part per million, ppm) allows for the unique identification of compounds that may have nearly identical or identical chromatographic behavior. Since some methods most amenable to the experiment may not separate all compounds, or the time involved for a complete separation may not necessarily be practical, the added dimension provided by adjusting the mass variation for extracting ion chromatograms allows co-eluting compounds to be uniquely identified. The technique of using mass variation to iden-tify co-eluting compounds is referred to as deconvolution.

The chromatographic separation thus need not separate all of the compounds as viewed in the total ion chromatogram (TIC) but does need to separate all compounds that have a unique molecular formula.

More information regarding the specific details of mass variation may be found in the MassHunter Qualitative Analysis on-line Help under Finding Compounds.

Baselining Baselining is a means of changing the feature signal intensities (referred to as an abundance unit in Mass Profiler Professional) found in each sample data set from a value that is based on the original instrumental parameters to a new “abundance unit” that relates the feature strength with respect to the feature strength among all of the sample data. For example a feature that shows several orders of magnitude of variation (i.e., variation from 10,000 to 10,000,000 with a mean value of 150,000) across the sample data sets can be converted to variation that spans positive and negative values with respect to the mean. By adjusting the feature signal strength through baselining the feature strength may be statistically weighted in a manner more consistent to biological significance rather than the features amenability for ionization and detection by the instrumental method.

More information regarding the specific details of standards may be found in section 2.1.7 Experiment Creation > Getting Started > Baselining of the Mass Profiler Pro-

29

Page 30: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview System suitability

fessional User Manual and by reading statistical literature on baselining approaches. Mass Profiler Professional employs three options for baselining: Z-transform, median across samples, and none.

Z-transform Z-transform works best with large data sets that in general do not have missing fea-tures among the data sets. Quantitative data is a good candidate for processing with this baselining approach.

Median Median across samples acts to treat the data sets with more even weight by sub-stantially reducing the influence of features that have particularly low or very large signals intensities. In this approach the abundance of each feature is adjusted by subtracting the median of the feature across all of the data sets: this results in a value of zero (0) abundance representing a feature that has a strength equal to the median of the feature as present on all of the data sets. A negative or positive abun-dance is a feature present in abundance less than or greater than the median, respectively.

None None simply treats feature compounds with greater abundances as more significant than features with smaller abundances.

Summary System suitability involves collecting data that provides a means for evaluating sys-tem drift and adjusting for instrumental variations to assure quality results. Four approaches are employed together in order to produce the highest quality results:

1. Retention time alignment 2. Intensity normalization by using internal standards in the data collection

experiment 3. Chromatographic deconvolution using m/z variation 4. Baselining of the features across the data sets

30

Page 31: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Improving data quality

Improving data quality Obtaining analytical results from a limited number of samples to conclude a hypoth-esis about a much larger population requires a large number of samples and involves skills that resemble art and in addition to science. The best results of a metabolomics statistical study involve using more than one method of sample col-lection. Using more than one method of sample collection inherently leads to the collection of more samples and, as discussed previously, more samples has a posi-tive impact to the significance of the results. Four main methods of sample collec-tion are census, survey, experiment, and study.

Methodologies The purpose of each of the sampling methodologies (census, survey, experiment and observation) is to provide data that spans the desired range of the independent vari-ables (variable states) which may include collecting data at various points of time (time points) during an experiment. Regardless of which methodology is used to col-lect the data, replicate data samples are critical to the successful metabolomics analysis.

Census A census collects samples from every member of a population. In most experimental designs a census is not practical because of the cost and resource commitment required.

Survey A survey is an experiment that obtains samples from a subset of a population, in order to estimate the population attributes. In a survey, samples are taken from ran-dom members of the population. All, or a portion, of the survey samples may be used to provide an estimate of the natural variability of the population as the baseline to the experimental study. The same, or the remaining, survey samples may be subject to experiments and thus used to estimate the response of the population to the experimental conditions.

31

Page 32: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Improving data quality

Experiment All of the samples studied in metabolomics form part of an experiment designed to answer a hypothesis regarding cause and effect correlations. Experiment-based sampling may be in one of two forms:

(1) Experimental sampling: taking random samples from a population that meets experimentally defined criteria

(2) Experimental exposure: subjecting random samples taken from a total popula-tion to experimentally defined conditions

When the independent variables of an experiment exist as a natural part of the pop-ulation, such as the geographical region a grape is grown, the experiment involves experimental sampling. When the hypothesis is in relation to the effect shown by an organism to controlled conditions, such as a response to proposed treatments, then a random sample of the population is subject to independent variables specified by the experimental exposure. In either case, the hypothesis is related to causality of one or more independent variables to the metabolic profile which may relate to observable effects. The experiment is controlled in the sense that (1) sample sub-jects are selected from defined populations and (2) known treatments are applied to each group of samples.

In the experiment analysis, results are evaluated by comparing the dependent vari-able (the metabolic profile) against the known conditions of the independent vari-ables. A conclusion regarding the causality of the treatment on the organism's response is then made against the hypothesis.

Observation Data acquired in an attempt to understand causality where no ability exists to (1) control how subjects are sampled and/or (2) control the exposure each sample group receives is considered an observational study. The data obtained from an observational study may be useful as a control sample or, if sufficient minimal infor-mation about the sample exists, may be useful for quality control against an experi-mental correlation derived from the hypothesis.

Replicate data Using more than one method of sample collection inherently leads to the collection of more samples and has a positive impact on the significance of the results by including more samples that provide consideration of the natural variability. The accuracy of the approximation of the true population mean proportionally improves with more samples. The true value of the population is only achieved if the entire population is sampled (census sampling).

It is recommended to use a sample set of at least ten (10) replicates for each attri-bute within an independent variable or within each permutation of attributes when more than one independent variable exists. Having too few samples leads to an increase in false negative and false positive correlations.

Hypothetical example - improving data quality

In the grape wine example, data quality involves a survey based on experimental sampling to address the question of “Can grape wine be correlated to the country of origin?” The definition of the hypothesis involves two independent variables: (1) county of origin and (2) wine variety. In the Agilent LC/MS example data set

32

Page 33: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Improving data quality

related to identifying red wines, two independent variables exist with the indepen-dent variable “Variety” having three parameters and the independent variable “Country of Origin” having twelve parameters. This data set is shown in Figure 12 on page 33. An ideal metabolomics analysis of the data involves 360 samples (3 variet-ies multiplied by 12 countries of origin multiplied by 10 replicates per permutation). However, practical limitations resulted in this example data set being limited to 97 samples. Thirty (30) samples would be expected for a particular country of origin to be properly represented, ten samples of each wine variety from each country. In the data table shown in Figure 12 on page 33, France has the potential for the best sta-tistical representation. A further reflection on the data set shows that the sampling is more ideally suited for an initial analysis of the question “Can grape wine be cor-related to variety?” If the answer to this question appears favorable at this early stage of sampling it may then be used in support of continuing the sampling to answer both “Can grape wine be correlated to the country of origin?” and “Can grape wine be correlated to variety?” with the appropriate sample size of at least 360 samples.

Figure 12 Red wine samples and the resulting PCA analysis where the number of molecular features was reduced from 20,506 to 26 using metabolomics statistical treatments provided by Mass Profiler Professional.

33

Page 34: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Overview Improving data quality

Summary Improved data quality for metabolomics analysis comes from matching the sampling methodology to the experimental design so that replicate data is collected that spans the variable states of the parameters and/or covers an adequate number of time points. If the number of samples that are selected appropriate to the population under study is larger, then the statistical answer made to the hypothesis is better. An understanding of the methodologies used in sampling and using more than one method of sample collection have a positive impact to the significance of the results.

34

Page 35: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow

Metabolomics is the process of identifying and quantifying of all metab-olites of an organism in a specified biological state. Metabolomic analy-ses can answer questions related to causality and relationships.

Introduction 36Features of example data sets 37Finding with MassHunter Qualitative Analysis 38Feature selection for recursion - experiment setup 51Feature selection for recursion - guided workflow 60Recursive finding with MassHunter Qualitative Analysis 75Analysis with Mass Profiler Professional - experiment setup 84Analysis with Mass Profiler Professional - guided workflow 86Analysis with Mass Profiler Professional - advanced workflow 87Analysis with Mass Profiler Professional - analysis 93Analysis with Mass Profiler Professional - additional features 124

Page 36: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Introduction

Introduction Metabolomics is the comparative analysis of endogenous metabolites found in bio-logical samples and involves the comparison of two or more groups, finding of bio-markers, identification of biomarkers, understanding of biological pathways, and discovery of new metabolites.

The Agilent metabolomics workflow consists of four major components: 1. Find Compounds by Molecular Feature (MassHunter Qualitative Analysis).

Molecular features are extracted from the data based on their mass spectral and chromatographic characteristics. The process is referred to as Molecular Feature Extraction (MFE).

2. Feature Selection (Mass Profiler Professional). An initial differential expres-sion identifying the most statistically significant features from among all of the features previously found using Find Compounds by Molecular Feature.

3. Find Compounds by Formula (MassHunter Qualitative Analysis). Importing the most statistically significant features back into MassHunter Qualitative Analy-sis as targeted features results is an improvement in finding of the features from the samples. This repeated feature analysis is referred to as recursion. An improved reliability in finding leads to a concomitant improvement in the accuracy of the analysis.

4. Analysis (Mass Profiler Professional). Recursive finding of the most statisti-cally significant features are processed by Mass Profiler Professional into a final statistical analysis and interpretation. The results from the final interpre-tation may be used to prove or disprove the hypothesis.

36

Page 37: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Features of example data sets

Features of example data sets

The metabolomics workflow is illustrated using a pair of data sets that were acquired to determine if a metabolic correlation exists to variations in either one or two independent variables. The results of both data sets are presented.

The One-Variable experiment presents an analysis of a metabolomic response to changes in a single independent variable using four parameters. For a single variable with four parameters, an ideal experiment involves at least ten (10) replicates for each of the parameters leading to a sample size of at least forty (40) samples. In this example the minimum sampling conditions are met. The parameters are a single control that represents the organism without perturbation and three variations where the organism is subject to three conditions established by the experiment design. In summary, the One-Variable Experiment contains a single independent vari-able with four parameters and ten replicates for each parameter meeting the mini-mum sampling recommendation.

The Two-Variable experiment presents an analysis of a metabolomic response to changes in two independent variables, each with two parameters. For two-variables, each with two parameters, an ideal experiment involves at least ten (10) replicates for each permutation of the parameters (See Chapter 2; Improving Data Quality; Rep-licate Data) leading to a sample size that is calculated by multiplying 2 parameters multiplied by 2 parameters multiplied by 10 replicates for an ideal minimum sample size of forty (40) samples. In this example the minimum sampling conditions are not met; four replicates exist for each permutation for a total of sixteen (16) samples. The parameters of the first variable are represented by a control that represents the organism without perturbation and a variation where the organism has been subject to and is suffering a known perturbation. The parameters of the second variable are represented by a pair of metabolite extraction techniques where the first parameter represents the current state of the art extraction process and the second parameter represent the addition of a step that is designed to improve metabolite extraction. In summary, the two-variable experiment contains two independent variables with a total of four permutations and four replicates for each permutation. While the sam-pling falls short of the minimum sampling recommendation, the strong correlation of cause and effect in this experiment overcomes the sampling deficiency and provides support for further investment in the metabolomics question being studied.

37

Page 38: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Finding with MassHunter Qualitative Analysis

This is the first of four components of the metabolomics workflow:

The Agilent metabolomics workflow consists of four major components: 1. Find Compounds by Molecular Feature (MassHunter Qualitative Analysis).

Molecular features are extracted from the data based on their mass spectral and chromatographic characteristics. The process is referred to as Molecular Feature Extraction (MFE).

2. Feature Selection (Mass Profiler Professional). An initial differential expres-sion identifying the most statistically significant features from among all of the features previously found using Find Compounds by Molecular Feature.

3. Find Compounds by Formula (MassHunter Qualitative Analysis). Importing the most statistically significant features back into MassHunter Qualitative Analy-sis as targeted features results is an improvement in finding of the features from the samples. This repeated feature analysis is referred to as recursion. An improved reliability in finding leads to a concomitant improvement in the accuracy of the analysis.

4. Analysis (Mass Profiler Professional). Recursive finding of the most statisti-cally significant features are processed by Mass Profiler Professional into a final statistical analysis and interpretation. The results from the final interpre-tation may be used to prove or disprove the hypothesis.

Start Agilent MassHunter Qualitative Analysis

1. Start MassHunter Qualita-tive Analysis software.

MassHunter Qualitative Analysis is the software tool used to perform the function of finding molecular features in the raw and CEF data files. The molecular features are then imported into Mass Profiler Professional for analysis. This is an essential pre-requisite to Mass Profiler Professional.

Note: On-line help is available at anytime within MassHunter Qualitative Analysis by pressing the F1 key on the keyboard. When the F1 key is pressed, the information presented specifically relates to the software fields and options available in the active display.

a Double-click the Qualitative Analysis icon located on the desktop.

or (for Qualitative Analysis version B.04.00 or later)

Click Start > Programs > Agilent > MassHunter Workstation > Qualitative Analysis B.04.00

38

Page 39: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

or (for Qualitative Analysis version B.03.01)

Click Start > Programs > Agilent > MassHunter Workstation > Qualitative Analysis

b Select the data file or data files to open.

An individual data file may be selected by using a single click.

An inclusive range of data files may be selected by pressing the Shift key when a second data file is selected.

A number of data files may be selected and added to any previously selected data files by pressing the Ctrl key when a data file is selected.

c Click Open to start MassHunter Qualitative Analysis with the selected data files

or click Cancel to start MassHunter Qualitative Analysis without opening any data files. Data files may opened later by clicking File > Open Data File.

2. Enable advanced parameters.

Advanced parameters must be enabled in MassHunter Qualitative Analysis in order to show tabs labeled Advanced in the method editor and to enable compound importing for recursive finding of molecular features.

a Click the Configuration > User Interface Configuration command.

b Mark the Show advanced parameters check box under Other. See Figure 13.

If the files intended to be processed include GC/MS data, you mark the GC check box under the Separation types heading and mark the Unit Mass (Q, QQQ) check box under the Mass accuracy heading.

Figure 13 MassHunter Qualitative Analysis user interface configuration

c Click the OK button.

d Check to make sure that File > Import Compound is an available command. See Figure 14. This command is necessary to review CEF files before importing into Mass Profiler Professional.

39

Page 40: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Figure 14 MassHunter Qualitative Analysis import compounds

Enter the parameters for Find Compounds by Molecular Feature

1. Find compounds using Find Compounds by Molecular Feature.

Molecular feature extraction (MFE) involves chromatographic deconvolution as described in “Metabolomics Overview” on page 12 (see Figure 6 on page 18). MFE automatically finds related co-eluting ions, sum the related ion signals into single values, create compound spectra, and report results for each molecular feature.

Co-eluting ions: MFE finds co-eluting ions related to the same compound and create an extracted compound chromatogram (ECC) including isotopes (13C, 15N, 2H, 18O), adducts (most commonly H+, Na+, K+ for positive ions and H- for negative ions), and dimers such as (2M + H)+.

Compound spectra: After creating extracted compound chromatograms, MFE gener-ates individual compound spectra for each molecular feature.

Volume: The area of the ECC. The ECC is formed from the sum of the individual ion abundances within the compound spectrum at each retention time in the specified time window. The compound volume generated by MFE is used by Mass Profiler Professional to make quantitative comparisons.

Composite spectrum: The compound spectrum generated to represent the molecu-lar feature and used by Mass Profiler Professional for recursive analysis and ID Browser.

Results: Each molecular feature is uniquely identified by a retention time, neutral mass, volume, and composite spectrum. An example of the relationships between some of the molecular features and the TIC is shown in Figure 15 on page 41.

40

Page 41: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Figure 15 Relationship between the TIC, ECC, composite spectrum, and neutral mass

a In the Method Explorer window, click Find Compounds > Find Compounds by Molecular Feature.

• All of the parameters involved in molecular feature extraction are accessed in the tabs presented in the Method Editor section, Find Compounds by Molecu-lar Feature. After the parameters are entered, you run MFE by clicking the Find > Find Compounds by Molecular Feature command or by clicking the Run button in the Method Editor.

• See the Agilent MassHunter Workstation Software Qualitative Analysis Famil-iarization Guide (Agilent Publication G3336-90007, Revision B, February 2011) for more information.

Note: Use the Extraction, Ion Species, and Charge State tabs to enter parameters that control compound finding. Use the remaining tabs to enter parameters to filter the results and display the graphics. See Figure 16 on page 42.

Note: After the first time MFE is run, any subsequent change made only to the tabs that do not affect compound finding reprocess the data much more quickly because the find features algorithm is not repeated. The improved speed for reprocessing the molecular features allows you to review several combinations of parameters to find the results that best suit the experiment.

41

Page 42: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Figure 16 Overview of the Find Compounds by Molecular Feature tabs

Tabs that control compound finding.

b Edit the parameters on the Extraction tab.

The parameters in this tab allow you to specify features of the source data that allow the MFE algorithm to perform more efficiently. 1. Click the Extraction tab. 2. Select Small molecules (chromatographic) in the Target data type box for

working with metabolomic data.

Note: The data must be collected in profile mode for the Small molecules (infu-sion) target data type. For the other target data type selections, the data must be collected in centroid mode or both mode.

MFE starts the data reduction process by creating a copy of the data file using the centroid of all of the ions. If you collect data in profile mode, it saves processing time if you also collect the data with centroid mode turned on. For all other data collection methods, it is recommended to save the data with centroid data. 3. Click the Use peaks with height button and type a value for the counts that

represents a value at and above which actual ion signals are observed. A typi-cal value is 300 for counts if the background noise is approximately 100 counts.

Note: The target is to set the counts at three times the electronic noise, signal measured by the detector that is not due to actual ions. The electronic noise is found by viewing the background signal level at the higher m/z range (around 1,000 m/z) of a single mass spectrum in the data set. Do not use an averaged or background subtracted mass spectrum. If this value is set too low, MFE takes a very long time to run and find features that are very small. If this value is set too high, then actual features may not be found.

42

Page 43: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Marking the options under Input data range is not necessary. Using Restrict retention time to and Restrict m/z to limits the location where the MFE searches for features. It is recommended to allow MFE to find all of the features and then to use the filter parameters available in the later tabs to remove unwanted fea-tures.

c Edit the parameters on the Ion Species tab.

The parameters in this tab allow you to specify the ion adducts that the MFE algo-rithm should consider during the process of identifying molecular features. 1. Click the Ion Species tab. 2. Select +H, +Na, and +K for positive ions and -H for negative ions.

Note: Removal of possible ion adducts from the Allowed ion species increases the MFE efficiency. Thus, it is recommended to change solvent and sample deliv-ery bottles to Teflon ® in place of glass to reduce the background sodium levels. Glass bottles leach sodium into the liquid introduction system.

Note: MFE cannot resolve a molecular feature that does involve at least an addi-tion or loss of a proton. 3. If the Salt dominated positive ions (M+H may be weak or missing) check

box is marked, the MFE algorithm reduces the emphasis on identifying proto-nated ions (M+H) because they may be weak or missing. If this check box is cleared, the MFE algorithm places even weight across among the allowed ion species in calculating the molecular weight for each feature. For example, you might mark this box when detecting sugars that are sodium adducted.

4. Mark common Neutral losses if you believe your mass spectrometer system is very energetic and thereby potentially inducing known neutral losses from the molecular ion.

d Edit the parameters on the Charge State tab.

The parameters on this tab allow you to set limits on the allowable ion charge states and to control how isotopes are identified and assigned to groups associ-ated with each feature. 1. Click the Charge State tab. 2. Type default values into Peak spacing tolerance under the Isotope grouping

group box. Type 0.0025 for m/z and 7.0 for ppm. These values are the tol-erance the MFE algorithm uses to find isotope ions associated with each fea-ture.

3. Select Common organic molecules for the Isotope model under the Isotope grouping group box. For most metabolomics analysis, the Common organic molecules isotope model properly groups ions into the appropriate isotope clusters. Proper isotope clustering leads to an accurate assignment of charge state and mass for the molecular feature. You chose Unbiased if metal con-taining molecules are expected.

Note: Selection of Unbiased slows the MFE calculations considerably because all isotope models are considered. 4. Mark the Limit assigned charge states to a maximum of check box and type

1. You only type 2 if you have a very specific reason; otherwise, 1 is used for metabolomics. Increasing the value increases the chance of unwanted isotope grouping.

5. Clear the Treat ions with unassigned charge as singly-charged check box. When this check box is cleared, ions that cannot be assigned a charge state by the MFE algorithm are ignored.

43

Page 44: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Tabs that filter the results and display the graphics.

e Edit the parameters on the Compound Filters tab. 1. Click the Compound Filters tab. 2. Select Absolute height and type in a value of 5000 counts under the height

heading.

Note: In the final compound spectrum there must be at least one ion that is greater than or equal to the counts specified. The value typed is determined by empirically reviewing the data. Absolute height is different from the volume used to quantify the compound as a feature.

Note: It is not recommended to select Relative height or to mark Limit to the largest when performing metabolomics. 3. Clear the Restrict retention times to check box under the Compound location

group box. If you have a known void volume, solvent peak, or other region con-taining unwanted peaks in the data set, only then you mark this parameter and type in the time range.

4. Clear the Restrict charge states check box under the Charge states group box because the charge state was previously limited to 1 in the Charge state tab. If a larger number of charge states is allowed, then marking this parameter and entering a value filters the results.

f Edit the parameters on the Mass Filters tab.

The parameters on this tab allow you to remove specific ion noise from the data without regard for retention time. This filter feature can be unmarked and per-formed in Mass Profiler Professional more effectively due to the addition of reten-tion time. Mass Profiler Professional is the preferred place to perform mass filtering. 1. Click the Mass Filters tab. 2. Clear the Filter mass list check box unless a specific list of neutral masses is

known to be present in the data set that you wish to remove. 3. If you mark the Filter mass list check box, select Exclude these mass(es).

Note: It is not recommended to select Include only theses mass(es). 4. If Filter mass list is marked, click the button under the Source of masses head-

ing indicating the source of the masses for the exclude filter. Two example masses to exclude are 120.0434 and 921.0013.

g Edit the parameters on the Mass Defect tab.

The parameters on this tab allow you to supply a range of mass defect within which the mass may still be a metabolite. Since the range necessary for filtering by mass defect must be rather large it is not recommended to filter metabolomics results by mass defect. 1. Click the Mass Defect tab. 2. Clear the Filter results on mass defects check box. 3. If the Filter results on mass defects check box is marked, it is recommended

to select Variable and type in values that work with your data set. If you select Variable, then the natural mass defect increases with increasing mass.

h Edit the parameters on the Peak Filters (MS/MS) tab.

The parameters on this tab allow you to filter ions by height and quantity. 1. Click the Peak Filters (MS/MS) tab. 2. Clear the Absolute height check box under Height filters. If this parameter is

marked, do not type a value into counts that is less than 10. The value of

44

Page 45: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

counts is more likely around 100. The best value to use is determined empiri-cally.

3. Mark the Relative height check box and type 1 for the % of largest peak. The best analytical information is found using ions with an intensity at least within a factor of 100 of the base peak.

4. Clear the Limit (by height) to the largest check box under Maximum number of peaks.

i Edit the parameters on the Results tab.

The parameters on this tab allow you to customize the display of the results. It is recommended to not draw graphics when running MFE to improve the speed of the extraction. If more information about a feature is desired it may be selectively obtained later instead of being generated for all of the features at this time. 1. Click the Results tab. 2. Mark Delete previous compounds under the Previous results group box. 3. Click Highlight first compound under the New results group box. 4. Clear all check boxes under the Chromatograms and spectra group box. 5. Clear the Display only the largest check box under Display limits if you intend

to create a CEF file.

j Edit the parameters on the Advanced tab.

The parameters on this tab allow you to filter features by ion count and neutral mass. 1. Click the Advanced tab. 2. Click Include all under the Compound ion count threshold group box. Filtering

by Two or more ions is a very useful feature, but it can filter out small molecu-lar weight valid ions.

3. Click the Exclude button under the Compounds with indeterminate neutral mass group box, especially when you click Include all. When the Exclude but-ton is clicked, then the algorithm disregards features that MFE has not been able to assign a certain neutral mass.

k Save the method. Click the Method > Save As command and type a method name. If a method name was previously entered, then click Method > Save.

2. Enable the method to run using MassHunter DA Reprocessor.

a In the Method Explorer window click Worklist Automation > Worklist Actions.

b Remove all actions in the Actions to be run list.

c Double-click the Find Compounds by Molecular Feature action in the Available actions list. You can also select the action and click the down arrow to copy the Find Compounds by Molecular Feature action to the Actions to be run list.

d Select the Export to CEF action in the Available actions list.

e Click on the down arrow to copy the Export to CEF action to the Actions to be run list. The Export to CEF action must be listed after the Find Compounds by Molec-ular Feature action as shown in Figure 17 on page 46.

45

Page 46: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Figure 17 Assign Actions to Run from Worklist for use with DA Reprocessor

f Click the Save Method button in the Method Editor window or click Method > Save to update the saved method.

3. Check parameters using Find Compounds by Molecular Feature on a single sample

Find Compounds by Molecular Feature is run on the entire metabolomics sample set using MassHunter DA Reprocessor because of the large number of sample files and large number of features present in each sample. However, before the entire sample set is run in the MassHunter DA Reprocessor program, a single file can be pro-cessed within MassHunter Qualitative Analysis.

a Run Find Compounds by Molecular Feature by doing one of the following:• Click Actions > Find Compounds by Molecular Feature• Click the Find Compounds by Molecular Feature button in the Method Editor.

b Select a single data file process as an example from the List of opened data files.

c Click the Find button.

4. Display and review the Compound List

a Right-click anywhere in the Compound List window and click Add/Remove Col-umns. The (Enhanced) Add/Remove Columns dialog box is opened.

b Click the Clear All button.

c Click the Column Name column header twice to sort alphabetically by the Column Name.

d Mark the check boxes for at least the following Column Names:

Abund, Area, Base Peak, Cpd, File, Height, Ions, Mass, RT, Saturated, Show/Hide, Width, and Vol• Each of these columns is documented in the MassHunter Qualitative Analysis

Help in Reference > Columns > Compound List Table Columns under the Con-tents tab.

• It is normal at this time for the Area and Abund columns to be blank.• All of the compounds are exported to the CEF file.

e Click the OK button.

46

Page 47: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

f Click and drag the column names so that they are arranged in the order you like. A useful order is shown in Figure 18.

g Click the Height column heading to sort the compounds by ascending height (the abundance value of the base peak), so that the lower height values shown are at the top of the list.

h Click and drag to select the first few compounds (around ten compounds). The compounds are highlighted.

i Right-click the Compound List window and click Extract Complete Result Set.

The results are displayed in the Chromatogram Results and MS Spectrum Results windows. These windows are updated when you use the arrow keys to move up and down the Compound List. See Figure 19 on page 48.

• In the Chromatogram Results window, the extracted ion chromatograms (EIC) for each compound is compared to the total ion chromatogram for the ions contained in all of the molecular features.

• In the MS Spectrum Results window, the compound spectrum for each com-pound is compared to the data spectrum.

• A compound may be deleted and removed from the features available for exporting by highlighting the compound and then pressing the delete key.

j If a significant number of compounds are too weak to provide confidence as a molecular feature, adjust the parameters entered in the tabs that filter the results and display the graphics. Then, run Run Find Compounds by Molecular Feature and repeat step 4 - Display and Review the Compounds List. MFE runs very quickly if no changes are made to the tabs that control compound finding.

Figure 18 The Compound List window

47

Page 48: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Figure 19 Display of the Extract Complete Result Set from the compound list

5. Optional - Export the results for the single sample to a CEF file.

This step is optional. The CEF files for all of the samples are generated in step 6 - Find Compounds by Molecular Feature using MassHunter DA Reprocessor.

a Click File > Export > as CEF. The Export CEF Options dialog box is opened.

b Select the data files to be exported from the List of opened data files. It is recom-mended to create a new folder for the exported CEF files to aid documentation of the metabolomics workflow and to make it easier to distinguish any new CEF files from previous CEF files.

c Update the other parameters in the Export CEF Options dialog box.

d Click the OK button.

6. Find Compounds by Molecular Feature using MassHunter DA Reproces-sor.

a Click Start > Programs > Agilent > MassHunter Workstation > Acq Tools > DA Reprocessor.

Press F1 from within MassHunter DA Reprocessor to start on-line help. For exam-ple, click DA Reprocessor > Shortcut Menu for Worklist (from the top left cell) for instructions on creating a worklist containing multiple samples.

b Right-click the top left cell of the worklist and click Add Multiple Samples from the shortcut menu. See Figure 20 on page 49.

48

Page 49: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

Figure 20 Adding samples to the MassHunter DA Reprocessor worklist

c Select the folder and file names that are processed.

d Click the OK button.

e Click the Method name for the first sample (row 1) and select the directory and name of the method saved from MassHunter Qualitative Analysis (Figure 21).

Figure 21 Selecting the MassHunter Qualitative Analysis method

f Copy the method from the first sample to each of the samples in the worklist by right-clicking on the method in row 1 and clicking Fill > Column. (Figure 22).

Figure 22 Copying the data analysis method to each sample in the worklist

49

Page 50: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Finding with MassHunter Qualitative Analysis

g Run the worklist actions (including Find by Molecular Feature and Export to CEF). Click Start in the toolbar. The progress is indicated on the worklist as each sam-ple is completed.

The CEF files containing the molecular features from each of the samples are placed in the directory containing each of the .d data files. Each CEF file has the same root name as the sample data file. These are the files that are imported into Mass Profiler Professional for feature selection in the next step of the metabolomics workflow.

The results are automatically saved with the data file when you run the Mass-Hunter DA Reprocessor program.

7. Display and review the Compound List after running the MassHunter DA Reprocessor.

a Return to MassHunter Qualitative Analysis. If you closed the MassHunter Quali-tative Analysis program, do the following:• Click Start > Programs > Agilent > MassHunter Workstation > Qualitative

Analysis B.04.00.• Click Cancel when the Open Data File dialog box opens.

b Click File > Close All to close any open data files. Do not save any results.

c Click File > Open Data File to open the original sample data file including the chromatographic data and the results of Find by Molecular Feature. Mark the Load result data check box.

or

Click File > Import Compound to open the CEF file that contains the molecular feature results of Find by Molecular Feature.

Note: Due the large number of features in a typical metabolomics sample file, it is recommended to open only one file at a time to review the results. Close the open file and then open the next file.

d You can display and review the compound list as noted previously in “Display and review the Compound List” on page 46. The chromatographic results are only vis-ible if the original data file is opened.

50

Page 51: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

Feature selection for recursion - experiment setup

This is the second of four components of the metabolomics workflow: 1. Find Compounds by Molecular Feature (MassHunter Qualitative Analysis).

Molecular features are extracted from the data based on their mass spectral and chromatographic characteristics. The process is referred to as Molecular Feature Extraction (MFE).

2. Feature Selection (Mass Profiler Professional). An initial differential expres-sion identifying the most statistically significant features from among all of the features previously found using Find Compounds by Molecular Feature.

3. Find Compounds by Formula (MassHunter Qualitative Analysis). Importing the most statistically significant features back into MassHunter Qualitative Analy-sis as targeted features results is an improvement in finding of the features from the samples. This repeated feature analysis is referred to as recursion. An improved reliability in finding leads to a concomitant improvement in the accuracy of the analysis.

4. Analysis (Mass Profiler Professional). Recursive finding of the most statisti-cally significant features are processed by Mass Profiler Professional into a final statistical analysis and interpretation. The results from the final interpre-tation may be used to prove or disprove the hypothesis.

Mass Profiler Professional imports CEF files created by MassHunter Qualitative Analysis and performs statistical and graphical analyses. Mass Profiler Professional exploits the high information content of chromatography/mass spectrometry data and can easily import, analyze and visualize GC/MS, LC/MS, CE/MS and ICP-MS data from large sample sets and complex MS data sets.

Mass Profiler Professional provides a guided workflow for importing data and a choice for either guided or advanced workflows for statistical and graphical analy-ses. A Guided Workflow guides you through an initial set of filters and models for differential analysis. Since the Advanced Workflow does not guide you through the initial steps of the differential analysis, it is not recommended to skip the Guided Workflow. Regardless of your personal expertise, the Guided Workflow provides you with a quality control to your analysis that improves your results. The Guided Work-flow ends with the Advanced Workflow interface where you may proceed to cus-tomize the entire analysis.

Note: Help and detailed information regarding the various fields and statistical treat-ments are available at anytime by pressing the F1 key on the keyboard or referring to the Mass Profiler Professional User Manual.

Feature selection for recursion It is recommended to process the molecular features in Mass Profiler Professional through at least Filter Flags of the Guided Workflow - Find Differential Expression page (Step 3 of 8) (page 65). The Filter Flag step is used to require that a feature must be present in at least two samples which removes one-hit wonders and allow the recursive finding in MassHunter Qualitative Analysis to run efficiently.

Note: Importing several thousand features back into MassHunter Qualitative Analy-sis for targeted finding may cause MassHunter Qualitative Analysis to run out of memory.

51

Page 52: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

1. Start Mass Profiler Profes-sional.

a Double-click the Mass Profiler Professional icon ( ) located on the desktop.

or

Click Start > Programs > Agilent > MassProfilerPro > MassProfilerPro

Project setup A project is a container for a collection of experiments. A project can have multiple experiments on different sample types and organisms. Four steps are involved in the project setup:

Startup: Select creation of a new project.

Create New Project: Type descriptive information about the project.

Experiment Selection Dialog: Select either a new experiment or to open an exist-ing experiment as part of the project.

New Experiment: Type and select information that guides the experiment cre-ation.

2. Create a new Mass Profiler Professional project in the Startup dialog.

a Click the Create new project button.

b Click the OK button.

Figure 23 Welcome to Mass Profiler Professional startup page

3. Type descriptive information in the Create New Project dialog.

a Type a descriptive name for the project in Name.

b Type descriptive notes for the project in Notes.

c Click the OK button.

Figure 24 Create New Project dialog box

52

Page 53: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

4. Click whether this is a new experiment or whether to open an existing experiment in the Experiment Section Dialog.

a Click the Create new experiment button.

b Click the OK button.

If you click the Open existing experiment button, you are prompted for the exper-iment to add to the analysis.

Figure 25 Experiment Selection dialog box

5. Type and select information that guides the experiment creation in the New Experiment dialog.

a Type a descriptive name for the experiment in Experiment name. This entry may be different from the project name previously entered.

b Select Unidentified for the Experiment type. Unidentified is the proper selection when the compound features have only been identified by their neutral mass and retention time using MFE. The selection of experiment type determines how Mass Profiler Professional manages the data. Use Combined (Identified + Uniden-tified) when you are unsure if the data is identified in full or in part or when Mass-Hunter Qualitative Analysis has been used previously to identify some of the compound features.

c Select Homo sapiens for Organism or select Not Applicable if the organism is not available, not known, or not necessary.

d Select Guided Workflow for Workflow type.

e Type descriptive notes for the experiment in Experiment Notes.

f Click the OK button.

Figure 26 Experiment description page in the New Experiment wizard

Guided Import - Experiment Creation

An experiment contains samples, interpretations, and associated entity lists. Several experiments may be within a project. Up to ten steps are involved in the experiment

53

Page 54: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

creation. Importing CEF files created with MassHunter Qualitative Analysis involves steps 1, 2, 6, 7, 8, 9, and 10:

1. Select Data Source: Depending upon the type of experiment and supported file format, this displays the options of data source.

2. Select Data to Import: This allows you to select the samples. 3. Input Validation: Validates different types of samples. This step appears only if

different types of files are selected and prompts you to select same type of files to proceed. This step applies to ChemStation and AMDIS data.

4. Signal Chooser: Allows you to select signal options specific to MassHunter Quantitation experiments.

5. Sample Summary: This provides an option to select the samples. In Generic Experiments one file can have multiple samples.

6. Filtering: Filters the data features by abundance, mass range, number of ions per feature, and by charge state.

7. Alignment: This aligns the features across the samples based on retention time and mass to the input tolerances.

8. Sample Summary: Displays Mass versus RT plot, spreadsheet, and legend for the distribution of aligned and unaligned entities in the sample.

9. Normalization: Changes the sample feature intensity values (referred to as abundance units) from that based on the absolute value of the extracted ion chromatogram (EIC) signal measured at the detector to a relative intensity based on the signal provided by either an (1) internal standard, (2) an external scalar, or (3) both an internal standard and an external scalar.

10. Baselining Options: Changes the signal intensities of the features found in each sample data set from a value that is based on the original instrumental parameters to a new “abundance unit” that relates the feature strength with respect to the feature strength among all of the sample data. Baselining may be performed with or without normalization.

6. Select Data Source: Select the experiment data source in the MS Experiment Creation Wizard (Step 1 of 10).

a Click the MassHunter Qual button.

b Click the Next button.

Figure 27 Select Data Source page in the MS Experiment Creation Wizard

54

Page 55: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

7. Select Data to Import: Select the experiment data files or data from previous experiments in the MS Experiment Creation Wizard (Step 2 of 10).

a Click Select Data Files.

b Select the data file or data files to open.

Note: Orderly naming of the data files with respect to the parameters related to the independent variables helps you make sure that all of the data is selected.

c Click the Open button.

d Click the Next button.

Figure 28 Selection of several CEF files from MassHunter Qualitative Analysis

8. Filtering: Select and type in the data filter parameters in the MS Experiment Creation Wizard (Step 6 of 10).

Filtering during the data import process may be used to reject low-intensity data or restrict the range of data. After data is imported, several filtering options may be applied: Abundance, Retention Time, Mass, Flags, Number of ions, Mass and Mini-mum Quality Score.

Note: Filtering works with both GC/MS and LC/MS data. The term abundance actu-ally refers to volume as stored in a find by molecular feature generated CEF file. The term abundance actually refers to area as stored in a find by formula generated CEF file. The parameters may be cleared to preserve prior filtering that was used to gen-erate the CEF file.

a Clear the Minimum relative abundance check box under the Abundance filtering group box.

b Mark the Minimum absolute abundance check box and type a value of 5000 counts.

c Clear the Limit to the largest check box. It is not recommended to use this parameter with metabolomics.

55

Page 56: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

d Mark the Use all available data check box under the Retention time filtering group box.

e Clear the Use all available data check box and type 50.01 for the Min Mass and 1000 for the Max Mass under the Mass filtering group box. Filtering by maximum mass improves the statistical analysis by rejecting masses that are not significant to the experiment.

f Click the Minimum number of ions button and type 2 under the Number of ions heading. The mass filter need not be set to include reference ions.

g Click the Multiple charge states forbidden button under the Charge states check box.

h Click the Next button.

Figure 29 Recommended filtering parameters

9. Alignment: Select and type in the alignment parameters in the MS Experiment Creation Wizard (Step 7 of 10).

Unidentified compounds from different samples are aligned or grouped together if their retention times are within the specified tolerance window and the mass spec-tral similarity as determined by a simple dot product calculation are above the spec-ified level.

Note: The alignment methodologies for the data types are explained in Section 2.1.5 in the Mass Profiler Professional User Manual. Retention alignment rewrites the retention times in the data file so that the user input or algorithmically selected fea-tures are used to correct the retention times.

a Clear the Perform RT correction check box under the Retention Time correction group box. A larger retention time shift may be used to compensate for less than ideal chromatography.

If retention time correction is used, it is recommended to perform retention time correction with standards, such as 9 anthracine carboxolic acid, provided that at least two widely spaced standards exist, and those standards must be present in every sample. With standards the correction is based on a piecewise linear fit.

b Type 0.1 % and 0.15 min for RT Window under Compound alignment. Smaller values result in reduced compound grouping among the samples leading to a larger list of unique compounds in the experiment.

56

Page 57: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

c Type 5.0 ppm and 2.0 mDa for Mass Window. It is not recommended to set the mass window less than 2.0 mDa for higher masses.

d Click the Next button.

Figure 30 Alignment Parameters page

10. Sample Summary: View and review the compounds present and absent in each sample in the MS Experiment Creation Wizard (Step 8 of 10).

This step allows you to see a summary of the compounds present and absent in each of the samples based on the experiment parameters including the application of the filter and alignment parameters.

Note: It is useful to click the Back button to make changes in the Filtering (Step 6 of 10) page and the Alignment (Step 7 of 10) page parameters and then return to this Sample Summary (Step 8 of 10) page several times to develop a feel for how each of the parameters affects compound summary. You can even independently assess the effects of retention time alignment versus compound alignment.

If you set the RT Window to 0.15 minutes in the Alignment (Step 7 of 10) page, the Total number of Aligned Compounds is 4424. If you set the RT Window to 0.05 minutes, this number increases to 4518, and a larger value of 0.30 min for RT Win-dow decreases this number to 4361.

Replicates should have similar numbers of compounds present and absent. You can see this easily if the files have a systematic naming system that allows replicates to be sorted together.

a Clear the Export for Recursion check box. It is not recommended to export the compounds for recursion at this step in the MS Experiment Creation. Better results are obtained after the data has been filtered for significance in the follow-ing steps.

b Click the Next button.

57

Page 58: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

Figure 31 Two-independent variables data set sample summary page

Note: Any part of the sample summary (Mass vs. RT, Spreadsheet, and Legend) may be exported by right-clicking on that part of the summary and clicking Export As and clicking of the image type. The option Image allows you to enter the image file loca-tion, file name, image size and resolution to meet your organization and publication requirements.

Figure 32 Exporting an image of the sample summary for publication

58

Page 59: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - experiment setup

11. Normalization: Select whether to normalize the data to reduce the variability caused by sample preparation and instrument response in the MS Experiment Creation Wizard (Step 9 of 10).

From the list of compounds present in all of the samples you may pick one as an internal standard.

No internal or external standard is selected at this time.

Note: Creatinine is a good choice for an internal standard for urine samples.

a Clear the Use Internal Standard check box on the Internal Standard tab.

b Clear the Use External Scalar check box on the External Scalar tab.

c Click the Next button.

12. Baselining Options: Select whether to baseline the response of the compounds across the samples in the MS Experiment Creation Wizard (Step 10 of 10).

The three options are None, Z-Transform, and Baseline each entity to median across samples.

None: Recommended if only a few features in the samples exist.

Z-Transform: Recommended if the data sets are very dense, data where very few instances of compounds are absent from any sample, such as a quantitation data set from recursion.

Baseline each entity to median across samples: The abundance for each com-pound is normalized to its median abundance across all of the samples. This has the effect of reducing the weight of very large and very small compound features on later statistical analyses. This is the recommended baselining option.

a Click Baseline each entity to median across samples.

b Click the Finish button.

Figure 33 Selecting baselining options

59

Page 60: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Feature selection for recursion - guided workflow

Guided Workflow helps you through the basic data analysis of an experiment. Default parameters in conjunction with your experimental design guides you to an initial differential expression that identifies the most statistically significant features from among all of the features previously found using molecular feature extraction. All parameters, including the default parameters used during the Guided Workflow, can be edited at the conclusion of the Guided Workflow by using the Advanced Workflow options in the Workflow Browser (See Figure 47, Mass Profiler Profes-sional Layout).

The Guided Workflow wizard helps you create an initial differential expression by displaying the sequence of steps on the left hand side navigator with the current step highlighted (Figure 34 on page 61). The workflow allows you to proceed through each step by clicking the Next button. The steps followed are predeter-mined and based on the experiment type, experimental grouping, and conditions. Some of the steps can be skipped for certain experiments.

Eight steps are involved in the Guided Workflow - Find Differential Expression. To skip later steps in the Guided Workflow, click the Finish button at each step. When you click the Finish button, the last step of the entity list is saved, and you can com-mence analysis using the Advanced Workflow.

Since this part of the metabolomics workflow is to identify the most statistically sig-nificant features, Steps 7 and 8 are skipped during the creation of an entity list for recursion.

1. Summary Report: Displays the summary view of the created experiment based on the parameters provided during the guided import. A Profile Plot with the samples on the X-axis and the Log Normalized abundance values on the Y-axis is displayed. If the number of samples is more than 30, they are represented in a spreadsheet view instead of Profile Plot.

2. Experiment Grouping: Independent variables and the attribute values of the independent variables must be specified to define grouping of the samples. An independent variable is referred to as a parameter name. The attribute values within an independent variable are referred to as parameter values. Samples with same parameter values within a parameter name are treated as repli-cates.

3. Filter Flags: The compounds created during the experiment creation are now referred to as entities. The entities are filtered (removed) from further analysis based on their presence across samples and parameter values (now referred to as a condition). See “Definitions” on page 157 for definitions and relation-ships of the terms used by the metabolomics workflow.

4. Filter by Frequency: Entities are further filtered based on their frequency of presence in specified samples and conditions. This filter removes irreproduc-ible entities.

5. Quality Control on Samples: The samples are presented by grouping and the current Principal Component Analysis (PCA). PCA calculates all the possible principal components and visually represents them in a 3D scatter plot. The scores shown by the axes scales are used to check data quality. The scatter plot shows one point per sample colored-coded by the experiment grouping. Replicates within a group should cluster together and be separated from sam-ples in other groups

6. Significance Analysis: The entities are filtered based on their p-values calcu-lated from a statistical analysis. The statistical analysis performed depends on the samples and experiment grouping.

60

Page 61: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

7. Fold Change: Compounds are further filtered based on their abundance ratios or differences between a treatment and a control that are greater than a spec-ified cut-off or threshold value.

8. ID Browser Identification: The final entity list may be exported in a CEF file and directly imported into IDBrowser for identification.

Note: The Guided Workflow does not start if Advanced Workflow was selected as the Workflow type in the New Experiment dialog.

1. Summary Report: Review the Profile Plot presented in the Summary Report of the Guided Workflow - Find Differential Expression (Step 1 of 8).

You do not enter analysis parameters at this time. You may review the data, change the plot view, export selected data, or export the plot to a file by clicking and right-clicking features available on the plot.

It is recommended that you try the review options presented to become familiar with the tools available to you. The graphical plot operations available are described in Section 5 Data Visualization in the Mass Profiler Professional User Manual. Specifi-cally, the Profile Plot operations available are described in Section 5.5 The Profile Plot View in the Mass Profiler Professional User Manual. Some of the tools available for you to review the data that has passed through the prior filters are:

• Click: You may click any line in the Profile Plot to select an entity line. Multiple entities may be selected by pressing the Ctrl key during successive clicks.

• Double-click: You may double-click any one line within the Profile Plot to inspect a single entity. You may also double-click the last entity selected while pressing the Ctrl key to inspect multiple entities. The entity inspector allows viewing of the specific entity Annotation, Data, Profile Plot and Spectra.

• Right-click: You may right-click anywhere on the Profile Plot to change the selection mode, entity selection, and plot view or to export the plot to a file.

Figure 34 Summary Report Profile Plot of the two-variable experiment example showing sixteen samples

61

Page 62: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Figure 35 Entity inspector from the Summary Report

Figure 36 Shortcut menu for the Summary Report Profile Plot (right-click)

a Click the Next button.

2. Experiment Grouping: Provide the parameters associated with the independent variables and their concomitant attribute values in the Experiment Grouping of the Guided Workflow - Find Differential Expression (Step 2 of 8).

An independent variable is referred to as a parameter name. The attribute values within an independent variable are referred to as parameter values. Samples with the same parameter values within a parameter name are treated as replicates.

Only the first two parameter names (independent variables) are used for analysis in the Guided Workflow. All of the parameters are available in the Advanced Workflow at the completion of the Guided Workflow.

Note: In order to proceed, at least one parameter with two values must be assigned.

Note: When entering Parameter Names and parameter Assign Values, it is very important that the entries use identical letters, case, numbers, and punctuation in order for the Experiment Grouping to function properly. Click the Back button or Experiment Setup > Experiment Grouping to return to Experiment Grouping if an error is identified later in the Guided or Advanced Workflow process, respectively.

Values for the First, or only, Independent Variable

a Click the Add Parameter button. The Add/Edit Experimental Parameter dialog box is opened.

62

Page 63: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Figure 37 Experiment grouping in the Guided Workflow

b Type a brief, descriptive name for the first independent variable into the Parame-ter Name. Type Infection for the two-variable experiment example.

c Click while pressing the Shift or Ctrl key as necessary to select the sample rows that are part of the first attribute of the typed parameter name.

d Click Assign Value after the rows have been selected.

e Type Control in the Assign Value dialog box.

63

Page 64: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Figure 38 Add Parameter and Assign Value during Experiment Grouping

f Click the OK button.

g Press the Shift or Ctrl key as necessary and click to select the sample rows that are part of the next attribute value within the parameter name.

h Click the Assign Value button after the rows have been selected.

i Type Infected in the Assign Value dialog box.

j Click the OK button.

k Repeat the row selection and Assign Value process as necessary to complete the assignment of the samples to each of the attribute values for the independent variable.

l Click the OK button when all of the attribute values (Assign Value) are assigned to the current independent variable (Parameter Name).

Values for the Second Indepen-dent Variable

a Click the Add Parameter button to begin parameter assignment for the next inde-pendent variable. Otherwise skip to step n.

b Type a brief, descriptive name for the second independent variable into the Parameter Name. Type Treatment for the two-variable experiment example.

c Press the Shift or Ctrl key as necessary and click to select the sample rows that are part of the second attribute identified by the typed parameter name.

d Click Assign Value after the rows have been selected.

e Type 000 in the Assign Value dialog box.

f Click the OK button.

64

Page 65: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

g Press the Shift or Ctrl key as necessary and click to select the sample rows that are part of the next attribute value within the parameter name.

h Click the Assign Value button after the rows have been selected.

i Type 250 in the Assign Value dialog box.

j Click the OK button.

k Repeat the row selection and Assign Value process as necessary to complete the assignment of the samples to each of the attribute values for the independent variable.

l Click the OK button when all of the attribute values (Assign Value) are assigned to the current independent variable (Parameter Name).

Values for the remaining Inde-pendent Variables

m Repeat step b through step l as often as necessary to assign all of the indepen-dent variables parameter names and to assign their concomitant attribute values.

Finished entering Values n Click the Next button.

Figure 39 Completed two-variable experiment example Experiment Grouping

3. Filter Flags: Review the Profile Plot presented in the Filter Flags of the Guided Workflow - Find Dif-ferential Expression (Step 3 of 8).

Before clicking Re-run Filter, you can review the data, change the plot view, export selected data, or export the plot to a file by clicking and right-clicking features avail-able on the plot in the same manner as presented in “Summary Report: Review the Profile Plot presented in the Summary Report of the Guided Workflow - Find Differ-ential Expression (Step 1 of 8).” on page 61. The graphical plot operations available are described in section 5 “Data Visualization” in the Mass Profiler Professional User Manual.

65

Page 66: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Figure 40 Revised Profile Plot after Experiment Grouping showing the final four permutations (based on two independent variables, each with two attribute values) from the sixteen samples

Provide the parameters for the Filter Flags.

The entities may now be filtered (removed) from further analysis based on the qual-ity of the entities presence across samples and parameter values (now referred to as a condition). See “Definitions” on page 157 for definitions and relationships of the terms used by the metabolomics workflow.

A major objective of Filter Flags is to remove “one-hit wonders” from further consid-eration. A one-hit wonder is an entity that appears in only one sample, is absent from the replicate samples, and does not provide any utility for statistical analysis.

A flag is a term used to denote the quality of an entity within a sample. A flag indi-cates if the entity was detected in each sample as follows: Present means the entity was detected, Absent means the entity was not detected, and Marginal means the signal for the entity was saturated.

a Click the Re-run Filter button.

b Mark the Present check box under the Acceptable Flags group box.

c Mark the Marginal check box under the Acceptable Flags group box.

d Clear the Absent check box under the Acceptable Flags group box. This flag is useful when you want to identify entities that are missing in the samples.

e Click at least ___ out of X samples have acceptable values under the “Retain Entities in which” group box. The “X” is replaced in your display with the total number of samples in your data set.

66

Page 67: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

f Type 2 in the entry box. By setting this parameter to a value of two or more, one-hit wonders are filtered.

g Click the OK button.

Figure 41 Recommended filter parameters for Filter by Flags

Note: With the two-variable experiment example, the Profile Plot changes to reflect displaying of 2930 out of 4165 entities if you started with MFE from the original data files. The display states displaying 2957 out of 4414 entities if you started with the sample CEF files. In either case this reflects the successful filtering of one-hit won-ders.

h (optional) Re-adjust the filter parameters again by clicking Re-run Filter until the results displayed in the Profile Plot are satisfactory. It is recommended that the filter be re-run several times with differing parameters to develop an understand-ing of how each parameter affects the results.

You can review the data, change the plot view, export selected data, or export the plot to a file by using left-click and right-click features available on the plot in the same manner as presented in “Summary Report: Review the Profile Plot presented in the Summary Report of the Guided Workflow - Find Differential Expression (Step 1 of 8).” on page 61. The graphical plot operations available are described in section 5 Data Visualization in the Mass Profiler Professional User Manual.

i Click the Next button.

67

Page 68: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Figure 42 Final Profile Plot after removing one-hit wonders. One entity is selected and shown as a bold green line.

4. Filter by Frequency: Provide the parameters for the Filter By Frequency in the Guided Workflow - Find Dif-ferential Expression (Step 4 of 8).

The entities may now be filtered from further analysis based on their frequency of occurrence among the samples and conditions. See “Definitions” on page 157 for definitions and relationships of the terms used by the metabolomics workflow.

Filter by frequency defines the filter by the minimum percentage of samples an entity must be present in to pass the filter. The filter is specified by typing the mini-mum percentage and selecting the applicable condition of the samples that each entity must be present:

• of all the samples (conditions are not evaluated)• of samples in only one condition (one and only one condition)• of samples in at least one condition (one or more conditions)• of samples within each condition (all conditions)

By default for Filter by Frequency, the filter is set to retain the entities that appear in at least 100% of all the samples. This is the recommended percentage for experi-ments that contain five or fewer replicates. A larger percentage removes more enti-ties from further statistical consideration. For experiments with a larger number of replicates the filter frequency percentage may be lowered to reflect the required occurrence.

a Type 100 in the Retain entities that appear in at least box.

b Click the of samples in at least one condition button.

c Click the OK button.

68

Page 69: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Figure 43 Recommended Filter by Frequency parameters for experiments with few replicates

Note: With the two-variable experiment example, the Profile Plot changes to reflect displaying 1220 out of the total number of entities, reflecting the successful filtering by frequency.

d Re-adjust the filter parameters by clicking Re-run Filter until the results displayed in the Profile Plot are satisfactory. It is recommended that the filter be re-run sev-eral times with differing parameters to develop an understanding of how each parameter affects the results.

You can review the data, change the plot view, export selected data, or export the plot to a file by clicking and right-clicking features available on the plot in the same manner as presented in “Summary Report: Review the Profile Plot presented in the Summary Report of the Guided Workflow - Find Differential Expression (Step 1 of 8).” on page 61.

e Click the Next button.

5. Quality control on samples: Assess the sample quality from QC on samples in the Guided Workflow - Find Dif-ferential Expression (Step 5 of 8).

This step provides the first view of the data using a Principle Component Analysis (PCA). PCA allows you to assess the data by viewing a 3D scatter plot of the calcu-lated principle components. The PCA scores are shown in each of the selection boxes located along the bottom of the 3D PCA Scores window. A higher score indi-cates that the principle component contains more to the variability of the data. The components generated in 3D PCA Scores graph are represented in the X, Y and Z axes and numbered 1, 2, 3 ... in order of their decreasing significance.

Principal component analysis: The mathematical process by which data contain-ing a number of potentially correlated variables is transformed into a data set in relation to a smaller number of variables called principal components that account for the most variability in the data. The result of the data transformation leads to the identification of the best explanation of the variance in the data, e.g. identification of the meaningful information.

Principal component: Transformed data into axes, principal components, so that the patterns between the axes most closely describe the relationships between the data. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. The principle components are viewed and interpreted in 3D graphical axes with additional dimensions represented by color and/or shape representing the parameters.

Experiment Grouping view: This table allows separate viewing of each sample within a parameter (now referred to as a group). Ideally, replicates within a group should cluster together and be separated from samples in other groups. The spreadsheet operations available are described in section 5.3 The Spreadsheet View in the Mass Profiler Professional User Manual. See “Definitions” on

69

Page 70: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

page 157 for definitions and relationships of the terms used by the metabolomics workflow.

3D PCA Scores scatter plot view: You may change the plot view or export the plot to a file by using left-click and right-click features available on the plot in a same manner similar to that presented in “Summary Report: Review the Profile Plot presented in the Summary Report of the Guided Workflow - Find Differential Expression (Step 1 of 8).” on page 61. Additional controls available are:• The 3D PCA scores plot can be customized by right-clicking and then clicking

Properties.• To zoom into the 3D scatter plot, press the Shift key and simultaneously click

the mouse button and drag the mouse upwards.• To zoom out, press the Shift key and simultaneously click the mouse button

and drag the mouse downwards.• To rotate, press the Ctrl key and simultaneously click the mouse button and

drag the mouse around the plot.

Legend - 3D PCA Scores view: This window shows the legend of the scatter plot.

Note: It is recommended to click the Back button to make changes in the “Filter Flags: Review the Profile Plot presented in the Filter Flags of the Guided Workflow - Find Differential Expression (Step 3 of 8).” on page 65 and the “Filter by Frequency: Provide the parameters for the Filter By Frequency in the Guided Workflow - Find Differential Expression (Step 4 of 8).” on page 68 parameters and then return to this step several times to understand how each of the parameters affects compound summary. You may even independently assess the effects of retention time align-ment versus compound alignment.

Figure 44 QC on samples PCA Score showing the initial separation of the untreated infection parameters and separation of the treatment values

a Click the Next button.

70

Page 71: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

6. Significance Analysis: Assess the differential sig-nificance from Significance Analysis in the Guided Workflow - Find Differential Expression (Step 6 of 8).

The entities are filtered based on their p-values calculated from a statistical analysis that is selected based on the samples and experiment grouping.

The statistical analysis is either a T-test or an Analysis of Variance (ANOVA) based on the samples and experiment grouping. The statistical analysis applied is described in “Section 3.4 Significance Analysis” in the Mass Profiler Professional User Manual.

Differential Expression Analysis Report view: This window describes the statis-tical test that was applied to the samples along with a summary table that orga-nizes the results by p-value. A p-value of 0.05 is similar to stating that if the means for each parameter value (a condition of an independent variable) are iden-tical, then a 5% chance or less exists of observing a difference in the parameter value means as large as you observed. In other words, statistical treatment of random sampling from identical populations with a p-value set at 0.05 leads to a difference smaller than you observed in 95% of experiments and larger than you observed in 5% of the experiments.

The last row of data in the Result Summary spreadsheet (Figure 45) shows the number of entities that would be expected to meet the significance analysis by random chance based on the p-value specified along at each column heading. If the number of entities expected by chance is much smaller than those based on the corrected p-value you have realized a selection of entities that show signifi-cance differing the means of the parameter values.

Figure 45 Result Summary table in the Differential Expression Analysis Report view.

The spreadsheet operations available are described in Section 5.3 The Spread-sheet View in the Mass Profiler Professional User Manual.

p-values: Each entity that survived the filters is now presented by compound along with the p-values expected and corrected for each of the interpretation sets. Each entity is uniquely identified by its average neutral mass and retention time from across the data sets.

Venn Diagram: Display of the Venn Diagram, or other plot, depends on the sam-ples and experiment grouping for the analysis (see Figure 46). The entities that make up each selected section of the Venn diagram are highlighted in the p-val-

71

Page 72: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

ues spreadsheet. The Venn diagram is a graphical view of the most significant entities in each of the sample analysis. Where entities in common to the analyses exist, they are depicted as overlapping section of the circles. Fewer entities in the regions of overlap are an indication that the samples support the hypothesis that a difference exists in the samples based on the experimental parameters.

You can change the plot view or export the plot to a file by clicking and right-click-ing features available on the plot in a same manner similar to that presented in “Summary Report: Review the Profile Plot presented in the Summary Report of the Guided Workflow - Find Differential Expression (Step 1 of 8).” on page 61. The graphical plot operations available are described in section 5 Data Visualization in the Mass Profiler Professional User Manual.

a Click the Re-run Analysis button to apply a new p-value cut-off.

b Move the slider or type in the p-value cut-off value. The default value is 0.05.

c Click the Close button.

d Re-adjust the p-value cut-off by clicking Re-run Analysis until the results dis-played are satisfactory. It is recommended that the analysis be re-run several times to develop an understanding of how the p-value cut-off affects the results. A larger p-value passes a larger number of entities.

Figure 46 Significance analysis based on a 2-way ANOVA using the two-variable experiment example. The results show a Venn diagram. A 1-way ANOVA signifi-cance analysis does not present a graphical representation of the entities relation-ships.

e Click the Next button.

f Fold Change is skipped by the Guided Workflow for the two-variable example.

72

Page 73: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

g ID Browser is not selected at this time because the object of this session with Mass Profiler Professional is to generate a CEF file containing the most signifi-cant features for recursion in MassHunter Qualitative Analysis.

h Click the Finish button.

Mass Profiler Professional - Advanced Workflow

You are now in the advanced workflow mode and have access to all features avail-able in Mass Profiler Professional. Figure 47 shows the layout of Mass Profiler Pro-fessional.

Further information about may be obtained by pressing the F1 key or reviewing the Mass Profiler Professional User Manual.

Figure 47 The main functional areas of the Mass Profiler Professional program

Export project zip file - save the project

7. Save the current analysis as a zip file for archiving, restoration of any future analysis to the current results, sharing the data with a collaborator, or sharing the data with Agilent customer support.

a Click Project > Export Project Zip.

b Mark the check box next to the experiment you wish to save.

c Click the OK button.

d Select or create the file directory.

e Type the File name.

f Click the Save button.

73

Page 74: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Feature selection for recursion - guided workflow

Export for recursion

8. Export the entities that have been identified as the most relevant to the differ-ential analysis for recursive finding in MassHunter Qualitative Analysis.

a Click Export for Recursion in the Workflow Browser under the Results Interpreta-tions Workflow.

b Click Choose to select the Entity List for exporting.

c Click Filtered by frequency from the entity lists in the Choose Entity List window.

d Click the OK button.

e Click the Selection Arrow on the Output File entry box.

Do not type a file name at this location.

f Select the folder to which to save the file.

g Type the File name. For example, you can type Two-Variable Filtered by Frequency.cef for our current example data set.

h Click the Save button.

i Click the OK button.

74

Page 75: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

Recursive finding with MassHunter Qualitative Analysis

This is the third of four components of the metabolomics workflow: 1. Find Compounds by Molecular Feature (MassHunter Qualitative Analysis).

Molecular features are extracted from the data based on their mass spectral and chromatographic characteristics. The process is referred to as Molecular Feature Extraction (MFE).

2. Feature Selection (Mass Profiler Professional). An initial differential expres-sion identifying the most statistically significant features from among all of the features previously found using Find Compounds by Molecular Feature.

3. Find Compounds by Formula (MassHunter Qualitative Analysis). Importing the most statistically significant features back into MassHunter Qualitative Analy-sis as targeted features results is an improvement in finding of the features from the samples. This repeated feature analysis is referred to as recursion. An improved reliability in finding leads to a concomitant improvement in the accuracy of the analysis.

4. Analysis (Mass Profiler Professional). Recursive finding of the most statisti-cally significant features are processed by Mass Profiler Professional into a final statistical analysis and interpretation. The results from the final interpre-tation may be used to prove or disprove the hypothesis.

Find compounds by formula typically uses input molecular formula to calculate the ions and isotope patterns derived from the formula as the basis to find features in the sample data file. When the input data consists of mass and retention time molecular features, Find by Formula calculates reasonable isotope patterns and use these with retention time tolerances to find the target features in the sample data files. When the input features are filtered, previously untargeted, molecular features this repeated process of finding molecular features is referred to as recursive find-ing.

Recursive finding consists of three steps: 1. Untargeted Find Compounds by Molecular Feature in MassHunter Qualitative

Analysis (see “Finding with MassHunter Qualitative Analysis” on page 38) 2. Filtering the molecular features by abundance, retention time, sample variabil-

ity, flags, frequency, and statistical significance in Mass Profiler Professional. (See section “Feature selection for recursion - experiment setup” on page 51)

3. Targeted find compounds by formula in MassHunter Qualitative Analysis. (this section)

Importing the most statistically significant features back into MassHunter Qualita-tive Analysis as targeted features results in an improvement in finding of the fea-tures from the samples. An improved reliability in finding compounds results in a concomitant improvement in the accuracy of the analysis. Recursive finding, with replicate sampling, reduces the potential for obtaining a false positive or false nega-tive answer to the hypothesis.

75

Page 76: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

Start Agilent MassHunter Qualitative Analysis

1. Start MassHunter Qualita-tive Analysis Software.

a Double-click the Qualitative Analysis icon located on the desktop.

or (for Qualitative Analysis version B.04.00 or later)

Click Start > Programs > Agilent > MassHunter Workstation > Qualitative Analysis B.04.00

or (for Qualitative Analysis version B.03.01)

Click Start > Programs > Agilent > MassHunter Workstation > Qualitative Analysis

Note: On-line help is available at anytime within MassHunter Qualitative Analysis by pressing the F1 key on the keyboard. When the F1 key is pressed, the informa-tion presented is specifically related to the software fields and options available in the active display.

b Select the same data files used during “Finding with MassHunter Qualitative Analysis” on page 38.

If MassHunter Qualitative Analysis was not closed from the prior steps in the metabolomics workflow, click File > Open Data File to make sure the files are properly selected.

An individual data file may be selected by using a single click.

An inclusive range of data files may be selected by pressing the Shift key when a second data file is selected.

A number of data files may be selected and added to any previously selected data files by pressing the Ctrl key when a data file is selected.

c Click Open to start MassHunter Qualitative Analysis with the selected data files or click Cancel to start MassHunter Qualitative Analysis without opening any data files. Data files may opened later by clicking File > Open Data File.

d Click Open to start MassHunter Qualitative Analysis with the selected data files

or click Cancel to start MassHunter Qualitative Analysis without opening any data files. Data files may opened later by clicking File > Open Data File.

2. Enable advanced parameters.

Advanced parameters must be enabled in MassHunter Qualitative Analysis in order to show tabs labeled Advanced in the method editor and to enable compound importing for recursive finding of molecular features.

a Click the Configuration > User Interface Configuration command.

b Mark the Show advanced parameters check box under Other. See Figure 48 on page 77.

If the files intended to be processed include GC/MS data, you mark the GC check box under the Separation types group box and mark Unit Mass (Q, QQQ) under the Mass accuracy group box.

76

Page 77: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

Figure 48 MassHunter Qualitative Analysis user interface configuration

c Click the OK button.

d Check to make sure that File > Import Compound is an available command. See Figure 14. This command is necessary to review CEF files before importing into Mass Profiler Professional.

Figure 49 MassHunter Qualitative Analysis import compounds

Find compounds using Find Compounds by Formula

All of the parameters involved in find by formula are accessed in the tabs presented in four method editor sections that are selected from the Method Explorer window:

• Find by Formula - Options• Find by Formula - Chromatograms• Find by Formula - Mass Spectra• Find by Formula - Sample Purity

After the parameters are entered in all four method editor sections, find compounds by formula is run by clicking the Find Compounds by Formula button from within any one of these method editor sections.

3. Update the parameters in the Find by Formula - Options section.

The parameters in this section specify the rules that are applied to the formula data-base to match the database against the data based on isotope patterns (m/z and

77

Page 78: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

abundance) and retention time. This is the first part of the recursive refinement of finding features.

a In the Method Explorer window, click Find Compounds by Formula > Find by For-mula - Options. The input options specify the rules that are applied to the input molecular formula database.

b Edit the parameters on the Formula Source tab. 1. Click the Formula Source tab.

The parameters on this tab allow you to use a molecular formula or previously created databases as the source of targeted features to find. 2. Click the Compound exchange file (CEF) button. 3. Type the directory and file name of the CEF file or click the Browse button and

select a CEF file from the Open CEF file dialog box. You want to open the CEF file that contains the most significant features created using Mass Profiler Professional from the “Feature selection for recursion - guided workflow” on page 60. For example, you can select Two-Variable Filtered by Frequency.cef.

4. Click the Mass and retention time (retention time required) button.

Values to match parameters are only active if the Compound exchange file (.CEF) button or the Database button is clicked. Mass and retention time (retention time required) is the proper selection for a CEF file. Mass and retention time (retention time optional) is the proper selection for a Database source.

Figure 50 Formula Source tab in the Find by Formula - Options section of the Method Editor

c Edit the parameters on the Formula Matching tab. 1. Click the Formula Matching tab.

The parameters in this tab specify the tolerances that are used to match the input values for mass and retention against those found in the data. 2. Type 20 for Masses tolerance and select ppm as the match tolerance units. It

is important to set this value wider than the measured instrumental acquisi-tion mass tolerance to avoid losing valid feature matches.

3. Type 0.15 for Retention times. This parameter should be no less than two times the measured retention time tolerance.

4. Select Symmetric (ppm) and ± 20 ppm for Possible m/z. The parameters under the Expansion of values for chromatographic extraction group box are used to direct the algorithm on how to handle saturated chromatographic data.

5. Mark the Limit EIC extraction range check box. 6. Type a value between 1.0 and 1.5 minutes for Expected retention time.

78

Page 79: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

d Enter the parameters on the Positive Ions tab.

The parameters in this tab specify the positive ion adducts that the algorithm uses with the molecular formula to confirm that the feature was found in the data. Better results are derived from acquisition methodologies that have mini-mized adducts, especially sodium and potassium. 1. Click the Positive Ions tab. 2. Select the charge carriers that are known to be present in the data file. Typi-

cally positive protonated is the ideal selection. Non-adducted molecular ions, loss of an electron, are an option in find by formula.

3. Molecular formulas for specific charge carries may be entered in the input box below the charge carrier selection.

4. Neutral losses are not typically used. An exception is when a facile loss is expected.

5. Type 1 for Charge state range. 6. Clear the Dimers check box and the Trimers check box under the Aggregates

group box.

e Edit the parameters on the Negative Ions tab.

The parameters in this tab specify the negative ion adducts that the algorithm uses with the molecular formula to confirm that the feature was found in the data. Better results are derived from acquisition methodologies that have mini-mized adducts. 1. Click the Negative Ions tab. 2. Select the charge carriers that are known to be present in the data file. Typi-

cally negative deprotonated is the ideal selection. Non-adducted molecular ions, attachment of an electron, are an option in find by formula.

3. Neutral losses are not typically used. An exception is when a facile loss is expected.

4. Type 1 for Charge state range. 5. Clear the Dimers and the Trimers check boxes under the Aggregates group

box.

f Edit the parameters on the Scoring tab.

The parameters in this tab specify how to rate whether the spectral pattern is cor-rect for the molecular formula. The scoring determines a goodness of fit between observed ions to against the expected ions in the database. It is expected that as signal levels decrease the scoring parameters entered may not match as well. The defaults provided are adequate. 1. Click the Scoring tab. 2. Type the default values of 100 for Mass score, 60 for Isotope abundance

score, and 50 for Isotope spacing score under Contribution to overall score.

You can set the values of 100 for Mass score and 0 for both Isotope abundance score and Isotope spacing score, and the Isotope abundance score and the Iso-tope spacing score are not included when calculating the Score. 3. Type the default values of 2.0 for mDa and 5.6 ppm for MS mass under the

Expected data variation. 4. Type the default value of 7.5 % for MS isotope abundance. 5. Type the default values of 5.0 for mDa and 7.5 ppm for MS/MS mass.

g Edit the parameters on the Results tab.

79

Page 80: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

The parameters in this tab specify how the results are saved which affect the ease in reviewing the results. 1. Click the Results tab. 2. Mark the Delete previous compounds check box to delete prior compound

results. Clear the Delete previous compounds check box when you want to concatenate the find by formula results to the find by MFE results and thereby allow for a manual review of whether the feature was found in both instances.

3. Click the Highlight first compounds under the New results group box. 4. Clear the Only generate compounds for matched formulas check box under

the Matched results group box. It is important to be able to review all of the results.

5. Under the Chromatograms and spectra group box, mark the Extract EIC check box.

6. Under the Chromatograms and spectra group box, clear the Extract raw spec-trum check box.

7. Under the Chromatograms and spectra group box, mark the Extract cleaned spectrum check box. Extracting chromatograms or spectra slows the process-ing. Once you are comfortable with the results, processing time is reduced by clearing these check boxes.

4. Update the parameters in the Find by Formula - Chro-matograms section.

In this section, you enter integrator parameters that are applied to the data to extract features for matching to the input formula. This is the second and most criti-cal part of the recursive refinement of the feature finding.

a In the Method Explorer window, click Find Compounds by Formula > Find by For-mula - Chromatograms. The input options specify the integrator parameters.

b Enter the parameters on the EIC Integration tab.

The parameters in this tab specify which integrator to use for the data extraction. 1. Click the EIC Integration tab. 2. Select Agile under the Integrator selection group box. Agile is the preferred

metabolomics integrator. No additional user parameters are associated with this integration.

Figure 51 EIC Integration tab in the Find by Formula - Chromatograms section

The General integrator may be selected as an alternate. When selected, an Options tab is presented below the Integrator selection heading.

Under the Detector heading: 1. Type 2 for Point sampling, 0.02 for Start threshold, and 0.1 for Stop

threshold. 2. Select 7 point for Filtering and Top for Peak location.

Under Baseline allocation: 3. Type 5 for Baseline reset > and 100 for If either edge <. 4. Click the Drop else tangent skim button.

80

Page 81: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

The other integrator integration selections are not recommended for metabolomics.

c Enter the parameters on the EIC Peak Filters tab.

The parameters in this tab specify which ions to filter out of the chromatogram integrator results. 1. Click the EIC Peak Filters tab. 2. Click the Peak height button. 3. Mark the Absolute height check box and type in a value of 1000 counts

under the Height filters group box. With targeted feature finding, the minimum Absolute height of the feature may be smaller than the absolute height used in the compound filters for untargeted molecular feature extraction.

4. Mark the Limit (by height) to the largest check box and type in a value of 5 under the Maximum number of peaks group box. If more than five peaks are found and pass the isotope test and retention times are not used, then the most abundant peaks are reported.

5. Update the parameters in the Find by Formula - Mass Spectra section.

The parameters in the Find by Formula - Mass Spectra section are applied to the data to extract features for matching to the input formula. This is the third and final part of the recursive refinement of the feature finding.

a In the Method Explorer window, click Find Compounds by Formula > Find by For-mula - Mass Spectra. The input options specify criteria for mass spectra to include in the feature processing.

b Enter the parameters in the Peak Spectrum tab.

The parameters in this tab specify which spectra from the extracted ion chro-matograms to include in the feature processing and whether to perform back-ground subtraction on the spectra. 1. Click the Peak Spectrum tab 2. Click the Average scans > button and type 10 for the % of peak height under

the Spectra to include group box. Average scans provide mass accuracy. 3. Clear the Exclude if above X% of saturation under the TOF spectra group box.

If this check box is marked, any spectrum containing a peak within the given percentage of being saturated ion is excluded from processing for any com-pound feature.

4. Select None for MS under the Peak spectrum background group box.

Figure 52 Peak Spectrum tab in the Find by Formula - Mass Spectra section

c Enter the parameters on the Peak Location tab.

81

Page 82: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

The parameters in this tab specify the m/z values in a spectrum that are consid-ered peaks. These parameters are only applicable profile data files, they are not applicable to centroid collected data files and may be left at the defaults. 1. Click the Peak Location tab. 2. Type the default of 2 for Maximum spike width. 3. Type the default of 0.70 for Required valley.

d Enter the parameters on the Charge State tab.

The parameters in this tab specify isotope grouping tolerances and charge state limits. It is important to set the maximum charge state to one and adjustments to the grouping model can change the compound results. 1. Click the Charge State tab. 2. Verify that the Peak spacing tolerance parameters are at the default values of

0.0025 for m/z and 7.0 ppm under the Isotope grouping check box. 3. Select Common organic molecules for the Isotope model. You select Unbi-

ased if the compounds are known to contain metals. 4. Mark the Limit assigned charge state to a maximum of check box and type 1.

This parameter should match the value typed into the Charge state range under the Charge state group box in the Positive Ions and Negative Ions tabs in the Find by Formula - Options section.

5. Clear the Treat ions with unassigned charge as singly-charged check box.

6. Turn off the sample purity calculations in the Find by Formula - Sample Purity section.

The Find by Formula - Sample Purity is not used in metabolomics analyses.

a In the Method Explorer window, click Find Compounds by Formula > Find by For-mula - Sample Purity. The input options specify criteria for mass spectra to include in the feature processing.

b Turn off sample purity calculations on the Options tab. 1. Click the Options tab. 2. Clear the Compute sample purity check box. If this check box is cleared, then

sample purity is not calculated. All other options on this tab are unavailable and the entries in the remaining tabs are not considered.

7. Save the method. Click the Method > Save As command and type a method name. If a method name was previously entered, then click Method > Save.

8. Run Find Compounds by Formula.

a Click Find > Find Compounds by Formula or click Actions > Find Compounds by Formula or click the Find Compounds by Formula icon in the toolbar in the Method Editor. If more than one data file is opened, the Find Compounds dialog box is opened.

b Select the data files to be processed from the List of opened data files.

c Click the Find button.

9. Display the Compound List table.

a If the Compound List window is not displayed, click the View > Compound List command.

b Right-click the Compound List window and click the Add/Remove Columns com-mand.

c Click Clear All.

82

Page 83: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Recursive finding with MassHunter Qualitative Analysis

d Click the Column Name column header twice to sort alphabetically by the Column Name.

e Mark the check boxes for at least the following Column Names:

Abund, Area, Base Peak, Cpd, File, Height, Ions, Mass, RT, Saturated, Show/Hide, Width, and Vol• Each of these columns is documented in the MassHunter Qualitative Analysis

Help in Reference > Columns > Compound List Table Columns under the Contents tab.

• It is normal at this time for the Area and Abund columns to be blank.• Everything in the table is exported to the CEF file.

f Click the OK button.

g Click and drag the column names so that they are arranged in the order you like. A useful order is shown in Figure 18 on page 47.

h Click the Height column heading to sort the compounds by ascending height (the abundance value of the base peak) so that the lower height values shown are the top of the list.

i Click and drag to select the first few compounds (around ten compounds). The compounds show as highlighted.

j Right-click the Compound List pane and click Extract Complete Result Set.

The results are displayed on the Chromatogram Results and MS Spectrum Results panes and are updated for each of the low height compounds by using the arrow keys to move up and down the Compound List. (See Figure 19 on page 48)• In the Chromatogram Results the extraction ion chromatograms (EIC) for each

compound is compared to the total ion chromatogram for the ions contained in all of the molecular features.

• In the MS Spectrum Results the compound spectrum for each compound is compared to the data spectrum.

k A compound may be deleted and removed from the features available for export-ing by highlighting the compound and then pressing the delete key.

10. Export the Data to a CEF file.

a Click File > Export > as CEF.

b Select the data files to be exported from the List of opened data files.

c Review and update the other parameters in the Export CEF Options dialog box.

d Click the OK button.

83

Page 84: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - experiment setup

Analysis with Mass Profiler Professional - experiment setup

This is the fourth of four components of the metabolomics workflow: 1. Find Compounds by Molecular Feature (MassHunter Qualitative Analysis).

Molecular features are extracted from the data based on their mass spectral and chromatographic characteristics. The process is referred to as Molecular Feature Extraction (MFE).

2. Feature Selection (Mass Profiler Professional). An initial differential expres-sion identifying the most statistically significant features from among all of the features previously found using Find Compounds by Molecular Feature.

3. Find Compounds by Formula (MassHunter Qualitative Analysis). Importing the most statistically significant features back into MassHunter Qualitative Analy-sis as targeted features results is an improvement in finding of the features from the samples. This repeated feature analysis is referred to as recursion. An improved reliability in finding leads to a concomitant improvement in the accuracy of the analysis.

4. Analysis (Mass Profiler Professional). Recursive finding of the most statisti-cally significant features are processed by Mass Profiler Professional into a final statistical analysis and interpretation. The results from the final interpre-tation may be used to prove or disprove the hypothesis.

The emphasis of the metabolomics workflow up to this point has been on finding the most significant molecular features from the sample data. The most significant molecular features were exported back into MassHunter Qualitative Analysis where Find by Molecular Formula was used to improve the accuracy of the feature finding within the sample files, a process referred to as recursive finding.

The recursive molecular features generated using MassHunter Qualitative Analysis are now imported into Mass Profiler Professional and subject to statistical analysis and interpretation to prove or disprove the hypothesis.

Mass Profiler Professional provides a guided workflow for importing data and a choice for either guided or advanced workflow. A Guided Workflow guides you through an initial set of filters and models for differential analysis. The Advanced Workflow does not guide you through the initial steps of the differential analysis and is not recommended. The Guided Workflow steps end with the Advanced Workflow interface.

Note: Help and detail information regarding the various fields and statistical treat-ments are available at anytime by pressing the F1 key on the keyboard or referring to the Mass Profiler Professional User Manual.

Experiment Setup is detailed in “Feature selection for recursion - experiment setup” on page 51.

The three main steps to the experiment setup are: 1. Start Mass Profiler Professional 2. Project Setup 3. Guided Import - Experiment Creation

The steps involved in the guided import typically include numbers 1, 2, 6, 7, 8, 9, and 10 from the list below. Steps 3, 4, and 5 are omitted.

1. Select Data Source: Depending upon the type of experiment and supported file format, this displays the options of data source.

2. Select Data to Import: This allows you to select the samples.

84

Page 85: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - experiment setup

3. Input Validation: Validates different types of samples. This step appears only if different types of files are selected and prompts you to select same type of files to proceed. This step applies to ChemStation and AMDIS data.

4. Signal Chooser: Allows you to select signal options specific to MassHunter Quantitation experiments.

5. Sample Summary: This provides an option to select the samples. In Generic Experiments one file can have multiple samples.

6. Filtering: Filters the data features by abundance, mass range, number of ions per feature, and by charge state.

7. Alignment: This aligns the features across the samples based on retention time and mass to the input tolerances.

8. Sample Summary: Displays Mass versus RT plot, spreadsheet, and legend for the distribution of aligned and unaligned entities in the sample.

9. Normalization: Normalizes the samples based on the available options. 10. Baselining Options: Baselines the samples.

85

Page 86: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - guided workflow

Analysis with Mass Profiler Professional - guided workflow

The Guided Workflow wizard helps you create an initial differential expression by displaying the sequence of steps on the left hand side navigator with the current step highlighted. The workflow allows you to proceed through each step by clicking the Next button. The steps followed are predetermined and based on the experiment type, experimental grouping, and conditions. Some of the steps may be skipped for certain experiments.

Guided Workflow is detailed in “Feature selection for recursion - guided workflow” on page 60.

Eight steps are involved in the Guided Workflow - Initial Quality Control. To skip later steps in the Guided Workflow, click Finish at any step. When you click Finish, the entity list is saved, and you may commence analysis using the Advanced Workflow.

1. Summary Report: Displays the summary view of the created experiment based on the parameters provided during the guided import. A Profile Plot with the samples on the X-axis and the Log Normalized abundance values on the Y-axis is displayed. If the number of samples is more than 30, they are represented in a spreadsheet view instead of Profile Plot.

2. Experiment Grouping: Independent variables and the attribute values of the independent variables must be specified to define grouping of the samples. An independent variable is referred to as a parameter name. The attribute values within an independent variable are referred to as parameter values. Samples with same parameter values within a parameter name are treated as repli-cates.

3. Filter Flags: The compounds created during the experiment creation are now referred to as entities. The entities are filtered (removed) from further analysis based on their presence across samples and parameter values (now referred to as a condition). See “Definitions” on page 157 for definitions and relation-ships of the terms used by the metabolomics workflow.

4. Filter by Frequency: Entities are further filtered based on their frequency of presence in specified samples and conditions. This filter removes irreproduc-ible entities.

5. Quality Control on Samples: The samples are presented by grouping and the current Principal Component Analysis (PCA). PCA calculates all the possible principal components and visually represents them in a 3D scatter plot. The scores shown by the axes scales are used to check data quality. The scatter plot shows one point per sample colored-coded by the experiment grouping. Replicates within a group should cluster together and be separated from sam-ples in other groups

6. Significance Analysis: The entities are filtered based on their p-values calcu-lated from a statistical analysis. The statistical analysis performed depends on the samples and experiment grouping.

7. Fold Change: Compounds are further filtered based on their abundance ratios or differences between a treatment and a control that are greater than a spec-ified cut-off or threshold value.

8. ID Browser Identification: The final entity list may be exported in a CEF file and directly imported into IDBrowser for identification.

86

Page 87: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - advanced workflow

Analysis with Mass Profiler Professional - advanced workflow

The Advanced Workflow in Mass Profiler Professional provides the tools necessary for analyzing features from your mass spectrometry data depending upon the need and aim of the analysis, the experimental design and the focus of the study. This helps you to create different interpretations to carry out the analysis based on the different filtering, normalization, and standard statistical methods.

It is recommended that you choose and use the Guided Workflow to develop in the beginning of experiment creation by choosing Guided Workflow as the Workflow Type in the New Experiment drop-down list. When the Guided Workflow is com-pleted you automatically entered in the Advanced Workflow mode and have access to all features available in Mass Profiler Professional. However, at any step in the Guided Workflow you can stop the guided workflow and immediately enter the Advanced Workflow by clicking on the Cancel button.

More information regarding the specific details of the Advanced Workflow may be found in the Mass Profiler Professional User Manual.

Figure 53 The main functional areas of Mass Profiler Professional program

87

Page 88: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - advanced workflow

Experiment setup Experiment setup allows you to review a quick start guide to Mass Profiler Profes-sional and defines the grouping and replicate structure of the experiment for inter-pretations.

Quick Start Guide a Click Quick Start Guide in the Workflow Browser.

b The Mass Profiler Professional Quick Start Guide opens in your internet browser.

c Review the Mass Profiler Professional Quick Start Guide.

Experiment Grouping a Click Experiment Grouping in the Workflow Browser.

b Follow the same process presented in Experiment Grouping (Guided Workflow - Find Differential Expression (Step 2 of 8)) on page 62.

c Click the OK button.

Create Interpretation a Click Create Interpretation in the Workflow Browser.

b Update parameters on the Parameters page (Create Interpretation (Step 1 of 3)): 1. Mark the experiment parameters you would like to use as the basis for display

and analysis. The experiments parameters are the independent variables. 2. Click the Next button.

c Update parameters on the Conditions page (Create Interpretation (Step 2 of 3)): 1. Mark the conditions that you would like to be included in the interpretation

under the Unselect conditions to exclude heading. The conditions are the attri-bute values of the independent variable or permutation of attribute values when more than one independent variable is selected in Select Parameters.

2. Mark Average over replicates in conditions to use a single value for each entity with respect to each condition. The single value is the average of the entity across the replicates of the condition.

3. Mark the measurement permitted flags under the Use Measurements flagged heading. A flag is a term used to denote the quality of an entity within a sam-ple. A flag indicates if the entity was detected in each sample as follows: Pres-ent means the entity was detected, Absent means the entity was not detected, and Marginal means the signal for the entity was saturated.

4. Click the Next button.

d Update parameters on the Interpretation page (Create Interpretation (Step 3 of 3)): 1. Type details that are saved as part of the interpretation. 2. Click the Finish button.

Quality control Quality Control allows you to decide which samples are ambiguous, or outliers, and which pass the quality criteria. Quality Control in Mass Profiler Professional is done by • “Quality Control on samples”• “Filter by Frequency”• “Filter on Sample Variability”

88

Page 89: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - advanced workflow

• “Filter by Flags”• “Filter by Abundance”• “Filter by Annotations”

Quality Control on samples a Click Quality Control on Samples in the Workflow Browser.

Display and printing of the 3D PCA Scores view and Experiment Grouping spread-sheet are described in the Guided Workflow - Find Differential Expression (Step 5 of 8) on page 69.

b Click Add/Remove Samples to limit the samples to those that meet the quality criteria based on he PCA scores.

c Click the OK button.

d Click Close to complete the quality control.

Filter by Frequency You can filter entities based on the number of times an entity occurs in the samples. The entity list created is placed under Analysis in the Experiment Navigator.

a Click Filter by Frequency in the Workflow Browser.

b Enter parameters on the Entity List and Interpretation page (Filter by Frequency (Step 1 of 4)): 1. Choose the entity list and the interpretation on which you want to run the fil-

ter. 2. Click the Next button.

c Enter parameters on the Input Parameters page (Filter by Frequency (Step 2 of 4)):

You enter the parameters that set the minimum percentage of samples an entity must be present in order to pass the filter. 1. Type the minimum percentage. 2. Select the applicable condition of the samples that each entity must be pres-

ent. 3. Click the Next button.

d Enter parameters on the Output Views of Filter by Frequency page (Filter by Fre-quency (Step 3 of 4)): 1. Review the entities that passed the filter conditions using the spreadsheet

view or the profile plot view. The total number of entities and number of enti-ties passing the filter along with their filtering condition are displayed on the top of the navigator window.

2. Click the Next button.

e Enter parameters on the Save Entity List page (Filter by Frequency (Step 4 of 4)): 1. Type in descriptive information that is stored with the saved entity list. 2. Click the Finish button.

Filter on Sample Variability Filter entities based on the standard deviation or coefficient of variation of intensity values within a condition. The entity list created is placed under Analysis in the Experiment Navigator.

a Click Filter on Sample Variability in the Workflow Browser.

89

Page 90: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - advanced workflow

b Enter parameters on the Entity List and Interpretation page (Filter on Sample Variability (Step 1 of 4)): 1. Choose the entity list and the interpretation on which you want to run the fil-

ter. 2. Click the Next button.

c Enter parameters on the Input Parameters page (Filter on Sample Variability (Step 2 of 4)): 1. Select the parameters that are used for the filter. 2. Type the percent or standard deviation value that is used for the filter. 3. Type a value for the number of conditions that an entity must meet the filter

setting in Retain entities in which at least ___ out of total conditions have values within range. The default value is one.

4. Click the Next button.

d Enter parameters on the Output Views of Sample Variability page (Filter on Sam-ple Variability (Step 3 of 4)): 1. Review the entities that passed the filter conditions using the spreadsheet

view or the profile plot view. The total number of entities and number of enti-ties passing the filter along with their filtering condition are displayed on the top of the navigator window.

2. Click the Next button.

e Enter parameters on the Save Entity List page (Filter on Sample Variability (Step 4 of 4)): 1. Type in descriptive information that is stored with the saved entity list. 2. Click the Finish button.

Filter by Flags The main goal of Filter by Flags is to remove “one-hit wonders” from further consid-eration. A one-hit wonder is an entity that appears in only one sample, is absent from the replicate samples, and does not provide any utility for statistical analysis. The entity list created is placed under Analysis in the Experiment Navigator.

a Click Filter by Flags in the Workflow Browser.

b Enter parameters on the Entity List and Interpretation page (Filter by Flags (Step 1 of 4)): 1. Click the Choose buttons to select the Entity List and the Interpretation on

which you want to run the filter. 2. Click the Next button.

c Enter parameters on the Input Parameters page (Filter by Flags (Step 2 of 4)): 1. Mark the acceptable flags. A flag is a term used to denote the quality of an

entity within a sample. A flag indicates if the entity was detected in each sam-ple as follows: Present means the entity was detected, Absent means the entity was not detected, and Marginal means the signal for the entity was sat-urated.

2. Click at least ___ out of # samples have acceptable values under the Retain Entities in which group box. The # symbol is replaced in your display with the total number of samples in your data set.

3. Type at least 2 in the entry box. 4. Click the Next button.

d Review results on the Output Views of Filter by Flags page (Filter by Flags (Step 3 of 4)):

90

Page 91: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - advanced workflow

1. Review the entities that passed the filter conditions using the spreadsheet view or the profile plot view. The total number of entities and number of enti-ties passing the filter along with their filtering condition are displayed on the top of the navigator window.

2. Click the Next button.

e Enter parameters on the Save Entity List page (Filter by Flags (Step 4 of 4)): 1. Type in descriptive information that is stored with the saved entity list. 2. Click the Finish button.

Filter by Abundance Filter out masses that have abundance values below the reliable detection limit of the mass spectrometer. You can set the proportion of conditions that must meet the criteria you specify. For example, to eliminate masses that do not meet a specified control value at least once in the experiment, you can filter them out by setting a minimum abundance value to be met in at least one condition. You can decide the proportion of conditions which must meet a certain threshold. The entity list created is placed under Analysis in the Experiment Navigator.

a Click Filter by Abundance in the Workflow Browser.

b Enter parameters on the Entity List and Interpretation page (Filter by Abundance (Step 1 of 4)): 1. Select the entity list and the interpretation on which you want to run the filter. 2. Click the Next button.

c Enter parameters on the Input Parameters page (Filter by Abundance (Step 2 of 4)): 1. Select the options to apply this filter to either Raw Data or Normalized Data. 2. Filter by: Select either the Filter by Percentile option or the Filter by Values

option with respect to the entity signal intensity (volume). 3. Range of interest: Upon selecting either the percentile or the values, you

assign upper and lower cut-off range. By default, the Upper Cut-off values recommended is 100 percent and lower cutoff value is 20 percent for normal-ized values. Similarly, the same relative default values are shown as an abso-lute range for raw values.

4. Retain entities in which: Select any one of the two options based on the condi-tions you want to retain entities in the samples and type in the respective value.

5. Click the Next button.

d Enter parameters on the Output Views of Filter by Abundance page (Filter by Abundance (Step 3 of 4)): 1. Review the entities that passed the filter conditions using the spreadsheet

view or the profile plot view. The total number of entities and number of enti-ties passing the filter along with their filtering condition are displayed on the top of the navigator window.

2. Click the Next button.

e Enter parameters on the Save Entity List page (Filter by Abundance (Step 4 of 4)): 1. Type in descriptive information that is stored with the saved entity list. 2. Click the Finish button.

91

Page 92: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - advanced workflow

Filter by Annotations Filter entities based on search fields, conditions, and search values that are part of the sample descriptions. The entity list created is placed under Analysis in the Experiment Navigator.

a Click Filter by Annotations in the Workflow Browser.

b Enter parameters on the Input Parameters page (Filter by Annotations (Step 1 of 4)): 1. Select the entity list on which you want to run the filter. 2. Click the Next button.

c Enter parameters on the Add/Remove filter conditions page (Filter by Annota-tions (Step 2 of 4)): 1. Add a condition for filtering the entities by selecting any of the options from

the Search drop-down list such as Compound Name, CAS Number, Compos-ite Spectrum, Compound Name, CompoundAlgo, Frequency Column, Library Spectrum, and Mass Column.

2. Select one of the options from the Condition drop-down list such as Equals, Does not Equal, Starts with, Ends with, and Includes.

3. Enter the value in the Search Value box. 4. You can execute multiple search queries by selecting AND or OR from Com-

bine search conditions using the drop-down list. If you do not add any valid condition, all the entities are selected.

5. You add a separate row of query by clicking the Add button, or you remove the row by clicking the Remove button.

6. Click the Next button.

d Enter parameters on the Filter Results page (Filter by Annotations (Step 3 of 4)): 1. Review the entities that passed the filter conditions using the spreadsheet

view or the profile plot view. The total number of entities and number of enti-ties passing the filter along with their filtering condition are displayed on the top of the navigator window.

2. Click the Next button.

e Enter parameters on the Save Entity List page (Filter by Annotations (Step 4 of 4)): 1. Type in descriptive information that is stored with the saved entity list. 2. Click the Finish button.

92

Page 93: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Analysis with Mass Profiler Professional - analysis

Statistical analysis is the pair wise comparison between two conditions. Because metabolomics involves the analysis of large data sets spanning many conditions, the result of the analysis is not simply with respect to a single hypothesis, but the result of comparisons made against many hypothesis as each pair wise comparison is made. For any particular test, a p-value may be thought of as a probability of reject-ing the null hypothesis when it is in fact true. For a p-value of 0.05, approximately one out of every twenty comparisons results in a false positive (rejection of the null hypothesis when in fact it is true). Thus, if our experiment involves performing com-parisons with a p-value of 0.05, we expect five results to be false positives. Thus, a proper statistical treatment is performed by this analysis to control the false positive rate for the entire comparison set among the samples that make up the experiment.

The Statistical Analysis wizard has eight steps. Steps 3, 4, and 6 of 8 are presented in cases when 1-way ANOVA and t-test are performed against zero such as when the data has a single independent variable. The entity list created is placed under Analysis in the Experiment Navigator.

Statistical Analysis

1. Click Statistical Analysis in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Step 1 of 8).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c (optional) Mark Exclude missing values from calculation of fold change and p-value.

d Click the Next button.

93

Page 94: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 54 Input Parameters page (Statistical Analysis (Step 1 of 8))

3. Enter parameters on the Select Test page. (Step 2 of 8).

a Select the test to be performed. The test list varies based on the input parame-ters.

b Click the Next button.

Figure 55 Select Test page (Statistical Analysis (Step 2 of 8))

4. Enter parameters on the Posthoc test page (Step 3 of 8).

After performing a t-test or ANOVA, the posthoc test is used to determine which of the groups are particularly different from each other. The posthoc method is designed to compensate for the overstatement that typically results from performing a series of t-tests on all of the possible pairs of means.

a Select the Post Hoc test.

b Click the Next button.

Figure 56 Posthoc Test page (Statistical Analysis (Step 3 of 8))

5. Enter parameters on the Column reordering page (Step 4 of 8).

This page is presented when the selected test involves pair wise comparisons; such as when Repeated measures or Friedman is the selected test.

a Reorder the columns as you like to carry out a paired test.

b Click the Next button.

94

Page 95: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 57 Column reordering page (Statistical Analysis (Step 4 of 8))

6. Enter parameters on the p-value Computation page (Step 5 of 8).

a By default the p-value computation algorithm selects Asymptotic.

b Click the multiple testing correction method to use to control the false positive rate for the entire comparison set among the samples that make up the experi-ment. Benjamini Hochberg FDR (false discovery rate) is the default selection and is used by the Guided Workflow.

c Click the Next button.

Figure 58 p-value Computation page (Statistical Analysis (Step 5 of 8))

7. Enter parameters on the p-value Computation page (Step 6 of 8).

This p-value computation page is presented in cases when 1-way ANOVA and t-test are performed against zero - such as when the data has a single independent vari-able.

a Click the p-value Computation from a choice of either Asymptotic or Permutative.

b Type the Number of Permutations if Permutative is selected.

c Click the multiple testing correction method to use to control the false positive rate for the entire comparison set among the samples that make up the experi-

95

Page 96: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

ment. Benjamini Hochberg FDR (false discovery rate) is the default selection and is used by the Guided Workflow.

d Click the Next button.

Figure 59 p-value Computation page (Statistical Analysis (Step 6 of 8))

8. Review results on the Results page (Step 7 of 8).

This step displays the results of analysis. For example on completion of T-test, the results are displayed as three, tiled windows. The views differ based upon the tests performed.

Differential Expression Analysis Report view: This window describes the statistical test that was applied to the samples along with a summary table that organizes the results by p-value. A p-value of 0.05 is similar to stating that if the means for each parameter value (a condition of an independent variable) are identical, then 5% chance or less exists of observing a difference in the parameter value means as large as you observed. In other words, statistical treatment of random sampling from identical populations with a p-value set at 0.05 would lead to a difference smaller than you observed in 95% of experiments and larger than you observed in 5% of the experiments.

The test description used for computing p-values, type of correction used, and P-value computation type (Asymptotic or Permutative) are reported.

The last row of data in the spreadsheet shows the number of entities that would be expected to meet the significance analysis by random chance based on the p-value specified along at each column heading. If the number of entities expected by chance is much smaller than those based on the corrected p-value you have realized a selection of entities that show significance differing the means of the parameters.

The spreadsheet operations available are described in Section 5.3 The Spreadsheet View in the Mass Profiler Professional User Manual.

p-values: Each entity that survived the filters is now presented by compound along with the p-values expected and corrected for each of the interpretation sets. Each entity is uniquely identified by its average neutral mass and retention time from across the data sets.

96

Page 97: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Venn Diagram: Display of the Venn Diagram, or other plot, depends on the samples and experiment grouping for the analysis. The entities that make up each selected section of the Venn diagram are highlighted in the p-values spreadsheet. The Venn diagram is a graphical view of the most significant entities in each of the sample analysis. Entities in common to the analyses are depicted as overlapping section of the circles. Fewer entities in the regions of overlap are an indication that the sam-ples support the hypothesis that a difference exists in the samples based on the experimental parameters.

Volcano plot: This plot comes up only if two groups exist in the Experiment Group-ing. The entities which satisfy the default p-value cutoff (0.05) appear in red and the rest appear in grey. This plot shows the negative log10 of p-value against log (base2.0) of the fold change. Compounds with a large fold-change and low p-value are easily identifiable on this view. If no significant entities are found, then you can change the p-value cut off using the Rerun Analysis button. You can choose an alternative control group from the Rerun Analysis button. The label at the top of the wizard shows the number of entities satisfying the given p-value.

You may change the plot view or export the plot to a file by clicking and right-clicking features available on the plot in a same manner similar to that presented in “Sum-mary Report: Review the Profile Plot presented in the Summary Report of the Guided Workflow - Find Differential Expression (Step 1 of 8).” on page 61. The graphical plot operations available are described in section 5 Data Visualization in the Mass Pro-filer Professional User Manual.

a Click Change cut-off to apply a new p-value cut-off.

b Move the slider or type in the p-value cut-off value. The default value is 0.05.

c Click the Close button.

d The p-value cut-off may be re-adjusted again by clicking Change cut-off until the results displayed are satisfactory. It is recommended that the analysis be re-run several times to develop an understanding of how the p-value cut-off affects the results. A larger p-value passes a larger number of entities.

e You can select one or more entities in the p-values table by clicking or clicking while pressing the Shift or Ctrl key.

f (optional) Save the selected entities as a custom entity list by clicking Save cus-tom list. 1. Click Configure Columns to select custom annotations. 2. Click the OK button. 3. Type in descriptive information that is stored with the saved entity list. 4. Click the OK button.

g Click the Next button.

97

Page 98: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 60 Results page (Statistical Analysis (Step 7 of 8))

9. Save Entity List page (Step 8 of 8).

a Type in descriptive information that is stored with the saved entity list.

b Click the Configure Columns button to select custom annotations.

c Click the OK button.

d Click the Finish button.

Figure 61 Save Entity List page (Statistical Analysis (Step 8 of 8))

98

Page 99: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Filter on Volcano Plot A volcano plot is a log-log scatter plot where the p-value is plotted against the fold-change. A volcano plot is useful for identifying events that differ significantly between two groups of subjects. The name “volcano plot” comes from the plot's resemblance to an image of a pyroclastic volcanic eruption with the most significant points at the top of the plot as if they were spewed pieces of molten lava.

A volcano allows you to visualize the relationship between fold-change (magnitude of change) and statistical significance (which takes both magnitude of change and variability into consideration).

The Filter on Volcano Plot wizard has seven steps. Some steps may be skipped based on the experimental design. The entity list created is placed under Analysis in the Experiment Navigator.

1. Click Filter on Volcano Plot in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Step 1 of 7).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c Mark or clear the Exclude missing values from calculation of fold change and p-value check box.

d Click the Next button.

Figure 62 Input Parameters page (Filter on Volcano Plot (Step 1 of 7))

3. Enter parameters on the Select Test page (Step 2 of 7).

a Select Condition 1 and Condition 2.

b Select the test to be performed.

c Click the Next button.

Note: If you select TTest paired or MannWhitney paired for the Select test, then the wizard continues to steps 3 and 4 and skip step 5. At step 4, the p-value computation is selected as Asymptotic by default.

99

Page 100: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 63 Select Test page (Filter on Volcano Plot (Step 2 of 7))

4. Enter parameters on the Column reordering page (Step 3 of 7).

This page is presented when the selected test involves pair wise comparisons (for example, when TTest paired or MannWhitney paired is the selected test).

a Reorder the columns as you like to carry out a paired test.

b Click the Next button.

Figure 64 Column reordering page (Filter on Volcano Plot (Step 2 of 7))

5. Enter parameters on the p-value computation page (Step 4 of 7).

a By default the p-value Computation algorithm selects Asymptotic.

b Click the Multiple Testing Correction method to use to control the false positive rate for the entire comparison set among the samples that make up the experi-ment. Benjamini Hochberg FDR (false discovery rate) is the default selection and is used by the Guided Workflow.

c Click the Next button.

100

Page 101: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 65 p-value Computation page (Filter on Volcano Plot (Step 4 of 7))

6. Enter parameters on the p-value computation page (Step 5 of 7).

If at the Select Test page (Step 2 of 7) you selected t-Test Unpaired, Unequal Vari-ance, or MannWhitney Unpaired as the Test to be performed, then the wizard skips Column reordering page (Step 3 of 7) and p-value computation page (Step 4 of 7), and you are at this step. Otherwise, this step is skipped if you were guided through Column reordering page (Step 3 of 7) and p-value computation page (Step 4 of 7).

a Click either the Asymptotic button or the Permutative button for the p-value Computation.

b Type the Number of Permutations if you clicked the Permutative button.

c Click the Multiple Testing Correction option to use to control the false positive rate for the entire comparison set among the samples that make up the experi-ment. Benjamini Hochberg FDR (false discovery rate) is the default option and is used by the Guided Workflow.

d Click the Next button.

Figure 66 p-value Computation page (Filter on Volcano Plot (Step 5 of 7))

7. Review results on the Results page (Step 6 of 7).

This step displays the results of analysis. For example on completion of T-test, the results are displayed as three tiled windows. The views differ based upon the tests performed.

p-values table: The table shows Compound Name, p-values, corrected p-values, Fold change (Absolute), and regulation. FC Absolute means that the fold-change reported is absolute. In other words, if an entity is 2-fold up or 2-fold down, it is still called as 2.0 fold, instead of being called 2.0 fold (for up-regulation) and 0.5 (for

101

Page 102: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

down-regulation). Absolute essentially means that no directionality associated with the value exists. Directionality or regulation is indicated separately under the regula-tion column.

Differential expression analysis report: The test description used for computing p-values, type of correction used, and P-value computation type (Asymptotic or Permu-tative) are reported.

Volcano plot: This plot comes up only if two groups exist in the Experiment Group-ing. The entities which satisfy the default p-value cutoff 0.05 appear in red and the rest appear in grey. This plot shows the negative log10 of p-value against log (base 2.0) of the fold change. Compounds with a large fold-change and low p-value are easily identifiable on this view. If no significant entities are found then you can change p-value cut off using the Rerun Analysis button. You can choose an alterna-tive control group from Rerun Analysis button. The label at the top of the wizard shows the number of entities satisfying the given p-value.

a Click the Change cut-off button to apply a new p-value cut-off.

b Move the slider or type in the p-value cut-off value. The default value is 0.05.

c Click the Close button.

d The p-value cut-off may be re-adjusted again by clicking Change cut-off until the results displayed are satisfactory. It is recommended that the analysis be re-run several times to develop an understanding of how the p-value cut-off affects the results. A larger p-value passes a larger number of entities.

e You can select one or more entities in the p-values table by clicking alone or while pressing the Shift or Ctrl key.

f (optional) The selected entities can be saved as a custom entity list by clicking Save custom list. 1. Click Save custom list to save the selected entities as a custom entity list. 2. Click Configure Columns to select custom annotations. 3. Click the OK button. 4. Type in descriptive information that is stored with the saved entity list. 5. Click the OK button.

g Click the Next button.

102

Page 103: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 67 Results page (Filter on Volcano Plot (Step 6 of 7))

8. Enter information on the Save Entity List page (Step 7 of 7).

a Type in descriptive information that is stored with the saved entity list.

b Click Configure Columns to select custom annotations.

c Click the OK button.

d Click the Finish button.

Figure 68 Save Entity List page (Filter on Volcano Plot (Step 7 of 7))

103

Page 104: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Fold Change Fold change is a signed value that describes how much a quantity changes from its initial to its final value. For example, when a quantity changes from a value of 60 to a value of 15 the fold change is -4. The quantity experienced a four-fold decrease. Fold change is the ratio of the final value to the initial value.

Fold change analysis in metabolomics is used to identify entities with abundance ratios, or differences between a treatment and a control, that are in excess of speci-fied cut-off or threshold value. Fold change is calculated between the conditions where Condition 1 and another condition, Condition 2, are treated as a single group.

Raw Abundance: Evaluates the absolute ratio between Condition 2 and Condition 1; Fold change= |Condition1/Condition2|.

Normalized Abundance: Evaluates the absolute difference between the normalized intensities of the conditions; Fold change= |(Condition1 - Condition2)|.

The Fold Change wizard has six steps, and step 3 of 6 is skipped and not docu-mented. Some steps may be skipped based on the experimental design. The entity list created is placed under Analysis in the Experiment Navigator.

1. Click Fold Change in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Fold Change (Step 1 of 6)).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c Mark, if desired, Exclude missing values from calculation of fold change and p-value.

d Click the Next button.

Figure 69 Input Parameters page (Fold Change (Step 1 of 6))

3. Enter parameters on the Pairing Options page (Fold Change (Step 2 of 6)).

a Select the pairing option based on parameters and conditions in the selected interpretation.

With two or more groups, you can evaluate fold change either pair-wise or with respect to a control by selecting All against single condition. If All against a sin-gle condition is selected, the control samples need to be specified. In pair-wise conditions, the order of conditions can also be reversed using the change icon.

104

Page 105: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

b Reorder the columns as you like to carry out a paired test.

c Mark the test to be performed in the Select column.

d Click the Next button.

Figure 70 Pairing Options page (Fold Change (Step 2 of 6))

4. Enter parameters on the p-value Computation page (Fold Change (Step 4 of 6)).

a Click either Asymptotic or Permutative for the p-value Computation.

b Type the Number of Permutations if permutative is selected.

c Click the Multiple Testing Correction method to use to control the false positive rate for the entire comparison set among the samples that make up the experi-ment. Benjamini Hochberg FDR (false discovery rate) is the default selection and is used by the Guided Workflow.

d Click the Next button.

105

Page 106: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 71 p-value Computation page (Fold Change (Step 4 of 6))

5. Review results on the Results page (Fold Change (Step 5 of 6)).

All of the entities passing the fold change cut-off are displayed along with their annotations. The default view opens a Profile Plot (by group).

a Click the Change cut-off button to apply a new p-value cut-off.

b Move the slider or type in the p-value cut-off value. The default value is 0.05.

c Click the Close button.

d The p-value cut-off may be re-adjusted again by clicking Change cut-off until the results displayed are satisfactory. It is recommended that the analysis be re-run several times to develop an understanding of how the p-value cut-off affects the results. A larger p-value passes a larger number of entities.

e You can select one or more entities in the p-values table using left-click alone or while pressing the Shift or Ctrl key.

f Optional - The selected entities can be saved as a custom entity list by clicking the Save custom list button. 1. Click the Save custom list button. 2. Click Configure Columns to select custom annotations. 3. Click the OK button. 4. Type in descriptive information that is stored with the saved entity list. 5. Click the OK button.

g Click the Next button.

106

Page 107: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 72 Results page (Fold Change (Step 5 of 6))

6. Enter parameters on the Save Entity List page (Fold Change (Step 6 of 6)).

a Type in descriptive information that is stored with the saved entity list.

b Click Configure Columns to select custom annotations.

c Click the OK button.

d Click the Finish button.

Figure 73 Save Entity List page (Fold Change (Step 6 of 6))

107

Page 108: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Clustering Clustering is the organization of a set of samples or entities into subsets (clusters) in a fashion such that the samples in each cluster have similar features. Clustering is a method of unsupervised learning and is a powerful method to organize com-pounds or entities and conditions in the data set into clusters based on the similarity of their feature abundance profiles.

Mass Profiler Professional implements a variety of clustering methods: K-Means, Hierarchical, and Self Organizing Maps (SOM), clustering, along with a variety of dis-tance functions - Euclidean, Square Euclidean, Manhattan, Chebychev, Differential, Pearson Absolute, Pearson Centered, and Pearson Uncentered. Data is sorted on the basis of the available distance measures to group entities or conditions. Since differ-ent algorithms work on different kinds of data, these algorithms and distance mea-sures ensures that a wide variety of data can be clustered effectively.

Interactive views such as the ClusterSet View, the Dendrogram View, and the U Matrix View are provided for visualization. These views allow drilling down into sub-sets of data and collecting together individual entity lists into new entity lists for fur-ther analysis.

The entity list created is placed under Analysis in the Experiment Navigator. Cluster-ing results are presented as:

Compound Tree: This is a dendrogram of the entities showing the relationship between the entities generated by Hierarchical Clustering.

Condition Trees: This is a dendrograms of the conditions and shows the relation-ship between the conditions in the experiment generated by Hierarchical Cluster-ing.

Combined Trees: This is a two-dimensional dendrograms that results from per-forming Hierarchical Clustering on both entities and conditions which are grouped according to the similarity of their abundance profiles.

Classification: This is a cluster set view of entities grouped into clusters based on the similarity of their abundance profiles.

Some clustering algorithms, like Hierarchical Clustering, do not distribute data into a fixed number of clusters, but rather produce a grouping hierarchy. Most similar enti-ties are merged together to form a cluster and this combined entity is treated as a unit thereafter until all the entities are grouped together. The result is a tree struc-ture or a dendrogram, where the leaves represent individual entities and the internal nodes represent clusters of similar entities.

The Fold Change wizard has four steps.

1. Click Clustering in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Clustering (Step 1 of 4)).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

108

Page 109: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c Select the Clustering Algorithm.

d Click the Next button.

Figure 74 Input Parameters page (Clustering (Step 1 of 4))

3. Enter parameters on the Input Parameters page (Clustering (Step 2 of 4)).

a Select the Cluster on parameter. The possible values are Entities, Conditions or Both entities and conditions.

b Select the Distance metric. This parameter determines how the similarity of two entities or conditions is calculated. The metric influences the shape of the clus-ters, as some elements may be close to one another according to one metric and farther away according to another metric.

c Type the Number of clusters. The default is 3.

d Type the Number of iterations. The default is 50.

e Click the Next button.

Figure 75 Input Parameters page (Clustering (Step 2 of 4))

4. Enter parameters on the Output views page (Clustering (Step 3 of 4)).

Depending on the clustering method and parameters selected, clustering results are presented as: ClusterSet View, the Dendrogram View, and the U Matrix View. These views allow you to inspect the clustering analysis results. If the results are not satis-factory, click the Back button, change the parameters and rerun the clustering algo-rithm.

Additional information regarding the options for changing the output view can be found by pressing F1 or reviewing Chapter 6 Clustering in the Mass Profiler Profes-sional User Manual.

109

Page 110: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

a Click the Next button.

Figure 76 Output views page (Clustering (Step 3 of 4))

5. Enter parameters on the Object Details page (Clustering (Step 4 of 4)).

a Type in descriptive information that is stored with the saved entity list.

b Click the Finish button.

Figure 77 Object Details page (Clustering (Step 4 of 4))

Find Similar Entities Similarity metrics available are Euclidean (finds similar entities based on the vector distance between two entities), Pearson (finds similar entities based on the trend and rate of change of trend of the chosen entity), and Spearman (finds similar enti-ties based on the trend of the chosen entity).

Euclidean - It calculates the Euclidean distance where the vector elements are the columns. The square root of the sum of the square of the A and the B vectors for

110

Page 111: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

each element is calculated and then the distances are scaled between -1 and +1. Result = (A-B) · (A-B).

Pearson Correlation - It calculates the mean of all elements in vector X. Then it sub-tracts that value from each element in X and calls the resulting vector A. It does the same for Y to make a vector B. Result = A · B/(|A| · |B|)

Spearman Correlation - It orders all the elements of vector X and uses this order to assign a rank to each element of X. It makes a new vector X' where the i-th element in X' is the rank of Xi in X and then makes a vector A from X' in the same way as A was made from a in the Pearson Correlation. Similarly, it makes a vector B from Y. Result = A · B/(|A| · |B|). The advantage of using Spearman Correlation is that it reduces the effect of the outliers on the analysis.

The entity list created is placed under Analysis in the Experiment Navigator. The Find Similar Entities wizard has four steps.

1. Click Find Similar Entities in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Find Similar Entities (Step 1 of 3)).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c Click the Select button to select the Choose Query Entry.

d Select the Similarity Metric.

e Click the Next button.

Figure 78 Input Parameters page (Find Similar Entities (Step 1 of 3))

3. Enter parameters on the Output View of Find Similar Entities page (Find Similar Entities (Step 2 of 3)).

Visualize the results of the analysis in a profile plot. The abundance profile of the target entity is shown in bold and along with the profiles of the entities whose corre-lation coefficients to the target profile are above the similarity cutoff.

a Click the Change cut-off button to apply new cut-off values.

b Move the slider or type in the Minimum and Maximum values. The default for the Minimum is 0.95 and the default for the Maximum is 1.0.

111

Page 112: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

c Click the Close button.

d Optional - The selected entities can be saved as a custom entity list by clicking Save custom list. 1. Click the Save custom list button. 2. Click Configure Columns to select custom annotations. 3. Click the OK button. 4. Type in descriptive information that is stored with the saved entity list. 5. Click the OK button.

e Click the Next button.

Figure 79 Output View page (Find Similar Entities (Step 2 of 3))

4. Enter information on the Save Entity List page (Find Similar Entities (Step 3 of 3)).

a Type in descriptive information that is stored with the saved entity list.

b Click the Configure Columns button to select custom annotations.

c Click the OK button.

d Click the Finish button.

112

Page 113: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 80 Save Entity List page (Find Similar Entities (Step 3 of 3))

Principal Component Analysis

Principal Component Analysis (PCA) facilitates viewing of data using a separation method. This is fairly useful for data containing the several thousand pieces of infor-mation (features) for each sample. Redundancy can exist in a large collection of compounds, and reducing the dimensionality of the input data is the challenge that PCA overcomes.

Viewing data in 2 or 3 dimensions is easier than viewing it in multiple dimensions. You can either overlook the less important dimensions or combine several dimen-sions to create a manageable number of dimensions. The latter is performed by PCA.

PCA detects the major trend in your data by linearly combining dimensions. Each lin-ear combination produces an Eigen vector that in short represents a fraction of the variability of the samples. The linear combinations (called Principal Axes or Compo-nents) are ordered in decreasing order of the associated Eigen value. Typically, two or three of the top few linear combinations in this ordering serve as very good set of dimensions to project and view the data in a PCA plot. These dimensions capture most of the variability or information in the sample data.

The entity list created is placed under Analysis in the Experiment Navigator. The Principle Component Analysis wizard has four steps.

113

Page 114: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

1. Click Principle Component Analysis in the Workflow Browser.

2. Enter parameters on the Entity List and Interpreta-tion page (PCA (Step 1 of 4)).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c Click the Next button.

Figure 81 Entity List and Interpretation page (PCA (Step 1 of 4))

3. Enter parameters on the Input Parameters page (PCA (Step 2 of 4)).

a Select the value for PCA on. You can select either Entities or Conditions.

b Click a button for the Pruning option.

Typically, only the first few Eigen vectors (principal components) capture most of the variation in the data. The execution speed of PCA algorithm can be greatly enhanced only when a few Eigen vectors are computed. The pruning option deter-mines how many Eigen vectors are computed eventually. You can explicitly spec-ify the exact number by selecting Number of Principal Components option, or specify that the algorithm compute as many Eigen vectors as required to capture the specified Total Percentage Variation in the data.

c Type in the Total percentage variation or the Number of principal components to use for the pruning.

d Mark the Mean centered check box.

e Mark the Scale check box.

The normalization options allow you to normalize all columns to zero mean and unit standard deviation before performing PCA. This is enabled by default. It is recommended to use these normalization options if the range of values in the data columns varies widely.

f Click the Next button.

114

Page 115: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 82 Input Parameters page (PCA (Step 2 of 4))

4. Review the Output views page (PCA (Step 3 of 4)).

Eigenvalues: The Eigen values (E0, E1, E2, etc.) are displayed on the X-axis and their respective percentage contribution to the sample variability is displayed on the Y-axis. The minimum number of principal axes required to capture most of the informa-tion in the data can be gauged from this plot. The red line indicates the actual varia-tion captured by each Eigen value, and the blue line indicates the cumulative variation captured by all Eigen values up to that point.

The minimum value for a PCA Eigen values is (1 x 10-3) / (total number of Principal components).

The maximum value is the square root of the maximum float value handled by your PC.

PCA Scores: This is a scatter plot of data projected along the principal component axes. By default, the first and second PCA components capture the maximum varia-tion of the data. If the data set has parameter assignments, the points are colored with respect to the assignments. It is possible to visualize the separation of assign-ments in the data. You can select different PCA components using the menu for the X and Y axes. You can select entities and save using Save custom list button.

PCA Loadings: Each principal component is a linear combination of the selected col-umns. The relative contribution of each column to an Eigen vector is called its load-ing and is depicted in the PCA loadings plot. The X-axis consists of columns, and the Y-axis denotes the weight contributed to an Eigen vector by that column. Each Eigen vector is plotted as a profile, and it is possible to visualize whether a certain subset of columns exists which overwhelmingly contribute (large absolute value of weight) to an important Eigen vector. This would indicate that those columns are distin-guishing features in the whole data.

Legend: This shows the legend for the respective active window.

a Click the Next button.

115

Page 116: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 83 Output Views page (PCA (Step 3 of 4))

5. Enter information on the Save Entity List page (PCA (Step 4 of 4)).

a Type in descriptive information that is stored with the saved entity list.

b Click Configure Columns to select custom annotations.

c Click the OK button.

d Click the Finish button.

116

Page 117: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 84 Save Entity List page (PCA (Step 4 of 4))

Find Minimal Entities Find Minimal Entities is the machine learning techniques to classify or analyze data to identify the specific features or attributes of the available data which are helpful in making decision. Find minimal masses becomes relevant when you have a large numbers of features and strong intercorrelation among the features.

In Mass Profiler Professional, feature subset selection is done using Forward Selec-tion Algorithm, Backward Elimination Algorithm, and Genetic Algorithm.

The entity list created is placed under Analysis in the Experiment Navigator. The Find Minimal Entities wizard has ten steps.

1. Click Find Minimal Entities in the Workflow Browser.

2. Enter parameters on the Entity List and Interpreta-tion page (Feature Selection (Step 1 of 10)).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c Click the Next button.

117

Page 118: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 85 Entity List and Interpretation page (Feature Selection (Step 1 of 10))

3. Enter parameters on the Choose Selection Algorithm page (Feature Selection (Step 2 of 10)).

a Click one of the buttons for the Choose selection algorithm.

If you click Forward selection algorithm, then after you click Next, the Forward Selection Parameters page (Step 3 of 10) is displayed.

If you click Backward selection algorithm, then after you click Next, the Back-ward Selection Parameters page (Step 4 of 10) is displayed.

If you click Genetic algorithm, then after you click Next, the Genetic Parameters page (Step 5 of 10) is displayed.

b Click the Next button.

Figure 86 Entity List and Interpretation page (Feature Selection (Step 2 of 10))

4. Enter parameters on the Forward Selection Parameters page (Feature Selection (Step 3 of 10)).

a Type the Target size for features. You must enter the target size of the features. By default the value shown is one tenth of the total entities.

b Select an Evaluation metric. Select Overall Accuracy or Minimum Class Accu-racy.

c Type the Target metric value. The default value is 100.

d Select an Evaluation Algorithm. Select one of the Evaluation Algorithms to build class: Naive Bayes, Neural Network, Support Vector Machine, and Axis Parallel Decision Tree. See the Class Prediction Chapter in the Mass Profiler Professional User Manual for more information about these classification algorithms.

e Select the Evaluation metric type. Select Validation Accuracy or Train Accuracy.

f Click the Next button.

118

Page 119: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 87 Entity List and Interpretation page (Feature Selection (Step 3 of 10))

5. Enter parameters on the Backward Elimination Parameters page (Feature Selection (Step 4 of 10)).

If you selected the Backward Elimination Algorithm in the Choose Selection Algo-rithm page (Feature Selection (Step 2 of 10)), then this step is where you specify the algorithm parameters. The parameters are the same as in Forward Selection Algo-rithm.

Backward elimination begins with a full set of features and removes one feature per cycle. The resulting set, with one less feature than the earlier one is evaluated, and the algorithm selects the highest performing candidate (again subject to a specified validation metric). If an empty set is reached or the subsequent removal of any fea-ture only deteriorates the current performance, the search is stopped.

The goal of backward elimination is to consider the contribution of all features ini-tially and then try to remove the most irrelevant features, leaving a smaller and more predictive subset. Since backward elimination starts with all of the features present in the set, it is more computationally intensive than the forward selection algorithm.

a Type the Target size for features. You must enter the target size of the features. By default the value shown is one tenth of the total entities.

b Select an Evaluation metric. Select Overall Accuracy or Minimum Class Accu-racy.

c Type the Target metric value. The default value is 100.

d Select an Evaluation Algorithm. Select one of the evaluation algorithms to build class: Naive Bayes, Neural Network, Support Vector Machine, and Axis Parallel Decision Tree. See the Class Prediction Chapter in the Mass Profiler Professional User Manual for more information about these classification algorithms.

e Select the Evaluation metric type. Select Validation Accuracy or Train Accuracy.

f Click the Next button.

119

Page 120: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 88 Backward Elimination Parameters page (Feature Selection (Step 4 of 10))

6. Enter parameters on the Genetic Algorithm Parameters page (Feature Selection (Step 5 of 10)).

If you select Genetic Algorithm in the Choose Selection Algorithm page (Feature Selection (Step 2 of 10)), then this step is where you specify the algorithm parame-ters. More information about this algorithm may be reviewed in the Genetic Algo-rithm Parameters section of the Mass Profiler Professional User Manual.

a Type the Population size. The default is 25.

b Type the Number of generations. The default is 10.

c Type the Mutation rate. The default is 1.

d Type the Target size (max) for features. The default is 1.

e Select a Fitness metric. Select Overall Accuracy or Minimum Class Accuracy.

f Type the Target fitness value. The default value is 100.

g Select an Evaluation Algorithm. Select one of the evaluation algorithms to build class: Naive Bayes, Neural Network, Support Vector Machine, and Axis Parallel Decision Tree. See the Class Prediction Chapter in the Mass Profiler Professional User Manual for more information about these classification algorithms.

h Select the Evaluation metric type. Select Validation Accuracy or Train Accuracy.

i Click the Next button.

Figure 89 Genetics Algorithm Parameters page (Feature Selection (Step 5 of 10))

120

Page 121: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

7. Enter the parameters on the Feature Set Evaluation Algorithm page (Feature Selection (Step 6 of 10)).

The parameters vary depending on the algorithm selected previously.

a Select a Distribution type.

b Select a Validation type.

c Type the Number of folds. The default value is 3.

d Type the Number of repeats. The default value is 10.

e Click the Next button.

Figure 90 Feature Set Evaluation Algorithm page (Feature Selection (Step 6 of 10))

8. Enter parameters on the Choose Features page (Feature Selection (Step 7 of 10)).

Select the features from the available items list by selecting and clicking on the movement direction arrow to place them in the Selected items list. You can also search for an item from the Match by item drop-down list. By default, all of the fea-tures are selected.

a Click the Next button.

Figure 91 Choose Features page (Feature Selection (Step 7 of 10))

9. Enter parameters on the Forced Features page (Feature Selection (Step 8 of 10)).

Select the features from the available items list by selecting and clicking on the movement direction arrow to place them in the Selected items list. You can also search for an item from the Match by item drop-down list. By default, none of the feature are selected.

a Click the Next button.

121

Page 122: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 92 Forced Features page (Feature Selection (Step 8 of 10))

10. Review the output on the Algorithm Outputs page (Feature Selection (Step 9 of 10)).

The output of the algorithm used in the find minimal mass process is displayed in a spreadsheet as well as a Feature vs. Accuracy plot. In the plot the drop-down list of X and Y axis allows you to select different combination of the parameters and view the plot.

The spreadsheet displays Number of descriptors, Train Overall Accuracy, Train Accuracy in condition 1, Train Accuracy in condition 2, Minimum of Train Accu-racy, Validation Overall Accuracy, Validation Accuracy in condition 1, Validation Accuracy in condition 2, and Minimum of Validation Accuracy. This also consists of options of Save Custom Feature Set and Run Model.

Save Custom Feature Set allows you to open and save the entities in the entity list by selecting the descriptor and clicking on the icon.

Run Model makes the prediction with help of train accuracies. Select Descriptor and click the Run Model icon. Application displays the Prediction in the form of Confu-sion Matrix, Lorenz Curve, Validation Report, and Model Formula.

Model Formula form has Save Model option which allows you to save the model in the specified path.

The number of descriptors continues increasing after each cycle in case of forward selection algorithm.

a Click the Next button.

122

Page 123: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - analysis

Figure 93 Algorithm Outputs page (Feature Selection (Step 9 of 10))

11. Enter parameters on the Feature Selection Analysis page (Feature Selection (Step 10 of 10)):

a Type in descriptive information that is stored with the saved entity list.

b Click the Configure Columns button to select custom annotations.

c Click the OK button.

d Click the Finish button.

Figure 94 Feature Selection Analysis page (Feature Selection (Step 10 of 10))

123

Page 124: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Analysis with Mass Profiler Professional - additional features

Additional features includes Class Prediction, Results Interpretations, and Utilities that allow you to complete your statistical analysis, apply your results, and make fur-ther refinements in your results.

Results Interpretations provide you with tools to analyze your experimental results through Pathway Analysis, Find Significant Pathways, and Find Similarity Lists. Your results are a list of compounds or an entity list that are significantly abundant in your experiment. Mass Profiler Professional provides you with different ways to ana-lyze your entity list and provides you with steps that help you find entities similar to the chosen entities and help you compare the entity lists with pathways.

Class Prediction

Class prediction is a supervised learning method where the prediction algorithm learns characteristic features from known samples (a training set) and establishes a prediction rule to classify new samples (a test set).

Prediction models in Mass Profiler Professional build a model based on the abun-dance or expression profile in any known condition and with the help of this model try to predict the class of an unknown sample. For example, given mass spectrome-try data from samples representing subjects with different kinds of cancer, a model can be built from this data which can predict the cancer type in a new sample.

Mass Profiler Professional provides a workflow to build a model and predict the sample from mass spectrometry data. The process of learning abundance profiles from the sample data is called training. Training requires a data set in which class labels of the various samples are known. Performing statistical validation on classifi-cation methods and model parameters is called validation. Once validated the class prediction signatures can be tested on new samples.

This technique is best used with at least 20 samples or conditions per class. Mass Profiler Professional supports five different class prediction tools:

• Decision Tree (DT)• Neural Network (NN)• Support Vector Machine (SVM)• Naive Bayesian (NB)• Partial Least Square Discrimination (PLSD)

Models built with these algorithms can then be used to classify samples or com-pounds into discrete classes for further functional and/or diagnostic studies.

Information about class prediction can be found in Chapter 7 - Class Prediction: Learning and Predicting Outcomes in the Mass Profiler Professional User Manual.

Build Prediction Model The process of learning abundance profiles from the most significant features in the sample data is called training. Training is the process of building the prediction model. Training requires a data set in which class labels of the various samples are known. Once the prediction model is validated, the class prediction can be tested on new samples using Run Prediction.

124

Page 125: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

The Build Prediction Model launches a wizard with five steps for building a predic-tion model. Statistical information about the prediction model can be found in Sec-tion 7.2.1and thereafter in the Mass Profiler Professional User Manual.

1. Click Build Prediction Model in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Class Prediction (Step 1 of 5)).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Choose button to select the Interpretation. By default, the active inter-pretation of the experiment is selected and shown in the dialog.

c Select the Class prediction algorithm.

d Click the Next button.

Figure 95 Input Parameters page for Class Prediction

3. Enter parameters on the Validation Parameters page (Class Prediction (Step 2 of 5)).

a Choose the parameters for the class prediction model. In most cases, the defaults give acceptable results.

b Select the Pruning method.

c Select the Goodness function.

d Type the Leaf impurity.

e Select the Leaf impurity type.

f Select the Validation type.

g Type the Number of folds.

h Type the Number of repeats.

i Type the Attribute Fraction at nodes.

j Click the Next button.

125

Page 126: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 96 Validation Parameter page for Class Prediction (Step 2 of 5)

4. Review the results on the Validation Algorithm Outputs page (Class Prediction (Step 3 of 5)).

a Review and modify the Prediction Results and Confusion Matrix displays.

b Click the Next button.

Figure 97 Validation Algorithm Outputs page for Class Prediction (Step 3 of 5)

5. Review the Training Algorithm Outputs page (Class Prediction (Step 4 of 5)).

a Review and modify the Prediction Results, Confusion Matrix, Lorenz, and Model Formula displays.

b Click the Next button.

126

Page 127: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 98 Training algorithm outputs for class prediction (Step 4 of 5)

6. Enter information on the Class Prediction Model page (Class Prediction (Step 5 of 5)).

a Type in descriptive information that is stored with the class prediction model.

b Click the Finish button.

Figure 99 Class prediction Model information (Step 5 of 5)

Run Prediction The prediction model is run on the current experiment and the results are output in a table. The prediction model runs on all the samples in the experiment and predict the outcome for each sample in the experiment. The predicted results are shown in the table along with a confidence measure appropriate to the model.

127

Page 128: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

The Run Prediction presents you with a set of model/data sets that are applicable to apply the model. Detail about the prediction model can be found in Section 7.2.2 and thereafter in the Mass Profiler Professional User Manual.

a Click Run Prediction in the Workflow Browser.

b Select the Prediction model/data set to run from the list.

c Double-click a list item to review more information about the prediction model and data set.

d Click the OK button.

Figure 100 Run Prediction dialog

e Click the Predict button.

Figure 101 Output views of the prediction classification

f Click the Close button.

128

Page 129: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Results Interpretation

Mass Profiler Professional provides several tools to analyze your experimental results, extract the most significant features, and perform feature identification.

Pathways Analysis After identifying compounds of interest in Mass Profiler Professional, Pathway Anal-ysis allows you to put the compounds into a biological context.

Mass Profiler Professional supports pathway analysis via the following:• A database of biological and chemical entities, relationships between entities,

and properties of these entities and relationships.• A set of pathway creation algorithms which query biological databases or

other sources of literature with a specified list of entities.• An interactive pathway viewer which allows visualization of these pathways

and overlay of data on these pathways.• A Find Significant Pathway function that helps to determine which of the cre-

ated pathways have significant overlap with a specified list of entities.

More information about class prediction can be found in Chapter 8 - Pathway Analy-sis in the Mass Profiler Professional User Manual.

Install pathway database Mass Profiler Professional supports databases for various organisms via updatable data packages.

Click Annotations > Update Pathway Interactions from the menu to see the list of available organism and update the database for the organism of your choice.

Click Annotations > Pathway Database Statistics to confirm that the pathway data-bases are available.

Perform Pathway Analysis The Perform Pathway Analysis launches a wizard with five steps for building a pre-diction model.

The available Algorithm options for the selection of the Analysis Type of Simple are:

Direct Interactions: Finds interactions that connect the entities in the selected entity list.

Network Targets and Regulators: Finds entities that are upstream and down-stream of two or more entities from the original list.

Network Targets: Finds downstream entity targets that connect two or more enti-ties from the original list.

Network Regulators: Finds upstream entity regulators that connect to two or more entities from the original list.

Network Binders: Finds entities that bind (connected by binding interactions) to two or more entities from the original entity list.

129

Page 130: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Network Modifiers: Finds protein entities that are either regulators or targets of biochemical protein modifications of two or more proteins from the original entity list.

Transcription Regulators: Find protein entities regulating mRNA expression of, or whose expressions are regulated by, two or more entities from the original list.

Transport Regulators: Finds all molecules that are regulating the transport of other molecules.

Metabolism Regulators: Finds molecules that are regulating the metabolism of biomolecules.

Small Molecules: Finds all small molecules, regulators, and targets of two or more entities from the original list.

Biological Processes: Finds all biological process entities connected to two or more entities from the original list.

Shortest Connect: Finds the shortest path that connects all entities in a given list into a single network.

The available Algorithm options for the selection of the Analysis Type of Advanced are:

Direct: Finds interactions that connect the entities in the selected entity list.

Expand: Expands the existing network to include the first-degree neighbors of the selected entities.

Shortest Connect: Finds the shortest path to connect two sets of nodes. The cri-teria use the smallest number of intermediate entities to trace out an uninter-rupted path between any two sub-clusters within a network.

The Pathway Analysis launches a wizard with five steps.

1. Click Pathway Analysis in the Workflow Browser.

2. Enter parameters on the Input Parameter page (Pathway Analysis (Step 1 of 5)).

a Click the Choose button to select the Input List. By default, the active entity list is selected and shown in the dialog.

b Select the Analysis Type. Select Simple for the guided pathway analysis. Select Advanced for changing the parameters for pathway creation.

c Select the Algorithm.

d Click the Next button.

130

Page 131: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 102 Input Parameters page for Pathway Analysis (Step 1 of 5)

If Simple was selected for the Analysis Type, the wizard proceeds directly to the Analysis Result page (Pathway Analysis (Step 4 of 5)) and the filters are set to the following defaults:

• Algorithms type: local/ global• Connectivity: Connectivity relevance: 50 and Connectivity: <= 2• Entity filter: All entities are selected• Quality filter: >=9• Relation filter: All interaction types are selected

3. Review results on the Matching Statistics page (Pathway Analysis (Step 2 of 5)).

This wizard displays how many entities matched from the input Entity List, to the pathway database. The Compound Name, Name, Type (Type of the entity: protein, small molecule etc.), Global Connectivity (total number of interactions in which it participates in the database) and Match Result (indicating status on whether the entity was matched or unmatched into the database) are displayed in a table. A sta-tus dialog shows the number of matched, unmatched and redundant entities.

a Click the Next button.

Figure 103 Matching Statistics page for Pathway Analysis (Step 2 of 5)

131

Page 132: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

4. Enter parameters on the Analysis Filters page (Pathway Analysis (Step 3 of 5)).

This page is displayed when the Analysis Type is Advanced and the Algorithm is Direct Interactions on the Input Parameter page (Pathway Analysis (Step 1 of 5)):

The Analysis Type is selected on the Input Parameter page (Pathway Analysis (Step 1 of 5)).

Type a value between 1 and 10 in the Relation score to use for the matching. You can build or expand a network by including only interactions with a particular quality score. The default score defining interaction quality is set to 9.

b Mark the Select relation type for the interactions to be evaluated. You can select one or more types of interactions to generate the network from the starting list of entities, depending upon the biology of the problem. The default settings use all type of interactions to create the network. For example, to create a network of transcriptional analysis, you can select only expression and promoter binding types of interactions.

c Click the Next button.

Figure 104 Analysis Filters page (Pathway Analysis (Step 3 of 5)) when the Algo-rithm is Direct Interactions

5. Enter parameters on the Analysis Filters page (Pathway Analysis (Step 3 of 5)).

when the Analysis Type is Advanced and Algorithm is Expand Interactions on the Input Parameter page (Pathway Analysis (Step 1 of 5)):

This step only appears if Advanced was selected as the Analysis Type on the Input Parameter page (Pathway Analysis (Step 1 of 5)).

a Type a value between 1 and 10 in the Relation score to use for the matching. You can build or expand a network by including only interactions with a particular quality score. The default score defining interaction quality is set to 9.

b Mark the Select relation type for the interactions to be evaluated. You can select one or more types of interactions to generate the network from the starting list of entities, depending upon the biology of the problem. The default settings use all type of interactions to create the network. For example, to create a network of transcriptional analysis, you can select only expression and promoter binding types of interactions.

c Type a value for Entity local connectivity. This filter allows you to add new enti-ties to a given network by ranking the new entities based upon their local connec-

132

Page 133: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

tivity. The default settings are > 2, to consider only those entities which are connected to 2 or more entities from the starting list.

d Mark the Select entity type for the entities to be evaluated check box. You can preferentially select the types of entities to add to the expanded network, depend-ing upon the biological question. The default settings use all type of entities to create the network. For example, you can choose to study all biological processes and functions regulating the list of compounds. In this case, you expand the list by limiting the newly added entities to Processes and Functions.

e Select Limit analysis results based on.

Local connectivity allows you to add a certain number entities to the given net-work based upon their rank on local connectivity. New entities are ranked with decreasing priority, based upon how many entities they connect within a given list of entities.

Local to global connectivity ratio allows you to add a certain number entities to the given network based upon their local/global connectivity ratio. The ratio is computed for each new entity. Local connectivity is based upon the number of entities to which it connects within a given list and global connectivity is the number of interactions that it participates in the entire database. New entities are ranked with decreasing priority, based upon this local/ global connectivity ratio.

f Type a value for the Maximum number of new entities. This allows you to limit the maximum number of entities to be added to a network. The default is set to 50.

g Click the Next button.

Figure 105 Analysis Filters page (Pathway Analysis (Step 3 of 5)) when the Algo-rithm is Expand Interactions.

133

Page 134: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

6. Enter parameters on the Analysis Filters page (Pathway Analysis (Step 3 of 5)).

This page is only opened when the Analysis Type is Advanced and Algorithm is Shortest Connect on the Input Parameter page (Pathway Analysis (Step 1 of 5)):

This step only appears if Advanced was selected as the Analysis Type on the Input Parameter page (Pathway Analysis (Step 1 of 5)).

a Type a value between 1 and 10 in the Relation score to use for the matching. You can build or expand a network by including only interactions with a particular quality score. The default score defining interaction quality is set to 9.

b Mark the Select relation type check box for the interactions to be evaluated. You can select one or more types of interactions to generate the network from the starting list of entities, depending upon the biology of the problem. The default settings use all type of interactions to create the network. For example, to create a network of transcriptional analysis, you can select only expression and pro-moter binding types of interactions.

c Type a value for Entity global connectivity. This filter allows you add new entities to connect two disconnected network clusters by ranking the new entities base upon their global connectivity. The default settings are 500 and above.

d Mark the Select entity type for the entities to be evaluated check box. You can preferentially select the types of entities to add to the expanded network, depend-ing upon the biological question. The default settings use all type of entities to create the network. For example, you can choose to study all biological processes and functions regulating the list of compounds. In this case, you expand the list by limiting the newly added entities to Processes and Functions.

e Click the Next button.

Figure 106 Analysis Filters page (Pathway Analysis (Step 3 of 5)) when the Algo-rithm is Shortest Connect

134

Page 135: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

7. Review the results on the Analysis Result page (Pathway Analysis (Step 4 of 5)).

This wizard view displays the created pathway, the initial number of entities, the number of new relations, and the number of new entities.

a Review the result page using the various interaction features and views. You can review the information of each entity by launching its property table. More infor-mation on your options to review the results is found in Section 8.5.3 Pathway View in the Mass Profiler Professional User Manual.

b If you double-click any node, its property table opens.

c Click the Next button.

Figure 107 Analysis Result page for Pathway Analysis (Step 4 of 5)

8. Enter parameters on the Save Pathway page (Pathway Analysis (Step 5 of 5)).

This wizard view displays the created pathway, the initial number of entities, the number of new relations, and the number of new entities.

a Type in descriptive information that is stored with the pathway analysis.

b Click the Finish button. The Pathway view is saved in the Experiment Navigator as a child under the selected input entity list.

135

Page 136: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 108 Save Pathway page for Pathway Analysis (Step 5 of 5)

Find Similar Entity Lists You can find entity lists stored in the experiment navigator that contain a significant number of entities in common to the entities in the selected entity list.

1. Click Find Similar Entity Lists in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Find Similar Entities List (Step 1 of 3)):

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Select the Target entity list.

c Select the Type of targets. If you selected Custom for the Target entity list this entry field is disabled; otherwise, you need to select one of the options from the Type of targets list.

d Click the Next button.

136

Page 137: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 109 Input Parameters page for Find Similar Entities List (Step 1 of 3)

3. Set parameters on the Choose Entity Lists page (Find Similar Entities List (Step 2 of 3)).

If you selected Custom for the Target entity list then this step displays the Entity List search wizard and allows you to build search query. From the search results you need to select the entities which get populated in the entity lists.

If you are selected any other option except Custom from the Target entity list then the wizard skips this step and takes you directly to step 3 of 3.

a Click the Next button.

Figure 110 Choose Entity Lists page for Find Similar Entities List (Step 2 of 3)

4. Enter parameters on the Find Similar Entity Lists Results page (Find Similar Entities List (Step 3 of 3)).

This step displays the results of Find Similar Entity List. Entity Lists showing signifi-cant overlap with the entity list selected for analysis are displayed in the left side of the spreadsheet.

a Click the Change cut-off button to modify the level of significance used for the results.

b Type a new p-value Cut-off.

c Click the OK button. The spreadsheet of results is automatically updated with respect to the new p-value.

d Click Custom Save button to save a subset of the significant pathways.

137

Page 138: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

e Click the Finish button.

Figure 111 Find Similar Entity Lists Results page for Find Similar Entities List (Step 3 of 3)

Find Significant Pathways This feature finds all pathways that contain a significant number of entities in com-mon to the entities in the selected entity list. Overlap denotes the number of com-mon entities between the selected list and the pathway. Commonness is determined via the presence of a shared identifier. In other words, Find Significant Pathways allows you to determine in which biological pathways exists a significant enrich-ment of the masses of interest based on the input entity list.

Note that each pathway has an associated organism and a pathway can only be searched with Find Significant Pathways if the organism of the active experiment matches the organism of the pathway.

The Find Significant Pathways launches a wizard with two steps.

1. Click Find Significant Pathways in the Workflow Browser.

2. Enter parameters on the Input Parameters page (Find Significant Pathways (Step 1 of 2)).

a Click the Choose button to select the Entity List. By default, the active entity list is selected and shown in the dialog.

b Click the Next button.

138

Page 139: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 112 Input Parameters page for Find Significant Pathways

3. Enter parameters on the Find Significant Pathways Results page (Find Significant Pathways (Step 2 of 2)):

This view of the wizard shows a number of significant pathways satisfying the p-value cut-off (default is p=0.05). The resulting pathways are shown in two windows. The window on the left shows Significant Pathways (i.e., those which have an entity in common with the input entity list) and the window on the right shows Non-Signif-icant Pathways (i.e., those which do not have any entity in common with the input entity list). For the resulting significant pathway window on left, the Pathway Name, Number of Entities, Matched with technology, Matched with Entity list, and p-values are displayed.

For the Non-Significant pathways, Pathway Name and Number of Entities are dis-played.

Overlap of entities is determined based upon the presence of either a CAS ID or SwissProt ID in the relevant pathway. Once the number of common entities is deter-mined, the p-value computation works exactly as in the case of Find Similar Entity Lists.

a Click the Change cut-off button to modify the level of significance used for the results.

b Type a p-value Cut-off.

c Click the OK button. The spreadsheet of results is automatically updated with respect to the new p-value.

d Click the Custom Save button to save a subset of the significant pathways.

e Click the Finish button.

139

Page 140: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 113 Find Significant Pathways Results page (Step 2 of 2)

Extract Relations via NLP NLP (Natural Language Processing) is a mechanism to create new pathways from the literature or local documents. NLP can be run directly on PubMed abstracts or on local pdf, doc, and html files. NLP first recognizes entities in sentences and then perform information extraction to identify relationships between the document and the selected entity list. The entities NLP can recognize are restricted to those pack-aged in the pathway interaction databases. Additional entities imported into these databases, such as via BioPAX import, do not participate in NLP. Mass Profiler Pro-fessional determines which organism database to use based on the currently active experiment.

The Extract Relations via NLP launches a wizard with four steps.

1. Click Extract Relations via NLP in the Workflow Browser.

2. Choose type of input data page (NLP Wizard (Step 1 of 4)).

a Select the Input source. The source can be PubMed abstracts or user specified files.

b Type search terms in the Search text.

c Click the Next button.

Figure 114 Choose type of input data page for NLP Wizard (Step 1 of 4)

140

Page 141: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

3. View Tagged Content page (NLP Wizard (Step 2 of 4)):

The target documents containing the search terms are identified and tagging is per-formed using the entity dictionary. All molecular and biological processes/functions in which entities are present in the Mass Profiler Professional database are tagged in these searched documents. The tagged entities are assigned a color that matched the default settings. The name or the PubMed ID of the target articles are displayed in the left hand column. Each tagged document can be separately viewed in the right hand column by selecting the corresponding document name or PubMed ID.

a Click the Next button.

Figure 115 View Tagged Content page for NLP Wizard (Step 2 of 4)

4. Review results on the Pathway View page (NLP Wizard (Step 3 of 4)).

Extracted relations from the relational inference rules used to parse relations between the tagged entities are displayed. The total number of extracted relations is displayed at the top of the view.

a Click the Next button.

141

Page 142: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 116 Pathway View page for NLP Wizard (Step 3 of 4)

5. Enter information on the Pathway created from NLP page (NLP Wizard (Step 4 of 4)).

Extracted relations from the relational inference rules used to parse relations between the tagged entities are displayed. The total number of extracted relations is displayed at the top of the view.

a Type in descriptive information that is stored with the pathway.

b Click the Finish button.

142

Page 143: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 117 Pathway created from NLP page for NLP Wizard (Step 4 of 4)

Export for Recursion Export the entities that have been identified as the most relevant to the differential analysis for recursive finding in MassHunter Qualitative Analysis.

a Click Export for Recursion in the Workflow Browser.

b Click the Choose button to select the Entity List for exporting.

c Click on the entity list in the Choose EntityList window.

d Click the OK button.

e Select a file from the Output File combo box.

Note: Do not type a file name at this location.

f Select the folder to save the file.

g Type the File name and click the Save button.

h Click the OK button.

143

Page 144: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 118 Export for recursion entity list selection and output file directory and name entry

IDBrowser Identification Saves the selected entity list into a CEF file format, opens Agilent MassHunter ID Browser, and imports the saved CEF file for identification. Only AMDIS, ChemSta-tion, and Non-mzXML data experiment entity lists are exported by IDBrowser.

Once identification is completed, ID Browser saves and sends back and an identified CEF file. This CEF which is imported into the Mass Profiler Professional experiment and annotations are updated.

For an unidentified experiment, this application saves the files in the specified for-mats. For all data sources the file is saved in CEF format. You can select the entity list and save them.

1. Mass Profiler Professional a Click IDBrowser Identification in the Workflow Browser.

b Click the Choose button to select the Entity List to be identified.

c Click the OK button.

Figure 119 Choose the entity list to be identified

2. MassHunter ID Browser MassHunter ID Browser opens, and you are prompted by two dialogs. Press F1 to get help in the ID Browser program or click Help > Contents from the main menu. When ID Browser completes identification of the features, you click Save and Return.

a Select an options for the Compound selection.

b Mark the Compound identification methods.

c Click the Next button if you want to change the defaults setting for the ID Browser identification method. Otherwise click the Finish button to start ID Browser.

144

Page 145: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

d Enter the parameters that you wish to use for the ID Browser identification method.

e Click the Finish button to start ID Browser.

f Click the Save and Return button to complete ID Browser and return the CEF file to Mass Profiler Professional.

Figure 120 First MassHunter ID Browser dialog

Figure 121 Second MassHunter ID Browser dialog

3. Mass Profiler Professional a Type in descriptive information that is stored with the saved entity list in the Entitylist Inspector dialog

b Click the Configure Columns button to select custom annotations. Click the OK button when you have configured the columns.

c Click the OK button to complete ID Browser Identification.

145

Page 146: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 122 ID Browser screen after identification is complete

Export for identification For an unidentified experiment, this application allows you to save selected entities for identification with another program. For all data sources the file is saved in CEF format.

a Click Export Inclusion List in the Workflow Browser.

b Click the Choose button to select the Entity List for exporting.

c Click the Output File selection box to choose the file path and type in a file name to store the list.

d Click the OK button.

Figure 123 Export for identification entity selection and file specification

Export Inclusion List Export inclusion parameters in the entity list as specified. This is applicable to Mass-Hunter Qualitative Analysis, MassHunter Qual GC Scan, AMDIS, and ChemStation experiment creation.

The Export Inclusion List launches a wizard with two steps.

146

Page 147: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

1. Click Export Inclusion List in the Workflow Browser.

2. Enter parameters on the Entity List and Path Chooser page (Export Inclusion List (Step 1 of 2)).

a Click the Choose button to select the Entity List for exporting.

b Click the Output File selection box to choose the file path and type in a file name to store the list.

c Click the Next button.

Figure 124 Entity List and Path Chooser page for Export Inclusion List (Step 1 of 2)

3. Enter parameters on the Filtering Parameters for Inclusion List page (Export Inclusion List (Step 2 of 2)).

a Type values in the Retention time window. The default values are 0.0 percent and 0.25 min.

b Mark the Limit number of precursor ions per compound to check box, if desired. By default this check box is cleared.

If you mark the Limit number of precursor ions per compound to check box, type in a value for the number of ions. The default value is 1 ion.

c Mark the Minimum ion abundance if you desire to set minimum abundance. By default, this check box is cleared.

If you mark the Minimum ion abundance check box, type in the minimum ion counts. The default value is 2000 counts.

If the sample data is from MassHunter Qualitative Analysis, all filters are enabled and available. Sample data from other experiment types can not be processing using the Positive and Negative ions filter, Exported m/z value filter, and Charge state Preference filter. These filters are disabled for non-MassHunter Qualitative Analysis sample data.

d Select the Exported m/z value as the monoisotopic value or the value repre-sented by the ion with the highest abundance.

e Select the Charge state preference option as highest abundance charge state or as specified by the charge state preference order.

If you select the Specify charge state preference order, the Charge state prefer-ences become active.

Positive and Negative ions check box selections become available.

f Mark the Positive ions and Negative ions that are included in the filter.

g Click the Finish button.

147

Page 148: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Figure 125 Filtering Parameters for Inclusion List page for Export Inclusion List (Step 2 of 2)

The filters are applied in following order: 1. Ions filter (Positive Ions and Negative Ions filter). Peaks which contain the

selected ions only are passed. e.g. if user has selected only +H and +Na ions then peak with ion species like M+H, M+2H, M+Na, M+2Na, .. M+H+Na are selected for further screening.

2. Peaks with same charge state and same ion species are grouped in one iso-tope cluster.(eg. M+H, M+H+1, M+H+2 in one cluster). From this cluster only one peak is exported depending upon the Exported m/z value filter. If this filter is set to Export monoisotopic m/z then the peak similar to the M+H (or M+Na, M+2H, etc) is selected from isotope cluster. Otherwise, if the filter is set to Export highest abundance m/z then the peak is exported which has the maxi-mum abundance in each isotope cluster.

3. Charge State Preference Filter. If this is set to Prefer highest abundance charge state then the peaks per compound are listed in descending order of abundance. Otherwise if the filter is set to Specify charge state preference order then only those peaks whose charge states are specified in the Active window are passed and ordered as specified. For example, if you specified Charge states as 2, 3, and >3 then peaks with charge states 1 are filtered out and the peaks with charge states 2, 3, and >3 are passed. Those peaks are ordered as all peaks with charge state 2 then all peaks with charge state 3 and finally those with a charge state >3 in descending order of abundance.

4. If Minimum Ion abundance filter is checked then those peak ions whose abun-dance is greater than the specified value are passed for further screening.

5. Limit number of precursor ions, if selected, results in the export of the top number of peaks/compounds as specified.

148

Page 149: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Import Annotations This option allows you to import annotations from CEF files. When you invoke this option, it allows you to select the CEF files and update annotations for compounds whose Mass Profiler Professional ID match that of the compounds in the imported CEF file.

a Click Import Annotations in the Workflow Browser.

b Click the Annotation File selection box to choose the file containing the annota-tions.

c Click the OK button.

Figure 126 Import Annotations file selection

Utilities

Remove Entities with missing signal values

This utility allows you to create a copy of any entity list that contains only entities that are present in every sample. The utility removes all entities from the input entity list that are missing in any single sample and creates a new entity list as a child to the input entity list. Removing entities that are missing from any sample is particu-larly important to statistical treatments that are sensitive to processing zero abun-dance values, such as Clustering and Class Prediction analyses.

a Click Remove Entities with missing signal values in the Workflow Browser.

b Click the Choose button to select the Entity List.

c Select an Entity List.

d Click the OK button.

e Click the OK button.

The new entity list is added as a child of the selected entity list (the parent).

Figure 127 Remove entities with missing signal values dialog

149

Page 150: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Metabolomics Workflow Analysis with Mass Profiler Professional - additional features

Differential expression guided workflow

This option allows you to launch the Guided Workflow Wizard on all samples as described in “Analysis with Mass Profiler Professional - guided workflow” on page 86. The Guided Workflow utility can be launched at any time during your data analysis. It is recommended to use the Guided Workflow to help develop good data analysis habits.

150

Page 151: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Process Summary

This chapter summarizes the metabolomics workflow.

What is metabolomics? 152 The Agilent metabolomics workflow 154

Page 152: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Process Summary What is metabolomics?

What is metabolomics? Metabolomics is the process of identifying and quantifying of all metabolites of an organism in a specified biological state. The metabolites of an organism represent a chemical “fingerprint” of the organism at a well defined state where the state is defined by a set of specific circumstances.

Metabolomics involves the study of two types of variables:

Independent variables: When one or more of the attributes of the state of the organism are known to you in advance of sampling those attributes are each referred to as an independent variable.

The known state of the organism or externalities to which the organism was sub-jected may be referred to as a parameter, parameter name, condition, or attribute value during the various steps of the metabolomic data analysis. The known state and externalities represent independent variables in the statistical analyses.

Dependent variables: The biological response to the change in the attributes can be manifest in a change in the metabolic profile. Each metabolite that undergoes a change in expressed concentration is referred to as a dependent variable.

The metabolites in a sample may be individually referred to as a compound, fea-ture, element, or entity during the various steps of the metabolomic data analysis. Metabolites represent dependent variables in the statistical analyses.

When hundreds to thousands of metabolites exist, chemometric data analyses can be employed to reveal accurate and statistically meaningful correlations between the independent variables and the metabolic profile. Meaningful information learned from the metabolite responses can subsequently be used for clinical diagnostics, for understanding the onset and progression of human diseases, and for treatment. This analysis is the metabolomics workflow illustrated in Figure 128.

Figure 128 Agilent Metabolomics Workflow Flow Diagram

152

Page 153: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Process Summary What is metabolomics?

The hypothesis The first and most important step in the metabolomic process is to formulate the question of correlation that is answered by the analysis - the hypothesis. This ques-tion is a statement that proposes a possible correlation, for example a cause and effect, between a set of independent variables and the resulting metabolic profile. The metabolomic process is used to prove or disprove the hypothesis.

Natural variability Before the metabolomics workflow is begun it is important to understand how any one sample represents the population as a whole. Because of natural variability, the uncertainties associated with both the measurement and associated with the popu-lation, no assurance exists that any single sample from a population represents the mean of the population. Thus, increasing the sample size greatly improves the accu-racy of the sample set in describing the characteristics of the population.

Replicate sampling Sampling the entire population is not typically feasible due to constraints imposed by time, resources, and finances. On the other hand, fewer samples increase the probability of concluding a false positive or false negative correlation. At a minimum, it is recommended that your metabolomics analysis include ten (10) or more repli-cate samples for each attribute value for each condition in your metabolomics study.

System suitability System suitability involves collecting data in such a way to that the data provides you a means to evaluate and compensate for drift and instrumental variations to assure quality results. The techniques of (1) retention time alignment, (2) intensity normalization, (3) chromatographic deconvolution, and (4) baselining employed by the metabolomics workflow produce the highest quality results. However, the pro-cess cannot compensate for excessive drift in the acquisition parameters. The best results are achieved by maintaining your instrument and using good chromatogra-phy.

Improving data quality Improved data quality for metabolomics analysis comes from matching the sampling methodology to the experimental design so that replicate data is collected to span the attribute values for each condition. A larger number of samples appropriate to the population under study results in a better answer to the hypothesis. An under-standing of the methodologies used in sampling and using more than one method of sample collection have a positive impact to the significance of the results.

153

Page 154: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Process Summary What is metabolomics?

The Agilent metabolomics workflow

The Agilent metabolomics workflow presented in Chapter 3 consists of four main components using Agilent MassHunter Qualitative Analysis, Agilent MassHunter DA Reprocessor, and Agilent Mass Profiler Professional:

1. Find Compounds by Molecular Feature (MassHunter Qualitative Analysis and DA Reprocessor). Molecular features are extracted from the data based on their mass spectral and chromatographic characteristics. The process is referred to as Molecular Feature Extraction (MFE). MassHunter Qualitative Analysis is used to create the MFE method and evaluate the suitability of the method on single samples representative of the experiment. MassHunter DA Reprocessor is used to automate MFE using the method created by Mass-Hunter Qualitative Analysis on all of the samples in a single batch operation.

The molecular features for each sample data file are exported as a CEF file for importing and filtering by Mass Profiler Professional.

2. Feature Selection (Mass Profiler Professional). The untargeted molecular fea-tures are imported into Mass Profiler Professional and filtered to retain only the most statistically significant features. This process is aided by the experi-ment setup and guided workflow provided by Mass Profiler Professional.

The most significant molecular features for each sample data file are exported as a CEF file for recursive finding by MassHunter Qualitative Analysis.

3. Find Compounds by Formula (MassHunter Qualitative Analysis). Importing the most statistically significant features back into MassHunter Qualitative Analy-sis as targeted features results is a significant improvement in finding of the features from the samples. This repeated feature analysis is referred to as recursion. Improved reliability in finding leads to a concomitant improvement in the accuracy of the analysis.

The recursively found, most significant molecular features for each sample data file are exported as a CEF file for importing, filtering, and analysis by Mass Profiler Professional.

4. Analysis (Mass Profiler Professional). Recursive finding of the most statisti-cally significant features are processed by Mass Profiler Professional into a final statistical analysis and interpretation. The results from the final interpre-tation may be used to prove or disprove the hypothesis.

154

Page 155: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Process Summary What is metabolomics?

Figure 129 The main functional areas of Mass Profiler Professional

155

Page 156: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference

Definitions 157References 166

Page 157: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

Definitions

Alignment Adjustment of the chromatographic retention time of eluting components based on the elution of specific component(s) that are (1) naturally present in each sample or (2) deliberately added to the sample through spiking the sample with a known com-pound or set of compounds that do not interfere with the sample to improve the data correlation among data sets.

AMDIS Acronym for automated mass spectral deconvolution and identification system developed by NIST (http://www.amdis.net).

Amino acid Biologically significant molecules that contain a core carbon positioned between a carboxyl and amine group in addition to an organic substituent. The dual carboxyl and amine functionalities allow for the formation of peptides and proteins.

ANOVA Abbreviation for analysis of variance which is a statistical method that allows for the simultaneous comparison of the means between two or more attributes or parame-ters of a data set. ANOVA is used to determine if a statistical difference exists between the means of two or more data sets and thereby prove or disprove the hypothesis. See also t-Test.

Attribute Another term for an independent variable. Referred to as a parameter or parameter name and is assigned a parameter name during the various steps of the metabolo-mic data analysis.

Attribute value Another term for one of several values within an attribute for which exist correlating samples. Referred to as a condition or a parameter value and is given an assigned value during the various steps of the metabolomic data analysis.

Bayesian A term used to refer to statistical techniques named after the Reverend Thomas Bayes (ca. 1702 - 1761).

Bayesian inference The use of statistical reasoning, instead of direct facts, to calculate the probability that a hypothesis may be true. Also known as Bayesian statistics.

Bioinformatics The use of computers, statistics, and informational techniques to increase the understanding of biological processes.

Biomarker A chemical compound, organic molecule that is an indication of a biological state and which by analytical measurement of its presence and concentration in a biolog-ical sample indicates a normal or altered function of higher level biological activity.

Carbohydrate An organic molecule consisting entirely of carbon, hydrogen, and oxygen.

157

Page 158: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

Cell The fundamental unit of an organism consisting of several sets of biochemical func-tions within an enclosing membrane. Animals and plants are made of one or more cells that combine to form varying tissues and perform varying living functions.

Census Collection of a sample from every member of a population.

Cheminformatics The use of computers and informational techniques (such as analysis, classification, manipulation, storage, and retrieval) to analyze and solve problems in the field of chemistry.

Chemometrics A science employing mathematical and analytical processes to extract information from chemical data sets. The processes involve interactive applications of tech-niques employed in disciplines such as multivariate statistics, applied mathematics, and computer science to obtain meaningful information from complex data sets. Chemometrics is typically used to obtain meaningful information from data derived from chemistry, biochemistry and chemical engineering. Agilent Mass Profiler Pro-fessional is designed to employ chemometrics processes to CG/MS and LC/MS data sets to obtain useful information.

Child A subset of information that is created by an algorithm or other reason from an orig-inal set of information. The information is an entity list in Mass profiler Professional. An original entity list is referred to as the parent of one or more child entity lists.

Co-elution When compounds elute from a chromatographic column at nominally the same time making the assignment of the observed ions to each compound difficult.

Composite spectrum The compound spectrum generated to represent the molecular feature and used by Mass Profiler Professional for recursive analysis and ID Browser.

Compound A metabolite and may be individually referred to as a compound, feature, element, or entity during the various steps of the metabolomic data analysis.

Condition Another term for one of several values within a parameter for which exist correlating samples. Condition may also be referred to as a parameter value during the various steps of the metabolomic data analysis. See also attribute value.

Data Information in a form suitable for storing and processing by a computer that repre-sent the qualitative or quantitative attributes of a subject. Examples include GC/MS and LC/MS data consisting of fundamentally of time, ion m/z, and ion abundance from a chemical sample.

Data processing Conversion of data into meaningful information. Computers are employed to enable rapid recording and handling of large amounts of data, i.e. Agilent MassHunter Workstation and Agilent Mass Profiler Professional.

158

Page 159: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

Data reduction See reduction.

Dependant variable An element in a data set that can only be observed as a result of the influence from the variation of an independent variable. For example, a pharmaceutical compound structure and quantity may be controlled as two independent variables while the metabolite profile presents a host of independent small molecules products that make up the independent variables of a study.

Determinate Having exact and definite limits on an analytical result that provide a conclusive degree of correlation of the subject to the specimen.

Element A metabolite and may be individually referred to as a compound, feature, element, or entity during the various steps of the metabolomic data analysis.

Endogenous Cause, development, or origination from within an organism by the process that occur within the organism.

Entity A metabolite and may be individually referred to as a compound, feature, element, or entity during the various steps of the metabolomic data analysis.

Experiment Data acquired in an attempt to understand causality where tests or analyses are defined and performed on an organism to discover something that is not yet known, to demonstrate as proof of something that is known, or to find out whether some-thing is effective.

Externality A quality, attribute, or state that is external to the specimen under evaluation.

Extraction The process of retrieving a deliberate subset of data from a larger data set whereby the subset of the data preserves the meaningful information as opposed to the redundant and less meaningful information. Also known as data extraction.

Feature Independent, distinct characteristics of phenomena and data under observation. Features are an important part of the identification of patterns, pattern recognition, within data whether processed by a human or artificial intelligence, e.g. the latter may be via a computer process such as Agilent MassHunter Workstation and Agi-lent Mass Profiler Professional. In metabolomics analysis a feature is a metabolite and may be individually referred to as a compound, feature, element, or entity during the various steps of the metabolomic data analysis.

Feature extraction The reduction of data size and complexity through the removal of redundant and non-specific data by using the important variables (features) associated with the data. Careful feature extraction yields a smaller data set that is more easily pro-cessed without any compromise in the information quality. This is part of the princi-pal component analysis process employed by Agilent Mass Profiler Professional.

159

Page 160: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

Feature selection The identification of important, or non-important, variables and the variable relation-ships in a data set using both analytical and a priori knowledge about the data. This is part of the principal component analysis process employed by Agilent Mass Pro-filer Professional.

Filter The process of establishing criteria by which entities are removed (filtered) from fur-ther analysis during the metabolomics workflow.

Filter by Flag A flag is a term used to denote the quality of an entity within a sample. A flag indi-cates if the entity was detected in each sample as follows: Present means the entity was detected, Absent means the entity was not detected, and Marginal means the signal for the entity was saturated

Hypothesis A meaningful piece of knowledge, such as a proposed explanation for observable phenomena that may be supported by the analytical data. Statistical data analysis is performed to quantify the probability that the hypothesis is true. Also known as the scientific hypothesis.

Hypothetical A statement based on, involving, or having the nature of a hypothesis for the pur-poses of serving as an example and not necessarily based on an actuality.

Identified compound Chromatographic components that have an assigned, exact identity, such as com-pound name and molecular formula, based on prior assessment or comparison with a database. See also Unidentified Compound.

Independent variable An essential element, constituent, attribute, or quality in a data set that is deliber-ately controlled in an experiment. For example, a pharmaceutical compound struc-ture and quantity may be controlled as two independent variables while the metabolite profile presents a host of independent small molecules products that make up the independent variables of a study. An independent variable may be referred to as a parameter and is assigned a parameter name during the various steps of the metabolomic data analysis.

Inorganic compound Non carbon and non biological origin compounds such as minerals and salts.

Interpretation Groupings of samples based on the experimental parameters. By default, when you open an experiment, the “All Samples” interpretation is active. You can click on another interpretation to activate it.

Mass variation The mass to charge (m/z) resolution of the mass spectral data allows for the identi-fication of compounds with nearly identical and identical chromatographic behavior by adjusting the m/z range for extracting ion chromatograms.

Mean The numerical result of dividing the sum of the data values by the number of individ-ual data observations.

160

Page 161: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

Metabolism The chemical reactions and physical processes whereby living organisms convert input compounds into living compounds, structures, energy and waste.

Metabolite Intermediate compounds and products produced as part of metabolism. In order to distinguish metabolites from lager biological molecules, known as macromolecules such as proteins, DNA and others, metabolites are typically under approximately 1000 Da. A metabolite may be individually referred to as a compound, feature, ele-ment, or entity during the various steps of the metabolomic data analysis.

Metabolome The complete set of small-molecule metabolites that may be found within a biologi-cal sample. Small molecules are typically in the range of 50 to 600 Da.

Metabolomics The process of identification and quantification of all metabolites of an organism in a specified biological situation. The study of the metabolites of an organism pres-ents a chemical “fingerprint” of the organism under the specific situation. See meta-bonomics for the study of the change in the metabolites in response to externalities.

Metabonomics The metabolic response to externalities such as drugs, environmental factors, and disease. The study of metabonomics by the medical community may lead to more efficient drug discovery and may lead to individualized patient treatment. Meaningful information learned from the metabolite response can be used for clinical diagnos-tics or for understanding the onset and progression of human diseases. See metabo-lomic for the identification and quantitation of metabolites.

Normalization A technique used to adjust the ion intensity of mass spectral data from an absolute value based on the signal measured at the detector to a relative intensity of 0 to 100 percent based on the signal of either (1) the ion of the greatest intensity or (2) a spe-cific ion in the mass spectrum.

Null hypothesis The default position taken by the hypothesis that no effect or correlation of the inde-pendent variables exists with respect to the measurements taken from the samples.

Observation Data acquired in an attempt to understand causality where no ability exists to (1) control how subjects are sampled and/or (2) control the exposure each sample group receives.

One-hit wonder An entity that appears in only one sample, is absent from the replicate samples, and does not provide any utility for statistical analysis. Entities that are one-hit wonders may be filtered using Filter by Flags.

Organic compound Carbon-based compounds, often with biological origin.

Organism A group of biochemical systems that function together as a whole thereby creating an individual living entity such as an animal, plant, or microorganism. The individual entities may be multicellular or unicellular. See also specimen.

161

Page 162: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

p-value The probability of obtaining a statistical result that is comparable to or greater in magnitude to the result that was actually observed, assuming that the null hypothe-sis is true. The null hypothesis is stated that no correlation exists between the inde-pendent variables and the measurements taken from the samples. Rejection of the null hypothesis is typically made when the p-value is less than 0.05 or 0.01. A p-value of 0.05 or 0.01 may be restated as a 5% or 1%, respectively, chance of rejecting the null hypothesis when it is true. When the null hypothesis is rejected, the result is said to be statistically significant meaning that a correlation exists between the independent variables and the measurements as specified in the hypothesis.

Parameter Another term for an independent variable. Referred to as a parameter or parameter name and is assigned a parameter name during the various steps of the metabolo-mic data analysis. See also condition and attribute.

Parameter value Another term for one of several values within a parameter for which exist correlating samples. Parameter value may also be referred to as a condition during the various steps of the metabolomic data analysis. See also attribute value.

Parent A set of information that when process by an algorithm or other reason creates one or more subsets of information. The information is an entity list in Mass profiler Pro-fessional. A subset entity list is referred to as the child of a parent entity list.

Peptide Linear chain of amino acids that is shorter than a protein. The length of a peptide is sufficiently short that they are easily made synthetically from the constituent amino acids.

Peptide bond The covalent bond formed by the reaction of a carboxyl group with an amine group between two molecules, e.g. between amino acids.

Permutation Any of the total number of subsets that may be formed by the combination of indi-vidual parameters among the independent variables. For example the number of per-mutations of A and B in variable Φ in combination with X, Y, and Z in variable θ equals six (6 = 2 x 3) and may be represented as AX, AY, AZ, BX, BY, and BZ. Note that the combinations of parameters within a variable are not relevant such as AB, XY, XZ, and YZ.

Polarity The condition of an effect as being positive or negative, additive or subtractive, with respect to some point of reference, such as with respect to the concentration of a metabolite.

Polymer A molecule formed by the covalent bonding of a repeating molecular group to form a larger molecule.

Pooled sample When the amount of available biological material is very small samples may be com-bined into a single sample (pooled) and then split into different aliquots for multiple analyses. By pooling the sample, sufficient material exists to obtain replicates analy-

162

Page 163: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

ses of each sample where formerly there was insufficient material to obtain repli-cate analytical results. The trade-off loss of information about the biological variation that was formerly present in each unique sample is offset by a gain in sta-tistical significance of the results.

Principal component Transformed data into axes, principal components, so that the patterns between the axes most closely describe the relationships between the data. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possi-ble. The principle components often may be viewed, and interpreted, most readily in graphical axes with additional dimensions represented by color and/or shape repre-senting the key elements (independent variables) of the hypothesis. This is part of the principal component analysis process employed by Agilent Mass Profiler Profes-sional.

Principal component analysis The mathematical process by which data containing a number of potentially corre-lated variables is transformed into a data set in relation to a smaller number of vari-ables called principal components that account for the most variability in the data. The result of the data transformation leads to the identification of the best explana-tion of the variance in the data, e.g. identification of the meaningful information. Also known as PCA.

Protein Linear chain of amino acids whose amino acid order and three-dimensional struc-ture are essential to living organisms. Also know as a polypeptide.

Proteomics The study of the structure and function of proteins occurring in living organisms. Proteins are assemblies of amino acids (polypeptides) based on information encoded in the genes of an organism and are the main components of the physiolog-ical metabolic pathways of the organism.

Quality A feature, attribute, and/or characteristic element that makes an analytical result of a sample representative to a high degree of certainty of the larger specimen.

Recursive Reapplying the same algorithm to a subset of a previous result in order to generate an improved result.

Recursive finding A three-step process in the metabolomics workflow that improves the accuracy in finding statistically significant features in sample data files. Step 1: Find untargeted compounds by molecular feature in MassHunter Qualitative Analysis. Step 2: Filter the molecular features in Mass Profiler Professional. Step 3: Find targeted com-pounds by formula in MassHunter Qualitative Analysis. Importing the most statisti-cally significant features identified using Mass Profiler Professional back into MassHunter Qualitative Analysis as targeted features improves the accuracy in find-ing these features from the original sample data files.

Reduction The process whereby the number of variables in a data set is decreased to improve computation time and information quality. For example, an extracted ion chromato-

163

Page 164: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

gram obtained from GC/MS and LC/MS data files. Reduction provides smaller, view-able and interpretable data sets by employing feature selection and feature extraction. Also know as dimension reduction and data reduction. This is part of the principal component analysis process employed by Agilent Mass Profiler Profes-sional.

Regression analysis Mathematical techniques for analyzing data to identify the relationship between dependent and independent variables present in the data. Information is gained from the estimation, regression, of the sign and proportionality of the effects of the inde-pendent variables on the dependent variables. This is part of the principal compo-nent analysis process employed by Agilent Mass Profiler Professional. Also known as regression.

Replicate Collecting multiple identical samples from a population to that when the samples are evaluated a value is obtained that more closely approximates the true value.

Sample A part, piece, or item that is taken from a specimen and understood as being repre-sentative of the larger specimen (e.g., blood sample, cell culture, body fluid, aliquot) or population. An analysis may be derived from samples taken at a particular geo-graphical location, taken at a specific period of time during an experiment, and taken before or after a specific treatment.

Specimen An individual organism, e.g., a person, animal, plant, or other organism, of a class or group that is used in representation of a whole class or group. See also organism.

Standard A chemical or mixture of chemicals selected for use as a basis of comparing the quality of analytical results or for use to measure and compensate the precise offset or drift incurred over a set of analyses.

Standard deviation A measure of variability among a set of data that is equal to the square root of the arithmetic average of the squares of the deviations from the mean. A low standard deviation vale indicates that the individual data tend to be very close to the mean, whereas a high standard deviation indicates that the data is spread out over a larger range of values from the mean.

State A set of circumstances or attributes characterizing a biological organism at a given time. A few sample attributes may include temperature, time, pH, nutrition, geogra-phy, stress, disease, and controlled exposure.

Statistics The scientific process employed in manipulating numerical data from scientific experiments to derive meaningful information. This is part of the principal compo-nent analysis process employed by Agilent Mass Profiler Professional.

Subject Something, such as chemical or biological sample taken from a specimen, or a whole specimen, that are made to undergo a treatment, experiment, or an analysis for the purposes of further understanding.

164

Page 165: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference Definitions

Survey Collection of samples from a less than the entire population in order to estimate the population attributes.

t-Test A statistical test to determine whether the mean of the data differs significantly from that expected if the samples followed a normal distribution in the population. The test may also be used to assess statistical significance between the means of two normally distributed data sets. See also ANOVA.

Unidentified compound Chromatographic components that are only uniquely denoted by their mass and retention times and which have not been assigned an exact identity, such as com-pound name and molecular formula. Unidentified compounds are typically produced by feature finding and deconvolution algorithms. See also Identified Compound.

Variable An element in a data set that assumes changing values, e.g. values that are not con-stant over the entire data set. The two types of variables are independent and dependent.

Volume The area of the extracted compound chromatogram (ECC). The ECC is formed from the sum of the individual ion abundances within the compound spectrum at each retention time in the specified time window. The compound volume generated by MFE is used by Mass Profiler Professional to make quantitative comparisons.

165

Page 166: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference References

References

Manuals • Agilent Mass Profiler Professional - Manual (Agilent publication n/a, March 2009)

• Agilent MassHunter Workstation Software Qualitative Analysis - Familiarization Guide (Agilent publication G3336-90007, Revision B, February 2011)

• Agilent MassHunter Workstation Software Quantitative Analysis - Familiarization Guide (Agilent publication G3335-90061, Fourth Edition, April 2010)

• Agilent MassHunter Mass Profiler Software - Quick Start (Agilent publication G3297-90005, Third Edition, April 2010)

Primers • Proteomics: Biomarker Discovery and Validation (Agilent publication 5990-5357EN, February 11, 2010)

• Metabolomics: Approaches Using Mass Spectrometry (Agilent publication 5990-4314EN, October 27, 2009)

Application Notes • An LC/MS Metabolomics Discovery Workflow for Malaria-Infected Red Blood Cells Using Mass Profiler Professional Software and LC-Triple Quadrupole MRM Confirmation (Agilent publication 5990-6790EN, November 19, 2010)

• Profiling Approach for Biomarker Discovery using an Agilent HPLC-Chip Coupled with an Accurate-Mass Q-TOF LC/MS (Agilent publication 5990-4404EN, October 20, 2009)

• Metabolite Identification in Blood Plasma Using GC/MS and the Agilent Fiehn GC/MS Metabolomics RTL Library (Agilent publication 5990-3638EN, April 1, 2009)

• Metabolomic Profiling of Bacterial Leaf Blight in Rice (Agilent publication 5989-6234EN, February 14, 2007)

Technical Overviews • Metabolomics at Agilent: Driving Value with Comprehensive Solutions (Agilent publication n/a, December 2010, http://www.agilent.com/labs/features/2010_metabolomics.html)

• Superior Molecular Formula Generation from Accurate-Mass Data (Agilent publication 5989-7409EN, January 4, 2008)

166

Page 167: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

Reference References

Presentations • Metabolomics LCMS Approach to: Identifying Red Wines according to their vari-ety and Investigating Malaria infected red blood cells (Agilent publication n/a, November 3, 2010)

• Small Molecule Metabolomics (Agilent publication n/a, November 3, 2010)

• Presentation: Metabolome Analysis from Sample Prep through Data Analysis (Agilent publication n/a, November 3, 2010)

Product Brochures • Emerging Insights: Metabolomics at Agilent (Agilent publication 5990-6048EN, September 10, 2010)

• Integrated Biology from Agilent: The Future is Emerging (Agilent publication 5990-6047EN, September 1, 2010)

• Agilent Mass Profiler Professional Software (Agilent publication 5990-4164EN, December 22, 2009)

• Agilent Metabolomics Laboratory: The breadth of tools you need for successful metabolomics research (Agilent publication 5989-5472EN, January 31, 2007)

• Agilent Fiehn GC/MS Metabolomics RTL Library (Agilent publication 5989-8310EN, December 5, 2008)

• Agilent METLIN Personal Metabolite Database (Agilent publication 5989-7712EN, December 31, 2007)

Training Videos • ! MassHunter Interactive Training MPP - Aug 2009 (TS, NK, DP)• ! MassHunter Interactive Training MPP Wine Study - Apr 2010 (FK)

Software • Agilent Mass Profiler Professional Software B.02.01 or later• Agilent MassHunter Workstation Qualitative Analysis Software B.03.01 or B.04.00

or later• Agilent MassHunter Workstation Quantitative Analysis Software B.03.02 or later

167

Page 168: Metabolomics - Agilent MS Connect You Begin Introduction 6 of performing metabolomics data analysis using Agilent MassHunter Qualitative Analysis and Agilent Mass Profiler Professional

© Agilent Technologies, Inc. 2011Printed in USA, August 2011

*5990-7067EN*

5990-7067EN

www.agilent.com