bionumerics - biosystematica

16
BioNumerics ® THE UNIVERSAL PLATFORM FOR DATABASING AND ANALYSIS OF ALL BIOLOGICAL DATA www.applied-maths.com

Upload: others

Post on 03-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BioNumerics - BIOSYSTEMATICA

BioNumerics®

THE UNIVERSAL PLATFORM FOR DATABASING AND ANALYSIS OF ALL BIOLOGICAL DATA

www.applied-maths.com

Page 2: BioNumerics - BIOSYSTEMATICA

BioNumerics

Organisms, samples

Identification

Export

Clustering

Dimensioning

Statistical tools

Database sharing

Animals, plants, microbial strains or communities, fungi, tissue, samples, etc.

1-D Fingerprints Electrophoresis gels, densitometric records, HPLC, spectrophotometry, etc.

Character sets Phenotypic test panels, antibiotic resistance profiles, microarrays, etc.

Sequences DNA, RNA, and protein sequences

2-D gels Two-dimensional protein gels

Input Import, conversion, image acquisition, normalization of gels, assembly of sequences, etc.

BioNumerics Database Different experiments and descriptive information linked to database entries

Quick similarity search Construction of libraries Neural networks

Dendrograms from individual experiments and composite data sets Phylogenetic tree construction

Principal components analysis Discriminant analysis Self-organising maps

Cluster and group significance Group validation techniques Multivariate analysis Congruence of techniques

Professionel printing of reports and analyses Export as enhanced metafiles or bitmaps Customized reporting using scripts

Peer-to-peer exchange Client-Server projects

i

Page 3: BioNumerics - BIOSYSTEMATICA

3BioNumerics

I N T R O D U C T I O N

The advent of high-throughput sequencers, microarrays, MALDI, and numerous

other fast and automated molecular typing techniques has made it possible to ef-

fortlessly produce millions of data points for each single sample under study. As easy

it is to generate massive amounts of data, as difficult and challenging it has become

to manage the data and extract meaningful signal out of it. Even more challenging is

the consensus analysis of data from different experimental techniques in order to

obtain more conclusive answers. The BioNumerics software platform addresses

these challenges by its four fundamental achievements:

1 Import and, where appropriate, automated batch-processing of any kind of bio-

data, from 1-D and 2-D electrophoresis gels or spectrometry profiles to sequences,

microarrays, or phenotype characters.

2 A relational multi-user database environment for lab-wide storage and retrieval of

experimental and descriptive information.

3 Powerful querying, data mining and exploration, analysis, comparison, and visu-

alization.

4 Integrated networking, data exchange, and Internet-connectivity in a peer-to-peer

or client-server environment.

BioNumerics is the most complete and powerful solution for databasing and com-

parative analysis of biodata. The software has gained worldwide recognition by daily

use in many research sites, including universities, hospitals and public health centers,

food, drug and pharmaceutical industries, and a wide range of federal and private

laboratories involved in typing, quality control, screening, testing, breeding, etc.

Page 4: BioNumerics - BIOSYSTEMATICA

C O N C E P T S

The uniqueness of BioNumerics consists in the combina-tion of a rich databasing platform with analysis tools for all existing biological data types. The software combines a powerful, multi-user database environment specifically designed for biodata with the most advanced tools for the analysis of patterns and fingerprints, character arrays, sequences, curves, and 2-D gels. The biological entities of the database, i.e. the database entries, can be any biologi-cal sample under study, including bacterial or viral strains, animals, plants, fungi, tissue, or any other organic samples for which experimental data can be obtained. The con-cept of the database allows various experiments of differ-ent nature to be entered for the same sample or strain (entry). As a result, multiple experimental data can be

explored and compared among the entries studied, and group-ings or identifications can be obtained for any combination of database entries and experi-ments available.

The experimental data can be subdivided into six classes, which include all possible experiment types employed to express relationships in biology: (1) 1-D densitomet-ric patterns, called f ingerprint types, (2) character-based data, called character types, (3) DNA, RNA, and protein sequences, called sequence types, (4) 2-D electrophoresis gels, called 2-D gel types, (5) curve readings that express an evolution (trend) of one parameter in function of an-other (e.g. kinetic readings), called trend data types, and (6) matrix types, including similarity and distance matrices. Within each of these generic experimental classes, the user can create custom experiment types with particular settings. The six experiment classes are the basis of the modular subdivision of the BioNumerics software.

Fingerprint types

Any densitometric record seen as a profile of peaks or bands

can be considered as a fingerprint type. Examples are elec-

trophoresis patterns, gas chromatography or HPLC profiles,

spectrophotometric curves, MALDI, SELDI, etc. Through its

easy and powerful script language, BioNumerics can import

and process virtually any type of fingerprint data from any

manufacturer.

Electrophoresis is an important component in studying

relationships in biology; therefore, comprehensive tools for

preprocessing electrophoresis fingerprints, both from slab gels

and capillary sequencers, are incorporated into BioNumerics.

These tools include reading graphical and densitometric file

formats from image files and automated sequencers, lane

finding, normalization (alignment of patterns), band finding

and quantification, band matching, etc.

The quality and completeness of electrophoresis fingerprint

analysis in BioNumerics is illustrated by the fact that the fa-

mous GelCompar II software, with all its functions and pos-

sibilities, is entirely contained in BioNumerics’ Fingerprint

types application.

In addition to the range of advanced functions for gel analysis,

BioNumerics provides comprehensive tools for automated

preprocessing and analysis of other fingerprint data such as

MALDI, dHPLC, and chromatogram files from automated

sequencers. The software also offers a number of specialist

plugins for electrophoresis-based applications such as VNTR

or MLVA, HDA or CSCE, spa-typing, AFLP-based breeding,

etc.

Character types

Using the character types, it is possible to define any array

of named characters, binary or continuous, with fixed or un-

defined length. The size of a character type in BioNumerics

can range from one single character (e.g. a morphological fea-

ture) to microarray experiments of many thousands of gene

expression values. Further adjustable features include the

range of the characters, the number of digits, the color scale,

similarity coefficients used for comparison, etc. Examples of

character types include fatty acid profiles, metabolic assimila-

Page 5: BioNumerics - BIOSYSTEMATICA

5BioNumerics

Button bars can be docked or removed as desired.

Individual buttons can be added or removed.

Database panels can be docked conveniently or removed if not used. Panels can also be tabbed with other panels to optimize display usage.

The BioNumerics main window

All database components are described by useful

information fields that can be shown or hidden, queried,

sorted, etc. The user can define custom fields.

Clear icons in menus and buttons aid quick access to frequently used tools.

Creating ‘Levels’ in a database allows for a richer database structure and increased flexibility in data organisation.

Page 6: BioNumerics - BIOSYSTEMATICA

tion or enzyme activity test panels such as API, Biolog, Vitek,

antibiotics resistance profiles, morphological and biochemi-

cal features, microarrays and gene chips, etc. Character sets

can also be the result of processed data from other sources,

for example, copy numbers in electrophoresis-based MLVA

or allele numbers in sequence-based MLST. BioNumerics’

powerful script language and ODBC functions allow for di-

rect import of data from external databases or from text-

formatted files or Excel spreadsheets.

Sequence types

Within the sequence types, the user can enter sequences of nu-

cleic acids (DNA and RNA) and amino acids. BioNumerics recog-

nizes widely used sequence file formats such as EMBL, GenBank,

and Fasta, with the possibility to import user-selected header

tags as information fields. In addition, BioNumerics’ powerful

sequence assembler tool allows direct import of raw chromato-

gram files from automated sequencers. The assembler has both

an excellent alignment engine and a smart, user-friendly interface.

The program is fully scriptable, allowing for automated batch-

processing in high-through-

put sequencing projects

such as for typing and sur-

veillance. Complete gene

assembly projects with

aligned chromatograms

can be saved into projects

and popped up with a single

mouse-click on a sequence stored in the BioNumerics data-

base. Each sequence type in BioNumerics can be stored with

its own reference sequence, and with specific alignment and

clustering settings.

BioNumerics offers probably the finest and most comprehen-

sive sequence alignment and clustering tools that currently ex-

ist for PCs. It combines clustering of thousands of nucleotide

or protein sequences of almost unlimited length with mul-

tiple alignment and display of homology matrices. The versa-

tile user interface allows sequences in multiple alignments to

be displayed as raw chromatogram files as well as translated

protein sequences, and direct editing is possible in any visual-

ization. Multiple alignments associated with dendrograms can

be edited manually in drag-and-drop mode, and a multistep

undo/redo function makes editing even more convenient.

In addition to well-established alignment algorithms de-

scribed in the literature, the software contains extremely fast

and reliable algorithms elaborated at Applied Maths.

BioNumerics’ sequence alignment application is an invaluable

tool for SNP and mutation analysis. SNPs or mutations are

screened for through up to many thousands of aligned se-

quences and the software statistically calculates the probabil-

ity of each SNP based upon the quality of the base assignments

and the curves in the chromatogram files. Various filters allow

for screening SNPs with specific thresholds or other features

such as the type of mutation they induce. By looking at se-

quencer chromatograms directly, the user has excellent con-

trol over the probability of each potential SNP locus.

Complete alignments, including all SNP and subsequence

search listings, can be saved and re-edited at any time. Ad-

ditional sequences can be added to existing projects.

The software offers a wide range of phylogenetic clustering

techniques and various tools for the estimation of the signifi-

cance and reliability of clusters, which are discussed below

under the Analysis and comparison functions.

2-D gel types

The 2-D gel analysis module in BioNumerics is a fully featured

application for complete analysis and databasing of 2-D gels.

Applied Maths’ experience in image analysis has been fully

exploited to achieve more reliable automatic gel alignments

than ever obtained so far. A project-based interface allows

Page 7: BioNumerics - BIOSYSTEMATICA

7BioNumerics

for the automated batch processing of multiple gels, including

experiments with repeats or multiplexed gels such as DIGE.

In addition, interactive overlay images with gels shown in

different colors allow the user to manually correct normal-

izations and detect unique and common spots at a glance.

BioNumerics allows protein spots from 2-D gels to be identi-

fied and stored in the database. As such, 2-D gel information

can be analyzed in an unparalleled way, using all the available

querying, clustering, identification, and ordination techniques

available in BioNumerics.

Trend data types

Trend data types include all types of sequential readings that

express an evolution of one parameter in function of another.

Unlike character data, the measurements are not indepen-

dent but together form a curve, through which a function can

be fitted. The most prevalent trend data experiments are

kinetic readings, i.e. the measurement of a parameter, e.g.,

a concentration of a product, in function of time. Examples

are enzymatic activity measurements, real-time PCR, growth

curves, etc. BioNumerics of-

fers a large number of curve

fit models, ranging from lin-

ear, logarithmic and Gaussian

functions to more complex models such as Logistic growth

and Gompertz. In addition, a number of curve-derived pa-

rameters can be calculated to compare curve data in a sen-

sible way. These parameters can be calculated dynamically

as character values and used for clustering, identification and

statistics purposes.

Matrix types

With matrix types, it is possible to import externally gener-

ated similarity or distance matrices, providing similarity be-

tween entries revealed directly by the technique, or by other

software. These matrices can be linked to the database en-

tries in BioNumerics and they are used in conjunction with

other information to obtain classifications and identifications.

A typical example of a native matrix type is a table of DNA

homology values.

Composite data sets

Each single database entry, a strain, organism, or sample, can

have several experiments of different type linked to it. For

example, a bacterial strain in the database could be charac-

terized by a PFGE pattern, a 16S rDNA sequence, an anti-

biotics resistance profile, and seven housekeeping genes. A

plant cultivar or variety could have an AFLP pattern and a

microarray experiment linked to it. There is virtually no limit

to the number and variety of experiments that can be linked

to a single object under study.

Page 8: BioNumerics - BIOSYSTEMATICA

resis is called a reference system. The concept of reference

systems also makes it possible to automatically and reliably

remap experiments run under different conditions or using

different reference markers into any other. This important

feature makes it possible for different labs to exchange and

compare electrophoresis data obtained with different condi-

tions or setups.

Reliable quantification of bands or peaks is often a require-

ment in molecular research, in genetic breeding, and for

quantitative comparisons. BioNumerics calculates best-fit-

ting Gaussian curves for 1-D peaks, and 2-D images can even

be quantified by determining the contours of the bands. A

regression of known calibration bands can be calculated; re-

sulting in a reliable estimate of concentration.

Defining bands or peaks on patterns can often be a criti-

cal and time-consuming task. BioNumerics offers accurate

band/peak search algorithms that are amenable to all types

of patterns through a number of adjustable parameters. The

software allows bands/peaks within certain intensity thresh-

olds to be marked as uncertain, in which case they are neither

considered as a match, nor as a mismatch in comparisons. For

techniques where the band/peak intensity differs in function

of the size (e.g. EtBr stained gels such as PFGE), a peak inten-

sity regression can be created based upon processed data-

base patterns. The software uses the obtained regression to

define peaks with much higher accuracy.

Zoom-sliders in all images, convenient buttons, tool tips,

floating menus, and multilevel undo/redo features make the

processing of gels easy and highly surveyable and give the

user easy and quick access to the wealth of advanced features

available in BioNumerics.

Numerous other features such as spot removal, 2-D and

1-D background subtraction, spot removal, filtering, spectral

analysis, alignment distortion bars, optimization & tolerance

statistics, have made BioNumerics the absolute standard for

fingerprint analysis in environments where speed, volume

and reliability are critical issues. As a last important feature,

the script language in BioNumerics allows any action involved

in gel processing to be executed from a script, which makes

it possible to introduce various levels of automation in the

gel analysis procedure. In environments where large numbers

of standardized gels are run, this feature forms an invaluable

basis for low cost – high through put routine analysis.

When more than one experiment is available for a set of

entries, it may be interesting to generate an overall table of

characters, which includes all the characters of the available

experiments, or a selection made by the user. The result is a

seventh class of experiments, the so-called composite data

set. A composite data set may include character types, se-

quences, 1-D or 2-D gels, curve parameter data, and can be

used for clustering, identification, or statistical analysis just

like a single experiment type.

Analysis of 1-D fingerprints

During more than 15 years, Applied Maths has built an unpar-

alleled experience and leadership in electrophoresis typing

and analysis. With the Fingerprint types module, BioNumer-

ics offers the most comprehensive and reliable platform that

exists for the analysis of 1-D profiles, including electrophore-

sis fingerprints, MALDI and SELDI profiles, chromatography,

spectrophotometry, HPLC, and virtually every type of densi-

tometric records that can be used for comparison purposes.

The software handles 8-bit, 12-bit, and 16-bit TIFF files as well

as densitometric curves from capillary sequencers, scanners,

and spectrophotometers. Convenient wizards enable the

user to define new fingerprint types and choose optimal set-

tings for normalization, resolution, background subtraction,

smoothing, band finding, etc. The whole process of analyzing

a run or gel, starting with track preprocessing, normalization,

band or peak finding, and ending with quantification, is con-

tained in a powerful tab-based window, allowing the user to

re-edit the processing at any stage without losing any editing

done in another step. In addition, the software can option-

ally record history files, keeping track of any changes made.

Reference peaks or bands used for alignment can be given

a name or a size value (e.g., molecular weight or length in

base pairs), which is used by the software to calculate the size

regression. The full information of reference peaks or bands

used for the normalization of a specific type of electropho-

Page 9: BioNumerics - BIOSYSTEMATICA

9BioNumerics

D ATA B A S I N G

The backbone of BioNumerics is a powerful relational da-

tabase, specifically designed for storing and retrieving bio-

logical data. By default, the software will create Microsoft™

Access databases, which are suitable for most purposes and

occasional multi-user access. For high volume databasing,

lab-wide access, permission control, automatic backup etc.,

BioNumerics will also manage a number of professional da-

tabase engines such as Oracle™, SQL Server™, PostgreSQL,

MySQL, DB2™.

The rich and flexible database structure allows information

to be added at numerous levels. For example, an organism

can have its own descriptive information fields (up to 150),

and can also have a number of attachments associated with it.

These include text files, images, Word, Excel and PDF files,

and HTML/XML files or URLs.

Experiments linked to the organism, for example the gel

pattern and the gel file in which the pattern occurs, can have

their own descriptive information fields. Even comparisons,

subsets, libraries, and other objects can have associated

information fields. To add even more flexibility, multiple

Levels can be defined within a database. As an example in

clinical diagnostics, one level could hold the patients, a second

level the samples that were taken from these patients, and

in a third level the experimental data obtained from these

samples. Each level can have its own associated descriptive

information fields, and levels are interrelated to each other

through Relations.

One of the highly appreciated database features that char-

acterize BioNumerics, is its advanced querying tools. Query

components can be created based upon database fields,

ranges of fields, availability of experiments, presence of bands

or characters, character values, subsequences, etc. These

components can be combined using logical operators such

as AND, NOT, OR, XOR, giving rise to complex queries that

are nicely represented in a smart interactive diagram. Really

no search query is too complex to be realized in BioNumer-

ics. Queries can be saved to be reused or modified at any

time. For full control, experienced users can also enter SQL

query statements.

Every biological experiment, including gel patterns, densito-

metric curves, carbohydrate assimilation panels, antibiotics

resistance profiles, blots, microarrays, 2-D gels and sequen-

ces, can instantly be visualized with a single mouse-click and

comparisons between experiments can be shown. Charac-

ter-type experiments can be visualized in table format or

graphically to resemble e.g. commercial test kits.

Page 10: BioNumerics - BIOSYSTEMATICA

In addition to the six experiment type modules, BioNumerics

offers three comparison type modules:

(i) Cluster Analysis and phylogeny, (ii) Non-hierarchic group-

ing techniques and statistics, and (iii) Identification and deci-

sion networks. Each of these modules is very comprehensive

in terms of functionality and possibilities, so that only their

most important features can be highlighted in the following

paragraphs.

Cluster Analysis and Phylogeny

Since the availability of computers to biologists, cluster analy-

sis, also called unsupervised learning, has been a fundamen-

tal tool in bioinformatics. Putting together the concepts of a

relational database, the contribution of multiple techniques,

and a range of powerful clustering algorithms has resulted in a

clustering module with unique capabilities in Bio Numerics.

The Comparison window. This crucial window in Bio-

Numerics presents a comprehensive overview of all available

experiments for a selection of entries and enables the user to

show and compare any combination of experiments.

Similarity or distance matrices and dendrograms can be cal-

culated for any selected experiment, and the obtained group-

ings can be compared with patterns or characters obtained

from other experiments. A variety of similarity and distance

coefficients and clustering methods are available, in order to

provide the most appropriate clustering for all data types.

Composite cluster analysis. Composite clusterings can be gen-

erated from selected combinations of experiments, and vari-

ous methods can be used to obtain a combined dendrogram.

Similarities can be adopted from the individual experiments and

averaged by user-defined weights, or weights determined by the

program, based upon the number of characters available in each

experiment. Alternatively, all characters from the individual ex-

periments can be pooled to form one global data set, which can

be clustered. Advanced mathematical algorithms allow the cal-

culation of a consensus similarity matrix and dendrogram based

upon individual matrices from different experiments.

Dendrogram functions. BioNumerics offers a comprehen-

sive set of features for clustering and mining of complex data

sets. Numerous viewing modes and editing tools such as two-

way zoom-sliders, swapping and abridging of branches, re-

rooting of trees, displaying data (characters, patterns, curves

or sequences) in various modes, make the interpretation of

large cluster analyses easier.

Incremental clustering. The “incremental clustering” algo-

rithm allows batches of entries to be pasted, or deleted from

existing dendrograms without having to recalculate the en-

tire similarity matrix. BioNumerics automatically updates the

existing matrix and rebuilds the dendrogram accordingly, so

that adding or deleting entries becomes a matter of seconds

instead of minutes or hours.

Special attention has been paid to the incremental construc-

tion of multiple alignments of sequences. In order to main-

tain alignments edited by the user, new sequences can be

realigned while preserving the existing alignment.

Dendrogram signif icance tools. Several statistical methods

are available for evaluating the confidence level of a global

tree, and of each individual branch. These methods include

the standard deviation and co-phenetic correlation at each

branching level and the root, bootstrap analysis at each

branching level of a rooted or unrooted tree, and the Jack-

knife method.

BioNumerics also can search and show all degeneracies on

a dendrogram and display a consensus dendrogram that en-

compasses all degeneracies. In a similar way, consensus den-

drograms can be calculated from different techniques. Par-

titioning methods provide an alternative way to discovering

group structures in complex data sets.

Clustering of characters. Not only can entries be clustered

based upon their common and different characters, but

also characters can simultaneously be clustered based upon

the swapped data matrix. This approach results in a trans-

versal clustering or two-way clustering, a combined view in

which both database entries and characters are clustered,

and which allows the user to easily reveal the characters that

determine and distinguish groups of related entries.

A N A LY S I S A N D C O M P A R I S O N

Page 11: BioNumerics - BIOSYSTEMATICA

11BioNumerics

Phylogenetic inference. In addition to pair-wise clustering

techniques such as UPGMA, Ward, Single Linkage, Complete

Linkage and Neighbor Joining, BioNumerics offers true phy-

logenetic clustering algorithms based upon evolutionary op-

timization criteria. These include the Generalized Maximum

Parsimony method and the Maximum Likelihood algorithm.

Parsimony can be combined with bootstrap analysis whereas

maximum likelihood offers the Likelihood Ratio Test. Both

methods result in an unrooted “seaweed” dendrogram,

which can be converted into a “pseudo-rooted” tree after as-

signment of a root. To correct phylogenetic distance scaling,

the Jukes & Cantor or Kimura 2 parameter correction factors

can be chosen.

Minimum Spanning Trees. Whereas parsimony and maxi-

mum likelihood techniques are suitable for inferring deeper

phylogenetic relationships, the Minimum Spanning Tree (MST)

algorithm allows short-term divergence and micro-evolution

in populations to be reconstructed based upon sampled

data. The MST technique as implemented in BioNumerics

is an excellent tool for analyzing genetic subtyping data such

as derived from MLST, MLVA and other allele-comparison

techniques. The MST interface offers great interaction with

the database and other techniques and is the ideal platform

for plotting epidemic divergence against other factors such as

geographical distribution, date of sampling, serotypes, etc.

Dimensioning techniques and statistics

Under dimensioning techniques, we classify all techniques

that place the entries in a two- or more dimensional space,

rather than imposing a hierarchical, bifurcating structure like

a dendrogram.

Principal Components Analysis (PCA). This technique starts

directly from a character table to obtain groupings in a multi-

dimensional space. Any combination of axes can be displayed

in two- or three dimensions.

Multi-Dimensional Scaling (MDS). Rather than starting from

the data set, MDS uses the similarity matrix as input, which

has the advantage over PCA that it can be applied directly

to banding patterns. The MDS algorithm iteratively optimizes

the distances between the entries in the MDS space accord-

ing to the similarity values of the matrix.

The advanced presentation modes of both PCA and MDS

produce fascinating three-dimensional graphs in an X-Y-Z

coordinate system, which can rotate in real time to enhance

the perception of the spatial structures. All dimensioning

techniques in BioNumerics provide great interactive fea-

tures, making it possible to select, add or remove entries di-

rectly on the plot, display additional database information as

colors or labels, relate groupings directly to discriminatory

characters, etc.

Page 12: BioNumerics - BIOSYSTEMATICA

Self-Organizing Maps (SOM). Basically being a type of neural

network, a SOM is able to place many thousands of entries in

a two-dimensional representation, a map, according to over-

all relatedness. For complex data sets with large numbers

of entries, SOM analysis is to be preferred over traditional

clustering. An interesting option of a SOM is that unknown

entries can be placed in an existing map with very little com-

puting time, which offers a quick and easy-to-interpret iden-

tification tool. BioNumerics was the first software to apply

this exciting technique to biological relatedness study and for

identification.

Discriminant Analysis and MANOVA. These very useful sta-

tistical analysis methods allow the relation between groups of

entries and characters to be discovered, and the significance

of such groups to be determined. The groups can be clusters

derived from a dendrogram, or any user-defined selections

of entries (e.g., by origin, species, serotype …).

Statistical tests and charts. Easy and intuitive tool to perform

a number of parametric and non-parametric statistical tests

(Chi-square test, T-test, Wilcoxon signed-ranks test, Kruskal-

Wallis test, ANOVA, Pearson correlation test, Spearmann

rank-order test…). For each input data type, the software dis-

plays the suitable tests and the available plot types.

Libraries and identification

Identification, also called supervised learning or classification,

is no doubt one of the most important techniques in bioin-

formatics. The possibility of identifying unknown organisms

based upon various available experiment data sets is also a

big step forward realized in BioNumerics, leading to more

faithful consensus identifications. The same range of similarity

and distance coefficients available for cluster analysis can be

used for identification.

Identif ication libraries. Identification can be as quick and

simple as sorting a large list of database entries according to

similarity with an “unknown” entry. However, the use of li-

braries can make the identification between complex groups

much more reliable. An identification library is a collection

of units, each of which consists of one or more entries of the

same group (taxon, subtype, variant, ecotype…).

The identification of unknown samples depends on the simi-

larity to the available library units. A very easy and surveyable

identification report lists the identifications obtained by all

individual data sets. The number of closest matches shown

can be expanded or reduced, and full detailed information on

the identification of a specific entry is shown instantly with a

simple click. Mathematical and statistical methods allow the

estimation of the reliability and the relevance of each identi-

fication case.

Page 13: BioNumerics - BIOSYSTEMATICA

13BioNumerics

A detailed pairwise comparison can be obtained between

any two entries from the database, which lists all the experi-

ments that both entries share, together with the percentage

similarity. With a simple mouse-click on the experiment type,

the gelstrips, character sets, or aligned sequences or what-

ever data entered for both entries are shown together. As an

interesting alternative to classical similarity-based identifica-

tion, BioNumerics allows neural networks to be generated

for each experiment type. For large databases containing

groups that are difficult to distinguish, neural networks can

be the quickest and most reliable identification tool.

Decision networks. Decision networks are one of the most

versatile and powerful tools in BioNumerics, allowing the

user to build automated workflows to make decisions, pre-

dict features, perform queries, fill in fields, create graphs and

plots, and much more.

A decision network is an operational workflow that carries

out one or more [logical] operations and/or actions on the

database. The network is built of Operators as building blocks

that form the Nodes of the network:

• Input operators to retrieve specific, usually experimental,

data

• String, Value and Sequence operators, which perform a

manipulation on data types

• Boolean operators, which combine one or more binary

states into a new binary state

• Output actions, performing a specific action on the data-

base, for example writing a field.

The operators of a decision network together form an

easy-to-use construction kit that allows one to build automat-

ed decision or action workflows, with endless possibilities.

Analysis of congruence between techniques

When comparisons are made between groupings based upon

different techniques, the question arises to what extent there

exists any congruence between these different techniques.

Another interesting aim is to find which technique is the clos-

est to the consensus classification, since this technique will in

general be the most reliable for identifying the organisms or

samples under study. This is another analytical tool offered

by BioNumerics: similarity matrices obtained from different

techniques are compared in a pairwise manner by comparing

corresponding similarity values by either Kendall’s Tau coef-

ficient or the product-moment correlation. This results in a

congruence matrix, expressing the global similarity or con-

gruence between different techniques. This matrix in turn can

be clustered into a dendrogram, now grouping techniques ac-

cording to congruence.

Pairwise comparisons between any two techniques are ob-

tained by plotting the corresponding similarity values in an

X-Y diagram.

Page 14: BioNumerics - BIOSYSTEMATICA

Such plots are very useful to reveal the taxonomic level

or depth of one technique compared to another: it shows

whether one technique is discriminative at a lower or higher

level than another technique and provides insight into the

limitations and benefits of each technique in building identifi-

cation strategies.

Database sharing

Today, the exchange of information among different laborato-

ries is of the utmost importance in the life sciences. The need

to exchange biodata has become particularly urgent in clini-

cal and epidemiological research and surveillance networks.

BioNumerics offers a powerful solution to this important is-

sue with its integrated Database Sharing Tools, available as a

separate module.

Peer-to-peer data exchange. The Database Sharing Tools

allow BioNumerics users to exchange information at a peer-

to-peer level by simply making a selection of database en-

tries, and clicking the information fields and experiment data

to be exported in XML format. Received XML files can be

imported and directly analyzed together with other database

entries. BioNumerics automatically recognizes which experi-

ments are compatible. XML exchange files can be optionally

compressed and encrypted.

Client-Server approach. BioNumerics’ advanced client-serv-

er system is the perfect solution for collaborative research

projects, networks, and private initiatives of any size, where

central databases are made available to a restricted or unre-

stricted number of client users. Each BioNumerics software

package that contains the Database Sharing Tools comes as

a client version, which can connect and communicate with

a BioNumerics Server using TCP/IP. A direct connection is

established between the Server and the Client allowing up-

loading and downloading of database entries, interactive que-

rying, and automatic identification of profiles uploaded by the

client. Using the script language both at client and server site,

the most sophisticated implementations can be designed.

Examples include automatic creation and broadcasting of re-

ports and notices, or automatic alerts of members in surveil-

lance networks.

Geographical mapping. In many research projects, especially

epidemiological, biological data is closely linked to geographi-

cal data. BioNumerics’ Database Sharing Tools enable a Geo

plugin to be installed, offering a simple yet powerful way to

map the results from queries, comparisons, identifications

etc. on geographical maps. Geographical information with

database entries can be provided as city names, postal or zip

codes, or geographical coordinates. Entries can be plotted in-

dividually or as stacked bar graphs or pie charts, using differ-

ent colors according to groups defined in the database. The

powerful geographic tools of Google™ Maps and versatile

search, select, and query interface of BioNumerics together

make the Geo plugin a very useful and interactive asset.

Plugin tools

Although BioNumerics is a versatile and comprehensive plat-

form for the analysis and databasing of any type of biological

data, a number of applications are too specific to be provided

in a generic environment. This is the case for import and ex-

port tools, but also for a number of cutting-edge techniques

that require continuous updating of the analysis tools to keep

up with the latest developments. Therefore, most technique-

oriented functionality has been enabled as Plugin applications.

These plugins are well-documented in separate manuals and

are officially supported by Applied Maths. Free specialist plu-

gins are available for antibiotic susceptibility

analysis, HIV drug resistance analysis, MLST analysis, spa

Page 15: BioNumerics - BIOSYSTEMATICA

15BioNumerics

of processed densitometric data data as peak MW, RF, height

and/or surface tables. Scriptable import of any densitometric

or peak table record available in text format.

Import of character data: Easy wizard-driven import of

character data from text tables, Excel spreadsheets or da-

tabases. Plugin tools available for import of most common

phenotypic test panels and automated identification systems

such as fatty acid profiling. Scriptable import of any character

array available in text format or contained in a database or

spreadsheet. RGB channel-quantification of grid-based char-

acter data scanned as 8-bit to 24-bit TIFF images, such as mi-

croplates, phenoypic test panels, DNA arrays, etc.

Import of sequences: Processing and contig construction

in BioNumerics Assembler of multichannel chromatogram

files from automated sequencers (Applied Biosystems, Beck-

man, Amersham) and text SCF or binary files. Compatible

with EMBL, GenBank, FASTA sequence formats for import

of annotated sequences (header descriptions, features and

qualifiers). Import of aligned sequences possible. Scriptbased

import of less common file formats.

Import of database information: Import of information fields

from any text file type, spreadsheet or database using the

import plugin. Direct link with SQL and ODBC compatible

databases.

Printing and export: Professional print reports in color or

grayscale. Each graphical or text-oriented print job can be

copied to the clipboard for import in other Windows soft-

ware, or can be saved as bitmap file with adjustable resolu-

tion. Creation of custom graphics or text reports possible

using scripts.

Script language: Powerful script language to realize tasks

like importing data from files or databases, automated im-

port and processing of fingerprints, automated sequence

assembly, exporting data, creating customized graphics and

text reports, manipulation of database fields, manipulation

of experimental data, performing complex queries, creating

specific analysis tools, etc.

typing, MLVA-VNTR analysis, project-based automated 2-D

gel analysis, etc. BioNumerics also offers free plugins for

import and export, XML-based exchange, automated batch

sequence assembly, geographical mapping, and a wide variety

of extra functionality for the database, dendrograms and

reporting. A few highly advanced plugin-based modules are

also available as separate software licenses, e.g. the HDA-

plugin for CSCE-based heteroduplex analysis (HDA), and the

Band Scoring plugin for electrophoresis-based codominant

band scoring and recurrent parent analysis.

Modular structure

BioNumerics consists of 10 different modules in total, of

which 6 modules are related to the different experimental

applications that can be analyzed (application modules), and

4 modules constitute the different analysis tools that the soft-

ware contains (analysis modules).

6 application modules:• Fingerprint types, Character types,

Sequence types, Trend data types, 2-D gel types, and Ma-

trix types.

4 analysis modules: • Cluster analysis, Identification & Librar-

ies, Dimensioning techniques & Statistics, and Database

sharing tools.

The full BioNumerics functionality is physically contained in

the same program unit, which guarantees perfect integration

of the modules and easy co-evaluation of different data sets

and analyses. For example, a selection of entries highlighted

on a dendrogram (Cluster analysis module) becomes also

highlighted on a PCA (Dimensioning techniques module) and

in the database.

Any or all of the application and analysis modules can be com-

bined with each other. At least one application module is re-

quired to operate the software.

Compatibility

Import of f ingerprints: Accepts uncompressed gel images as

8-bit, 12-bit, and 16-bit TIFF files generated by any imaging

system. Direct import and processing of multichannel chro-

matogram files from automated sequencers (Applied Bio-

systems, Beckman, Amersham). Import of absorbance and

densitometry profiles from a variety of scanners, sequencers

and automated system (electrophoresis, spectrophotom-

etry, HPLC, mass spectrometry, MALDI, SELDI, etc. Import

Page 16: BioNumerics - BIOSYSTEMATICA

www.applied-maths.com

Keistraat 120, B-9830 Sint-Martens-Latem, Belgium

Phone +32 9 2222100, Fax +32 9 2222102

13809 Research Blvd., Suite 645, Austin, Texas 78750, USA

Phone +1 512-482-9700, Fax +1 512-482-9708

BioNumerics is a trademark of Applied Maths NV.

All other trademarks are the properties of their respective owners.

The information in this brochure is subject to changes without

prior notice.

Copyright 1998-2008, Applied Maths NV.

All rights reserved.