sri international bioinformatics 1 the structured advanced query page tomer altman bioinformatics...

12
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

Upload: myron-roberts

Post on 13-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

The Structured Advanced Query Page

Tomer AltmanBioinformatics Research Group

SRI, InternationalFebruary 1, 2008

Page 2: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

1 SRI International Bioinformatics

IntroductionBioVelo is a query language

Like SQL but simpler and easier to learn Documentation: http://biocyc.org/bioveloLanguage.html Free-Form Advanced Query Page allows Web submission of

BioVelo queries

Structured Advanced Query Page (SAQP) Web page for interactively constructing advanced queries to

PGDBs Queries are translated to BioVelo and sent to the PGDB SAQP: http://biocyc.org/query.html Documentation: http://biocyc.org/webQueryDoc.html

Page 3: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

Why a query interface?

Allow a structured way to access the rich data representation stored in a PGDB.

Most advanced databases have a high-level, declarative method of access (i.e., SQL).

Provides an intermediate level of access between graphically browsing the PGDB and programmatically processing the data using Lisp.

Page 4: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

The Structured Advanced Query Page

'Advanced', in that it allows you to ask more advanced and complicated queries than the basic search interface.

'Structured', in that it is a dynamic HTML form, that provides greater ease in crafting queries, but trades flexibility and power for simplicity (FFAQP).

'Page', in that it is primarily accessed via the web interface for BioCyc (www.biocyc.org/query.html), or from your own Pathway Tools webserver.

Page 5: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

SAQP Architecture

The SAQP is built on top of a high-level functional declarative language called BioVelo (Mario Latendresse, SRI), which is built on top of Pathway Tools.

On every result page, you will see the equivalent BioVelo code that was generated from the SAQP, which, in turn, generated the results.

You don't need to know anything about BioVelo to use the SAQP, but it might be helpful later if you need the ability to write even more complicated queries using the Free Form Advanced Query Page.

Page 6: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

The Structure of the SAQP:

Database specificationClass specification'Where' constraints on attributes of classesOutput attributes descriptionData format (HTML vs TXT)

Page 7: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

Example #1:

A simple query usually consists of querying a particular database about a particular class.

Find all the proteins in E. coli K-12.Display the protein names.

Page 8: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

Structure of the Results

A line that shows the equivalent BioVelo expression that the SAQP generated to answer the query.

A HTML table of the results, with the corresponding entries hyperlinked to the matching Pathway Tools webpages.

If a text data format was requested, then a tab-delimited text file is generated, with just the table data.

Page 9: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

Example #2:

Find all the proteins of E. coli K-12 for which the DNA-FOOTPRINT-SIZE is smaller than 10.

Display the protein name, and the DNA footprint size.

Page 10: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

Example #3:

In EcoCyc, display polypeptides constrained by experimentally determined molecular weight and isoelectric point.

The experimental molecular weight should be between 50 and 100 kD.

The pI should be less than 7.Display the protein name, the experimental molecular

weight, and the pI.

Page 11: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

Example #4:

The SAQP allows for specifying quantifiers on relations between PGDB classes.

Extending example #3, now we want only proteins where at least one of the genes that encodes the protein to be within the first 500 kilobases of the E. coli chromosome.

Page 12: SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008

SRI International Bioinformatics1

Example #5: Queries with Several Components

Performs a 'Cartesian product' of the classes from each search component.

Search for all pathways of E. coli that also exist in H. pilori, strains 26995 and HPAG1.