icis5 and the new internet benjamin good cihr and msfhr strategic training program in bioinformatics...

29
ICIS5 and the new Internet Benjamin Good CIHR and MSFHR Strategic Training Program in Bioinformatics Department of Molecular Biology and Biochemistry Simon Fraser University Vancouver, Canada

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

ICIS5 and the new Internet

Benjamin Good CIHR and MSFHR Strategic Training

Program in Bioinformatics

Department of Molecular Biology and Biochemistry

Simon Fraser University

Vancouver, Canada

Objectives

• Briefly introduce the concept of web services• Provide some reasons for you to be interested in them• Introduce the problem of translation• Introduce two approaches to service development that deal with this

problem in different ways– One uses XML-Schema– One uses the Biomoby data-type ontology

• Describe a potential strategy for ICIS development that takes advantage of the strengths of both of these approaches with a minimum duplication of effort.

• Conclude with the predicted consequences of this strategy

Web Service Intro: Standards

• Web Services are programs that can be executed by other programs connected via the internet.

• HTTP, TCP/IP make internet communication possible.

• English makes this presentation possible• SOAP makes communication between

anonymous web services possible. – As we will see, additional standards will

become necessary to describe the content of communications between web services.

– But first, time to clean up with SOAP.

SOAP Web Services

• SOAP - a new XML protocol– Simple Object Access Protocol.– Wraps communications in a consistent XML structure.– Results in a “Mail-like” system.

<input>hello</input> SOAP Packaging SOAP unpackagingParse and process xml

<output>world</output>SOAP PackagingSOAP unpackaging

Client or “Consumer” Service or “Provider”

A SOAP example<?xml version='1.0' ?><env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <m:reservation xmlns:m="http://travelcompany.example.org/reservation"

env:mustUnderstand="true"> <m:reference>uuid:093a2da1-q345-739r-ba5d-pqff98fe8j7d</m:reference><m:dateAndTime>2001-11-29T13:20:00.000-05:00</m:dateAndTime>

</m:reservation> <n:passenger xmlns:n=env:role=env:mustUnderstand="true"> <n:name>Benjamin Good</n:name> </n:passenger> </env:Header> <env:Body> <p:itinerary xmlns:p="http://travelcompany.example.org/reservation/travel"> <p:departure>

<p:departing>New York</p:departing> <p:arriving>Los Angeles</p:arriving> <p:departureDate>2001-12-14</p:departureDate> <p:departureTime>late afternoon</p:departureTime> <p:seatPreference>aisle</p:seatPreference> </p:departure> <p:return> <p:departing>Los Angeles</p:departing> <p:arriving>New York</p:arriving> <p:departureDate>2001-12-20</p:departureDate> <p:departureTime>mid-morning</p:departureTime> <p:seatPreference/> </p:return> </p:itinerary></env:Body></env:Envelope> http://www.w3.org/

Optional extension

Forces processing

Primary Payload,Could be any valid XML

Why Should We Bother?

• Participation in a newly forming global community– By incorporating web service clients, ICIS can take advantage of

services offered by other institutions. For example, NCBIs Entrez information retrieval system is now provided as web services.

– In the opposite sense, ICIS services can be shared with other applications at institutes around the world.

• For example, Gramene could embed calls to ICIS methods directly within their website.

• Inter ICIS collaboration– Provides a protocol for data and software sharing between

institutes or applications.• Example: Windows ICIS application written in C makes use of

Java methods provided as web services (Gerard Sylvester).

A unifying framework?

How Should We Proceed?

• Apache Axis– Handles all SOAP processing, leaves only the content of the XML

payload to worry about.– The content of the payload must be described somewhere in order

for a transaction to succeed.• Typically XML-Schema• Alternatively Moby data-type (object) ontology

XML-Schema

• Defines the allowable structure of XML documents• Embedded in WSDL• Can be used to map from XML to other representations

– Axis can automatically map between XML-Schema and Java classes, if the classes follow the Java Beans protocol or are primitive types like Strings or Integers.

• No arguments allowed in the constructor.• Private attributes manipulated using “get” and “set” methods.• For more information see

http://java.sun.com/products/javabeans/docs/spec.html

– Otherwise custom serialization/deserialization routines must be developed.

Evaluation of XML-Schema

• Pro– Many users and tools, allows infinite possible data representations.

• Con• Without some shared common vocabulary and data model,

collaboration is challenging.– Semantically identical things may be represented in different ways

by different groups. Thus multiple serialization routines must be constructed to accomplish the same task.

– Self-describing documents are not sufficient to create a common language.

• Not sufficient for service discovery by anonymous users. Must employ a registry like the UDDI or Moby Central.

Moby Web Services

• Provides a shared protocol for building and accessing web services.• This shared protocol is formally encoded in two principal extensible

ontologies.• “Ontology” = A set of related terms that describe some domain.• Data type ontology

– Defines the data types used as input and output to services.• Sequence, genbank record, blast output …

• Service ontology– Defines kinds of services that are available.

• Data retrieval, analysis…

The Class Ontology

Object

NucleotideSequence

VirtualSequence

String

Integer

ISA

ISA

ISA

ISA

HAS-A

HAS-A

DNASequence

AminoAcidSequence

ISA

ISA

text/plain

text/html

ISA

ISA

text/base64ISA base64_gifISA

Generic Sequence

XML serialization of Classes

<String namespace=' ' id=' '/> <Object namespace='NCBI_gi' id='83 '/> <Integer namespace=' ' id=' '/>

<VirtualSequence namespace='NCBI_gi' id='163483'> <Integer namespace='' id='' articleName='Length'>

975 </Integer></VirtualSequence>

<GenericSequence namespace='NCBI_gi' id='163483'> <Integer namespace='' id='' articleName='Length'>

975 </Integer>

<String namespace='' id='' articleName='Sequence'> ATGG... </String></GenericSequence>

ISAISA

ISA

ISA

HASA

HASA

Note: the Classes contained through HASA and HAS relationships are named; at the present time, these names are ~not in a controlled vocabulary, and are thus are only

human-readable

Value of the Moby Approach

1. Common language that allows for interoperability– Avoids naming problems. DNASequence != dna_sequence– Avoids wheel reinvention.- re-use other people objects.– Potential for automatic pipeline generation

2. Service Discovery– Moby Central is better than the UDDI for bioinformatics applications

because:• It is specific to biology.• It can traverse the Moby ontologies. • For example:

– A_Service processes GenericSequences– I have a DNASequence – I can find and use A_Service because DNASequence isa

GenericSequence.

Moby vs. XML-Schema

Measure XML-Schema Moby

Popularity: Proliferation of tools and support

Clear advantage

Service Discovery Clear advantage in the biological domain

Ease of Client development

Depends on complexity of Schema and presence of clients-side tools

Easy to access the XML strings. Parsing support is lacking but on the way

Moby + XML-Schema

• Services can be deployed using both protocols.• If the input and output to the exposed methods takes the form of

primitives or of Java Beans, this process is greatly simplified for both approaches.

Suggestion:

1. Establish a Bean-based data model for ICIS java data classes.

2. Web services can be deployed for any bean, bean out class using Axis and writing deployment descriptor. No code needs to be written. XML-Schema is produced automatically.

3. Establish data-binding framework for mapping from ICIS beans to moby XML. “Castor” is designed for this and could probably automatically perform all serialization/deserialization.

4. Use data-binding methods in wrappers for the original ICIS methods.

Germplasm search: Input Classpackage org.cgiar.icis.GMSexample;public class QueryBean {

/** A query string (may contain wildcard characters) */private String query;/** The first row to return */private int startRow;/** The last row to return */private int lastRow;/** The gms class for limiting the search */private int gmsClass;

//bean accessor methodspublic String getQuery(){return query;}////getStartRow, getLastRow, getGmsClass

//bean setter methodspublic void setQuery(String s){query = s;}////setStartRow, setLastRow, setGmsClass}

Germplasm search: Output Classpackage org.cgiar.icis.GMSexample;public class GmsBean {

public class GMSBean {private int nid;private int gid;private int ntype;private int nstat;private int nuid;private String nval;private int nlocn;private String ndate;private int nref;

//bean accessor methodsGet…

//bean setter methodsSet…

Germplasm Search: Method callpublic GMSBean[] getList(QueryBean qb)throws java.rmi.RemoteException{//set input parameters String germplasmName = qb.getQuery(); int germplasmType = qb.getGmsClass(); int batchNum = qb.getStartRow(); int batchSz = qb.getLastRow(); ---> Execute logic to ensure batch number specified is positive.

// Formulate the query. String sqlCondition = prepareSearchString(germplasmName); final String queryStmt = "SELECT g.gid, g.methn, g.glocn, n.*" + " FROM germplsm g, names n" + " WHERE g.gid = n.gid" + " AND g.grplce = 0" + (germplasmType > 0 ? " AND n.ntype = ?" : "") + " AND (" + sqlCondition + ") LIMIT ?, ?";--->bind input parameters --->Execute the query--->For each row in the result construct a bean and append it to the output array//Return the array of GmsBeans

}

Germplasm Search: Deployment

<deployment xmlns=http://xml.apache.org/axis/wsdd/ xmlns:java=”http://xml.apache.org/axis/wsdd/providers/java"> <service name="MockGMSBeanEater" provider="java:RPC"> <parameter name="className" value="org.cgiar.icis.GMSexample.MockGMSBeanEater"/> <parameter name="allowedMethods" value="getStandardName,getPreferredName, getGermplasmList"/> <beanMapping qname="myNS:QueryBean" xmlns:myNS="urn:BeanService" languageSpecificType="java:org.cgiar.icis.GMSexample.QueryBean"/> <beanMapping qname="myNS:GMSBean" xmlns:myNS="urn:BeanService" languageSpecificType="java:org.cgiar.icis.GMSexample.GMSBean"/> </service></deployment>

Germplasm Search: Deploy

• Execute the Axis Admin client on the deployment descriptor.– Posts the web service.– Generates a WSDL document that describes it, including XML

schema for mapping between XML and Java Beans• WSDL file consists of:

– “Definitions” XML-Schema for parameters of services– A set of descriptions of the deployed services

• Names of the classes• Location• Input, Output

Consuming GMS Search: XML-Schema Method

• Axis -> WSDL2Java generates all needed client classes based on the published WSDL document. This includes – To execute service:

MockGMSBeanEaterService service = new MockGMSBeanEaterServiceLocator();

java.net.URL u = new java.net.URL("http://iris-genome:8081/TestWeb/services/MockGMSBeanEater");

MockGMSBeanEater m = service.getMockGMSBeanEater(u);QueryBean qb = new QueryBean();qb.setQuery("IR64"); qb.setStartRow(0);…GMSBean[] gbs = m.getGermplasmList(qb);

Germplasm Search: Moby Input

<moby:IcisQuery namespace=“” id = “”>

<moby:String attributeName=“query”>IR 64</moby:String>

<moby:Integer attributeName=“startRow”>0</moby:Integer>

<moby:Integer attributeName=“endRow”>0</moby:Integer>

<moby:Integer attributeName=“GmsClass”>0</moby:Integer>

</moby:IcisQuery>

Germplasm Search: Moby Output

<moby:GMS namespace=“IRIS” id = “707”><moby:String attributeName=“gmsId”>12</moby:String><moby:Integer attributeName=“NID”>1210</moby:Integer><moby:Integer attributeName=“GID”>70</moby:Integer><moby:Integer attributeName=“NTYPE”>1</moby:Integer>

.

.

.</moby:GMS>

GermplasmSearch: Moby Method

Public String mobyGMS(String moby){

queryBean qb = MobyQueryBinder.Demarshall(moby);

gmsBean gb = gms.getGermplasmList(qb);

String mobyOut = MobyGmsBinder.Marshall();

return mobyOut;

• Can be accomplished with XML parsers and constructors.• Castor www.castor.org conceals parsing and maps directly from XML

to Bean-like Java classes.

Consuming GMS Search: XML-Schema Method

• Service location and intent known• Axis, .NET or equivalent present• Access XML-Schema based method• Axis -> WSDL2Java generates all needed client classes.

– To execute service:

Service = ..

Gms.execute

Consuming GMS Search:Moby method

• Service discovered using moby central• Accessed via Moby Java API.

– Find, execute, bind/parse

My Rotation Products

• Code templates that could be used to deploy ICIS web services using two popular paradigms.

• Code templates for consumption of XML-Schema based and Moby Ontology based web services.

• Technical design document describing the suggested approach in greater conceptual and technical detail.

• Many new friends and a truly unique experience.

Thank You BBU!

I’ll remember the Los Banos in the freezing Vancouver winter !