ict for health care and life sciences
TRANSCRIPT
V School of Information Engineering
Master of Science in Information Engineering
ICT for Health Care and Life Sciences
Dipartimento di
Elettronica e Informazione
Davide Chicco [email protected] http://www.davidechicco.it/
2
ICT for Health Care and Life Sciences
Service Oriented Architectures (SOA) (Bio)Web Services
2nd part
Davide Chicco [email protected]
Marco Masseroli [email protected]
3
Workflows, a reference
What is a workflow?
• Definition: a workflow is a sequence of connected steps
• A depiction of a sequence of operations, declared as work of a person, a group of persons, or one or more simple or complex mechanisms.
• Workflow may be seen as any abstraction of real work
• The flow being described may refer to a document or product that is being transferred from one step to another
from Wikipedia in English
4
Workflows, a reference (2)
What is a workflow?
• A workflow is a model to represent real work for further assessment, e.g., for describing a reliably repeatable sequence of operations.
• More abstractly, a workflow is a pattern of activity enabled by a systematic organization of resources, defined roles and mass, energy and information flows, into a work process that can be documented and learned.
• Workflows are designed to achieve processing intents of some sort, such as physical transformation, service provision, or information processing.
from Wikipedia in English
5
Workflows, a reference (3)
What makes a workflow?
• A workflow can usually be described using formal or informal flow diagramming techniques, showing directed flows between processing steps.
• Single processing steps or components of a workflow can basically be defined by three parameters:
1. input description: the information required to complete the step
2. transformation rules, algorithms, which may be carried out by associated human roles or machines, or a combination
3. output description: the information, material and energy produced by the step and provided as input to downstream steps.
from Wikipedia in English
6
Workflows, a reference (4)
What does a workflow need?
• Components can only be plugged together if the output of one previous (set of) component(s) is equal to the mandatory input requirements of the following component.
• Thus, the essential description of a component actually comprises only input and output that are described fully in terms of data types and their meaning (semantics).
• The algorithms‘ description or rules' description need only be included when there are several alternative ways to transform one type of input into one type of output – possibly with different accuracy, speed, etc.
from Wikipedia in English
7
Workflows, a reference (5)
What does a workflow need?
• When the components are non-local services that are invoked remotely via a computer network, such as Web Services, additional descriptors, such as quality of service (QoS) and availability, also must be considered.
from Wikipedia in English
10
Workflows, a reference (8)
Workflows and UML
• Unified Modeling Language - UML (nothing to do with UMLS -Unified Medical Language System!!!)
• Workflow graphic depiction: Activity Diagrams
• representations of workflows of stepwise activities and actions with support for choice, iteration and concurrency
• activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system.
• An activity diagram shows the overall flow of control.
Maybe you (should!) have studied workflows previously… ..during Software Engineering course?
12
Workflows, a reference (10)
Software to create Workflows:
• Open Source:
• Dia http://live.gnome.org/Dia
• Calligra Flow http://www.calligra-suite.org/flow
• Proprietary:
• Microsoft Visio http://visio.microsoft.com
13
Workflows, a reference (11)
Easy exercise
• write a simple workflow, by using paper or your pc, of this process
• a division of two values ( X / Y )
• Steps: reading of X; reading of Y; control if Y is different from zero; if it is, compute the division; else, repeat the reading of Y.
• The shapes to be used are: Start / End
Process (generic)
Decision
15
Workflows through webservices
The original problem: many services/operations of the workflow are located in different resources
16
Webservices: state of the art
• As organizations expand and technology evolves, application integration becomes increasingly important.
• Component reuse and interoperability requirements have driven companies to move toward a Service-Oriented Architecture (SOA), where self-contained business logic can be exposed and shared efficiently across applications and platforms.
• At the heart of recent success in SOA designs are Web services, a technology that enables disjoint applications to communicate with each other in a platform-independent and language-independent manner.
from McPressOnline.com
17
Webservices: state of the art
• Specifically, a Web service is a self-contained piece of software available via standard network protocols (such as HTTP(S), FTP, and SMTP) and exposed by a standardized interface, the Web Service Description Language (WSDL).
• The WSDL is a schema-defined XML document that includes all of the information an application requires to call, or consume, the Web service.
• Data is exchanged between the application and the Web service, using a standard XML messaging format called Simple Object Access Protocol (SOAP).
from McPressOnline.com
18
Webservices: state of the art
• Web services, like all technology, are rapidly evolving. Implementations range from basic Remote Procedure Calls (RPCs) to loosely coupled, document-style messaging.
• The WSDL and SOAP specifications are expanding to include support for secure, reliable, transactional Web services.
• Enhancements in these areas are helping to make Web services the preferred solution for application integration throughout the industry.
from McPressOnline.com
19
Webservices: advantages
The original problem: many services/operations of the workflow are located in different resources
1) Interoperability - This is the most important benefit of Web Services.
• Web Services typically work outside of private networks, offering developers a non-proprietary route to their solutions.
• Services developed are likely, therefore, to have a longer life-span, offering better return on investment of the developed service.
• Web Services also let developers use their preferred programming languages.
• In addition, thanks to the use of standards-based communications methods, Web Services are virtually platform-independent.
from Msdn Microsoft
20
Webservices: advantages
2) Usability
• Web Services allow the business logic of many different systems to be exposed over the Web.
• This gives your applications the freedom to chose the Web Services that they need.
• Instead of re-inventing the wheel for each client, you need only include additional application-specific business logic on the client-side.
• This allows you to develop services and/or client-side code using the languages and tools that you want.
from Msdn Microsoft
21
Webservices: advantages
3) Reusability
• Web Services provide not a component-based model of application development, but the closest thing possible to zero-coding deployment of such services.
• This makes it easy to reuse Web Service components as appropriate in other services.
• It also makes it easy to deploy legacy code as a Web Service.
from Msdn Microsoft
22
Webservices: advantages
4) Deployability
• Web Services are deployed over standard Internet technologies.
• This makes it possible to deploy Web Services even over the fire wall to servers running on the Internet on the other side of the globe.
• Also thanks to the use of proven community standards, underlying security (such as SSL) is already built-in.
from Msdn Microsoft
23
Webservices: disadvantages
Unfortunately, webservices present disadvantages, too...
1) Simplicity is not always good
• Although the simplicity of Web services is an advantage in some respects, it can also be a hindrance.
• Web services use plain text protocols that use a fairly verbose method to identify data.
• This means that Web service requests are larger than requests encoded with a binary protocol.
• The extra size is really only an issue over low-speed connections, or over extremely busy connections.
from Msdn Microsoft
24
Webservices: disadvantages
2) Long-term sessions
• Although HTTP and HTTPS (the core Web protocols) are simple, they weren't really meant for long-term sessions.
• Typically, a browser makes an HTTP connection, requests a Web page and maybe some images, and then disconnects. In a typical CORBA or RMI environment, a client connects to the server and might stay connected for an extended period of time. The server may periodically send data back to the client.
• This kind of interaction is difficult with Web services, and you need to do a little extra work to make up for what HTTP doesn't do for you.
from Msdn Microsoft
25
Webservices: disadvantages
3) Client and server ain’t aware of
each other
• The problem with Http and Https when it comes to web services is that these protocols are stateless
• The interaction between the server and client is typically brief and when there is no data being exchanged, the server and client have no knowledge of each other.
• For example, if a client makes a request to the server, receives some information, and then immediately crashes due to a power outage, the server never knows that the client is no longer active.
• The server needs a way to keep track of what a client is doing and also to determine when a client is no longer active.
from Msdn Microsoft
26
Webservices: disadvantages
4) Timeout
• Typically, a server sends some kind of session identification to the client when the client first accesses the server. The client then uses this identification when it makes further requests to the server.
• This enables the server to recall any information it has about the client. A server must usually rely on a timeout mechanism to determine that a client is no longer active.
• If a server doesn't receive a request from a client after a predetermined amount of time, it assumes that the client is inactive and removes any client information it was keeping. This extra overhead means more work for Web service developers.
from Msdn Microsoft
27
Webservices: some other benefits
The original problem: many services/operations of the workflow are located in different resources
Exposing the function on the network
• A Web service is a unit of managed code that can be remotely invoked using Http, that is, it can be activated using Http requests
• So, web services allows you to expose the functionality of your existing code over the network.
• Once it is exposed on the network, other application can use the functionality of your program
from JavaBeat.net
28
Webservices: some other benefits
Connecting different applications
• Web Services allows different applications to talk to each other and share data and services among themselves.
• Other applications can also use the services of the web services. For example VB or .NET application can talk to java web services and vice versa.
• So, Web services is used to make the application platform and technology independent.
from JavaBeat.net
29
Webservices: some other benefits
Standardized protocol
• Web Services uses standardized industry standard protocol for the communication.
• All the four layers (Service Transport, XML Messaging, Service Description and Service Discovery layers) uses the well defined protocol in the Web Services protocol stack.
• This standardization of protocol stack gives the business many advantages like wide range of choices, reduction in the cost due to competition and increase in the quality.
from JavaBeat.net
30
Webservices: some other benefits
Low cost of communication
• Web Services uses REST or SOAP over HTTP protocol for the communication, so you can use your existing low cost internet for implementing Web Services.
• This solution is much less costly compared to proprietary solutions like EDI/B2B.
from JavaBeat.net
31
Webservices: some other benefits
Support for other communication
• Beside SOAP over HTTP, Web Services can also be implemented on other reliable transport mechanisms.
• So, it gives flexibility use the communication means of your requirement and choice.
• For example Web Services can also be implemented using ftp protocol (Web services over FTP).
from JavaBeat.net
32
Webservices: some other benefits
Web services sharing
• These days due to complexness of the business, organizations are using different technologies like EAI, EDI, B2B, Portals etc. for distributing computing.
• Web Services supports all these technologies, thus helping the business to use existing investments in other technologies.
Web services are self describing
• Web Services are self describing applications, which reduces the software development time.
• This helps the other business partners to quickly develop application and start doing business. This helps business to save time and money by cutting development time.
from JavaBeat.net
33
Webservices: some other benefits
Automatic discovery
• Web Services automatic discovery mechanism helps the business to easily find the service providers. This also helps your customer to find your services easily.
• With the help of Web Services your business can also increase revenue by exposing their own Web Services available to others.
Business opportunity
• Web Services has opened the door to new business opportunities by making it easy to connect with partners.
from JavaBeat.net
37
Galaxy
What is Galaxy?
• Galaxy is a scientific workflow and data integration
platform that aims to make computational biology
accessible to research scientists that do not have
computer programming experience.
• Although it was initially developed for genomics research,
it is largely domain agnostic and is now used as a
general bioinformatics workflow management
system.
from Wikipedia in English
38
Galaxy
When and where
• Develop by a research group at Penn State University,
Pennsylvania, Usa
• First version in 2005
• Developers and users community involved in the project
40
Galaxy
“Galaxy: A platform for interactive large-scale genome analysis”
Belinda Giardin et al. (Genome Research, 2005)
[…] An interactive system that combines the power of
existing genome annotation databases with a simple Web
portal to enable users to search remote resources,
combine data from independent queries, and visualize the
results.
The heart of Galaxy is a flexible history system that stores
the queries from each user; performs operations such as
intersections, unions, and subtractions; and links to other
computational tools. […]
41
Galaxy
Foundations in Galaxy
• Galaxy is "an open, web-based platform for performing
accessible, reproducible, and transparent genomic
science”
Accessibility
• Galaxy stresses a simple user interface over the
ability to build complex workflows.
• This design choice makes it relatively easy to build
typical analyses, but more difficult to build complex
workflows that include, for example, looping
constructs. from Wikipedia in English
42
Galaxy
Reproducibility
• Reproducibility is a key goal of science: when
scientific results are published the publications should
include enough information that others can repeat the
experiment and get the same results.
• Galaxy supports reproducibility by capturing
sufficient information about every step in a
computational analysis, so that the analysis can be
repeated, exactly, at any point in the future.
• This includes keeping track of all input, intermediate,
and final datasets, as well as the parameters provided
to, and the order of each step of the analysis.
43
Galaxy
Transparency
• Galaxy supports transparency in scientific research by
enabling researchers to share any of their Galaxy
Objects either publicly, or with specific individuals.
• Shared items can be examined in detail, rerun at will
and copied and modified to test hypotheses.
46
MyExperiment
What is MyExperiment?
“The myExperiment Virtual Research Environment enables you and your colleagues to share digital items associated with your research — in particular it enables you to share and execute scientific workflows.
You can use MyExperiment.org to find publicly shared workflows […]”
definition on the MyExperiment.org website
http://www.myexperiment.org
47
MyExperiment
What is MyExperiment?
“MyExperiment is a social website for researchers sharing research objects such as scientific workflows.
Its website […] contains a significant collection of scientific workflows for a variety of workflow systems, most notably Taverna, but also other tools such as Bioclipse.
myExperiment has a REST API and is based on an open source Ruby on Rails codebase.
[…]”
definition on Wikipedia in English
48
MyExperiment
MyExperiment details
• Started in 2007
• By research groups at the University of Manchester
and University of Southampton, United Kingdom
• Nowadays (November 2011) it has:
• over 5,000 members
• over 250 groups
• over 2,000 workflows
• over 450 files
• over 150 packs
51
MyExperiment
MyExperiment videoclip: JISC - myExperiment
http://www.youtube.com/watch?v=x83pzMMw7lk
53
BioCatalogue
What is BioCatalogue?
“The BioCatalogue is a centralised registry of curated life science web services.
It allows you to easily discover, register, annotate, monitor and use web services”
definition on the BioCatalogue.org website
http://www.biocatalogue.org
“The BioCatalogue is a curated catalogue of Life Science Web Services”
definition from Wikipedia in English
54
BioCatalogue
“BioCatalogue: a universal catalogue
of web services for the life sciences”
Jiten Bhagat et al. (Nucleic Acids Res. 2010)
“[...] The BioCatalogue provides a common interface for registering, browsing and annotating Web Services to the Life Science community.
Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. “
55
BioCatalogue
“Services in the BioCatalogue are also
subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources.
The system is accessible via a human-readable ‘Web 2.0’-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community. [...]“
56
BioCatalogue
BioCatalogue, some details
• Started in 2009
• Collaboration between myGrid project at the
University of Manchester (UK) and the European
Bioinformatics Institute
• based on an open source Ruby on Rails codebase
• Nowadays (November 2011) is has:
• 2,261 services
• 155 service providers
• 620 members
59
BioMoby
What is BioMoby?
“MOBY is a system for interoperability between biological data hosts and analytical services.
The MOBY-S system defines an ontology-based messaging standard through which a client will be able to automatically discover and interact with task-appropriate biological data and analytical service providers, without requiring manual manipulation of data formats as data flows from one provider to the next.”
definition on the BioMoby.org website
60
BioMoby
What is BioMoby?
“BioMOBY is a registry of web services used in bioinformatics.
It allows interoperability between biological data hosts and analytical services by annotating services with terms taken from standard ontologies.”
definition from Wikipedia in English
61
BioMoby
"BioMOBY: An open source
biological web services proposal"
Mark Wilkinson et al. (Briefings in Bioinformatics 2002)
“BioMOBY is an Open Source research project which aims to generate an architecture for the discovery and distribution of biological data through web services.
Data and services are decentralised, but the availability of these resources, and the instructions for interacting with them, are registered in a central location called MOBY Central.“
62
BioMoby
“BioMOBY adds to the web
services paradigm, as exemplified by Universal Data Discovery and Integration (UDDI), by having an object-driven registry query system with object and service ontologies.
This allows users to traverse expansive and disparate data sets where each possible next step is presented based on the data object currently in-hand […]”