from genotype to phenotype - future perspectives on data and services integration

23
FROM GENOTYPE TO PHENOTYPE FUTURE PERSPECTIVES ON DATA AND SERVICE INTEGRATION TÓPICOS AVANÇADOS EM ENGENHARIA INFORMÁTICA BIOINFORMÁTICA Programa Doutoral em Engenharia Informática 2008-2009 Pedro Lopes | [email protected]

Upload: pedro-lopes

Post on 07-Mar-2016

220 views

Category:

Documents


1 download

DESCRIPTION

GEN2PHEN internal report created for bioinformatics course. Provides a general overview of current Web2.0 applications and how can they be used in bioinformatics.

TRANSCRIPT

Page 1: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

FROMGENOTYPETOPHENOTYPE

FUTUREPERSPECTIVESONDATAANDSERVICEINTEGRATION

 

TÓPICOS AVANÇADOS EM ENGENHARIA INFORMÁTICA 

BIOINFORMÁTICA

Programa Doutoral em Engenharia Informática 2008­2009 

PedroLopes|[email protected]

Page 2: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

2

TABLE OF CONTENTS 

Tableofcontents .....................................................................................................................................................2Introduction–TheGEN2PHENProject.........................................................................................................3IntegrationScenariosandRelatedWork......................................................................................................6SemanticWeb ......................................................................................................................................................7Socialenvironments .........................................................................................................................................8Integration ............................................................................................................................................................9Summary............................................................................................................................................................. 10

OurOngoingDevelopments ............................................................................................................................ 12Dynamicflow ..................................................................................................................................................... 12DiseaseCard ....................................................................................................................................................... 14Summary............................................................................................................................................................. 15

FuturePerspectives ............................................................................................................................................ 16Cloud‐computing............................................................................................................................................. 16InformationIntegration ............................................................................................................................... 17DataVisualization ........................................................................................................................................... 18Summary............................................................................................................................................................. 19

Conclusion............................................................................................................................................................... 20References............................................................................................................................................................... 21

Page 3: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

3

 

INTRODUCTION – THE GEN2PHEN PROJECT 

Bioinformatics is emerging as one of the more fastest‐growing scientific areas of

computerscience.Recenthardwareandsoftwaredevelopmentsshowanevolutionfaster

thantheMoore’sLawpredictions.ThisdevelopmenthasbegunwiththeHumanGenome

Project 1 which has succeeded in decoding the complete human genetic code. This

generated a tremendous amount of information that was readily available and the

scientific community rapidly started designing applications, increasing the amount of

resources needed in this area. Following the Human Genome Project came the Human

Variome Project2, which aims to collect information about genome variations and their

influenceinhumanhealth.Alongwiththelatter,EuropeanCommunityisalsosponsoring

a bioinformatics project in its Seventh Framework Program: Genotype to Phenotype

Databases:aHolisticSolution(GEN2PHEN)3.

The GEN2PHEN Project is a collaborative project with 19 partners. Most of the

partners are from European institutions with relevant work in the bioinformatics

scientific area. GEN2PHEN is an ambitious project aiming to unify human and model

organismsgeneticvariationdatabasesallowingthecreationofacentralgenomebrowser

withtheabilitytoblendGEN2PHENdataandmedicaldata.Theoverallgoalistocreatea

completebiomedicalknowledgeenvironment.Thestrategyandobjectivesofthisproject

maybedividedinseveralresearchareas:

• Analyzethegenotypetophenotype fieldand investigatecurrentneedsand

practices in order to obtain a complete knowledge about other ongoing

projects with similar objectives. The active biology community must be

consultedinorderdevelopanaccuratestate‐of‐the‐artdocumentdescribing

thegeneralprocessonthefieldandenablingthemostcorrectdefinitionof

what this particular area is lacking andwhatmodels and technologies are

beingeffectivelyused.

1 Human Genome Project: http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml 2 Human Variome Project: http://www.humanvariomeproject.org 3 GEN2PHEN: http://www.gen2phen.org

Page 4: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

4

• Developstandardsforthegenotypetophenotypefieldofresearchinorder

to speed up the standardization process with new data models,

nomenclatureandtechnologystandards.

• Create generic database components, services and integration

infrastructures for thegenotypetophenotypedomain.Thesesolutionswill

bemostlywebapplicationsapplyingnew interfaceusability standardsand

customizedtotheirendusers.Solutionsforgeneticandgenomicdatabases

will be developed. This particular objective is aiming to create a central

GEN2PHEN database crossing all the research areas and a simpler

application,whichcanbedeployedbyanyresearchgroup.

• Create data search and presentation solutions for genotype to phenotype

knowledge.Applicationsdesignedwhen fulfilling thepreviouslymentioned

objectivewon’t be completewithout proper searchmechanisms thatmust

encompass information distributed throughout different applications and

architecture layers. The applications must also have an effective interface

layerdesignedtorespectthecommunityrequests.

• Facilitate research and diagnostic genotype to phenotype databases

population by developing new tools and promoting them in the scientific

community. The newly developed applications will also support more

efficientmethods for data insertion allowing anyone to collaborate in this

project.

• Build a major genotype to phenotype Internet portal, a GEN2PHEN

knowledge centre. This portal will contain all GEN2PHEN related

information, ranging from calendars to databases, from publications to

discussionforums.

• Deploy developed solutions to the community in order to increase

researchers interestandparticipation.Severalresourceswillbedevotedto

advertising, explaining and training researchers in using the developed

solutions.

The project main focus is on developing and promoting a new generation of

applicationsthatwillaiddifferenttypesofresearchersintheirscientificworkand,atthe

sametime,gatherandintegrateinformationfromdifferentsourceswhichwillbeshared

tothecommunity.GEN2PHENapplicationshavetobestate‐of‐the‐artwebapplications.It

is important to research and study the most popular Web2.0 (and next Web3.0)

Page 5: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

5

applicationsinordertoimprovedevelopers’knowledgeaboutwhatcaptivatestheusers,

increasing general biomedical community interest. This research should be mainly

focused on user interactions issues like usability, interfaces, “quality of service” and

overallusersatisfaction.Thisnewwaveofapplicationshastoaddressissueslikesemantic

data integration, user collaboration, information sharing and search engines’ algorithms

improvements.

Fig. 1 ­ GEN2PHENstrategy 

Developing a simple Rich Internet Application is, by now, a somewhat trivial

process,notrequiringgreatsoftwareengineeringandprogrammingknowledge.However,

bioinformatics and biomedicine don’t depend only on good‐looking interfaces. What

matters, and this is the difficult part, is what’s under the hood. Going deeper in the

application composition, several issues like data integration, service integration, service

orchestration, workflow composition, distributed processing, query expansion or object

ontologiesarise.

ThisreportintendstogiveaGEN2PHENprojectoverviewwithspecialincidencein

these next‐generation web applications problems. Some solutions with ongoing

developmentwill be referred aswell as systems in development in ourworkgroup and

howcanbothhelpassessingGEN2PHENapplicationdesign.

Page 6: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

6

INTEGRATION SCENARIOS AND RELATED WORK 

Firstofallisnecessarytounderstandtowhomthesenewapplicationparadigmswill

beimportantandwhythesegenericGEN2PHENgoalsaresosignificant.Thebiologicaland

biomedical scientific community iswatchinganexponential increaseon the information

available. This growth leads, subsequently, to the growth of the number of applications

(web or desktop) to solve the same specific problems. And along with these new

applications,comenewdatasources,newservicesandtheheterogeneityamongthemis

huge. The main issue one main found when doing scientific research is where to find

information.A fewyearsago thiswasaproblembecauseof the lackofapplicationsand

databases. Now, this is a problem because of the excessive amount of information

availableoneverycorneroftheweb.

Fig. 2 ­ Web2.0integration 

From the users perspective, we believe they are looking for a central, unifying

portal,customizedtotheirpersonalstatus,wheretheycaneasilyfindalltheinformation

they need. This is the added value GEN2PHEN solutionsmay have. Currently, there are

Page 7: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

7

innumerous ongoing works focusing on this problem. However, there isn’t a universal

solutiontosolvealltheheterogeneityproblemsarosebydataandserviceintegration.And

theproblemsdon’tboildowntothis;therearealsothenovelfunctionalitiespossiblewith

thesemanticweb[1]andthegranddevelopmentsmadeininformationmining.Following

GobleandStevens[2]work,onecanconcludethatnotall iswell in thekingdomofdata

integrationinbioinformaticsandthatdata integrationhasa longpathtoruninorderto

completelysatisfytheinitialsgoals.

Thegroupofapplicationsthatshouldbestudiedmaybedividedinthreemainareas

thatarelargelyconnectedandpotentiateintegration.Therearedevelopmentsinsemantic

web and its application in biology and how the bridge between generic ontologies and

biological ones can be made. Other groups are working in collaboration tools for the

community, which have better information sharing and productivity tools. The largest

group is the integration one. In this group one can encompass data integration, service

integration,serviceorchestration,workflowcompositionandmashupapplications.

SEMANTICWEB

Semantic web developments have the main purpose of describing, with a pre‐

definedontology,all the informationexistent in theweb.Semanticwebkeycomponents

areRDF4,OWL5andSPARQL6.RDF stands forResourceDescriptionFrameworkand is a

genericmetadatamodel foronlineinformationandcontentdescription.OWListheWeb

OntologyLanguage,whichistheontology‐authoringtoolusuallyassociatedwiththeRDF

schema. SPARQL is a recursive acronym for SPARQLProtocol andRDFQuery Language

and is a query language, basedon SQL, to obtain information stored in theRDF format.

Implementing semantic web architectures is not a trivial task [3] for any kind of data.

However, it is important to introduce these metadata structures and algorithms in

bioinformatics,astheywillbecomepartofWeb3.0.

Applyingsemanticwebconceptsandtechnologiesinbioinformaticsonecanaccess,

in a unified manner, several biological documents described with RDF. Automation of

processes and improved machine‐machine data exchange are also enabled with the

applicationoftheseconcepts.Belleauetal.proposeBio2RDF[4],apreliminaryapproach

4 Resource Description Framework: http://www.w3.org/RDF 5 Web Ontology Language: http://www.w3.org/2004/OWL 6 SPARQL Query Language for RDF: http://www.w3.org/TR/rdf-sparql-query

Page 8: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

8

to create an engine which provides RDF access to biological data distributed through

several databases such asKEGGorNCBI. Bio2RDF7makes all the data available in their

websiteusingonlytheURLto locatetheresources. Splendiani[5]alsoasaproposalto

bringthesemanticwebtobiology,buttheimplementationisn’tasadvancedasBio2RDF.

These are themost recent implementations but biology andmedicine are very difficult

scientificareasduetothecomplexityindefiningaproperontologythatcoversallthelife

sciencesconceptsandterms.

SOCIALENVIRONMENTS

Social networks and collaboration environments are some of the most popular

Web2.0applications.Theseapplicationsconnectusersandallowthemtosharepersonal

information, music, videos or any other type of data. Additionally, several small

applications are developed to integrate information about different users or

entertainment areas. For instance, a movies application would allow every user to

describehispersonalmovietastes;whenusedinalargescaleenvironment,itwouldgive

the developers important information about cinema which could be used to improve

advertisementsshowntotheuser:auserwholikeshorrormovieswouldhaveagreater

probabilityofseeinghorrormovieadsthanonewholikescomedies.Facebook8isoneof

thelargestworldwideusedsocialwebapplicationswithover120millionusers.Usingthe

personal connections, personal preferences and other specific applications, Facebook

owners have valuablemarket information. Like Facebook,MySpace9or Google’s Orkut10

provide almost the same functionalities to users. Experiencing a sustained growth is

Carole Anne et al. [6] myExperiment11which is the first bioinformatics social network

application where one can connect with others, share files (with focus on Taverna

workflows,detailedmoreaheadinthisreport)andcreatescientificcommunities.Despite

the focus on Taverna, myExperiment provides a rich scientific ecosystem offering the

community a wide range of tools essential in any social collaborative environment.

myExperiment also offers access to its services using RESTful programming interfaces,

7 Bio2RDF: http://www.bio2rdf.org 8 Facebook: http://www.facebook.com 9 MySpace: http://www.myspace.com 10 Orkut: http://www.orkut.com 11 myExperiment: http://www.myexperiment.org

Page 9: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

9

thus,itispossibletobuildnewapplicationsontheframeworkorusemyExperimentdata

andtoolstoimproveexistingones.

INTEGRATION

Integrationinbioinformatics isoneoftheareaswheremoregroupsareinterested

andwithmoreongoingwork.Integrationisaresearchareawhichincludesthementioned

semantic web and social networking tools besides other fields such as mashups or

workflows.Aworkflowisasimplesequenceof logicstepsoractivitiesthatareexecuted

independently fromeachother [7].Applying this generic concept tobioinformatics, one

mayassumethataworkflowisanorganizedinformationflow,connectingdistinctservices

and/ordata sources inorder to solveaproblem inamodularlymanner.Themostused

solution forworkflow building and execution is Taverna [8, 9]. Taverna is a Java based

desktopapplicationofferingasimpleinterfaceforworkflowcompositionandexecution.It

canaccessseveraltypesofservicessuchasBioMoby[10]orgenericWSDLwebservices.

The major setback is that to integrate services, one must define an integration XML

componenttoassistinformationpipingfromserviceAoutputtoserviceBinput.Taverna

can also be used from within other applications, allowing access to the results of

previously savedworkflows or executingworkflows in real time. One ofmyExperiment

functionalitiesisworkflowsharing,onemayaccessalargeworkflowstoragesystemand

findsolutionsdevelopedbyothersor shareone’sworkflowand importantdevelopment

information.Currently,Taverna’sgreatestflawisbeingdesktopbasedaswe’reassistinga

shift in the computational paradigm: web applications usage dominating over desktop

ones.

Alongsidewithworkflowstherearemashups.Mashupsbeguninthemusicindustry:

theyweresimplemixesofseveralsongsintoasinglesong.WithWeb2.0,thisideacrossed

to web applications. Mashups are web applications which combine information from a

predefinedcollectionofdatasourcesorservices inasingle interface.Wecanconsidera

mashup as being a meta application: it basically creates a new application by using

functionalitiesprovidedbyotherapplications.Online,thereareseveralworkflow/mashup

building frameworks. It is important to mention Yahoo! Pipes12and Microsoft Popfly13

becausetheyhaveremarkableinterfacesandpre‐builtcomponentstoaccessWorldWide

12 Yahoo! Pipes: http://pipes.yahoo.com/pipes 13 Microsoft Popfly: http://www.popfly.com

Page 10: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

10

Web most popular websites. Bioinformaticians can use these tools with data from

different data sources to develop new applications. Cheung et al. [11] pursued this

approach to create a biomedical mashup application. Despite this, the mentioned tools

weren’t specifically designed to be used in the life sciences area. Therefore, several

researchersareworkingonserviceintegrationframeworks:deKnikkeretal.[12]havea

basicweb service choreography scenario;Bio‐jETI fromMargaria et al. [13] is a similar

solution,usingthesameprinciplesasdeKnikker.Thesetoolsshareacommonproblemin

integration: the information sources heterogeneity doesn’t allow a fully automated

integrationsolution.Eachservicestoresandoffers thedata in itsownmodel, increasing

thedifficultyinconceptmappingandinformationexchange.Thereisn’tyetanautomated

toolwhichoffersasimpleintegrationinterface,allowingtheuseofcomponentsfromany

randomservice.BioMoby[10]isaninitiativetocreateanontologyandcentralrepository

of bioinformatic resources.With this semantic framework, one can share or use online

services created by others in an almost automated fashion [14]. BioMoby14 central

repository faces typical resource discovery problems such as validation or duplication.

Anyonecanaddservicesandthedescriptionprovidedorservicefunctionalitymaynotbe

scientifically valid and induce errors to users.Duplication of services is also a problem:

there can be any number of services doing the same task, thus it is difficult to choose

whichonesfitsbetterinthedesiredrequirements.

Fig. 3 – Existingdevelopmentscategories 

SUMMARY

Fullyautomatedanddynamicintegrationisthepanaceathatdevelopershaven’tyet

reached.Workflow ormashup solutions are themost popular to integrate services and

14 BioMoby: http://www.biomoby.org

Page 11: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

11

datasources.However,bothofthemimplyhardcodingseveralfunctionalities,increasing

dependencyondeveloperstoaddnewfunctionalities.Applyingasemanticwebapproach

to bioinformatics will empower developers to create more independent applications.

Describing services and information semantically will allow automated communication

between heterogeneous applications. This will enhance existing workflow and mashup

applications: it will be easier for users to add new services to existing applications,

becomingdevelopersofnewmetaapplicationsadjustedtotheirneeds.

Page 12: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

12

OUR ONGOING DEVELOPMENTS 

Our bioinformatics group is, like others, developing software solutions to solve

problems associated with this specific area. The developed work didn’t focus on

integration or semantic web. Our work was mostly focused on aiding microarray

laboratory research. ANACONDA [15] is a tool to study gene primary structure. The

Microarray Information Database – MIND [16] ‐ is a web application which helps

researchers in the task of analyzingmicroarray experiment results. More abstract than

MINDisGeneBrowser[17],atoolforgeneexpressionstudiesfrommicroarraygenelists

results.

However, the web trends and the association with projects like GEN2PHEN or

ALERT15broughtthenecessitytoexpandourgroup’sapplicationrange.DynamicFlow[18,

19] is a web‐based workflow management application, providing Web2.0 semi‐

autonomous service integration. DiseaseCard [20] is an older application, however it

alreadyimplementsbasiccollaborationandintegrationfunctionalitieswhichlaterbecame

famouswithWeb2.0.Furtherdevelopmentsarebeingstudiedtoimplementsemanticweb

engines,mashupapplicationsandnovelinformationvisualizationtechniques.

DYNAMICFLOW

DynamicFlowisaframeworkfordynamicintegrationofheterogeneousinformation

sources. Themaingoalwhendevelopingthisframeworkwastocreateanovelandagile

interfaceforservice integration.Theapplicationshouldhaveausable,easyandintuitive

interfaceforsolvingproblemsusinga“divideandconquer”strategy:themainproblemis

dividedinsmallertasksthatcanbesolvedwithacertainwebservice;thetasksarethen

combined,usingtheworkflowmetaphor,creatingan informationflowfromtasktotask,

until we get the final solution. This modular approach could be useful for researchers

because it ismore similar to the plan they havewhen solving problems in thewet lab:

structuring the problem and then solving it iteratively, using simple tasks in a web

applicationrunningintheirbrowser.

15 ALERT Project: http://www.alert-project.org

Page 13: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

13

Fig. 4 ­ DynamicFlowframeworkmodel

One of DynamicFlow’s key elements is its innovative model. The three‐layered

model ‐ Fig. 4 – divides the application in access: the bottom layer, containing the

databasesandtheexternalservices;design,thetoplayerwheretheuserinteractionslike

workflow building occur, using AJAX technology and drag‐‘n‐dropmetaphors; core, the

processing layer which encompasses server‐side processing on the application’s web

server and client‐sideprocessing in the client’s browser.This is oneof the framework’s

mainfeatures,thedivisionoftheprocessinglayerintwoseparatecomponents.Theweb

server processes client requests and connects to the authentication server and the

framework’sDBMSbutservice–applicationcommunicationanddatapipingbetweentasks

are client‐side processed, reducing server charger and speeding up the application

executionwithanincreaseinefficiencyandresponsetime.Thissemi‐autonomousprocess

ofmaintainingavalidinformationflowfromoneservicetothenextispossibleduetothe

service definition standard that was previously defined. The standard follows a simple

ontology and provides an easy way for editing the available services. Using it, the

application can validate workflow consistency, execute the workflow and display

intermediateresultsallusingthebrowser’sresources.It’saprimitiveversionofsemantics

inaninformationintegrationapplication.

Thework conducted resulted in aweb application prototype available for testing

and open to new developments. These new developments will be on five main topics:

perfectingtheservicedefinitionstandard, inclusionofsemanticwebtechnologies(RDF),

interfaceimprovements,newuserinteractionandwideningtheservicerange.

Page 14: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

14

DISEASECARD

DiseaseCard16project has begun in 2003 with the objective of creating a rare

disease link aggregator, integrating information from distributed and heterogeneous

medical andgenomicdatabases.The linksweregatheredby aweb crawling engine and

groupedintonodesrepresentingconcepts‐Fig.5.For instance, forthePetersanomaly17

disease, the node References contains all the reference sections of the NCBI OMIM18

database that refer to this disease and the node Pathology contains Orphanet 19

informationaboutthisdisease.Alongwiththeexternalinformation,eachdiseasealsohas

a forum entry, where any registered user can share his personal experience. A tree –

similar to Windows Explorer one – shows all the nodes and their collection of links,

displaying,inaunifiedinterface,informationfromthegenotypetothephenotype.Aswe

wanttogatherasmuchinformationaspossible,rarediseasesarethemaintargetdueto

theirhighassociationbetweengenotypeandphenotype.Itisimportanttomentionthatno

database information is replicated: DiseaseCard only saves link information of shared

data.Modernconceptslikeintegration–heterogeneouslinkgathering–andcollaboration

–publicdiseaseforums–wherealreadyconsideredwhendevelopingthesystem.

16 DiseaseCard: http://www.diseasecard.org 17 Peters Anomaly disease card: http://diseasecard.org/evaluateCard.do?diseaseid=604229 18 OMIM Home: http://www.ncbi.nlm.nih.gov/omim 19 Orphanet: http://www.orpha.net/consor/cgi-bin/index.php

Page 15: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

15

Fig. 5 ­ DiseaseCardconceptmap 

As the application got older, it lost quality: the web crawling engine doesn’t

automaticallyadapttolinkchangesandso,forseveralconcepts,theresultingnodeswere

empty. In apreliminaryanalysisofGEN2PHENgoals andhow they canbeachieved,we

concluded that DiseaseCard was the most adequate solution and should be under

development again. After a careful analysis and the definition of an action plan, its

operability was restored, the crawler was corrected, the interface got a new look and

DiseaseCardisbackontrack.

AsfarasGEN2PHENisconcerned,DiseaseCardwillbeasimplewaytoachievesome

oftheinitiallyproposedgoals.Inthefuture,addingGEN2PHENrelateddatabasesandweb

portalsisaprioritytocompletetheapplication.TheinclusionofsemanticsinDiseaseCard

and in the portals it crawls will ease the crawling process and improve the obtained

resultsprecision. Informationmiming featuresarealsobeing researched: even if it only

storeslinks,DiseaseCardcontainsvaluableinformationinthoselinkswhichcanbeuseful

innewtypesofqueries.

SUMMARY

Both DynamicFlow and DiseaseCard are ongoing projects that will be developed

withintheGEN2PHENperspective.Thenextsectiondetailsnewfunctionalities,interfaces

anduserinteractionsthatcanbeimplementedineitheroftheseapplicationsinorderto

improvetheirquality.

Page 16: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

16

FUTURE PERSPECTIVES 

Web2.0 changed Internet forever. Developers don’t just care about what the

applicationdoesanymorebutalsowhattheuserswant it todo.Usersarenowthemost

importantpartoftheInternet.Theyproducecontent,theyhavetheirownwebfootprint,

andtheyarepartofanewonlinecommunity.IfWeb2.0isthesocialweb,Web3.0maybe

the intelligent web. Despite being science fiction, Web3.0 is nearer one may think.

Differentplatformscancommunicatewitheachotherautomatically;“cloud‐computing”is

taking over the web; web is getting intelligent with new semantics; distributed

applicationsarebeingintegrated.Thesefacts,whichweremeredreamsafewyearsago,

are empowering the Internetwith new solutions and establishing it as the platform for

everything:productivity,entertainment,research,leisure…

CLOUD‐COMPUTING

NewcomputingparadigmsarechangingtheInternetatthearchitecturelevel.GRID

[21] architectures are the new solution for distributed computing. Virtualization

improvements [22] make virtual machines almost as powerful as real ones. “Cloud‐

computing” [23] uses the best of both to offer an online development environment.

MicrosoftwiththeAzureServicesPlatform20,AmazonwiththeElasticComputeCloud21or

Googlewith its App Engine22offer access to virtualmachineswhere anyone can deploy

applications which will use distributed resources to guarantee real‐time scalability,

flexibilityandavailability.

Following the same paradigm trend, new web applications and web applications

suites are replacing traditionaldesktopapps. For instance,Microsoft’s Live23suiteoffers

almostalltheOfficesuitetoolsonlineandGoogle24alsohastheessentialproductivitytools

online,inthe“cloud”.

20 Azure Services Platform: http://www.microsoft.com/azure/default.mspx 21 Amazon Elastic Compute Cloud: http://aws.amazon.com/ec2 22 Google App Engine: http://code.google.com/appengine 23 Microsoft Live: http://www.live.com 24 Google Apps: http://www.google.com/apps

Page 17: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

17

INFORMATIONINTEGRATION

Considering information integration tools one can explore mashups and web

desktops.Popularmashupapplicationsarepersonalandcustomizablewebportals,made

with gadgets that access almost any web application. Netvibes25is definitely the most

completepersonalportalintheWeb.However,themostfamousisGoogle’siGoogle26.Both

offer, in a simple interface, the ability to customize a page with any gadgets we want.

Availablegadgetsincludee‐mailaccess,calendars,to‐dolists,newsreadersandalmostany

interestingtooltoincludeinasingleportal.

Fig. 6 ­ iGooglegadgetinterfacestub

Web desktops are web applications that simulate the traditional desktop

environment:there’swallpaper,iconstoaccessapplications,trashbin,taskbarandmenus

forapplications.eyeOS27isacloudcomputingoperatingsystemallowinganyusertowork

online in a vast set of applications. Besides this, it is also an open source development

platform:userscancreatetheirapplicationsandinstallthemontheirwebdesktop.

25 Netvibes: http://www.netvibes.com 26 iGoogle: http://www.google.com/ig 27 eyeOS: http://eyeos.org

Page 18: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

18

DATAVISUALIZATION

Other interesting area is data visualization. Traditionally, search results are listed

withasimpledescription.However,newsearchengineslikeViewzi28orSearchme29offer

resultsindifferentinterfaces.Theresultsarepresentedinamuchmorevisuallyappealing

interface. Screenshots are taken from the pages and show in grids or lists. Results are

orderedbydate to formachronological sequence. Information isgathered fromdistinct

searchenginesinordertobetterranktheresults.Contextrelationsareestablishedamong

resultstocreateavisualrelationaltree.Thedistinctvisualizationsofthesameresultsare

importantastheycanofferdistinctiveinsightsonthesamedata.Aiminganimproveduser

interactionandgreaterusagesatisfaction,thesetoolsrelyonAJAX,FlashorSilverlightto

createcaptivatingandusableinterfaces.

Fig. 7 ­ Viewziresultgridforgen2phensearch

28 Viewzi: http://www.viewzi.com 29 Searchme: http://www.searchme.com

Page 19: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

19

SUMMARY

All the presented applications and interfaces are new solutions that are being

consideredinseveralthematicfields.Theyrepresentthefirststeptothenextgeneration

ofwebapplicationsandopenthedoortoanewlevelofuserinteraction.

Thisnewwaveofwebapplicationswillhaverepercussionsonbioinformatics.New

applications like iBioinformatics andBioDesktop or new result visualization tools could

leavetheirmarkinthebioinformaticsworld.

From the iGoogle and Netvibes example one could develop a similar portal,

integrating gadgets and applications in a single interface. iBioinformatics or BioVibes

wouldrepresenta leap forward in integrationandpersonalization. Ifonecouldcreatea

large range of services in the gadget repository, any research could customize the

applicationaccordingtohisneeds,thus,creatinghisownpersonalmetaapplication.

BioDesktop or BiOS could be an EyeOS based bioinformatics and biomedical web

desktop.Followingthedesktopmetaphor,onecouldcreateawebdesktopimplementation

containingapplicationsandtoolsusefulforresearchers.Anyusercouldthenhavehisown

personaldesktoponline,customizedaccordingtohisownneedsandtaste.

Integrationplaysalargeroleinthefutureofbioinformatics,butdatavisualizationis

alsoimportant.Webscreenshotsareusefultoshowapreviewofthepagewe’researching.

Thisideacouldbeappliedtobioinformaticssearchresults,showingpathwaypreviewsor

proteinstructurepreviews.Arrangingtheresults ingridsor listsandusingtechnologies

likeAJAX,FlashorSilverlighttocreatenewinterfacesonecoulddevelopinterestingand

usefulapplications.

Page 20: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

20

 

CONCLUSION 

Bioinformatics applications are evolving. Evolution isn’t a simple process and

choosingtherightpath isn’ta trivial task.Thisevolutionprocess isusuallysustainedby

largeprojectsliketheHumanGenomeProjectafewyearsagoortheEuropeanGEN2PHEN

projectnow.

As bioinformatics is evolving, so are other software applications. The trend is to

move the software to theweb and tomake it available, freely, to the entireworld. This

processmaybecomplex,but intheend, thepositiveaspectsruleoverthetradeoffs that

havetobemade.

Forbioinformatics,continuingthisridealongwithstate‐of‐the‐artwebtechnologies

isatremendoustask.Thelifesciencesareaisdefinitelyoneoftheareaswheretheamount

ofdata is larger, andwhere thedifferencesbetweenapplications and services aremore

noticeable. This leads to an enormous complexity in integration heterogeneous

informationsources.

Despite these facts, several groups areworking to solve integrationproblems and

they have several approaches. Semantic web concepts for better machine‐machine

exchanges or “proprietary” integration frameworks using hard‐coded concept mapping

aresolutionscurrentlyunderdevelopment.However,thereisn’tanyheavenlysolutionfor

these problems. Fully automatic and dynamic information integration hasn’t yet been

achievedandisstillsciencefiction.

Hopefully,usingthepresentedperspectivesandusingmoreconceptsfromsuccess

casesinotherareaslikeentertainmentorCRM,willenhancecurrentbioinformaticsweb

applicationsandempowerdeveloperswithtoolstodesignnewones.

Page 21: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

21

REFERENCES 

1. Berners‐Lee,T.,Hendler,J.,Lassila,O.:TheSemanticWeb.SciAm284 (2001)34‐

43

2. Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics.

JournalofBiomedicalInformatics41 (2008)687‐693

3. Fielding,R.:SemanticWebServicesChallenge:ArchitecturalStylesandtheDesign

ofNetwork‐basedSoftwareArchitectures.SemanticWebServicesChallenge:Challengeon

Automating Web Services Mediation, Choreography and Discovery: 2006; Stanford

University,USA(2000)

4. Belleau,F.,Nolin,M.‐A.,Tourigny,N.,Rigault,P.,Morissette,J.:Bio2RDF:Towardsa

mashuptobuildbioinformaticsknowledgesystems.JournalofBiomedicalInformatics41 

(2008)706‐716

5. Splendiani, A.: RDFScape: Semantic Web meets Systems Biology. BMC

Bioinformatics9 (2008)S6

6. Carole Anne, G., David Charles De, R.: myExperiment: social networking for

workflow‐usinge‐scientists.Proceedingsofthe2ndworkshoponWorkflowsinsupportof

large‐scalescience.ACM,Monterey,California,USA(2007)

7. Cardoso, J., Sheth, A.: Semantic E‐Workflow Composition. Journal of Intelligent

InformationSystems(2003)

8. Ludascher, B., Altintas, I., Berkley, C., Higgings, D., Jaeger, E., Jones,M., Lee, E.A.,

Tao, J., Zhao, Y.: Taverna: Scientific Workflow Management and the Kepler System.

ResearchArticles,ConcurrencyandComputation:Practice&Experience18 (2006)1039‐

1065

9. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T.,

Glover,K.,Pocock,M.R.,Wipat,A.,Li,P.:Taverna:atoolforthecompositionandenactment

ofbioinformaticsworkflows.Bioinformatics20 (2004)3045‐3054

10. Wilkinson, M., Links, M.: BioMoby: An open source biological web services

proposal.BriefBioinform3 (2002)331‐341

11. Cheung,K.‐H.,Yip,K.Y.,Townsend, J.P., Scotch,M.:HCLS2.0/3.0:Healthcareand

lifesciencesdatamashupusingWeb2.0/3.0.JournalofBiomedicalInformatics41 (2008)

694‐705

Page 22: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

22

12. deKnikker,R.,Guo,Y., Li, J.‐l.,Kwan,A.,Yip,K.,Cheung,D.,Cheung,K.‐H.:Aweb

services choreography scenario for interoperating bioinformatics applications. BMC

Bioinformatics5 (2004)25

13. Margaria, T., Kubczak, C., Steffen, B.: Bio‐jETI: a service integration, design, and

provisioning platform for orchestrated bioinformatics processes. BMC Bioinformatics 9 

(2008)S12

14. DiBernardo, M., Pottinger, R., Wilkinson, M.: Semi‐automatic web service

compositionforthe lifesciencesusingtheBioMobysemanticwebframework. Journalof

BiomedicalInformatics41 (2008)837‐847

15. Pinheiro, M., Afreixo, V., Moura, G., Freitas, A., Santos, M.A.S., Oliveira, J.L.:

Statistical, computational and visualization methodologies to unveil gene primary

structurefeatures.Vol.vol.45,n.¬∫2(2006)p.163‐168

16. Joel,A.,Laura,C.,Manuel,A.S.S.,JoséLuis,O.:Collaborativeworkonmicroarrays

usingMAGE‐ML.MGED9:ThemeetingoftheMicroarrayGeneExpressionDataSociety

17. Arrais, J., Santos, B., Fernandes, J., Carreto, L., Santos, M., A. S., Oliveira, J.L.:

GeneBrowser: an approach for integration and functional classification of genomic data.

Vol.vol.4,n.º3(2007)

18. Lopes, P.: Service Integration for Knowledge Extraction. Electronics,

Telecommunications and Informatics Department, Vol. Master of Science. University of

Aveiro,Aveiro(2008)

19. Lopes, P., Arrais, J., Oliveira, J.L.: Dynamic Service Integration using Web‐based

Workflows. In: Society, A.C. (ed.): 10th International Conference on Information

IntegrationandWebApplications&Services.Association forComputerMachinery,Linz,

Austria(2008)622‐625

20. Oliveira,J.L.,Dias,G.M.S.,Oliveira,I.F.C.,Rocha,P.D.N.S.d.,Hermosilla,I.,Vicente,J.,

Spiteri, I.,Martin‐Sánchez, F., Pereira , A.M.M.d.S.: DISEASECARD: AWeb‐based Tool for

the Collaborative Integration of Genetic and Medical Information. 5th International

Symposium,ISBMDA2004:BiologicalandMedicalDataAnalysis(2004)409‐417

21. Nadeem, F., Yousaf, M.M., Ali, M.: Grid Performance Prediction: Requirements,

Framework,andModels.EmergingTechnologies,2006.ICET'06.InternationalConference

on(2006)695‐702

22. Chen,W.,Lu,H.,Shen,L.,Wang,Z.,Xiao,N.,Chen,D.:ANovelHardwareAssisted

Full Virtualization Technique. Young Computer Scientists, 2008. ICYCS 2008. The 9th

InternationalConferencefor(2008)1292‐1297

Page 23: From Genotype to Phenotype - Future Perspectives on Data and Services Integration

23

23. Vouk,M.A.:Cloudcomputing‐Issues,researchandimplementations.Information

TechnologyInterfaces,2008.ITI2008.30thInternationalConferenceon(2008)31‐40