finding our way in information space phil ashworth phil scordis

18
Finding our way in information space Phil Ashworth Phil Scordis

Upload: annabella-barton

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding our way in information space Phil Ashworth Phil Scordis

Finding our way in information space

Phil Ashworth

Phil Scordis

Page 2: Finding our way in information space Phil Ashworth Phil Scordis

UCB: The Next Generation Biopharmaceutical Leader

R&D activities at 10 global sitesR&D Headcount = 2,100 (August 2007)

Braine (Be)

Atlanta (US)

Bulle (CH)

Tokyo (Jap)

Slough & Cambridge (UK)

RTP (US)

Rochester (US)

Shannon (Ire)

Monheim (De)

Global biopharmaceutical company with specialist focus:Neurology, Inflammation and Oncology

Proven sales and marketing – creating global brands

• Keppra®, Xyzal®, Zyrtec®

Revenues of €3.5 billion in 2006 (pro forma)

Successfully transformed with:

• Celltech acquisition in 2004

• Integration of SCHWARZ PHARMA in September 2007

Over 10,000 employees across more than 40 countries

Listed on EURONEXT (Brussels); current market cap of €7.5 bn

Page 3: Finding our way in information space Phil Ashworth Phil Scordis

Apology

Health Warning

• We are still in the middle of all of this, I don’t have all of the answers

Page 4: Finding our way in information space Phil Ashworth Phil Scordis

History

Research and Development in UCB

• Comes from integration of Schwarz Pharma, Celltech, OGS, Chiroscience, Darwin

Variety of data source issues

• Silos, vendor systems, structured, un-structured etc.

Data integration

• A mess of legacy approaches and many situations where no attempt has been made.

• To warehouse or not to warehouse?• After a rollout of a research warehouse, at least two distinct examples of

different working practice “break” the model

• Difficult to extend and rebuild warehouses. – Just another rigid system

Page 5: Finding our way in information space Phil Ashworth Phil Scordis

Principles and Ideals of the Semantic Web

“The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [Tim Berners-Lee et al 2001]

Ideal environment

• Starting from scratch, building connectivity

• Start defining the problem space from a blank page

How applicable is this attractive approach to us?

Lets find out……

Page 6: Finding our way in information space Phil Ashworth Phil Scordis

The Dream

What did we want

• Facilitating UCB’s pipeline faster to market

• Better ROI, an environment in which investment in data generation can be exploited to the full.

• Breaking down data boundaries

Major Areas for Improvement

• Operational Orchestration

• Data Integration

• Knowledge discovery and creation

The fantasy

• Legacy systems remain in place where appropriate

• Data integration is seamless, facilitates aggregation, query based on the meaning of the data

• Facilitated exploration of data and exploitation of connections

Page 7: Finding our way in information space Phil Ashworth Phil Scordis

Starting the journey

Heard of others oscillating around the semantic vs warehouse question

• Large investment in both technologies, building components, rolling out home built solutions

Our initial investment

• Minimal resource

• Limited to vendor applications (best of breed) rather than building our own• But not an all or nothing approach offered by a some

Our learning curve has been steep

• Made many mistakes

• Visited many dead ends

• Experienced limitations first hand

• Had many frustrations

Data Integration was our key goal

Page 8: Finding our way in information space Phil Ashworth Phil Scordis

Where to start

Principles of the Semantic Web

• Understanding the concepts of semantics – so much reading.

Semantic Technologies

• Differences between the semantic and OO mindsets

Academia

• Some nice projects but, not enterprise orientated

Data Integration

• RDF• Has desirable flexibility inherent potential for integration

• OWL• Builds on top of RDF potential for rich descriptive framework, plus the power

of DL to facilitate Knowledge discovery through Reasoning• Making connections

• But our data is in relational systems!

Page 9: Finding our way in information space Phil Ashworth Phil Scordis

How to integrate: Getting RDF from RDB

RDF from RDB

• D2RQ • Offered the ability to read/query relational databases as RDF

• Limitations• Open source.

• Didn’t work on real world databases in our hands

• Concerns of query speed when using multiple data sources. Wanted asynchronous distributed environment

• Reasoning very slow across multiple data sources, Forward Chaining

• Cerebra server• Tantalising prospect. A dead-end? Recent changes within company meant

that direction for tool was uncertain.

• SDS – Interesting prospect (www.insilicodiscovery.com)

• Integrated query environment across a variety of data sources (relational, excel, web services etc.)

• Distributed asynchronous computing model

• No RDF!

Page 10: Finding our way in information space Phil Ashworth Phil Scordis

How to integrate: RDF Stores / Warehouse

Triple stores• Allegrograph – Franz.

• Sesame

Problems

• Immature technology• data volumes are limited wrt to life science data volumes

• Security and backup – primitive

• Limited Integration with other tools.• Needed tighter integration – queries not being carried out directly in RDF

stores. Again slow queries & reasoning from tools due to forward chaining.

• Still have data duplication issues and requirements for ETL processes

One step forward, two steps back!

Page 11: Finding our way in information space Phil Ashworth Phil Scordis

How to integrate: Development Tools

Few professional development and deployment environments

• Roll your own vs the use of open source

Protégé

• Great for model development but lacked integration with other tools (when we looked)

TopBraidComposer - TopQuadrant

• Excellent functionality out of the box. Easy interface, File imports, navigation etc

• Integrated with a variety of third party systems. • D2RQ, Allegrograph, Sesame, Jena, Oracle

• But still could not do everything we wanted it to.

• TopQuadrant supported our limited resource to enhance our understanding and knowledge.

• TopBraidLive one of the first development –> deployment applications

Reasoners

• Several looked at - Each had their quirks

• None did as we thought or wanted with the data volume we had.

• Used Rules to achieve what we needed.• Isn’t this cheating?

Page 12: Finding our way in information space Phil Ashworth Phil Scordis

Stop the journey – we are getting off

We have tried to achieve data integration chasing several avenues

• RDF from RDB

• RDF warehouse• Via RDB data -> txt -> RDF -> RDF Store

• Semantic SOA, another approach• Pragmatic semantics

Now we understand the messages others have been trying to pass

• Blowing hot and cold on the whole idea

• Wavering over semantic vs conventional warehousing

• Heavy investment in home brew technology or enterprise environment

Is this a dead end?

Page 13: Finding our way in information space Phil Ashworth Phil Scordis

The end

Thanks for coming …

Page 14: Finding our way in information space Phil Ashworth Phil Scordis

Hang on, we are not giving up yet

RDF Stores

Ontologies

Data Integration Tools

Delivery Tools

Development Tools

Visualisation

We decided to persevere

• But we still don’t have a large amount of resource to throw at this

• We need to take a different path• Community action

• Collaboration

• There is a vibrant and active community out there• W3C …

• Involved in direction and calling for standards

Page 15: Finding our way in information space Phil Ashworth Phil Scordis

So where are we today?

Page 16: Finding our way in information space Phil Ashworth Phil Scordis

Driving change

TopBraidComposer - A semantic development environment using open source and limited data integration tools.

• Help with SDS

• Tighter Integration with RDF stores• TQ also had to drive other vendors to provide functionality for them

• Many other changes as we pushed the boundaries of the tool

• TopBraidLive looks very promising as an easy deployment environment

SDS - A data integration platform, enterprise ready, lacking a semantic direction

• SPARQL integration (Not just RDF from RDB, RDF from RDB, Excel, web services)• We believe this is key to our future strategy

• Changes to their interfaces, tools and capabilities

• Integration with TBC

UCB is driving collaborative development

• Helping bring companies together (A big thank you to TQ and ISD)

• Helping drive the community

Page 17: Finding our way in information space Phil Ashworth Phil Scordis

In Summary

The semantic wave is too large to surf alone

• Too unpredictable to control

There are some big hurdles to overcome

• Integration, tools, enterprise solutions, visualisation, orchestration

However we are committed to helping make things happen

• Always on the lookout for open-minded enthusiasts

• Committed to contribute to the community

Still believe that Semantic Technologies are part of the solution

• But it is not just something we can adopt (at the moment)

• It is still something we have to help forge so others can be adopters.

Page 18: Finding our way in information space Phil Ashworth Phil Scordis

Thank you

Any Advice Questions?