High-quality integration using Semantics
Lab of the Future, 2019 Wellcome Genome Campus, Cambridge, UK
Etzard Stolte, PhD – Global Hd Knowledge Management PTD F. Hoffmann-La Roche
Die Zeit, 2019
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Genentech, R&D andcommercial operations US
R&D sites in Pharmaand Diagnostics
CHF 57 billion sales
R&D sites in pharma and diagnostics worldwide30
Roche Group Headquarters in Basel, Switzerland
employees worldwide94 442
Chugai, R&D and commercialoperations Japan
Roche - a global pioneer in pharmaceuticals and diagnostics
2
Among top 10 R&D investors worldwide across industries (for 2018)
CHF 11 billion invested into R&D
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Pharma Technical DevelopmentProcess development, formulation design, clinical manufacturing, device development
Research GenentechgRED
Pharma Technical Development
Research BaselpRED
DevelopmentPD
ManufacturingPT
3
• About 2500 FTE• 800 applications in use, 330 GxP• 130 applications used daily• Most applications are owned by other
functions (e.g. Research & Development)
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Integration Challenge
4
Complex Information Landscape across 800 systems, 127 used daily
Regulatory, Safety
• Enabling Computer System Validation (CSV), GxP and quality management systems and processes
• e.g. TrackWise, IDM, SpecA
Strategy, PM, MDM
• Tools, processes, governance, that consistently define and manage the critical data of an organization
• e.g. MDMS, Roformis, SAP
Lab & Manufacturing
• Platforms to control, manage and document all activities in the laboratory and manufacturing
• e.g. B/Pali, Damaz, PMX, cELN
Knowledge Mngt
• Tools, processes & governance to build a culture that values, shares, and re-uses our information and knowledge
• e.g. SysWiki, TP, Discourse,
Data/Inf Mngt
• Tools, policies, and best practices to ensure that data are understandable, trusted, accessible, and interoperability
• e.g. MIA, ECM, BaseCamp, CTMS
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Knowledge Management
5
Integration Roadmap
• Discovering, accessing and re-using internal & external data, information and knowledge sources more effectively• Complex information landscape, but one
integration platform• Integrate, not replace existing processes
& systems• Targeting the use of semantics where
e.g. master data or curation are missing• Focusing on quality, to increase our
opportunities for innovation• Culture / Mind change is a key goal for
each phasestructuredun-structured
text
ual
num
eric
al
Documents on e.g. fileshares
Document Management
Systems
LIMS
ELN
Files on e.g. FileShares
Sample Management
ERPWeb pages, Wikis
Instrument database
1. Semantic Integration Platform
2. Single Document Management
Process
4. Information LifeCycle
Management
3. Numerical Integration
2015 -2018
2017 -2020
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Hype Cycle for AI
6
Gartner, 2019
Quality!
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Example – NLP - Taxonomies
7
The foundation for any semantic understanding
• 86 taxonomies, internal & external, commercial & open-source & home-grown
• majority internally hosted
• no alignment / merging
• -> high ETL costs, but no run-time impact
• -> no consistent semantic context
• -> highly relevant & specific concepts
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Example – NLP – Concept Extraction & Matching
8
2018 Roche-wide announcement
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Example – NLP – Named Entity Recognition (NER)
9
Direct Data Access - show me the data behind this reference…
everywhere we encounter cryptic identifiers
Semantic Web detect identifiers and adds web links to source, e.g. to a LIMS, ELN, IDM, etc.
for example, a new link would open the analysis behind the ID
in a LIMS system, or an ELN record -> parsing is hard
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Examples – Model based integration
• A complete data catalogue across all data/information/knowledge systems
• Uses Minimal Attribute Set (8 attributes chosen to find all data)
• Data Models behind each attribute for higher accuracy
• Around 60 systems (LIMS, ELN, etc.) target out of 127 total
• Will offer Search front-end and URI (Unified Resource Identifiers)
• Incomplete models are key challenge
10
“Smart” Data Catalogue - Find all relevant experimental data
Find data at scale
Classify Visibility
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Examples – NLP - Natural Language Processing
11
AI / ML Examples
NN - Automated Translation• On the fly translation in good
quality• Challenge: documents contain
many, many data types; complex access controls across sytems
NLP – Expert Finder• Match colleagues to concepts• Challenge: meta data quality
NLP – Fact extraction• Return relevant details, e.g. table
rows / columns; text snippets; database entries
• Challenge: text understanding w/ brief context
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Integration methods have evolved
12
Moving to loosely coupled, semantic approaches
monolithicone system
1980
client/servercommon codebase
1990
components / APIscommon frameworks
2000
SOA / iPaaScommon protocols
2010
fabricscommon semantics
2020
Syntax
Structure
Semantic
Interfaces & Implicit Semantics
Artificial Intelligence & Explicit Semantics
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
KM Integration Platform
Agility, Costs, IP control• Open source, in-house• Scalable to large user & data numbers
• Enable by semantic & machine learning services• Processing pipelines to segment, enrich & index
data / information / knowledge
• Runs on internal High Performance Computer cluster
13
Buy if you can, build if you must
Semantics, Enrichment, Translation, Machine Learning, Analytics2016
Semantic & Numerical Integration
Experiment / Pilot
FAIR (Findable, Accessible, Integrable, Re-usable)2017Capture & Collaboration
Pilot / Service
Document Systems(across Pharma)
Numerical Databases(100 identified)
Websites(internal & external)
KM Integration Platform (KIP)Semantic & Numerical Integration
2016
Master Data & Registration systems
Taxonomies &Ontologies other…
Service
Models
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Knowledge Management
14
Our platform will be worth your while
adapted from PT KM Grab&Go Deck 2015
Company Benefits from KM program
Merck 81% increase in employee engagement
Chevron $2 Billion reduction in annual operating costs within 7 years
Dow Chemical $4 Million savings in first year, $100 million total
Schlumberger 300% increase in daily output productivity
Hewlett-Packard 50% reduction in customer support operating costs in 2 years
A mature KM approach will deliver > 2 x return on investmentAmerican Productivity & Quality Center (APQC) Benchmark Study
5 x in PTD
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Next Generation Knowledge Management @ PTD
• “KM is such a hot topic right now”• “Super. Search is ‘the’ most important capability in Roche”• “.. I am really exited to see what you guys created”• “I wanted to let you and team know that I love this tool..”• “This is really nice!” “AWESOME!”• “Whoa!!! That's incredible!!!!!” • “I like it. Thanks a bunch!”• “.. this search is really great!”• "This is a great achievement! “• “Thanks to you .. this work is amazing!”
15
Our focus on quality has been well received
• “Well done! Thank you for your leadership.”
• It's an amazing tool! I am really impressed by the convenience and speed! Thank you so much!”
• “Supercool ….. really great!”• “I cannot remember another IT project that
went so well”• “Outstanding outcome ... in a short time…“• “Super, what you guys are delivering”
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Thank You!Next Generation Knowledge Management @ PTD
16
Long-term vision
Data Dictionary Data Commons Insight Pipeline
Personal Cloud Semantic Integration Automated Quality Controls
Standardization& Efficiency
Quality & Compliance
Semantic Integration & Text Unterstanding
Numerical Integration & Analysis
Collaboration& Insights
FAIR Data & Information
Security & Privacy Integrated Laboratory
Expectationsleading tonew trends
Industry &Technology-trends driving
Meaning Based Computing
Project portfolio, strategic goals
© Copyright 2018 FA Hoffmann La Roche – [email protected] - The information contained herein is subject to change without notice.
Doing now what patients need next