how and why novartis is exploiting grid technology? hpc and semantic web prof. manuel c. peitsch,...

35
How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

Upload: alexandrina-jenkins

Post on 24-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

How and Why Novartis is Exploiting GRID

Technology?HPC and Semantic Web

Prof. Manuel C. Peitsch, PhD

Global Head of Systems Biology

Page 2: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Mechanism-based Drug Discovery

Understanding Disease

Pathways elucidation

Target validation

Clinical PoC

New drug candidates (to be tested in PoC studies)

Reduce project life cycle

Increase PoS after D3 (Lead optimisation)

The Challenges of Drug Discovery

Systems Biology: Combination of *Omics & Mathematical Modelling

Page 3: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Japan• Oncology• Diabetes• Cardiovascular

Austria• Autoimmunity

Great Britain• Respiratory• Gastrointestinal

Switzerland• Muscular and Bone• Nervous system• Oncology• Transplantation• Ophthalmology• Genome and Proteome Sciences• Discovery Techologies• Discovery Chemistry• Protease Platform• GPCR

United States• Diabetes• Infectious diseases• Cardiovascular• Oncology• Discovery Techologies• Discovery Chemistry• Animal Models• Pathways• Genome and Proteome

Sciences

Organizational complexity

Page 4: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Data and Information complexity

Raw data from instruments

Literature

Molecular Structure

S

1

S

2

L

3

L

4

E

5

K

6

G

7

L

8

D

9

G

10

A

11

K

12

K

13

A

14

V

15

G

16

G

17

L

18

G

19

K

20

L

21

G

22

K

23

D

24

A

25

V

26

E

27

D

28

L

29

E

30

S

31

V

32

G

33

K

34

G

35

A

36

V

37

H

38

D

39

V

40

K

41

D

42

40 30 20 10

V

43

L

44

D

45

S

46

V

47

L

48

1

S

1

S

2

L

3

L

4

E

5

K

6

G

7

L

8

D

9

G

10

A

11

K

12

K

13

A

14

V

15

G

16

G

17

L

18

G

19

K

20

L

21

G

22

K

23

D

24

A

25

V

26

E

27

D

28

L

29

E

30

S

31

V

32

G

33

K

34

G

35

A

36

V

37

H

38

D

39

V

40

K

41

D

42

40 30 20 10

V

43

L

44

D

45

S

46

V

47

L

48

1

Mass (m/z)

% I

nte

ns

ity

1500 2200 2900 3600 4300 5000

50

100

3876

.3

2738

.9

2324

.7

2495

.6

3832

.1

4174

.9

2081

.1

4503

.2

2981

.5

2623

.8

3321

.5

3717

.1

3491

.6

4059

.6

2795

.8

2209

.3

3094

.331

67.7

4290

.3

1838

.1

1652

.2

1911

.5

b27

b42 - D

b30

b38

y39 -D9

y11

y27

y33

y18 [M+H]+

y35

b39-D

b28-D (y26)

y24 -Db24-D (y22)

y20 -D

b23

b45 - D

Genomics and Proteomics

Page 5: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

The Vision

Computational life science and HPC

GRIDs

People Networks

Data Information and Knowledge GRID Knowledge Space /

Semantic Web

Enable and transform the Drug

Discovery process through:

- Comprehensive and reliable Data

and Information

- Seamless information integration

for easy navigation

- Turning Data into Knowledge

using in silico science

- Simulate biomolecular processes

using in silico science

- E-Collaboration and v-communities

Page 6: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Computational Aspects in Drug Discovery

Targetfinding

Targetvalidation

Leadfinding

Leadoptim.

Bioinformatics LabMacromolecular

Structure & Function LabComputationalChemistry Lab

Page 7: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Signal Transduction Networks

5

- 30- 25- 20- 15- 10- 50

0 50 100 150 200

0 1 2 3 4 5-2

0

2

4

6

time

contr

ol

cyto

0 1 2 3 4 5-1

0

1

2

3

time

nuc

0 1 2 3 4 5-2

0

2

4

6

time

dru

g

0 1 2 3 4 5-1

0

1

2

3

time

...

Mathematical Representation

Omics experiments

Page 8: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Human data

SNPDNA SampleSequencing

DB DB

SAPTranslate &Map/Align

Model &Map

DB DB

Disease associationValidated Targets

Virtual Drug DiscoveryIn Silico Docking

In Silico “Chemogenomics”Virtual Library DesignPredictive MedChem

Tox PK/PK ADME modelling

Functional and Structural insights

DBKinasesNRProteases

Structures &Modelling templates

Pro

tein

s

Compounds

QSAR

In Silico Drug Discovery

Page 9: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

3D-Crunch

In Silico Drug Discovery Pipeline: Can it be done?

ProductiveAutomated Protein

modelling email server

ProductiveAutomated Protein

modelling Web server

Genome scale Automated Protein modelling

SETI@Home

1990 1995 2000 2005

Protein Model Structure database

SETI@Home recognised as a leading new concept (ComputerWorld Award)

SWISS-MODEL and 3D-Crunch recognised as a leading new concept (ComputerWorld Award)

GeneCrunch

GeneCrunch recognised as a leading new concept (ComputerWorld Award)

First PC-GRID at Novartis

Docking in productionat Novartis

Automated ToxCheck and other CIx tools

Full TranscriptomeModelling at Novartis

First automated pipelines

UD recognised for visionary use of information technology in the category of Medicine (ComputerWorld Award)

In Silico Drug Discovery and

Chemogenomics pipeline

Page 10: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Novartis’ HPC Grid Strategy

Linux Clusters Shared Servers

PC GRID

ExternalCollaborations

Job

su

bm

issio

n layer

Page 11: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Influencing Biomolecular Processes

Target

Drug

Target = enzyme, receptor, nucleic acid, …Ligand = substrate, hormone, other messenger, ...

Target

ACTIVE

Ligand

INACTIVE

Page 12: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

PC Grid Success Story: Protein Kinase CK2 Inhibition

Target finding:

Protein Kinase CK2 has roles in cell growth, proliferation and survival.

Protein Kinase CK2 has a possible role cancer and its over expression has been associated with lymphoma.

Target validation:

To elucidate the different functions and roles of CK2 and confirm it as a drug target for oncology, one needs a potent and selective inhibitor.

Approach:

The problem was addressed by in silico screening (docking).

Page 13: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Virtual Screening by in silico Docking

> 400,000 Compounds

DockingProcess

andSelection

ofpossible

hits

< 10 Compound

s

Page 14: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Important results

ConclusionWe have identified a 7-substitued Indoloquinazoline compound as a novel inhibitor of protein kinase CK2 by virtual screening of 400 000 compounds, of which a dozen were selected for actual testing in a biochemical assay. The compound inhibits the enzymatic activity of CK2 with an IC50 value of 80 nM, making it the mostpotent inhibitor of this enzyme ever reported. Its high potency, associated with high selectivity, provides a valuable tool for the study of the biological function of CK2.

“The reported work clearly shows that large database docking in conjunction with appropriate scoring and filtering processes can be useful in medicinal chemistry. This approach has reached a maturation stage where it can start contributing to the lead finding process. At the time of this study, nearly one month was necessary to complete such a docking experiment in our laboratory settings. The Grid computing architecture recently developed by United Devices allows us to now perform the same task in less than five working days using the power of hundreds of desktop PC’s. High-throughput docking has therefore acquired the status of a routine screening technique.”

“The reported work clearly shows that large database docking in conjunction with appropriate scoring and filtering processes can be useful in medicinal chemistry. This approach has reached a maturation stage where it can start contributing to the lead finding process. At the time of this study, nearly one month was necessary to complete such a docking experiment in our laboratory settings. The Grid computing architecture recently developed by United Devices allows us to now perform the same task in less than five working days using the power of hundreds of desktop PC’s. High-throughput docking has therefore acquired the status of a routine screening technique.”

Page 15: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Major benefits of GRID computing

Optimization of resources utilization: HPC platforms usage is maximized and Technology expertise is

shared. Response to additional performance requirements is easier and

faster No service downtime due to possibility to run same job on many

platforms across different sites.

Enable cross business units collaboration and synergies: Single efficient access path to Data and Compute resources. Tools are easily exchanged between scientists/programs.

Favor “out of the box” thinking: Apply HPC to areas which one would not even have considered a

year ago. This has created a fertile ground for a new paradigms in Drug Discovery leading to Business Process transformation.

Page 16: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Performance of the PC-GRID (today)

Computing Power:

Theoretical >5 TeraFLOPS harvested from 3000 PCs in all geographical locations.

Acceleration of the in silico Docking process versus 1 standard 2002 PC (start of project): ~4000x

Financial:

Immediate savings in excess of 2m$.

No need for additional data centre to support this computing power.

Optimally use of existing hardware (associates’ PCs)

Page 17: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Building a GRID: Management focus

You need a champion!

Do not punctuate every sentence with the GRID word and avoid the Hype!

Demonstrate value through pilots: Think “Iterative Improvement”. The conceptual layers

are there, prototype are emerging, improvements and optimization is essential, maturity will follow

Leadership, transcendence, entrepreneurship and tenacity are the essence of transformation! Concepts are easy to draw on a napkin over beer!

But new and great things are hard to achieve!

Use external goodwill to create internal acceptance!

Page 18: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Peru

Community projects help with acceptance

Page 19: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Building a GRID: User base

You need a clearly defined and communicated HP Computing strategy. Address unmet computational needs.

Apply HPC to areas which one would not even have considered two years ago. This has created a fertile ground for a new paradigms in Drug Discovery leading to Business Process transformation. Are all problems “GRIDable”?

Further applications: Sequence identification in proteomics from LC-MS/MS

data

Text Mining and semantic Web infrastructure

Page 20: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Building a GRID: Software

The Software licensing models will have to evolve

Do not stop because of software licensing issues.

Show success with freeware and home grown algorithms.

Demonstrate business value and cost leadership.

Opportunity to develop your own code?

Unification of HPC applications environment:

Ensure that applications can run on maximum number of systems.

Introduce HPC software management:

Influence licensing models. The classical models do not fit the GRID and HPC paradigm.

Page 21: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Building a GRID: PC owners

Education and awareness. Ensure that the HelpDesk is well trained and gives

the right answers.

Ensure that PC owners know about the REAL impacts, including network.

The PCs are company and not personal assets! Strategy to use them when they are idle is not a

user but a company decision.

Address power saving policies in a transparent manner.

Page 22: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Knowledge Space - Vision

The "Knowledge Space Portal” is a Drug Discovery oriented implementation of the Semantic Web. Through a single customizable interface it:

• Federates heterogeneous data resources and provide precise organization of the content

• Provides quick and intuitive access to information

• Provides data extraction, analysis and exploration tools

• Allows data integration, data exchange and interoperability of applications

• Provides mechanisms for data capture and annotation

• Provides knowledge sharing and collaborative tools

Page 23: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Basic principles behind the Knowledge Space

The Knowledge Space consists of:

The collection of all types of data and information within the scope of interest defined by a particular business. There is no conceptual difference between internal and external data/information.

The Meta Data and the Knowledge Map which describe the collection in terms of content and location.

The Text Mining platform which allows the identification of entities (using vocabularies) and the concepts they belong to using ontologies.

The Ultralinker, which associates identified entities and concepts with specific contextual rules.

A user interface.

Page 24: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

What is an Ultralink?

The Ultralink is an “intelligent” context-sensitive Hyperlink created at run time by the Ultralinker.

The Ultralink is generally a menu of links instead of a single link.

This menu will only offers sensible actions/options:

No dead ends due to a verification process ensuring that the link has a target.

The Ultralink provides direct interaction between any type of entity (gene name, compound name, mode of action, disease name, company name, etc… with an appropriate set of tools and resources as defined by the rules encoded in the Ultralinker.

The Ultralink functionality allows the selection of any portion of text in the Web browser and sends it as input to the Ultralinker for analysis and menu creation.

The Ultralink allows easy navigation across the information domains contained in the Knowledge Space.

Page 25: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

How the Ultralinker works

The Ultralinker is a Web service which analyses any information (such as a complete web pages) it receives for recognisable entities using text mining and pattern recognition methods.

Each recognised item is mapped onto the ontologies and the Knowledge Map.

The Expert System will define what can be done with the identified entities e.g.

If a gene name is recognised then Ultralinks are created to:

get its sequence and perform sequence similarity searches;

query genetic disorder databases and map it onto the chromosome;

produce a 3D structure by comparative modelling;

look for hits from High Throughput Screening;

etc…

Automated predefined processes can thus be activated by a single click (Ultraaction or work-flow).

The Ultralinker will create a menu that will be sent to the User interface.

Page 26: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Ultralinker

SemanticSearch

Text Mining

Analytics

What constitutes the Knowledge Space

Internet

Other ResearchDocumentation

Chemistry

Biology

Literature

Comp. Inf.

Bioinformatics

Meta Data K map

ThesauriiOntologies

Rules

Definedworkflows

Page 27: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Knowledge Space Search Modes

Text Structure Concepts

Page 28: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Knowledge Space: Text search

ACE modulator Cholecystokinin modulator Metalloprotease 4 modulatorACE-related carboxypeptidase modulator Chymase modulator Metalloprotease 7 modulatorAcrosin modulator Chymotrypsin modulator Metalloprotease 8 modulatorAggrecanase modulator Clipsin modulator Metalloprotease 9 modulatorAlpha 1 protease modulator Collagenase modulator Metalloprotease modulatorAlpha 1 proteinase inhibitor Complement cascade modulator NAALADase modulatorAlpha 1 proteinase modulator Complement factor modulator Pepsin modulatorAminopeptidase modulator Cysteine protease modulator Peptidase modulatorAmyloid protease modulator Dipeptidase modulator Plasmepsin modulatorAntitrypsin modulator Elastase modulator Plasmin modulatorAspartic protease modulator Endopeptidase modulator Protease inhibitorAtriopeptidase modulator Endothelin converting enzyme modulator Protease stimulantCalpain inhibitor Factor IX modulator Proteasome inhibitorCalpain modulator Factor VII modulator Proteasome modulatorCarboxypeptidase modulator Factor X modulator Renin modulatorCaspase modulator Factor XII modulator Secretase modulatorCathepsin B modulator Gelatinase modulator Serine protease modulatorCathepsin D modulator Interleukin 1 converting enzyme modulator Thrombin modulatorCathepsin F modulator Kallikrein modulator Thrombokinase modulatorCathepsin G modulator Metalloprotease 1 modulator Trypsin modulatorCathepsin K modulator Metalloprotease 11 modulator Tryptase modulatorCathepsin L modulator Metalloprotease 12 modulator Ubiquitin-specific protease inhibitorCathepsin modulator Metalloprotease 13 modulator Ubiquitin-specific protease modulatorCathepsin S modulator Metalloprotease 2 modulator Ubiquitin-specific protease stimulantCathepsin V modulator Metalloprotease 3 modulator Urokinase modulatorCathepsin X modulator Viral protease modulator

AntiviralCMV protease inhibitorCMV protease modulatorHepatitis C protease inhibitorHepatitis C protease modulatorHerpes simplex virus protease inhibitorHerpes simplex virus protease modulatorHIV protease inhibitorHIV protease modulatorHIV-1 protease inhibitorHIV-1 protease modulatorHIV-2 protease inhibitorHIV-2 protease modulatorNS3 protease inhibitorNS3 protease modulatorPicornavirus protease inhibitorPicornavirus protease modulator

Expansion: EMTREE + Novartis proprietary dictionary expansion for protease modulators + respective synonyms

Page 29: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Analysis tools

Display-Navigation-UltralinkProtease modulator in Literature DB (Medline-Embase)

Sort capabilities

Easy navigation in record titles

Search report: Number of Docs, Key-words extracted

Ranking value and access to document

Page 30: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Document view

Take advantage of the full-text article provided by PubMed

Page 31: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Analysis Tools

Page 32: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Data Analysis Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects

Univariate - Companies Univariate - MOA

Univariate - Diseases conditionned by Companies Clustering Diseases -MOAs

Page 33: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Graph Navigator Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects

Page 34: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Clustering

Page 35: How and Why Novartis is Exploiting GRID Technology? HPC and Semantic Web Prof. Manuel C. Peitsch, PhD Global Head of Systems Biology

HPTS Asilomar / M. Peitsch / September, 2005

Chemistry, Chemoinformatics and Structural Biology