open-source bioinformatics for data scientists with amanda schierz

24
Open Source Bioinformatics for Data Scientists Amanda Schierz

Upload: jessica-willis

Post on 21-Apr-2017

1.180 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Open Source Bioinformatics for Data Scientists

Amanda Schierz

Page 2: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Recent Projects! Druggability prediction

! 3D structure ! Protein Sequence ! Predict a protein’s druggability based on it’s position in the

protein-protein interaction network ! Drug Resistance

! Therapeutic opportunities ! Identification of new gene targets for cancer ! Are they Druggable?

! Candidate Compounds ! Compounds more likely to be a hit for a bioassay

Page 3: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Drug Discovery ProcessEarly-stage: Discovery Optimisation ADMET Clinical

Trials Paperwork

• Target Evaluation • Compound

Screening

• Computational Chemistry • Structure-

based Drug Design

• Absorption Distribution Metabolism Excretion Toxicity

• Patient Stratification • Protocol

• Drug Approval

Page 4: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Biology 101! There is a many to many relationship between Gene and Protein

! A Protein is a large molecule; a Drug is a small molecule

! Gene Expression data ! The amount of a gene produced. Epigenetics. ! highly / lowly / over / under – fold change ! Warning: Platforms and preprocessing

! Gene Copy Number ! Loss / Gain a gene ! On one strand or 2?

! There are only approx. 400 genetic targets of approved pharmaceuticals ! Only from a handful of Protein Families ! Desperate need for diversity

Page 5: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

! TCGGTCAGGCTAGCCGTTACAGGG

Page 6: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Target Identification! Prediction of disease-associated genes

! patient level ! gene / protein level ! network

! Prediction of mechanisms of disease ! Epigenetic targets – meta-targets

! Prediction of protein function – from sequence / structure / network ! multi-class; multi-label

! Prediction of 3D structure

! Prediction of protein binding ! New immune targets

Page 7: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Druggability Prediction! Drugs – FDA Approved ~350 Very strict – know

therapeutic benefit

! Drugbank – loose – binds but no therapeutic benefit

! Tractable or Druggable ! Rule of 5 compliant

! Precedence-based - Druggable families / Homology - Ligand-based scoring - Uniprot, bioassays – EBI and Pubchem bioassay - Statistical analysis

Page 8: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Druggability Prediction! Sequence Analysis

- Amino Acid motifs and composition - Physicochemical descriptors

- infinite amount – very wide data set - Supervised classification

! FASTA - can download all human sequences from Uniprot >seq0 FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTD

! R ProtR ; R Bioconductor

! species,mhc,peptide_length,A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V,scl1.lag1,scl2.lag1,scl1.lag2,scl2.lag2,scl1.2.lag1,scl2.1.lag1,scl1.2.lag2,scl2.1.lag2,AA,RA,NA,DA,CA,EA,QA,GA,HA,IA,LA,KA,MA,FA,PA,SA,TA,WA,YA,VA,AR,RR,NR,DR,CR ..... ,Schneider.Xr.K,Schneider.Xr.M,Schneider.Xr.F, Grantham.Xr.A,Grantham.Xr.R,

Page 9: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Druggability Prediction! 3D structure

- Pockets, surface area - Ligand interaction fingerprints - Supervised classification

Page 10: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

3D Structure! PDB, ProtDCal, PockDrug

Page 11: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Druggability Prediction! Interaction Network

! Many use cases ! Data from EBI and Y2H

! List of binary interactions ! Becareful 1: Data is inherently biased ! Becareful 2: Complex interactions

! R iGraph; Gephi for visualisation ! Topological properties ! Community analysis ! Subgraph analysis ! Statistical analysis, network analysis and supervised

classification

Page 12: Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Page 13: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Drug Resistance

Page 14: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Drug Resistance

Page 15: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Compound Bioactivity! Brute force mass screening

! 1000s compounds screened in batches

! Primary Assays; Secondary / confirmatory assays

! Can be binary classification or regression ! The IC50 is a measure of how effective a drug is. ! Active / inactive : IC50 threshold

! Goal is also to identify diverse compound structures ! Scaffold Hopping

! Same kind of method as Protein Sequence conversion ! Pharmacophore fingerprints

! https://www.chemaxon.com/free-software/

Page 16: Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Page 17: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Compound ADMET! Many use cases

! ADMET of hits ! Absorption ! Distribution ! Metabolism ! Excretion ! Toxicity

! Mutagenecity

! Protein binding

Page 18: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

General Resources! EBI European Bioinformatics Institute / Pubchem

! API ! Integrates several downloadable Data Sources (expression, Copy

Number, Bioassays, network, disease-specific) ! Baseline data (Normal not diseased)

! Protein Data Bank – 3D Structures

! DrugBank

! Cancer – The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC)

! Coding Tools – R Bioconductor , BioPerl, BioPython

! https://docs.chemaxon.com/display/docs/Documentation

Page 19: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

General Resources! canSAR database

! Integration of biological, pharmacological, chemical, structural biology and protein network data

Page 20: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Beware 101! Non-standard Gene names

! Some experiments Genes, some are Proteins

! We need new Drug Targets, different from established ones. ! Keep in mind when analysing results

! Cancer is difficult ! Drug resistance ! Data is not up with the science ! Tumour Heterogeneity

! Wide data = random patterns

! Different expression / sequencing platforms

Page 21: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Therapeutic Opportunities! Approximately only 350 - 400 protein targets

! DNA damage response (DDR) is essential for maintaining the genomic integrity of the cell ! Currently targeted by chemotherapy and radiation. Goal is for

small molecule targeting

! TCGA Patient Analysis: Expression, Copy Number Variation and Mutation data. ! 15 cancer disease types

! Telegraph March 2015 ! New drugs to tackle cancer cell weak spots could end

'scattergun' chemotherapy

Laurence H. Pearl, Amanda C. Schierz, Simon E. Ward, Bissan Al-Lazikani, Frances M. G. Pearl. Therapeutic opportunities within the DNA Damage Response. Nature Cancer Reviews

Page 22: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Therapeutic Opportunities! Statistical analysis of DDR deregulation in patients compared

to a random set of genes

! Druggability prediction of deregulated DDR genes

! Synthetic Lethality analysis of Yeast DDR orthologues ! Two genes are synthetic lethal if mutation of either alone is fine

but mutation of both leads to cell death. Targeting a gene that is synthetic lethal to a cancer-relevant mutation theoretically will kill only cancer cells.

Page 23: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Therapeutic Opportunities

Page 24: Open-Source Bioinformatics for Data Scientists with Amanda Schierz

DDR Pathway Signatures