cancer genes lists alfonso valencia structural and computational biology programme spanish national...

42
Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop From Genome to Proteome and Biological Function Brussels April 2008

Upload: elizabeth-ramos

Post on 27-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Cancer Genes lists

Alfonso ValenciaStructural and Computational Biology Programme

Spanish National Cancer Research CentreCNIO, Madrid

BioSapiens WorkshopFrom Genome to Proteome and Biological Function

BrusselsApril 2008

Page 2: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Cancer genes

Page 3: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Transcriptome classification of B-cell non.Hodgkins lymphomasMohit Aggarwal et al. Cancer Cell 2007

CGH and microarray data in Ewing sarcomasFerreira et al., Oncogene. 2007 Oct 22

QuickTime™ and a decompressor

are needed to see this picture.

Epigenetics The DNA Methylomes of Double-Stranded DNA Viruses Associated with Human CancerAgustin Fernandez-Fernandez1, ….. Osvaldo Graña2, Gonzalo Gomez-Lopez2, David G. Pisano2, Alfonso Valencia2, …… Manel Esteller1å

Page 4: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

BioSapiens

Network of ExcellenceNetwork of Excellence

€12 Million between 26 partners in 14 different countries

The objective of the BIOSAPIENS Network of Excellence is to provide a large-scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists.

The BioSapiens-sponsored project concentrated on the protein coding loci and in particular on the alternatively spliced products.This work is part of the BioSapiens efforts for the annotation of the human genome (www.biosapiens.info).

Page 5: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

_Line of action 1_: Making information about cancer genes accessible to experimental biologists.

The idea here is to take the lists of genes provided by experimental groups, starting with the one published by Sjoblom et al., (ref: Science. 2006 Oct 13;314(5797):268-274), and add the information/annotations provided by the different groups.Other gene lists will be added as they are published, what makes important to have the methods working as automatically as possible.We need proposals of groups on what they can provide. We have to avoid duplications.

Represent information for biologist. We can use the protein DAS or CARGO system (see http://cargo.bioinfo.cnio.es)

The aim in this chapter is to publish a rich resource of annotated cancer gene lists in a format useful for biologist. And the goal is to do it by summer this year.

DO IT !

Page 6: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

A web portal to integrate customized biological information.

Page 7: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

• CARGO is a configurable biological web portal designed as a tool to facilitate, integrate and visualize results from Internet resources, independently of their native format or access method through the use of small agents, called widgets (or BioWidgets).

• CARGO provides pieces of minimal, relevant and descriptive biological information.

• The tool is designed to be used by experimental biologists with no training in bioinformatics.

• Available at http://cargo2.bioinfo.cnio.es

Cases I, Pisano DG, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM. CARGO: a web portal to integrate customized biological information. PubMed 17483515.

Page 8: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop
Page 9: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

• Cargo has a iGoogle Gadget version.

• iGoogle Gadgets are simple HTML and JavaScript mini-applications served in iFrames that can be embedded in webpages and other apps.

Page 10: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

A widget for CARGO is described by an XML Document that contains several fields providing information and documentation.

Page 11: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

•How do widgets work?

3D filesSNP’s

PDB/seqalignments

•Ensembl request

Distributed Annotation System.

FTP

Asyncronous Javascript And Xml (AJAX).

Page 12: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

DAS Infrastructure

By Henning Hermjakob

Page 13: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Andreas Prlic

Page 14: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

• Search for a term (like "regulation") or gene name ("p53")

• See some gene lists related with cancer (Sjoblom et al. Science, 2006, Matsuoka et al. Science, 2007, etc.) and some protein lists.

Cancer

Spindle

Page 15: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Register new widgets, login and manage accounts. New “Widget Manager” web form.

Page 16: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

• Open any classified widget by clicking on their names at menu bar on the top.

• See the global information related to the query made in the "Input description panel”.

Page 17: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

BioSapiens Ontology

• Aim: Standardise DAS feature types• Developed protein feature ontology in close

collaboration with UniProt and HUPO PSI• Three main branches:

– Positional features: “Donated” terms to the existing Sequence Ontology from GO consortium

– Protein Modifications: Adopted the existing PSI MI MOD ontology

– Non-positional features: BioSapiens

• Delivered as De107.8

By Gabby Reeves

and Henning Hermjakob

Page 18: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 19: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 20: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 21: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 22: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 23: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 24: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 25: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By Ildefonso Cases

Page 26: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

CIPF Joaquin TarragaFatiGo: GO Classification AsignementsIDConverter: Ids Translator

PCB Adam HospitalMoDel : Molecular Dynamics Extended LibraryPmut: Prediction of pathological mutations

BSC Dmitry Repchevsky3D-Annotation: Domains annotation over 3D structures

CNB Natalia JimenezVisual Genomics: Gene Expression on Anatomical Atlases

Teresa ParamoGene2SNPs SNPs in HapMapGene2tagSNPs Tag SNPsGene3GADStudies Association Studies

UPF Nuria BigasCGPROP Cancer gene properties

MIPS Philip WongCorum: http://mips.gsf.de/genre/proj/corum/the Comprehensive Resource of Mammalian protein complexes

PBD Cb-Cb 8a Pawel Smialowski(Data are calculated directly from structures of biological units.).Univ Roma Alejandro Giorgetti, Tiziana Castrignano, Ildefonso Cases (CNIO)PMDB: http://mi.caspur.it/PMDB/ Protein Models database

MPI Inf. Fidel RodriguezAnotation Similarity.

EBI- Thornton David TalaveraCSA and PDB Sum: http://www.ebi.ac.uk/thornton-srv/databases/CSA/

EBI-Brazma Misha KapusheskyArrayExpress Top 5 experiments: http://www.ebi.ac.uk/microarray-as/aew/

Uni Bologna Piero Fariselli, Ildefonso Cases (CNIO)PhD-SNP:Predictor of human Deleterious Single Nucleotide Polymorphismshttp://gpcr2.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi

CBS Peter Wad Sakett (service), Ildefonso Cases (CNIO)ProtMod: Protein Modification and Transmembrane Predictions:http://www.cbs.dtu.dk/services/

UCL Corin Yates, Joathan LeesGene3D and Cath

ENSEMBL Andreas Prlic

CNIOiHop (Jose Manuel Rodríguez) Text Mining OMIM (Jose Maria Fernández) DiseaseFunCut (Jose Manuel Rodríguez) FunctionAllDomains (Ildefonso Cases) DomainsEnviro (Jaime Fernández) InteractionsSNP 3D (Ildefonso Cases) Structure and SNPsMutation Viewer (Jaime Fernández) Cancer Mutations General Framework (Angel Carro, Eduardo Andrés León)

Biosapiens Widgets

By Ildefonso Cases

Page 27: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Combining SNP3D and OMIM facilitates the study of the structural consequences of each variant (SNPs and/or mutations). IN this case the mutations “0001,R248” is clearly part of the DNA interaction site.

Comparative study with OMIN R249S, associated with Hepatocellular carcinoma is not related to DNA binding. Related with phenotypic differences ?

“Functional Residues” widgets reports S249 shows that it is involved in ligand binding.

SNP-3D widget with 1GZH structure is part of the interaction interface between P53 and P53-BP and part of the interaction with the SV40 Oncoprotein ( 2H1L structure).

“Enviro” Widget provides additional information on other interactions.

By Ildefonso Cases

Page 28: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

_Line of action 2_: Annotating with detailed manual interpretation of genes potentially associated with cancer and the mutations already detected.

The plan here is to collaborate with the Sanger Cancer Genome Project in the analysis of their list of genes. In particular in the analysis of human protein kinases in a large collection of cancers (Greenman ... Futreal and Stratton Patterns of somatic mutation in human cancer genomes. Nature. 2007 Mar 8;446(7132):153-8.).

Possible functional consequences of the mutations knowing that less than 1/3 of them are truly related with cancer. We will need here a combination of structural bioinformatics and genomics (i.e. splicing analysis, comparative genomics).

The automatic results of modelling and analysis tools will not be sufficient and we have to think in how to develop a sufficiently robust analysis framework valid for other families.

Interested people will be cancer groups in search for targets interested in the relation between cancer/genes/SNPs/mutations.

For Discussion

Page 29: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop
Page 30: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

30

Driver Vs Passenger mutationsThere are 2 different kinds of mutations that arise with the cancer cell

spread-out:–- Driver Mutations: Mutations that confer growth advantage on the cell in which they occur, are casually implicated in cancer development and have been therefore positively selected. They are by definition found in cancer cells. –- Passenger Mutations: Mutations not subject to positive selection. Present in the cell that was the progenitor of the final clonal expansion of the cancer, biologically neutral and do not confer growth advantage.

Normal Tissue Mutation Cancer

Passenger Driver (Greenman et al, Nature 2007)(Wood et al, Science 2007)

Page 31: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

31

Single Nucleotide PolymorphismsA SNP is a DNA sequence variation occurring when a single nucleotide in the genome differs between members of a species (or between paired chromosomes in an individual).

Almost all common SNPs have only two alleles, so we say they are dimorphic.

Within a population, SNPs can be assigned a minor allele frequency (the ratio of chromosomes in the population carrying the less common variant to those with the most common variant). Only mutations with a minor allele frequency of ≥ 1% (or 0.5%, depending on the dataset) are given the title "SNP". It is important to note that there are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another.

SNPs can localize everywhere in the genome: - within coding sequences of genes, - non-coding regions of genes, - intergenic regions between genes.

A SNP, within a coding sequence, in which both forms lead to the same polypeptide sequence (degeneracy of the genetic code) is termed synonymous (sometimes called a silent mutation) - if a different polypeptide sequence is produced they are non-synonymous.

SNPs that are not in protein coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of non-coding RNA.

By Jose M. G.-Izarzugaza

Page 32: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

L

xxD n 1

max

−=

Maximal distance between changes

–Cancer-related mutations from the paper by Sjöblom et al. (2006).–Ten randomly generated sets of positions–SNPs downloaded from Ensembl

By David Talavera

Page 33: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Effect of mutations: effect on functional sites

Cancer-related mutations

Random positions

Ligand-binding 17% 21%

Metal-binding 7% 7%

Nucleic Acid-binding

10% 11%

Catalytic 0% 0%

By David Talavera

Page 34: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Effect of mutations: kind of substitution

Cancer-related mutations

SNPs

Conservative changes

55.1% 55.3%

Non-conservative changes

44.9% 44.7%

By David Talavera

•Cancer mutations are not randomly distributed along the sequence; however, there is no relation with functional sites.•Cancer-related mutations don’t occur at extremely conserved positions.•Cancer-related mutations don’t seem to be more drastic than SNPs.

Page 35: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

35

Protein kinases are enzymes that modify other proteins by chemically adding phosphate groups to them (phosphorylation). Phosphorylation usually results in a functional change of the target protein (substrate) by changing enzyme activity, cellular location, or association with other proteins.

The chemical activity of a kinase involves removing a phosphate group from ATP and covalently attaching it to one of three amino acids that have a free hydroxyl group. Most kinases act on both serine and threonine, others act on tyrosine, and a number (dual specificity kinases) act on all three.

The human genome contains about 520 protein kinase genes [Manning et al, 2001] Disregulated kinase activity is a frequent cause of disease, particularly cancer, since kinases regulate cell growth,

movement and cell-death.Protein Kinase is the most commonly found domain in known cancer genes[Futreal et al, 2004]Since protein kinases have key effects on the cell, their activity is highly regulated:- by phosphorylation (sometimes auto-phosphorylation) - by binding of activator proteins or inhibitor proteins. - by binding of activator/inhibitor small molecules.- by controlling their location in the cell relative to their substrates.

Drugs which inhibit specific kinases are being developed to treat several diseases, and some are currently in clinical use, including Gleevec (imatinib, leukaemia) and Iressa (gefitinib, lung cancer).

Protein Kinases

35By Jose M. G.-Izarzugaza

Page 36: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Many Structures

InactiveActive

Kinases undergo a large articulated motion when they turn “on” and “off”

Source: Src tyrosine kinase from Protein DataBank

By Jose M. G.-Izarzugaza

Page 37: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Query Family (Kinases)

Family Members (From Kinbase)

Family Representatives (From PDB)

Feature Distribution Analysis

Multiple StructureAlignment

Get SNPs

Map SNPs onto PDBs

Mutation analysis workflow

for SNPs, very similar for Mutations

By Jose M. G.-Izarzugaza

Page 38: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Statistics on the PK PDB retrieval

Total Human Sequences in Kinbase 620

Sequences in Kinbase not Pseudogenes 516

Sequences with known Swissprot ID (asigned by BLAST) 488

Sequences with known Swissprot ID, Blast identity>95% 474

Kinases with at least one solver protein structure (PDB) 145

Human Kinase Sequences in the Multiple Seq. Alignment 266

Total Number of SNPs (Kinase Domain)

Synonymous SNPs

Non-Synonymous SNPs

569

263

306

Total Number of Mutations (Kinase Domain)

Driver Mutations

Passenger Mutations

140

73

63

By Jose M. G.-Izarzugaza

Page 39: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

TreeDet vs firedb vs conserv

By David de Juan

TreeDet vs firedb

Page 40: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

By David de Juan

Page 41: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Mean: 3.61Median: 4.68St.Dev: 3.12Xd: 1.72

Mean: 6.50Median: 6.35St.Dev: 4.16Xd: 1.24

Mean: 11.07Median: 10.26St.Dev: 7.06Xd: -0.07

Mean: 4.34Median: 4.94St.Dev: 2.58Xd: -0.89

Mean: 6.71Median: 5.57St.Dev: 3.69Xd: -0.30

Mean: 10.26Median: 9.94St.Dev: 6.32Xd: -0.78

By Jose M. G.-Izarzugaza

Driver

Passenger

Page 42: Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop

Next

- “CARGO cancer gene list” paper to be presented tomorrow with action items (scope: Cancer Research)

- Mutation analysis is still a key challenge. Creation of analysis pipelines for all proteins and for protein families (SNPs versus mutations, driver versus passenger mutations)