inbiomedvision workshop at mie 2011. victoria lópez

29
2nd Consortium Meeting, Barcelona 16th May, 2011 Victoria López Alonso PhD Medical Bioinformátics Area Instituto de Salud Carlos III Spain Bioinformatics challenges in a personalized medicine pipeline Workshop INBIOMEDvision, MIE 2011

Upload: inbiomedvision

Post on 26-Jan-2015

105 views

Category:

Technology


1 download

DESCRIPTION

INBIOMEDvision_Workshop MIE 2011

TRANSCRIPT

Page 1: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Victoria López Alonso PhDMedical Bioinformátics AreaInstituto de Salud Carlos III

Spain

Victoria López Alonso PhDMedical Bioinformátics AreaInstituto de Salud Carlos III

Spain

Bioinformatics challenges in apersonalized medicine pipelineBioinformatics challenges in apersonalized medicine pipeline

Workshop INBIOMEDvision, MIE 2011

Page 2: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Bridging gaps between Bioinformatics and MI

BMI deals with the integrative management and synergic exploitation of the wide and inter-related scope of information that is generated and needed in healthcare settings, biomedical research institutions and health-related industry.

Page 3: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Overview:Personalized medicine in current practice1- Processing large-scale genomic data2- Interpretation of functional effect of genomic variation3- Integration of systems data4- Translation into medical practice

Bioinformatics challenges for Personalized medicine

Page 4: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Personalized medicine in current practice

Translational bioinformatics utilizes computational tools for the analysis of large biological databases and to fully comprehend disease mechanisms by not only understanding the genetics and the proteomics but also by associating them with the clinical data.

Translational bioinformatics utilizes computational tools for the analysis of large biological databases and to fully comprehend disease mechanisms by not only understanding the genetics and the proteomics but also by associating them with the clinical data.

Page 5: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Advances of molecular science•Human Genome Project in 2003

Finishing the euchromatic sequence of the human genome. Nature 2004; 431 (7011): 931-945.

•Phase I HapMap project in 2005Phase II and Phase III

A haplotype map of the human genome.Nature 2005: 437(7063):1299-1320

•Encyclopedia of DNA Elements (ENCODE) project in 2007Identification and analysis of functional elements in 1% of the human genome.

Nature 2007; 447(7146):799-816

•1000 Genomes Project in 2008DNA sequences. A plan to capture human diversity in 1000 genomes.

Science 2008; 319(5863):395

$1000 Genome in …2013 ??

Page 6: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Personalized medicine in current practice

Chemotherapy medications trastuzumab and Imatinib (Gambacorti-Passerini, 2008; Hudis, 2007)

Targeted pharmacogenetic dosing algorithm is used for warfarin (International Warfarin Pharmacogenetics Consortium et al., 2009)

Incidence of adverse events for drugs Abacavir, Carbamazepine and Clozapine (Dettling et al., 2007; Ferrell and McLeod, 2008, 2002).

The inclusion of genetics in EHRs will provide risk assesment. Clinical assessment incorporating a personal genome. Ashley et al. Lancet (2010)

The inclusion of genetics in EHRs will provide risk assesment. Clinical assessment incorporating a personal genome. Ashley et al. Lancet (2010)

Page 7: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Bentley D. “Genomes for Medicine”. (2004). Nature Insight 429, p440-446

Today patient´s genetics are consulted only for few diagnoses and treatments and only in certain medical centers (cystic fribrosis , breast cancer)

Today patient´s genetics are consulted only for few diagnoses and treatments and only in certain medical centers (cystic fribrosis , breast cancer)

With easy access to a well annotated human genome an individual could adquire a genetic health profile including risk and resistance factors that could be used to guide medical decisions.

Personalized medicine in current practice

Page 8: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

1- Processing large-scale genomic data2- Interpretation of functional effect of genomic variation3- Integration of systems data4- Translation into medical practice

Bioinformatics challenges for Personalized medicine

Different informatics challenges should be addressed to create the tools to tailor medical care to each individual genome and also to realize the potential of personalized medicine

Page 9: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

SNPs (Single Point Polymorphims) are key enablers in realizing the concept of personalized medicine.

Sequencing technologies are becoming accessibleWhole genome < 2 weeks1 error per 100 kb-------30.000 erroneous variant calls

The error rate of these technologies is a source of significant challenges in applications, including discovering novel variants

1-Processing large-scale genomic data

SNP: frequency in the human population is higher than 1%

Page 10: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

100.000 and 300.000 previously undiscovered SNPsVariant discovery---”needle in a haystack”

Verification of novel variants due to the false positive rate

In addition there are other important classes of variations for clinical applications:

short insertion–deletion variants (indels), copy number variants (CNVs) structural variants (SVs)

1-Processing large-scale genomic data

New algorithms to detect these variations from sequencing data

Page 11: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

1- Processing large-scale genomic data

High quality sequence reads must be placed into their genomic context to identify variants.

The challenge is to develop new algorithms to do the “novo assembly” computationally possible.De novo assembly is slow and complicated by repetitive elements.

Sequences are mapped to a genomic reference sequence:BLAST have been traditionally used, but their execution speed depends on the genome size.

Page 12: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

1- Processing large-scale genomic data

New Mapping and alignment algorithmsBLAT indexed version of the genome (Kent, 2002). Burrows-Wheeler Aligner (BWA) (Li and Homer, 2010).

Ideally performed in a cluster or by using cloud computingProgram must allow for mismatches without resulting in false alignments Improving of quality control metrics: ratios of base transition, Mendelian inheritance errors (MIE), relative quality scores…

Page 13: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

2- Interpretation of functional effect

After genomic data has been processed, the functional effect and the impact of the genetic variation must be analyzed Genome-wide association studies (GWASs) have been used to assess the statistical associations of SNPs with many important common diseases. GWAS provides new insights but only a limited number of variants have been characterized and understanding the functional relationship between variants and phenotypes.

https://www.wtccc.org.uk

Page 14: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

2- Interpretation of functional effect

Important issues for predicting the effect of SNPs are data management, retrieval and quality control.

SNP databases:•The dbSNP database (20 millions of validated SNPs)•The Human Gene Mutation Database (HGMD) (SNPs associated with diseases)•SwissVar•Online Mendelian Inheritance in Man (OMIM) database•PharmGKB database•Catalogue of Somatic Mutations in Cancer (COSMID)

Number of known SNPsFernald et al. 2011

Page 15: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

2-Interpretation of functional effectComputational methods to predict mSNPs:

•Empirical rules (Ng and Henikoff, 2003; Ramensky et al., 2002),

•Hidden Markov Models (HMMs) (Thomas and Kejariwal, 2004),

•Neural Networks (Bromberg et al., 2008; Ferrer-Costa et al., 2005),

•Decision Trees (Dobson et al., 2006; Krishnan and Westhead, 2003),

•Random Forests (Li et al., 2009; Wainreb et al., 2010)

•Support Vector Machines (Calabrese et al., 2009).

The prediction algorithms input features include:•amino acid sequence •protein structure •evolutionary information

Page 16: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

New algorithms that include knowledge-based information are being developed on evolutionary information for the prediction of SNPs:

•PANTHER uses a library of protein family HMM. http://www.pantherdb.org/•PolyPhen uses different sequence-based features.http://genetics.bwh.harvard.edu/pph•MutPred evaluates the probabilities of gain or loss of structure and function upon mutations using random forest. http://mutdb.org/mutpred •SIFT uses a multiple sequence alignment between homolog proteins. http://sift.jcvi.org •SNAP Sequence http://rostlab.org/services/snap

•SNPEffect http://snpeffect.vib.be•SNPs3D Structure-based SVM predictorhttp://www.snps3d.org

New algorithms that include knowledge-based information are being developed on evolutionary information for the prediction of SNPs:

•PANTHER uses a library of protein family HMM. http://www.pantherdb.org/•PolyPhen uses different sequence-based features.http://genetics.bwh.harvard.edu/pph•MutPred evaluates the probabilities of gain or loss of structure and function upon mutations using random forest. http://mutdb.org/mutpred •SIFT uses a multiple sequence alignment between homolog proteins. http://sift.jcvi.org •SNAP Sequence http://rostlab.org/services/snap

•SNPEffect http://snpeffect.vib.be•SNPs3D Structure-based SVM predictorhttp://www.snps3d.org

2- Interpretation of functional effect

Page 17: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

2- Interpretation of functional effectExperimental test are required to validate genetic predictions. There are is a need for fast and accurate methods for gene prioritization

Eleftherohorinou et al., 2010Eleftherohorinou et al., 2010

Currently the most effective strategy uses the concept of genes that are linked to the biological process of interest.The input data for gene priorization is the functional annotation, the protein–protein interactions, biological pathways and literature.

Page 18: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

2- Interpretation of functional effect

•SUSPECT: sequence features, gene expression data, functional terms…•ToppGene: mouse phenotype data with human gene annotations and literature•MedSim: human disease genes with mouse genes•ENDEAVOUR: genes involved in a known biological process•G2D and PolySearch data mining on biological databases•MimMiner: text mining comparing the human phenome and disease phenotypes• PhenoPred : uses protein sequence and function •GeneMANIA : uses functional assays

The Gene Priorization Portal provides comprehensive descriptions of available predictors:

http://homes.esat.kuleuven.be/~bioiuser/gpp/index.php

Page 19: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

2- Interpretation of functional effect

Last year, the first edition of the Critical Assessment of Genome Interpretation (CAGI) was organized to assess the available methods for predicting phenotypic impact of genomic variation and to stimulate future research.

http://genomeinterpretation.org/

Page 20: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

3-Integration of systems data

There is concern that pharmacogenomics GWAS themselves are susceptible to many limitations:

insufficient sample size, selection biases for genetic variants and environmental interactions may affect the outcome measuresMultiple gene–gene interactions may underlie unexplained. HapMap Project, 2004 HapMap Project, 2004

Page 21: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

3- Integration of systems dataModel Selection Methods have been successful with disease and trait GWAS studies using selection techniques to choose multifactorial models that balance the false positive rate, statistical power and computational requirements of the search

Dimensionality reduction methods•Principal Components Analysis•Information Gain and •Multifactor Dimensionality Reduction (ie. hypertension and familial amyloid polyneuropathy type I)

Ritchie and Monsimger, 2010Ritchie and Monsimger, 2010

Page 22: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

3- Integration of systems data

Naylor and Chen, 2010Naylor and Chen, 2010

No external knowledge sources informs about the biology behind the interactions.

Systems biology and network approaches address to the problem of complexity integrating molecular data at multiple levels of biology including genomes, transcriptomes, metabolomes, proteomes and functional and regulatory networks.

Page 23: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

3- Integration of systems data

The simple “one SNP, one phenotype” approach is insufficient.Most medically relevant phenotypes are thought to be the result of gene–gene and gene–environment interactions

Adeyemo et al., 2010Adeyemo et al., 2010

Page 24: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

3- Integration of systems data

Limdi and Veenstra , 2008Limdi and Veenstra , 2008

Drug response often depends on multiple pharmacokinetic and pharmacodynamic interactions .

Some success: studies of warfarin have linked the majority of variation in response to two genes, CYP2C9 and VKORC1. Improved dosing algorithm.

Drug response often depends on multiple pharmacokinetic and pharmacodynamic interactions .

Some success: studies of warfarin have linked the majority of variation in response to two genes, CYP2C9 and VKORC1. Improved dosing algorithm.

Page 25: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011Goh et al., 2007 Goh et al., 2007

3- Integration of systems data

•Disease–Gene Networks •Chemical structures, Diseases and Protein sequences •Epigenetic data and Drug Phenotypes•Pathways and Gene sets

Gene Set Enrichment Analysis (GSEA) SNP Ratio Test The Prioritizing Risk Pathways method

Assumptions must also be examined carefully ¡¡¡

•Disease–Gene Networks •Chemical structures, Diseases and Protein sequences •Epigenetic data and Drug Phenotypes•Pathways and Gene sets

Gene Set Enrichment Analysis (GSEA) SNP Ratio Test The Prioritizing Risk Pathways method

Assumptions must also be examined carefully ¡¡¡

Combining disparate data sources can result in novel associations

Page 26: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

4- Translation into medical practiceMuch of this research has yet to be translated to the clinic for improved patient care. One of the areas where bioinformatics can have the greatest clinical impact is pharmacogenomics improving drug prescription and dosing.Pharmacogenomic prescription and dosing algorithms need to be accessible to physicians.

Much of this research has yet to be translated to the clinic for improved patient care. One of the areas where bioinformatics can have the greatest clinical impact is pharmacogenomics improving drug prescription and dosing.Pharmacogenomic prescription and dosing algorithms need to be accessible to physicians. Martin-Sanchez et al. 2006

Warfarindosing could save up to 60% of the cost and reduce possible adverse events

Page 27: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Medical practice needs to be updated to include routine pharmacogenetic testing, educating and training physicians in personalized medicine, and futher clinical trials to prove the efficacy of predictions

Bioinformatics also translates discoveries to the clinic by disseminating discoveries through curated, searchable databases

Medical practice needs to be updated to include routine pharmacogenetic testing, educating and training physicians in personalized medicine, and futher clinical trials to prove the efficacy of predictions

Bioinformatics also translates discoveries to the clinic by disseminating discoveries through curated, searchable databases

4-Translation into medical practice

http://pacdb.org/http://pacdb.org/

http://www.pharmgkb.org/http://www.pharmgkb.org/

The database of Genotypes and Phenotypes

The Pharmakogenomics Knowledge Database

Pharmacogenetics-Cell line database

www.ncbi.nlm.nih.gov/gapwww.ncbi.nlm.nih.gov/gap

The Adverse Event Reporting System (AERS)www.fda.gov/Drugs/www.fda.gov/Drugs/

Page 28: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

Biologically and medically focused text mining algorithms can speed the collection of this structured data, such as methods that use sentence syntax and natural language processing to derive drug–gene and gene–gene interactions from scientific literature.

Opportunities for bioinformatics to integrate with the electronic medical record (EMR)

Biologically and medically focused text mining algorithms can speed the collection of this structured data, such as methods that use sentence syntax and natural language processing to derive drug–gene and gene–gene interactions from scientific literature.

Opportunities for bioinformatics to integrate with the electronic medical record (EMR)

4- Translation into medical practice

www.mc.vanderbilt.edu/www.mc.vanderbilt.edu/

www.phenx.org/ www.phenx.org/

BioBank system at VanderbiltBioBank system at Vanderbilt

RTI International with NHGRIRTI International with NHGRI

Page 29: INBIOMEDvision Workshop at MIE 2011. Victoria López

2nd Consortium Meeting, Barcelona 16th May, 2011

http://biotic.isciii.es/

[email protected]

Instituto de Salud Carlos IIIMedical Bioinformatics Area

Thanks ¡¡¡

http://www.inbiomedvision.eu/