inbiomedvision workshop at mie 2011. victoria lópez
DESCRIPTION
INBIOMEDvision_Workshop MIE 2011TRANSCRIPT
2nd Consortium Meeting, Barcelona 16th May, 2011
Victoria López Alonso PhDMedical Bioinformátics AreaInstituto de Salud Carlos III
Spain
Victoria López Alonso PhDMedical Bioinformátics AreaInstituto de Salud Carlos III
Spain
Bioinformatics challenges in apersonalized medicine pipelineBioinformatics challenges in apersonalized medicine pipeline
Workshop INBIOMEDvision, MIE 2011
2nd Consortium Meeting, Barcelona 16th May, 2011
Bridging gaps between Bioinformatics and MI
BMI deals with the integrative management and synergic exploitation of the wide and inter-related scope of information that is generated and needed in healthcare settings, biomedical research institutions and health-related industry.
2nd Consortium Meeting, Barcelona 16th May, 2011
Overview:Personalized medicine in current practice1- Processing large-scale genomic data2- Interpretation of functional effect of genomic variation3- Integration of systems data4- Translation into medical practice
Bioinformatics challenges for Personalized medicine
2nd Consortium Meeting, Barcelona 16th May, 2011
Personalized medicine in current practice
Translational bioinformatics utilizes computational tools for the analysis of large biological databases and to fully comprehend disease mechanisms by not only understanding the genetics and the proteomics but also by associating them with the clinical data.
Translational bioinformatics utilizes computational tools for the analysis of large biological databases and to fully comprehend disease mechanisms by not only understanding the genetics and the proteomics but also by associating them with the clinical data.
2nd Consortium Meeting, Barcelona 16th May, 2011
Advances of molecular science•Human Genome Project in 2003
Finishing the euchromatic sequence of the human genome. Nature 2004; 431 (7011): 931-945.
•Phase I HapMap project in 2005Phase II and Phase III
A haplotype map of the human genome.Nature 2005: 437(7063):1299-1320
•Encyclopedia of DNA Elements (ENCODE) project in 2007Identification and analysis of functional elements in 1% of the human genome.
Nature 2007; 447(7146):799-816
•1000 Genomes Project in 2008DNA sequences. A plan to capture human diversity in 1000 genomes.
Science 2008; 319(5863):395
$1000 Genome in …2013 ??
2nd Consortium Meeting, Barcelona 16th May, 2011
Personalized medicine in current practice
Chemotherapy medications trastuzumab and Imatinib (Gambacorti-Passerini, 2008; Hudis, 2007)
Targeted pharmacogenetic dosing algorithm is used for warfarin (International Warfarin Pharmacogenetics Consortium et al., 2009)
Incidence of adverse events for drugs Abacavir, Carbamazepine and Clozapine (Dettling et al., 2007; Ferrell and McLeod, 2008, 2002).
The inclusion of genetics in EHRs will provide risk assesment. Clinical assessment incorporating a personal genome. Ashley et al. Lancet (2010)
The inclusion of genetics in EHRs will provide risk assesment. Clinical assessment incorporating a personal genome. Ashley et al. Lancet (2010)
2nd Consortium Meeting, Barcelona 16th May, 2011
Bentley D. “Genomes for Medicine”. (2004). Nature Insight 429, p440-446
Today patient´s genetics are consulted only for few diagnoses and treatments and only in certain medical centers (cystic fribrosis , breast cancer)
Today patient´s genetics are consulted only for few diagnoses and treatments and only in certain medical centers (cystic fribrosis , breast cancer)
With easy access to a well annotated human genome an individual could adquire a genetic health profile including risk and resistance factors that could be used to guide medical decisions.
Personalized medicine in current practice
2nd Consortium Meeting, Barcelona 16th May, 2011
1- Processing large-scale genomic data2- Interpretation of functional effect of genomic variation3- Integration of systems data4- Translation into medical practice
Bioinformatics challenges for Personalized medicine
Different informatics challenges should be addressed to create the tools to tailor medical care to each individual genome and also to realize the potential of personalized medicine
2nd Consortium Meeting, Barcelona 16th May, 2011
SNPs (Single Point Polymorphims) are key enablers in realizing the concept of personalized medicine.
Sequencing technologies are becoming accessibleWhole genome < 2 weeks1 error per 100 kb-------30.000 erroneous variant calls
The error rate of these technologies is a source of significant challenges in applications, including discovering novel variants
1-Processing large-scale genomic data
SNP: frequency in the human population is higher than 1%
2nd Consortium Meeting, Barcelona 16th May, 2011
100.000 and 300.000 previously undiscovered SNPsVariant discovery---”needle in a haystack”
Verification of novel variants due to the false positive rate
In addition there are other important classes of variations for clinical applications:
short insertion–deletion variants (indels), copy number variants (CNVs) structural variants (SVs)
1-Processing large-scale genomic data
New algorithms to detect these variations from sequencing data
2nd Consortium Meeting, Barcelona 16th May, 2011
1- Processing large-scale genomic data
High quality sequence reads must be placed into their genomic context to identify variants.
The challenge is to develop new algorithms to do the “novo assembly” computationally possible.De novo assembly is slow and complicated by repetitive elements.
Sequences are mapped to a genomic reference sequence:BLAST have been traditionally used, but their execution speed depends on the genome size.
2nd Consortium Meeting, Barcelona 16th May, 2011
1- Processing large-scale genomic data
New Mapping and alignment algorithmsBLAT indexed version of the genome (Kent, 2002). Burrows-Wheeler Aligner (BWA) (Li and Homer, 2010).
Ideally performed in a cluster or by using cloud computingProgram must allow for mismatches without resulting in false alignments Improving of quality control metrics: ratios of base transition, Mendelian inheritance errors (MIE), relative quality scores…
2nd Consortium Meeting, Barcelona 16th May, 2011
2- Interpretation of functional effect
After genomic data has been processed, the functional effect and the impact of the genetic variation must be analyzed Genome-wide association studies (GWASs) have been used to assess the statistical associations of SNPs with many important common diseases. GWAS provides new insights but only a limited number of variants have been characterized and understanding the functional relationship between variants and phenotypes.
https://www.wtccc.org.uk
2nd Consortium Meeting, Barcelona 16th May, 2011
2- Interpretation of functional effect
Important issues for predicting the effect of SNPs are data management, retrieval and quality control.
SNP databases:•The dbSNP database (20 millions of validated SNPs)•The Human Gene Mutation Database (HGMD) (SNPs associated with diseases)•SwissVar•Online Mendelian Inheritance in Man (OMIM) database•PharmGKB database•Catalogue of Somatic Mutations in Cancer (COSMID)
Number of known SNPsFernald et al. 2011
2nd Consortium Meeting, Barcelona 16th May, 2011
2-Interpretation of functional effectComputational methods to predict mSNPs:
•Empirical rules (Ng and Henikoff, 2003; Ramensky et al., 2002),
•Hidden Markov Models (HMMs) (Thomas and Kejariwal, 2004),
•Neural Networks (Bromberg et al., 2008; Ferrer-Costa et al., 2005),
•Decision Trees (Dobson et al., 2006; Krishnan and Westhead, 2003),
•Random Forests (Li et al., 2009; Wainreb et al., 2010)
•Support Vector Machines (Calabrese et al., 2009).
The prediction algorithms input features include:•amino acid sequence •protein structure •evolutionary information
2nd Consortium Meeting, Barcelona 16th May, 2011
New algorithms that include knowledge-based information are being developed on evolutionary information for the prediction of SNPs:
•PANTHER uses a library of protein family HMM. http://www.pantherdb.org/•PolyPhen uses different sequence-based features.http://genetics.bwh.harvard.edu/pph•MutPred evaluates the probabilities of gain or loss of structure and function upon mutations using random forest. http://mutdb.org/mutpred •SIFT uses a multiple sequence alignment between homolog proteins. http://sift.jcvi.org •SNAP Sequence http://rostlab.org/services/snap
•SNPEffect http://snpeffect.vib.be•SNPs3D Structure-based SVM predictorhttp://www.snps3d.org
New algorithms that include knowledge-based information are being developed on evolutionary information for the prediction of SNPs:
•PANTHER uses a library of protein family HMM. http://www.pantherdb.org/•PolyPhen uses different sequence-based features.http://genetics.bwh.harvard.edu/pph•MutPred evaluates the probabilities of gain or loss of structure and function upon mutations using random forest. http://mutdb.org/mutpred •SIFT uses a multiple sequence alignment between homolog proteins. http://sift.jcvi.org •SNAP Sequence http://rostlab.org/services/snap
•SNPEffect http://snpeffect.vib.be•SNPs3D Structure-based SVM predictorhttp://www.snps3d.org
2- Interpretation of functional effect
2nd Consortium Meeting, Barcelona 16th May, 2011
2- Interpretation of functional effectExperimental test are required to validate genetic predictions. There are is a need for fast and accurate methods for gene prioritization
Eleftherohorinou et al., 2010Eleftherohorinou et al., 2010
Currently the most effective strategy uses the concept of genes that are linked to the biological process of interest.The input data for gene priorization is the functional annotation, the protein–protein interactions, biological pathways and literature.
2nd Consortium Meeting, Barcelona 16th May, 2011
2- Interpretation of functional effect
•SUSPECT: sequence features, gene expression data, functional terms…•ToppGene: mouse phenotype data with human gene annotations and literature•MedSim: human disease genes with mouse genes•ENDEAVOUR: genes involved in a known biological process•G2D and PolySearch data mining on biological databases•MimMiner: text mining comparing the human phenome and disease phenotypes• PhenoPred : uses protein sequence and function •GeneMANIA : uses functional assays
The Gene Priorization Portal provides comprehensive descriptions of available predictors:
http://homes.esat.kuleuven.be/~bioiuser/gpp/index.php
2nd Consortium Meeting, Barcelona 16th May, 2011
2- Interpretation of functional effect
Last year, the first edition of the Critical Assessment of Genome Interpretation (CAGI) was organized to assess the available methods for predicting phenotypic impact of genomic variation and to stimulate future research.
http://genomeinterpretation.org/
2nd Consortium Meeting, Barcelona 16th May, 2011
3-Integration of systems data
There is concern that pharmacogenomics GWAS themselves are susceptible to many limitations:
insufficient sample size, selection biases for genetic variants and environmental interactions may affect the outcome measuresMultiple gene–gene interactions may underlie unexplained. HapMap Project, 2004 HapMap Project, 2004
2nd Consortium Meeting, Barcelona 16th May, 2011
3- Integration of systems dataModel Selection Methods have been successful with disease and trait GWAS studies using selection techniques to choose multifactorial models that balance the false positive rate, statistical power and computational requirements of the search
Dimensionality reduction methods•Principal Components Analysis•Information Gain and •Multifactor Dimensionality Reduction (ie. hypertension and familial amyloid polyneuropathy type I)
Ritchie and Monsimger, 2010Ritchie and Monsimger, 2010
2nd Consortium Meeting, Barcelona 16th May, 2011
3- Integration of systems data
Naylor and Chen, 2010Naylor and Chen, 2010
No external knowledge sources informs about the biology behind the interactions.
Systems biology and network approaches address to the problem of complexity integrating molecular data at multiple levels of biology including genomes, transcriptomes, metabolomes, proteomes and functional and regulatory networks.
2nd Consortium Meeting, Barcelona 16th May, 2011
3- Integration of systems data
The simple “one SNP, one phenotype” approach is insufficient.Most medically relevant phenotypes are thought to be the result of gene–gene and gene–environment interactions
Adeyemo et al., 2010Adeyemo et al., 2010
2nd Consortium Meeting, Barcelona 16th May, 2011
3- Integration of systems data
Limdi and Veenstra , 2008Limdi and Veenstra , 2008
Drug response often depends on multiple pharmacokinetic and pharmacodynamic interactions .
Some success: studies of warfarin have linked the majority of variation in response to two genes, CYP2C9 and VKORC1. Improved dosing algorithm.
Drug response often depends on multiple pharmacokinetic and pharmacodynamic interactions .
Some success: studies of warfarin have linked the majority of variation in response to two genes, CYP2C9 and VKORC1. Improved dosing algorithm.
2nd Consortium Meeting, Barcelona 16th May, 2011Goh et al., 2007 Goh et al., 2007
3- Integration of systems data
•Disease–Gene Networks •Chemical structures, Diseases and Protein sequences •Epigenetic data and Drug Phenotypes•Pathways and Gene sets
Gene Set Enrichment Analysis (GSEA) SNP Ratio Test The Prioritizing Risk Pathways method
Assumptions must also be examined carefully ¡¡¡
•Disease–Gene Networks •Chemical structures, Diseases and Protein sequences •Epigenetic data and Drug Phenotypes•Pathways and Gene sets
Gene Set Enrichment Analysis (GSEA) SNP Ratio Test The Prioritizing Risk Pathways method
Assumptions must also be examined carefully ¡¡¡
Combining disparate data sources can result in novel associations
2nd Consortium Meeting, Barcelona 16th May, 2011
4- Translation into medical practiceMuch of this research has yet to be translated to the clinic for improved patient care. One of the areas where bioinformatics can have the greatest clinical impact is pharmacogenomics improving drug prescription and dosing.Pharmacogenomic prescription and dosing algorithms need to be accessible to physicians.
Much of this research has yet to be translated to the clinic for improved patient care. One of the areas where bioinformatics can have the greatest clinical impact is pharmacogenomics improving drug prescription and dosing.Pharmacogenomic prescription and dosing algorithms need to be accessible to physicians. Martin-Sanchez et al. 2006
Warfarindosing could save up to 60% of the cost and reduce possible adverse events
2nd Consortium Meeting, Barcelona 16th May, 2011
Medical practice needs to be updated to include routine pharmacogenetic testing, educating and training physicians in personalized medicine, and futher clinical trials to prove the efficacy of predictions
Bioinformatics also translates discoveries to the clinic by disseminating discoveries through curated, searchable databases
Medical practice needs to be updated to include routine pharmacogenetic testing, educating and training physicians in personalized medicine, and futher clinical trials to prove the efficacy of predictions
Bioinformatics also translates discoveries to the clinic by disseminating discoveries through curated, searchable databases
4-Translation into medical practice
http://pacdb.org/http://pacdb.org/
http://www.pharmgkb.org/http://www.pharmgkb.org/
The database of Genotypes and Phenotypes
The Pharmakogenomics Knowledge Database
Pharmacogenetics-Cell line database
www.ncbi.nlm.nih.gov/gapwww.ncbi.nlm.nih.gov/gap
The Adverse Event Reporting System (AERS)www.fda.gov/Drugs/www.fda.gov/Drugs/
2nd Consortium Meeting, Barcelona 16th May, 2011
Biologically and medically focused text mining algorithms can speed the collection of this structured data, such as methods that use sentence syntax and natural language processing to derive drug–gene and gene–gene interactions from scientific literature.
Opportunities for bioinformatics to integrate with the electronic medical record (EMR)
Biologically and medically focused text mining algorithms can speed the collection of this structured data, such as methods that use sentence syntax and natural language processing to derive drug–gene and gene–gene interactions from scientific literature.
Opportunities for bioinformatics to integrate with the electronic medical record (EMR)
4- Translation into medical practice
www.mc.vanderbilt.edu/www.mc.vanderbilt.edu/
www.phenx.org/ www.phenx.org/
BioBank system at VanderbiltBioBank system at Vanderbilt
RTI International with NHGRIRTI International with NHGRI
2nd Consortium Meeting, Barcelona 16th May, 2011
http://biotic.isciii.es/
Instituto de Salud Carlos IIIMedical Bioinformatics Area
Thanks ¡¡¡
http://www.inbiomedvision.eu/