computational tools for disease gene identification sonia abdelhak, phd molecular investigation of...

88
Computational tools Computational tools for disease gene for disease gene identification identification Sonia ABDELHAK, PhD Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis Institut Pasteur de Tunis

Post on 19-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Computational tools for Computational tools for disease gene identificationdisease gene identification

Sonia ABDELHAK, PhDSonia ABDELHAK, PhDMolecular Investigation of Genetic Orphan DisordersMolecular Investigation of Genetic Orphan Disorders

Institut Pasteur de TunisInstitut Pasteur de Tunis

Page 2: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

SummarySummary

How could we identify genes involved in How could we identify genes involved in human disorders?human disorders?Positional cloning in the pre-genomic era.Positional cloning in the pre-genomic era.Monogenic/multifactorial diseases.Monogenic/multifactorial diseases.

Computational tools: Positional cloning in Computational tools: Positional cloning in the post genomic era.the post genomic era.

Page 3: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Hem

ophi

liaFam

ilial

Col

on o

r

Breas

t Can

cer

Alz

heim

er’s

Ast

hma

Skin

Can

cer

Mot

or V

ehic

le

Acc

iden

t

Car

diov

ascu

lar

Dise

ase

Monogenic versus Complex Diseases : Genes & Environment

Environmental Effect

Genetic Component

Schi

zoph

reni

a

Cys

tic F

ibro

sis

Stro

ke

Type 2

Dia

bete

s

Lung

Can

cer

Bipol

ar D

isord

er

S.K. Brahmachari, GENOMED-HEALTH meeting

Page 4: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

What could we learn from disease gene What could we learn from disease gene identification?identification?

Better understanding of the underlying biology Better understanding of the underlying biology of the trait in questionof the trait in question

Serve as direct targets for better treatmentsServe as direct targets for better treatmentsPharmacogeneticsPharmacogenetics InterventionsInterventions

Predictions of susceptibility to the diseasePredictions of susceptibility to the diseasePredictions of the course of the diseasePredictions of the course of the diseaseKnowledge for treatment or preventionKnowledge for treatment or prevention

Page 5: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

““SIMPLE” MENDELIAN SIMPLE” MENDELIAN GENETIC DISEASESGENETIC DISEASES

Diseases of Simple Genetic ArchitectureDiseases of Simple Genetic ArchitectureCan tell how trait is passed in a family: follows Can tell how trait is passed in a family: follows

a recognizable pattern (Mendelian disease)a recognizable pattern (Mendelian disease)One gene altered per family (exceptions)One gene altered per family (exceptions)Usually quite rare in population (exceptions)Usually quite rare in population (exceptions) ““Causative” geneCausative” gene

Page 6: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Some examples of deleterious mutations

Stop codon creationCAG GlnTAG

Page 7: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

•X linked•Duchenne muscular dystrophyDuchenne muscular dystrophy

Modes of inheritance

Page 8: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

•Autosomal dominant•Huntington diseaseHuntington disease

Page 9: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

•Autosomal recessive•Cystic fibrosisCystic fibrosis

Page 10: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

MitochondrialMitochondrialLeber Optic atrophyLeber Optic atrophy

C

Page 11: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Disease

Function/Protein

Gene

Chromosomal localisation

Disease

Function/Protein

Gene

Chromosomal localisation

Functional cloning versus positional cloning of genes

Page 12: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Position-Independent MethodsPosition-Independent Methods.. Gene-specific Gene-specific

oligonucleotidesoligonucleotides: : hemophilia A Factor VIII hemophilia A Factor VIII gene (most common form gene (most common form of hemophilia, X-linked)of hemophilia, X-linked)Clotting factor purified Clotting factor purified

from pig, and its N-from pig, and its N-terminal amino acids terminal amino acids were sequenced. were sequenced.

This allowed a group of This allowed a group of oligonucleotides to be oligonucleotides to be synthesized. synthesized.

These probes were These probes were used with colony used with colony hybridization against a hybridization against a cDNA library.cDNA library.

Page 13: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Disease

Function/Protein

Gene

Chromosomal localisation

Disease

Function/Protein

Gene

Chromosomal localisation

Positional cloning of genes

Page 14: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

... CCT GAG GAG ... ... CCT GTG GAG ...

... Pro Glu Glu ... ... Pro Val Glu ...

normal muté

Genetic mapping

Physical mapping

Identification of coding sequences(candidate genes)

Mutation screening

Functional analysis

Identification of informative families

Page 15: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Genetic mapping

What are the markers that are used for genetic mapping

Page 16: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Polymorphisms used in Polymorphisms used in Gene MappingGene Mapping

1980s – RFLP marker maps1980s – RFLP marker maps1990s – microsatellite marker maps1990s – microsatellite marker maps

Page 17: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Identification de Polymorphismes de type microsatellites par analyse de séquence:

tggtggcagaaatcattgtctgaaaagtaattgttttacttttattcttttcgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgcatgtgccagatttcttgtttgaaaggcaatgagcttcatccaagtatcaa  

IL-12p35AC F

IL-12p35AC R

atttcaggtgtgagccactgtgcctggccagaactttttcaatgaatattcaagataattgtatacacattttatatatatatatatatatacacacacacacacacacacatatgtatacacacattatatatataatccatgttatatacatctctacattatatatatccactatatatattttacttatacatatagattttatttttatgaactaggatcaaattgta

IL-12p40AC F

IL-12p40AC R

78.57%

69.23%

174170166

1 2 3 4 5

Page 18: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

SNPs in Genetic AnalysisSNPs in Genetic Analysis

Abundance – lots Abundance – lots Position – throughout genomePosition – throughout genomeHaplotype patterns – groups of SNPs may Haplotype patterns – groups of SNPs may

provide exploitable diversityprovide exploitable diversityRapid and efficient to genotypeRapid and efficient to genotype Increased stability over other types of Increased stability over other types of

mutationmutation

Page 19: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Gene mapping: Linkage analysis

Do marker alleles co-segregate with the disease by chanceor are there linked to the underlying gene?

Page 20: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Crossing over and RecombinationCrossing over and Recombination

Page 21: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Recombination FractionRecombination Fraction

= ½ : independent assortment (Mendel)= ½ : independent assortment (Mendel)

< ½ : linked loci< ½ : linked loci

= 0 : tightly linked loci (no recombination)= 0 : tightly linked loci (no recombination)

Page 22: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

LOD Score AnalysisLOD Score AnalysisThe likelihood ratio as defined by Morton (1955):

L(pedigree| = x) L(pedigree | = 0.50)

where represents the recombination fraction and where 0 x 0.49.

When all meioses are “scorable”, the LR is constructed as:

L.R. = N

NRR

)5.0(

))1((

The LOD score (z) is the log10 (L.R.)

: z() is the lod score at a particular valueof the recombination fraction: z() is the maximum lod score, which occurs at the MLE of the recombinationfraction

H1: Linkage H0: Exclusion =0

Page 23: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 24: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

... CCT GAG GAG ... ... CCT GTG GAG ...

... Pro Glu Glu ... ... Pro Val Glu ...

normal muté

Cytogenetic anomaliesAnimal model

Genetic mapping

Physical mapping

Identification of coding sequences(candidate genes)

Mutation screening

Functional analysis

Identification of informative families

1 to 10 years!

Functional candidategenes

Page 25: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Branchio-oto-renal syndromeBranchio-oto-renal syndrome

Clinical features: Clinical features: deafness, renal deafness, renal anomalies, cervical anomalies, cervical cysts…cysts…

Mapped to 8q13. Mapped to 8q13.

PAC contig 11083 9480 4405 10910

cDNA library screening, cDNA selection and exon trapping

Page 26: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

T7 T3

Sequence assemble and analysis

Sequencing T7, T3

Selection of clones

subcloning in pBCSK+

Sonication or partialdigestion

PAC (P1 derived)

Page 27: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Quality assessment

Elimination of contaminating sequencesBlastn against vector, bacteria, yeast… databases

Assemble using Phred, Phrap, Consed

Identification of candidate genes by blastx and tblastx,Gene prediction tool: GRAIL

The different steps used for sequence analysis

A G C T A T

Page 28: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

BLASTX 1.4.7 [19-Dec-94] [Build 07:11:56 Jun 16 1995]  Query= w1g9t7.Seq (743 letters)  Translating both strands of query sequence in all 6 reading frames Database: ../../databases/fasta/nrprot 244,544 sequences; 71,258,360 total letters.Searching..................................................done  Smallest Sum Reading High ProbabilitySequences producing High-scoring Segment Pairs: Frame Score P(N) N pir|S|A45174 eyes absent (eya) protein (alternatively... -2 173 5.6e-15 1  >pir|S|A45174 eyes absent (eya) protein (alternatively spliced) - fruit fly (Drosophila melanogaster) >gp||DRONOEYE_ Length = 760  Minus Strand HSPs:  Score = 173 (79.6 bits), Expect = 5.6e-15, P = 5.6e-15 Identities = 29/36 (80%), Positives = 34/36 (94%), Frame = -2 Query: 169 LCLPXGVRGGVDWMRKLAFRYRRVKEIYNTYKNNVG 62 LCLP GVRGGVDWMRKLAFRYR++K+IYN+Y+ NVGSbjct: 586 LCLPTGVRGGVDWMRKLAFRYRKIKDIYNSYRGNVG 621  

11083 9480 4405 10910

Page 29: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

1 2 3 4 5 6 7 8 9-1 1' 10 1112

1314

15 16

II III IV V VI VII VIII IX X XI XIVXIII

XVXII-I I I'

EYA1 gene structure

Identification of a new gene family EYA1, EYA2, EYA3, ….

Page 30: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

COMPLEX (MULTIFACTORIAL) COMPLEX (MULTIFACTORIAL) GENETIC DISEASEGENETIC DISEASE

Diseases of Complex Genetic ArchitectureDiseases of Complex Genetic ArchitectureNo clear pattern of inheritanceNo clear pattern of inheritanceModerate to strong evidence of being Moderate to strong evidence of being

inheritedinheritedCommon in population: cancer, heart disease, Common in population: cancer, heart disease,

dementia etc.dementia etc. Involves many genes and environmentInvolves many genes and environment ““Susceptibility” genesSusceptibility” genes

Page 31: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Large Families Small FamiliesLinkage Analysis

Association Studies

Family-Based Case-Control

Complex disease loci mapping

Page 32: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 33: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Large Families Small FamiliesLinkage Analysis

Association Studies

Family-Based Case-Control

Study Designs

Page 34: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

TDT calculationTDT calculation

AA BB

CC DD

Transmitted

Non

-Tra

nsm

itte

d

12 12

11

1 2

21

(B-C)2

TDT= (B+C)

With > 5 per cell, this followsa 2 distribution with 1 df

Page 35: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Examples: Alzheimer’s Examples: Alzheimer’s

Alzheimer’s disease and ApoEAlzheimer’s disease and ApoE

E4 presentE4 present E4 absentE4 absent

PatientsPatients 5858 3333

ControlsControls 1616 5555

The E4 allele appears to be positively associated with Alzheimer’s disease:

Odds Ratio = (58/16)/(33/55) = 6

Page 36: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

« Finished » sequence April 1953-April 2003

February 2001

Page 37: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

... CCT GAG GAG ... ... CCT GTG GAG ...

... Pro Glu Glu ... ... Pro Val Glu ...

normal muté

Genetic mapping

Physical mapping

Identification of coding sequences(candidate genes)

Mutation screening

Functional analysis

Identification of informative families

Page 38: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Past and present toolsPast and present tools

Genetic mappingGenetic mapping Physical mappingPhysical mapping Cytogenetic Cytogenetic

abnormalitiesabnormalities Animal modelsAnimal models Positional and Positional and

functional candidatesfunctional candidates

Genome databases Genome databases and genome browsersand genome browsers

Comparative Genome Comparative Genome Hybridization.Hybridization.

Comparative GenomicsComparative Genomics Microarray analysisMicroarray analysis

Page 39: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 40: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Visualize all the genes in an interval

NCBI genome browser

Page 41: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

UCSC genome browser

Page 42: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Ensembl genome browser

Page 43: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

NCBI genome browser showing candidate region for EV

Page 44: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 45: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

How to collect and interpret all the data?How to collect and interpret all the data?

How to choose the best “candidate” gene?How to choose the best “candidate” gene?

Page 46: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Strategies and adapted tools for gene Strategies and adapted tools for gene selection are urgently needed!selection are urgently needed!

Find candidate genes for the trait (time and Find candidate genes for the trait (time and cost!)cost!)WHAT genes are there?WHAT genes are there?WHAT do they do?WHAT do they do?How could they play a role in the diseaseHow could they play a role in the disease= Data mining and integration!!= Data mining and integration!!

Visualization of the whole pictureVisualization of the whole pictureGlobal viewGlobal viewOption to zoom into detailOption to zoom into detail

Page 47: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

http://www.esat.kuleuven.be/endeavour.

Page 48: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 49: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 50: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Disease Gene Finding

(Center for Biological Sequence Analysis)

Combining network theory and phenotype associations in an automated large scale disease gene finding platform

Networks – deducing functional relationships from network theory

Phenotype association

Grouping disorders based on their phenotype.

Page 51: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Phenotype association

(Brunner and van Driel 2004)

Phenotype clustering:Phenotype clustering:

Each arrow represents a Each arrow represents a KEYWORD vector.KEYWORD vector.

The components in a The components in a keyword vector correspond to keyword vector correspond to terms in the document.terms in the document.

Vectors that point in the Vectors that point in the same direction are more same direction are more alike.alike.

Ordering phenotypes in Ordering phenotypes in “syndrome families” could tell “syndrome families” could tell us about the relationships of us about the relationships of the underlying genes.the underlying genes. Disease gene identification.Disease gene identification. Clues to gene interactions Clues to gene interactions

pathways and functions.pathways and functions.

Word vectors

Page 52: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

%608389 BRANCHIOOTIC SYNDROME 3 14q23.1 SIX1

Page 53: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

 

SIX1 mutations cause branchio-oto-renal syndrome by disruption of EYA1-SIX1-DNA complexes.

Ruf RG, Xu PX, Silvius D, Otto EA, Beekmann F, Muerb UT, Kumar S, Neuhaus TJ, Kemper MJ, Raymond RM Jr, Brophy PD, Berkman J, Gattas M, Hyland V, Ruf EM, Schwartz C, Chang EH, Smith RJ, Stratakis

CA, Weil D, Petit C, Hildebrandt F.

Department of Pediatrics, University of Michigan, Ann Arbor, MI 48109, USA.

Urinary tract malformations constitute the most frequent cause of chronic renal failure in the first two decades of life. Branchio-otic (BO) syndrome is an autosomal dominant developmental disorder characterized by hearing

loss. In branchio-oto-renal (BOR) syndrome, malformations of the kidney or urinary tract are associated. Haploinsufficiency for the human gene EYA1, a homologue of the Drosophila gene eyes absent (eya), causes BOR and BO syndromes. We recently mapped a locus for BOR/BO syndrome (BOS3) to human chromosome

14q23.1. Within the 33-megabase critical genetic interval, we located the SIX1, SIX4, and SIX6 genes, which act within a genetic network of EYA and PAX genes to regulate organogenesis. These genes, therefore, represented

excellent candidate genes for BOS3. By direct sequencing of exons, we identified three different SIX1 mutations in four BOR/BO kindreds, thus identifying SIX1 as a gene causing BOR and BO syndromes. To elucidate how these mutations cause disease, we analyzed the functional role of these SIX1 mutations with respect to protein-protein and protein-DNA interactions. We demonstrate that all three mutations are crucial for Eya1-Six1 interaction, and the two mutations within the homeodomain region are essential for specific Six1-DNA binding. Identification of SIX1 mutations as causing BOR/BO offers insights into the molecular basis of otic and renal developmental

diseases in humans.

PMID: 15141091 [PubMed - indexed for MEDLINE]

Page 54: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Computational tools for disease gene identificationComputational tools for disease gene identificationApplication to EV and T2DApplication to EV and T2DOlfa MESSAOUD and Manel BALIOlfa MESSAOUD and Manel BALI

GENE SEEKERGENE SEEKERDGPDGPPROSPECTRPROSPECTRSUSPECTSSUSPECTSG2DG2DTOMTOM

Page 55: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

GeneSeekerGeneSeeker http://http://www.cmbi.ru.nlwww.cmbi.ru.nl//geneseekergeneseeker//

Web toolWeb tool

Gathers and combines data from several databases Gathers and combines data from several databases

(MIMMAP, MGD, GDB etc.)(MIMMAP, MGD, GDB etc.)

Selects positional candidate genes according to their Selects positional candidate genes according to their

expression and phenotypic data from both human and expression and phenotypic data from both human and

mouse.mouse.

Page 56: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

A general overview of the GeneSeeker program

Page 57: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 58: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Output of the GeneSeeker program

Page 59: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

G2D= Genes to DiseasesG2D= Genes to Diseases http://http://www.ogic.cawww.ogic.ca//projectsprojects/g2d_2//g2d_2/

Scoring all terms in GO according to their relevance Scoring all terms in GO according to their relevance

to each disease using MEDLINE and RefSeq.to each disease using MEDLINE and RefSeq.

Identifying candidate genes by performing BLASTX Identifying candidate genes by performing BLASTX

searchessearches..

Page 60: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

131244

q13.2

Band(s)

1

Page 61: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

63950000 73950000

Band(s)

1

3667 3630 3767

Databases used

Page 62: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 63: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

DGP= Disease Gene PredictionDGP= Disease Gene Predictionhttp://http://cgg.ebi.ac.uk/services/dgpcgg.ebi.ac.uk/services/dgp//

A decision tree-based model built based on sequence A decision tree-based model built based on sequence

properties.properties.

This model is then applied to all the genes in the This model is then applied to all the genes in the

disease loci analysed in order to obtain a probability disease loci analysed in order to obtain a probability

score for these proteins to be involved in hereditary score for these proteins to be involved in hereditary

disease.disease.

Page 64: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

22500000

33200000

Page 65: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 66: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

PROSPECTRPROSPECTRhttp://http://www.genetics.med.ed.ac.uk/prospectrwww.genetics.med.ed.ac.uk/prospectr//

Automatic classifier based on sequence features using Automatic classifier based on sequence features using

the alternating decision tree algorithm which ranks the alternating decision tree algorithm which ranks

genes in the order of likelihood of involvement in genes in the order of likelihood of involvement in

diseasedisease

Score: >0.5 likely to be involved Score: >0.5 likely to be involved

< 0.5 unlikely to be involved< 0.5 unlikely to be involved

Page 67: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 68: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 69: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 70: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 71: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 72: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

SUSPECTSSUSPECTShttp://http://www.genetics.med.ed.ac.ukwww.genetics.med.ed.ac.uk/suspects//suspects/

Web-based server.Web-based server.

Builds on PROSPECTOR (sequence features) Builds on PROSPECTOR (sequence features)

and combines annotation data (from GO, and combines annotation data (from GO,

InterPro and expression librairies)InterPro and expression librairies)..

Page 73: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

q21.1 1

Page 74: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

--

Page 75: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

TOM= Transcriptomics of OMIM TOM= Transcriptomics of OMIM http://www-micrel.deis.unibo.it/~tomhttp://www-micrel.deis.unibo.it/~tom

An automated pipeline for the extraction An automated pipeline for the extraction

of the best candidate genes for a given of the best candidate genes for a given

genetic diseasegenetic disease. .

Page 76: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Global description of the process

Page 77: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Page 78: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

The second option (two loci option) is designed for poorly characterized diseases

when no specific gene is a priori known. At least 2 linkage areas need to be present.

(Looks for pairs that have similar expression and functional profiles)

Page 79: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

The results page (genes and GO annotation)

Page 80: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Application

- A monogenic disorder:

Epidermodysplasia verruciformis

- A multifactorial disorder:

Type 2 diabetes

Page 81: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Epidermodysplasia verruciformisEpidermodysplasia verruciformis (EV)(EV)

Genetic skin disease (genodermatosis)Genetic skin disease (genodermatosis)Predisposition to skin cancerPredisposition to skin cancerHigh susceptibility to human papillomavirus High susceptibility to human papillomavirus

(HPV)(HPV)

Page 82: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Genomic organisation of EV1 locus

(Ramoz et al., 2002)

Page 83: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Haplotypic analysis of microsatellites

Page 84: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

(A) Sources of input data for each method, (B) number of genes in the starting candidate set and number of genes selected by each method    MethodsMethods

   GeneSeekerGeneSeeker DGPDGP ProspectrProspectr SuspectsSuspects G2DG2D TOMTOM

InputInput

PubMed abstractsPubMed abstracts XX          XX   

Sequence dataSequence data    XX XX XX XX XX

GO annotationGO annotation          XX XX XX

Protein dataProtein data XX       XX XX   

Expression librariesExpression libraries XX       XX    XX

Orthologous mouse Orthologous mouse genesgenes XX               

OMIMOMIM XX             XX

Number of genes selectedNumber of genes selected

EVEV

Starting set of Starting set of candidatescandidates 8585 8585 8585 8585 8585 8585

selected genesselected genes 1111 3737 4040 4545 2020 5454

T2DT2D

Starting set of Starting set of candidatescandidates 260260 260260 260260 260260 260260 ??

selected genesselected genes 2424 7676 1414 2626 33 ? ? 

Page 85: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Personal Personal annotationannotation GeneSeekerGeneSeeker DGPDGP PROSPECTRPROSPECTR SUSPECTSSUSPECTS G2DG2D TOMTOM

((SLC30A6SLC30A6)) ALKALK HADHBHADHB SPG4SPG4 OTOFOTOF BFSP2BFSP2 LBHLBH

BIRC6BIRC6 CARD12CARD12 SNX17SNX17 HADHAHADHA KCNK3KCNK3 KRT19KRT19 KHKKHK

SLC5A6SLC5A6 MSH2MSH2 NULLNULL OTOFOTOF XDHXDH KRT12KRT12 FOSL2FOSL2

GTF3C2GTF3C2 PDE1CPDE1C LBHLBH CADCAD SLC5A6SLC5A6 KRT18KRT18 KRT18KRT18

PREBPREB POMCPOMC BIRC6BIRC6 GALNTM4GALNTM4 CADCAD GFAPGFAP PPP1CBPPP1CB

KCNK3KCNK3 PPM1GPPM1G SLC5A6SLC5A6 KCNK3KCNK3 HADHBHADHB NEF3NEF3 GTF3C2GTF3C2

NRBP1NRBP1 SDC1SDC1 POMCPOMC SLC5A6SLC5A6 SPG4SPG4 KRT23KRT23 HADHAHADHA

SELISELI ((SLC23A3SLC23A3)) HADHAHADHA KIF3CKIF3C NP056477NP056477 KRT33BKRT33B KRTCAP3KRTCAP3

RAB10RAB10 SMARCAD1SMARCAD1 XDHXDH RNF30RNF30 HADHAHADHA KRT1KRT1 FLJ20254FLJ20254

SOS1SOS1 SRD5A2SRD5A2 MAPRE3MAPRE3 RAB10RAB10 KRT14KRT14 XDHXDH

SRD5A2SRD5A2 EIF2B4EIF2B4 DPSYSL5DPSYSL5 CENPACENPA KRT35KRT35 HADHBHADHB

OTOFOTOF ALKALK EHD3EHD3 KRT14KRT14 PPP1CBPPP1CB

SPG4SPG4 XDHXDH GALNT14GALNT14 KRT15KRT15 HIBCHHIBCH

Comparison between Results obtained by each method

Page 86: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Conclusion

Several promising computational tools

Need for more accurate methods

Page 87: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Thank you!

Page 88: Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

Some References and H-ReferencesSome References and H-References

For a good review see: Nucleic Acids Res. For a good review see: Nucleic Acids Res. 2006 Jun 6;34(10):3067-81. 2006 Jun 6;34(10):3067-81.

kc.vanderbilt.edu/quant/Seminar/Stat-Genkc.vanderbilt.edu/quant/Seminar/Stat-Gen02-2006.ppt02-2006.ppt

http://www.cbs.dtu.dk/http://www.cbs.dtu.dk/http://www.bios.niu.edu/johns/humgen/http://www.bios.niu.edu/johns/humgen/

Finding_Disease_Genes.pptFinding_Disease_Genes.ppt