abstract in a large family with a specific disease, patients usually share the identity-by-descent...

1
Abstract In a large family with a specific disease, patients usually share the identity-by-descent (IBD) haplotype linked to the disease susceptibility genes. Although many haplotype analysis methods have been developed to detect the shared interval, it is currently still very difficult to reconstruct the haplotype on a genome-wide basis. A non-parametric method Homozygosity Haplotype (HH) was proposed recently for the genome- wide search of the shared autosomal segment with high density SNP genotype. Rather than phasing the haplotype, HH utilizes a form of haplotype described by the homozygous SNPs only, which allows HH to perform genome-wide search with high efficiency. The applicability and the effectiveness of HH in identifying the candidate region of causative gene were studied with the Illumina 550k genotype data of the affected members from a large family with Schnyder crystalline corneal dystrophy (SCCD, MIM 121800), a rare autosomal dominant disease. HH successfully detected the ~1Mb shared segment with a minimum set of three samples. We proposed that HH can be applied to screen the known causative genes or loci by searching for the shared homozygosity haplotype for patients who have inherited a susceptibility gene from a common ancestor. A new strategy for the genome-wide screening of the known causative genes or loci with high density SNP genotype data was developed, which has the potential to be used as an efficient alternative approach other than sequencing or microsatellite-based fine mapping for the research of genetic diseases and the clinical diagnosis. Applications of Homozygosity Haplotype in the Study of Human Genetic Diseases with High Density SNP Genotype Haiyan Jiang 1 ,Mark Samuels 1,2 ,Duane Guernsey 1 ,Andrew Orr 1,3 Departments of 1 Pathology, and 3 Ophthalmology and Visual Sciences, Dalhousie University, Halifax, NS Canada 2 Department of Medicine, University of Montreal, Montreal, QC Canada 3. Application to the screening of known causative genes Assuming that patients who have inherited the disease susceptibility gene from a common ancestor also share haplotype in the genomic interval, HH approach can be applied to screen the known causative genes or loci by searching for the shared homozygosity haplotype around the gene. If patients do not share significant RCHH around the known gene, then the gene can be excluded. Impact of genotyping errors It is difficult to determine genotyping errors when only a few affected individuals in a family are available to be genotyped, an approach was developed to calculate the error possibility. First, replace the mismatched compSNPs with concordant SNPs to create consistent homozygosity haplotype. Run Monte Carlo (MC) to simulate genotyping errors with the selected error model and error ratio on the modified genotypes. Analyze the distribution of the number of mismatched compSNPs created by simulated genotyping errors using Poisson distribution. Calculate the possibility of getting N mismatched compSNPs introduced by genotyping error. 1. Homozygosity Haplotype Method An HH is a haplotype described by only homozygous SNPs and is obtained by the deletion of heterozygous SNPs. A compSNP is a SNP that is homozygous in two subjects An RCHH is a run of compSNPs matched for allelic type, the genetic length of which is longer than the cutoff value. An RCHH is bounded by either a mismatched compSNP(s) or by the end(s) of an autosome. HH program 1. Read in genotypes of all affected subjects 2. Delete heterozygous SNPs and generate HH for each subject 3. Generate list of RCHHs for each pair of subjects with a cutoff value 4. Determine the shared RCHH of multi-subjects Short arm Long arm Short arm Long arm RCHHs identified by HH method with the 550K genotypes of patient 1351 and 1425 HH successfully detected the ~1Mb shared segment on Chr1 with genotypes of three patients I. An RCHH at chr1:10,679,786-11,639,887 was identified by HH method with the genotype data of 10 patients. -4 -2 0 2 4 6 8 10 0 2 4 6 8 10 12 14 cM LOD D1S2667 LO D = 8.7 ped 1 ped 115 Schnyder crystalline corneal dystrophy (SCCD) is a rare genetic disorder presenting with opacification of the cornea caused by aberrant intracellular cholesterol storage and possible systemic dyslipidemia. AMGGI ascertained a large family segregating SCCD from Nova Scotia. Mapping analysis confirmed linkage of the Nova Scotia SCCD family to a previously published locus at chromosome 1p34. DNA resequencing identified a heterozygous segregating missense variant in the gene UBIAD1 at chr1:11,255,866-11,268,929 in the NS family, and four other missense variants in four additional small families ascertained nationally and internationally with SCCD. The presumptive pathogenic variants, at positions 102, 112, 119, 175 and 232 are all in highly conserved residues, and lie near each other in a 3-D model of protein folding. UBIAD1 encodes a potential prenyltransferase, and may participate in biosynthesis or regulation of intracellular cholesterol trafficking. Thus UBIAD1 is a potential novel therapeutic target for treatment of hyperlipidemia, a significant risk factor for cardiovascular disease. Taken from Orr et al. PLoS One 2: e685 (2007) Conclusions Our study of HH approach with Illumina 550k SNP genotype data from a series of monogentic disease projects demonstrates that HH method is very efficient and effective in identifying disease linked regions. Based on the idea of homozygosity haplotype, we developed a new approach for the genome-wide screening of the known causative genes or loci using high density SNP genotype data. The successful application to a family with known causative mutation supports that the method has the potential to be used as an efficient alternative approach other than sequencing or Microsatellite-based fine mapping for the research and clinical diagnosis of genetic diseases. Genome-wide mapping of RCHHs shared by four patients from a Canadian family with Myoclonus dystonia Results Short arm Long arm DRD2: chr11:112,785,527- 112,851,211 DYT1: chr9:131,616,072- 131,626,199 SGCE: chr7:94,052,472- 94,123,457 . 2. Use HH to identify the candidate loci for Schnyder crystalline corneal dystrophy Results Background Cutoff value selection In which, O is the observed genotype of a SNP, G is the underlying genotype, the genotype error rate is E. The error rate is the same for all possible underlying genotypes. Lincoln SE, Lander ES. Genomics 1992; 14:604– 610. The P value, the possibility of getting N mismatched compSNPs introduced by genotyping error, is calculated with a Poisson distribution. Gene Region Error Simulation E=0.01 DRD2 Chr11:111,851,211-113,785,527 λ=8.98 P=0 DYT1 Chr9:130,626,199-132,616,072 λ=8.76 P=0 Features of HH method Non-parametric High efficiency Complexity O(n 2 ), n: number of subjects For Marfan syndrome, Affymetrix 500k SNP genotype, 9 subjects, the computational time is 6 s on laptop. Both dominant and recessive disease loci can be detected HH analysis may provide an advantage when 6≤m+n≤ 50 (m, n are the number of generations removed from a common ancestor of two subjects) where the haplotype analysis or the linkage analysis are difficult to perform. HH is well-suited to the local population in Atlantic region with m+n<20. Region from a common ancestor (RCA) An RCA is an autosomal region where subjects share a chromosomal segment derived from a common ancestor (i.e. IBD). The presence of RCA is predicted through the RCHH. Miyazawa H, et. al. Homozygosity haplotype allows a genomewide search for the autosomal segments shared among patients. Am J Hum Genet. 2007 Jun; 80(6):1090-102. II. Minimal subset required to identify the interval Sample selection: select distantly related individuals because they share less RCAs Ratio of RCA to the total genetic length shared by two descendants from a common ancestor. In which, m, n are the number of generations removed from a common ancestor of two subjects 10 patients were genotyped with Illumina 550K bead chips. HH analyses were run with cutoff 3.0 cM. Short arm Long arm RCHHs identified by HH method with the 550K genotypes of 10 patients Results of genotyping error simulation The two gene DRD2 and DYT1 can be excluded because no RCHH was detected around them. The results of genotyping error simulations with P=0 suggest the genotype data are reliable. The largest RCHH at chr7: 93,168,493-130,965,632 with size of 37 Mb includes gene SGCE (chr7:94,052,472-94,123,457). The study of Myoclonus dystonia demonstrated that the proposed screening approach excluded all non-causative genes successfully. Besides, it identified the potential linkage of SGCE in the meanwhile. The whole-genome screening approach was validated using a family with Myoclonus dystonia (MIM 159900 ). The known causative genes are SGCE, DRD2, and DYT1. A published causative mutation c.304C>T (R102X) in the SGCE gene has been detected in the affected family members by sequencing. HH was tested whether the proposed screening approach can exclude non-causative genes correctly. Four patients from the family were genotyped with Illumina HumanHap550 beadchips. HH was run to identify RCHHs shared by the four patients with a cutoff 3.0 cM. Lincon and Lander error model July 2008, ISMB 2008

Upload: eugenia-casey

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract In a large family with a specific disease, patients usually share the identity-by-descent (IBD) haplotype linked to the disease susceptibility

Abstract

In a large family with a specific disease, patients usually share the identity-by-descent (IBD) haplotype linked to the disease susceptibility genes. Although many haplotype analysis methods have been developed to detect the shared interval, it is currently still very difficult to reconstruct the haplotype on a genome-wide basis. A non-parametric method Homozygosity Haplotype (HH) was proposed recently for the genome-wide search of the shared autosomal segment with high density SNP genotype. Rather than phasing the haplotype, HH utilizes a form of haplotype described by the homozygous SNPs only, which allows HH to perform genome-wide search with high efficiency. The applicability and the effectiveness of HH in identifying the candidate region of causative gene were studied with the Illumina 550k genotype data of the affected members from a large family with Schnyder crystalline corneal dystrophy (SCCD, MIM 121800), a rare autosomal dominant disease. HH successfully detected the ~1Mb shared segment with a minimum set of three samples. We proposed that HH can be applied to screen the known causative genes or loci by searching for the shared homozygosity haplotype for patients who have inherited a susceptibility gene from a common ancestor. A new strategy for the genome-wide screening of the known causative genes or loci with high density SNP genotype data was developed, which has the potential to be used as an efficient alternative approach other than sequencing or microsatellite-based fine mapping for the research of genetic diseases and the clinical diagnosis.

Applications of Homozygosity Haplotype in the Study of Human Genetic Diseases with High Density SNP Genotype

Haiyan Jiang1,Mark Samuels1,2,Duane Guernsey1,Andrew Orr1,3

Departments of 1Pathology, and 3Ophthalmology and Visual Sciences, Dalhousie University, Halifax, NS Canada2Department of Medicine, University of Montreal, Montreal, QC Canada

3. Application to the screening of known causative genes

Assuming that patients who have inherited the disease susceptibility gene from a common ancestor also share haplotype in the genomic interval, HH approach can be applied to screen the known causative genes or loci by searching for the shared homozygosity haplotype around the gene. If patients do not share significant RCHH around the known gene, then the gene can be excluded.

Impact of genotyping errorsIt is difficult to determine genotyping errors when only a few affected individuals in a family are available to be genotyped, an approach was developed to calculate the error possibility. First, replace the mismatched compSNPs with concordant SNPs to create consistent homozygosity haplotype. Run Monte Carlo (MC) to simulate genotyping errors with the selected error model and error ratio on the modified genotypes. Analyze the distribution of the number of mismatched compSNPs created by simulated genotyping errors using Poisson distribution. Calculate the possibility of getting N mismatched compSNPs introduced by genotyping error.

1. Homozygosity Haplotype Method

An HH is a haplotype described by only homozygous SNPs and is obtained by the deletion of heterozygous SNPs.

A compSNP is a SNP that is homozygous in two subjects

An RCHH is a run of compSNPs matched for allelic type, the genetic length of which is longer than the cutoff value. An RCHH is bounded by either a mismatched compSNP(s) or by the end(s) of an autosome.

HH program1. Read in genotypes of all affected subjects 2. Delete heterozygous SNPs and generate HH for each subject3. Generate list of RCHHs for each pair of subjects with a cutoff value4. Determine the shared RCHH of multi-subjects

Short arm Long armShort arm Long arm

RCHHs identified by HH method with the 550K genotypes of patient 1351 and 1425

HH successfully detected the ~1Mb shared segment on Chr1 with genotypes of three patients

I. An RCHH at chr1:10,679,786-11,639,887 was identified by HH method with the genotype data of 10 patients.

Simwalk2 multipoint linkage: Chomosome 1

-4

-2

0

2

4

6

8

10

0 2 4 6 8 10 12 14cM

LOD

D1S2

667

LOD = 8.7

ped 1ped 115

Schnyder crystalline corneal dystrophy (SCCD) is a rare genetic disorder presenting with opacification of the cornea caused by aberrant intracellular cholesterol storage and possible systemic dyslipidemia. AMGGI ascertained a large family segregating SCCD from Nova Scotia. Mapping analysis confirmed linkage of the Nova Scotia SCCD family to a previously published locus at chromosome 1p34.

DNA resequencing identified a heterozygous segregating missense variant in the gene UBIAD1 at chr1:11,255,866-11,268,929 in the NS family, and four other missense variants in four additional small families ascertained nationally and internationally with SCCD.

The presumptive pathogenic variants, at positions 102, 112, 119, 175 and 232 are all in highly conserved residues, and lie near each other in a 3-D model of protein folding. UBIAD1 encodes a potential prenyltransferase, and may participate in biosynthesis or regulation of intracellular cholesterol trafficking. Thus UBIAD1 is a potential novel therapeutic target for treatment of hyperlipidemia, a significant risk factor for cardiovascular disease.

Taken from Orr et al. PLoS One 2: e685 (2007)

Conclusions Our study of HH approach with Illumina 550k SNP genotype data from a series of monogentic disease projects demonstrates that HH method is very efficient and effective in identifying disease linked regions. Based on the idea of homozygosity haplotype, we developed a new approach for the genome-wide screening of the known causative genes or loci using high density SNP genotype data. The successful application to a family with known causative mutation supports that the method has the potential to be used as an efficient alternative approach other than sequencing or Microsatellite-based fine mapping for the research and clinical diagnosis of genetic diseases.

Genome-wide mapping of RCHHs shared by four patients from a Canadian family with Myoclonus dystonia

Results

Short arm Long arm

DRD2: chr11:112,785,527-112,851,211

DYT1: chr9:131,616,072-131,626,199

SGCE: chr7:94,052,472-94,123,457

.

2. Use HH to identify the candidate loci for Schnyder crystalline corneal dystrophy

Results

Background

Cutoff value selection

In which, O is the observed genotype of a SNP, G is the underlying genotype, the genotype error rate is E. The error rate is the same for all possible underlying genotypes. Lincoln SE, Lander ES. Genomics 1992; 14:604–610.

The P value, the possibility of getting N mismatched compSNPs introduced by genotyping error, is calculated with a Poisson distribution.

Gene RegionError Simulation

E=0.01

DRD2 Chr11:111,851,211-113,785,527λ=8.98

P=0

DYT1 Chr9:130,626,199-132,616,072λ=8.76

P=0

Features of HH method Non-parametricHigh efficiencyComplexity O(n2), n: number of subjectsFor Marfan syndrome, Affymetrix 500k SNP genotype, 9 subjects, the computational time is 6 s on laptop.Both dominant and recessive disease loci can be detectedHH analysis may provide an advantage when 6≤m+n≤ 50 (m, n are the number of generations removed from a common ancestor of two subjects) where the haplotype analysis or the linkage analysis are difficult to perform.HH is well-suited to the local population in Atlantic region with m+n<20.

Region from a common ancestor (RCA)

An RCA is an autosomal region where subjects share a chromosomal segment derived from a common ancestor (i.e. IBD). The presence of RCA is predicted through the RCHH.

Miyazawa H, et. al. Homozygosity haplotype allows a genomewide search for the autosomal segments shared among patients. Am J Hum Genet. 2007 Jun; 80(6):1090-102.

II. Minimal subset required to identify the interval

Sample selection: select distantly related individuals because they share less RCAs

Ratio of RCA to the total genetic length shared by two descendants from a common ancestor. In which, m, n are the number of generations removed from a common ancestor of two subjects

10 patients were genotyped with Illumina 550K bead chips. HH analyses were run with cutoff 3.0 cM.

Short arm Long arm

RCHHs identified by HH method with the 550K genotypes of 10 patients

Results of genotyping error simulation

The two gene DRD2 and DYT1 can be excluded because no RCHH was detected around them. The results of genotyping error simulations with P=0 suggest the genotype data are reliable.

The largest RCHH at chr7: 93,168,493-130,965,632 with size of 37 Mb includes gene SGCE (chr7:94,052,472-94,123,457).

The study of Myoclonus dystonia demonstrated that the proposed screening approach excluded all non-causative genes successfully. Besides, it identified the potential linkage of SGCE in the meanwhile.

The whole-genome screening approach was validated using a family with Myoclonus dystonia (MIM 159900). The known causative genes are SGCE, DRD2, and DYT1. A published causative mutation c.304C>T (R102X) in the SGCE gene has been detected in the affected family members by sequencing. HH was tested whether the proposed screening approach can exclude non-causative genes correctly. Four patients from the family were genotyped with Illumina HumanHap550 beadchips. HH was run to identify RCHHs shared by the four patients with a cutoff 3.0 cM.

Lincon and Lander error model

July 2008, ISMB 2008