10 liu, dajiang
DESCRIPTION
TRANSCRIPT
Statistical Genetics Using Sequence Data
Dajiang J. Liu
Department of Statistics
Why We Study Statistical Genetics• Statistics is originated from genetics• R.A. Fisher: “The Correlation Between Relatives on the Supposition of Mendelian
Inheritance”– Introduced the concept of variance in this article
• Francis Galton: Regression of human height toward the mean:– Introduced correlation and regression
• Karl Pearson: – “Mendelism and the problem of mental defect”– “Tuberculosis, heredity and environment”
• Why don’t we seek our roots?
• In order to find disease genes in the genome, statistics is a must
Statistical Genetics
• Disease gene mapping: – The determination of the sequence of genes and their
relative distances from one another on a specific chromosome
– Technology driven field:1. Mendel’s era: Segregation Analysis
- Patience: peas, fruit fly: inbreeding is necessary
Experimental Design
Statistical Genetics• Modern era:
– Microsatellite Markers:• Genetic linkage analysis
– Extremely successful for mapping and identifying Mendelian traits
– Single nucleotide polymorphism (SNP) marker• Case control studies:
– Genome Wide Association Studies: To identify common variants involved in complex traits
ComputationalTechniques for
likelihood in Pedigrees
Statistics play a major role
Statistical Genetics• Sequencing Era:
• Study of diseases due to rare variants is emerging
ABI SOLiD sequencer
Statistics is ALL for sequencing data
Statistical Genetics
• Data we work with
Human Genome Project
Hap Map Project
1000 GenomeProject
Multi-facotorial Disease Etiology Hypothesis
• Common Disease Common Variants Hypothesis (CD/CV) hypothesis:– Common diseases are caused by a few common variants with
moderate effect– E.g. Age-related Macular Degeneration:
• Common variants are likely to have lower odds ratio than rare variants:
Multi-facotorial Disease Etiology Hypothesis
• Common Disease Rare Variants Hypothesis:– Common diseases are caused by multiple rare
variants with large effect size:– The discovery of rare variants will have high impact
on public health since they will aid in risk prediction and treatment
• E.g. Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol
• E.g. Colorectal Adenomas
Challenges on Statistical Methodologies• Variants misclassification:
– Non-causal variants Included:• Huge number of mutations on the genome:
– Most of them are not causing the disease under study
– Causal Variants Excluded:• Intronic mutations:• Intergenic regions:
• Unknown patterns of interactions:1. Within gene interactions: e.g. Hirschsprung’s disease (RET gene)2. Gene x gene interactions: e.g. breast cancer genes (BRCA 1 BRCA2 x
CHEK2)
Adaptive methods are needed
Kernel Based Adaptive Clustering• Combine variant classification with association testing into a
coherent framework• Applicable to population based case/control studies using unrelated
individuals• Robust against variants misclassifications• Can handle gene x gene interactions and gene x environment
interactions