statistical models and numerical methods in...
TRANSCRIPT
Biostatistics 666
Statistical Models and Numerical Methods in Human Genetics
How to find me…� Gonçalo Abecasis
� Assistant ProfessorDept. of Biostatistics
� SPH II, Room M4132
� Phone: (734) 763-4901� E-mail: [email protected]
� Office Hours: Monday 4:00pm - 5:00pm
Course Grading� Written Exams 50%
� In-class midterm - 20%� Final exam - 30%
� Problem Sets 20%
� Research Project or Review 30%� In-class presentation for 10% extra credit
Paper or Project� Clearly written paper
� No more than 2,000 words
� Original thinking� Critical evaluation of literature� Original data analysis� Interesting computer simulation� Investigate analytical model
� Extra credit for oral presentation to class
Paper or Project� Review Paper
� Critically evaluate a recently published research article
� I will provide list of suggested articles
� Research Project� Analyse your own data� Carry out a simple simulation project
Class Schedule� Wednesdays 3:00pm – 4:30pm
� Room M4332
� Fridays 3:00pm – 4:30pm� Room M4332� Occasionally in Computing Lab, SPH II A
� Please bring your schedules this Friday
DNA – Information Store� Encodes the information required for
cells and organisms to produce new cells and organisms and to function.
� DNA variation is responsible for many individual differences, some of which are medically important.
DNA – A string of bases� Each base is either a:
� Purine� (A) – Adenine� (G) – Guanine
� Pyrimidine� (C) – Cytosine� (T) – Thymine
� Has an orientation
DNA Double Helix� Pair of DNA strands
� Each strand is a sequence of A, C, T and G
� Complementary Strands� Facilitate replication� Bound by Hydrogen Bonds
Chromosomes� Human DNA is protected by special
proteins� DNA is coiled around histone proteins
� Nucleosomes� Higher order structures
� Reference points in chromosomes� Telomeres� Centromeres
Human Genome� Multiple chromosomes
� Each one is a DNA double helix� 22 autosomes
� Present in 2 copies� One maternal, one paternal
� 1 pair of sex chromosomes� Females have two X chromosomes� Males have one X chromosome and one Y chromosome
� Total of ~3 x 109 bases
Inheritance� Offspring inherit one chromosome from
each parent
� Through meiosis, germ line cells produce haploid gametes
� These fuse to create an egg, and eventually a new human being
Meiosis� DNA is replicated
� Chromosomes are paired
� DNA stretches exchanged between chromosomes
� Successive cell divisions take place
Some Types of DNA Sequence� Genes (<5% of all human DNA)
� ~30,000-35,000 in humans� Exons, translated into protein� Introns, transcribed into RNA, but not protein
� Promoters� Enhancers� Repeat DNA� Pseudogenes
Central Dogma� Information in cells is stored in DNA.
� DNA can be transcribed into RNA.
� RNA can be translated into protein.� Proteins can catalyze chemical reactions.� Proteins receive and transmit signals.� Proteins constitute structural building blocks.
Genetic Code� DNA � RNA � Protein� DNA: 4 bases (A,T,C,G)� RNA: 4 bases (A,U,C,G)� Proteins: 20 amino-acids� Universal Genetic Code
� Translation between DNA/RNA and protein� Three bases code for one amino-acid
Genetic Code
Human Variation� When two chromosomes are compared most
of their sequence is identical� Consensus sequence
� About 1 per 1,000 bases differs between pairs of chromosomes in the population� In the same individual� In the same geographic location� Across the world
Types of Genetic Variation� Sequence Polymorphisms
� A single or few bases differ between individuals
� Length Polymorphisms� In some regions of DNA, the number of
copies of particular DNA repeat varies
Repeat Length Polymorphisms� Variable Number Tandem Repeats
� VNTRs� Typical repeat units of 10 – 100s bp� E.g.: ~110 bp repeat in IL1RN gene
� Microsatellites� Simple repeat sequences
� Most popular are 2, 3 or 4 bp
� E.g.: ACACACAC …� D naming scheme (e.g., D2S160)
Example VNTR� Picture of DNA on an
agarose gel� This is a repeat
sequence near IL1 gene
� Small fragments move faster towards positive pole
–
+
Microsatellites� Most popular markers for linkage
analysis� Large number of alleles (10 is common)� Can distinguish and track individual
chromosomes in families
� Relatively abundant� ~15,000 mapped loci
SNPs� SNP is usually read as Snip� Single nucleotide polymorphisms
� Change one nucleotide� Replace� Insert� Delete
� Abundant, but traditionally hard to detect� Typically have one or a few alleles
Single Base Changes� Transitions
� A/G, C/T� Purine � Purine� Pyrimidine � Pyrimidine
� Transversions� Purine � Pyrimidine� A/T, A/C, C/G, G/T
A little more on SNPs� Most SNPs have only
two alleles� Easy to automate their
scoring� Becoming extremely
popular� Typing Methods
� Sequencing� Restriction Site� Hybridization
Phenotypes� Can measured genetic variation indirectly
� E.g., Cystic Fibrosis� Patients must carry two mutations in CF gene� Parents of patients must carry one mutation� Normal individuals carry 0 or 1 mutations
3 Stages of Genetic Mapping� Are there genes influencing this trait?
� Epidemiological studies
� Where are those genes?� Linkage analysis
� What are those genes?� Association analysis
Is a trait genetic?� Examine distribution of trait in the population
and among relatives
� E.g. Inflammatory Bowel Disease (Crohn’s)� General population
� 1-3 cases per 1,000 individuals
� Twins of affected individuals� 44% of monozygotic twins also have Crohn’s� 3.8% of dizygotic twins also have Crohn’s
Where are those genes?� Find genetic markers that co-segregate
with disease
� E.g. D16S3136co-segregateswith Crohn’s
What are those genes?� Identify genetic variants that are associated
with disease…
� E.g. Disruptive mutations in NOD2 much more common in Crohn’s patient� Crohn’s Controls� Arg702Trp: 11% 4%� Gly908Arg: 4% 2%� Leu1007fs 8% 4%
Checking Assumptions…
Take Home Reading!� An introduction to important issues in
genetics:
� Lander and Schork (1994)Science 265:2037-48