statistical models and numerical methods in...

32
Biostatistics 666 Statistical Models and Numerical Methods in Human Genetics

Upload: others

Post on 08-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Biostatistics 666

Statistical Models and Numerical Methods in Human Genetics

Page 2: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

How to find me…� Gonçalo Abecasis

� Assistant ProfessorDept. of Biostatistics

� SPH II, Room M4132

� Phone: (734) 763-4901� E-mail: [email protected]

� Office Hours: Monday 4:00pm - 5:00pm

Page 3: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Course Grading� Written Exams 50%

� In-class midterm - 20%� Final exam - 30%

� Problem Sets 20%

� Research Project or Review 30%� In-class presentation for 10% extra credit

Page 4: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Paper or Project� Clearly written paper

� No more than 2,000 words

� Original thinking� Critical evaluation of literature� Original data analysis� Interesting computer simulation� Investigate analytical model

� Extra credit for oral presentation to class

Page 5: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Paper or Project� Review Paper

� Critically evaluate a recently published research article

� I will provide list of suggested articles

� Research Project� Analyse your own data� Carry out a simple simulation project

Page 6: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Class Schedule� Wednesdays 3:00pm – 4:30pm

� Room M4332

� Fridays 3:00pm – 4:30pm� Room M4332� Occasionally in Computing Lab, SPH II A

� Please bring your schedules this Friday

Page 7: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

DNA – Information Store� Encodes the information required for

cells and organisms to produce new cells and organisms and to function.

� DNA variation is responsible for many individual differences, some of which are medically important.

Page 8: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

DNA – A string of bases� Each base is either a:

� Purine� (A) – Adenine� (G) – Guanine

� Pyrimidine� (C) – Cytosine� (T) – Thymine

� Has an orientation

Page 9: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

DNA Double Helix� Pair of DNA strands

� Each strand is a sequence of A, C, T and G

� Complementary Strands� Facilitate replication� Bound by Hydrogen Bonds

Page 10: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Chromosomes� Human DNA is protected by special

proteins� DNA is coiled around histone proteins

� Nucleosomes� Higher order structures

� Reference points in chromosomes� Telomeres� Centromeres

Page 11: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Human Genome� Multiple chromosomes

� Each one is a DNA double helix� 22 autosomes

� Present in 2 copies� One maternal, one paternal

� 1 pair of sex chromosomes� Females have two X chromosomes� Males have one X chromosome and one Y chromosome

� Total of ~3 x 109 bases

Page 12: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Inheritance� Offspring inherit one chromosome from

each parent

� Through meiosis, germ line cells produce haploid gametes

� These fuse to create an egg, and eventually a new human being

Page 13: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Meiosis� DNA is replicated

� Chromosomes are paired

� DNA stretches exchanged between chromosomes

� Successive cell divisions take place

Page 14: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Some Types of DNA Sequence� Genes (<5% of all human DNA)

� ~30,000-35,000 in humans� Exons, translated into protein� Introns, transcribed into RNA, but not protein

� Promoters� Enhancers� Repeat DNA� Pseudogenes

Page 15: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Central Dogma� Information in cells is stored in DNA.

� DNA can be transcribed into RNA.

� RNA can be translated into protein.� Proteins can catalyze chemical reactions.� Proteins receive and transmit signals.� Proteins constitute structural building blocks.

Page 16: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Genetic Code� DNA � RNA � Protein� DNA: 4 bases (A,T,C,G)� RNA: 4 bases (A,U,C,G)� Proteins: 20 amino-acids� Universal Genetic Code

� Translation between DNA/RNA and protein� Three bases code for one amino-acid

Page 17: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Genetic Code

Page 18: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Human Variation� When two chromosomes are compared most

of their sequence is identical� Consensus sequence

� About 1 per 1,000 bases differs between pairs of chromosomes in the population� In the same individual� In the same geographic location� Across the world

Page 19: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Types of Genetic Variation� Sequence Polymorphisms

� A single or few bases differ between individuals

� Length Polymorphisms� In some regions of DNA, the number of

copies of particular DNA repeat varies

Page 20: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Repeat Length Polymorphisms� Variable Number Tandem Repeats

� VNTRs� Typical repeat units of 10 – 100s bp� E.g.: ~110 bp repeat in IL1RN gene

� Microsatellites� Simple repeat sequences

� Most popular are 2, 3 or 4 bp

� E.g.: ACACACAC …� D naming scheme (e.g., D2S160)

Page 21: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Example VNTR� Picture of DNA on an

agarose gel� This is a repeat

sequence near IL1 gene

� Small fragments move faster towards positive pole

+

Page 22: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Microsatellites� Most popular markers for linkage

analysis� Large number of alleles (10 is common)� Can distinguish and track individual

chromosomes in families

� Relatively abundant� ~15,000 mapped loci

Page 23: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

SNPs� SNP is usually read as Snip� Single nucleotide polymorphisms

� Change one nucleotide� Replace� Insert� Delete

� Abundant, but traditionally hard to detect� Typically have one or a few alleles

Page 24: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Single Base Changes� Transitions

� A/G, C/T� Purine � Purine� Pyrimidine � Pyrimidine

� Transversions� Purine � Pyrimidine� A/T, A/C, C/G, G/T

Page 25: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

A little more on SNPs� Most SNPs have only

two alleles� Easy to automate their

scoring� Becoming extremely

popular� Typing Methods

� Sequencing� Restriction Site� Hybridization

Page 26: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Phenotypes� Can measured genetic variation indirectly

� E.g., Cystic Fibrosis� Patients must carry two mutations in CF gene� Parents of patients must carry one mutation� Normal individuals carry 0 or 1 mutations

Page 27: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

3 Stages of Genetic Mapping� Are there genes influencing this trait?

� Epidemiological studies

� Where are those genes?� Linkage analysis

� What are those genes?� Association analysis

Page 28: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Is a trait genetic?� Examine distribution of trait in the population

and among relatives

� E.g. Inflammatory Bowel Disease (Crohn’s)� General population

� 1-3 cases per 1,000 individuals

� Twins of affected individuals� 44% of monozygotic twins also have Crohn’s� 3.8% of dizygotic twins also have Crohn’s

Page 29: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Where are those genes?� Find genetic markers that co-segregate

with disease

� E.g. D16S3136co-segregateswith Crohn’s

Page 30: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

What are those genes?� Identify genetic variants that are associated

with disease…

� E.g. Disruptive mutations in NOD2 much more common in Crohn’s patient� Crohn’s Controls� Arg702Trp: 11% 4%� Gly908Arg: 4% 2%� Leu1007fs 8% 4%

Page 31: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Checking Assumptions…

Page 32: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Take Home Reading!� An introduction to important issues in

genetics:

� Lander and Schork (1994)Science 265:2037-48