proposed problem

15

Upload: slone

Post on 11-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Proposed Problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Proposed Problem
Page 2: Proposed Problem

Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have disadvantages. (they had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck)

Proposed Problem

Page 3: Proposed Problem

A diploid genome sequence contains hundreds of thousands of independent loci, each with its own TMRCA between the two alleles carried by an individual. In principle, it should be possible to reconstruct the TMRCA distribution across the auto- somes and the X chromosome by studying how the local density of heterozygous sites changes across the genome.

Why diploid genome sequence

Page 4: Proposed Problem

Aim: Inference of human population history from individual whole-genome sequences

Data: Simulated Data Complete diploid genome sequences of a

Chinese male (YH)6, a Korean male (SJK)7, three European individuals (J. C. Venter8, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239).

Summary of the Paper

Page 5: Proposed Problem

Main Method: PSMC-HMM Technics HMM EM Baum-Welch Iteration Bootstraping

Summary of the Paper

Page 6: Proposed Problem

HMM: states, emission, transition probability, emission probability

States: discrete TMRCA Emission or Observation: “0”, “1” and “.” Emission Probability: The emission

probability from t is

Page 7: Proposed Problem

Transition probability: is the scaled mutation rate, is

the scaled recombination rate, is the Dirac delta function and

is the transition Probability conditional on there being a recombination event, where

is the relative population size at state t.

Technics

Page 8: Proposed Problem

EM: The expectation- maximization iteration started from a constant-sized population history. Powell’s direction set method was used for the maximization step. Parameter values stabilized by the twentieth iteration and these were taken as the final estimate. All parameters were scaled to a constant that is further determined under the assumption of a neutral mutation rate.

Technics

Page 9: Proposed Problem
Page 10: Proposed Problem
Page 11: Proposed Problem
Page 12: Proposed Problem

European and Chinese populations had very similar population-size histories before 10–20 kyr ago. Both populations experienced a severe bottleneck 10–60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100–120 kyr ago, but considerable genetic exchanges may still have occurred until 20–40 kyr ago.

Main Results

Page 13: Proposed Problem

There is the potential to extend this type of sequentially Markovian coalescent hidden Markov model approach to data from several individuals, which would access more recent times, but this will require inference over a substantially more complex hidden-state-space of trees on the haplotypes, with each Markov path representing an ancestral recombination graph. In addition, there is the potential to apply the method to investigate the population-size history of other species for which a single diploid genome sequence has been obtained.

What we can do

Page 14: Proposed Problem

HMM PSMC EM BW Iteration How to calculate simulation variance:

Bootstraping

Taking home

Page 15: Proposed Problem