the bonobo genome compared with the chimpanzee and human genomes kay pruüfer et al. nature...
TRANSCRIPT
The bonobo genome compared with the chimpanzee and human genomes
Kay Pruüfer et al. Nature (June,2012)Presenter: Chia-Ying Chen
Materials
a female bonobo (Ulindi, Leipzig Zoo) -- 454 sequencing -- paired-end reads of insert sizes of 3, 9 and 20 kb (a total depth of 26 X )
19 individuals: 3 bonobos 2 western chimpanzees 7 eastern chimpanzees 7 central chimpanzees -- Illumina 76 or 101 paired end (about 1X coverage)
Assessment of the Bonobo Genome
panTro2 (Clint) (6X Sanger sequneced)
Retrotransponson Evolution in the Bonobo Genome
using GMAP to align all available bonobo (198 million reads) and chimpanzee (46 million reads) sequence traces to hg18
99.7%991 27 30
Using Alu retrotransposon to estimate split times
2.2M
6.5M
15M
B C H O
Ontology analysis of transposon
• to create 2 million simulated insertions• to count numbers of observed vs. simulated transposon integrants inside or within +/- 50 kb Refseq genes• to query the PANTHER data
Enrichment or depletion of L1 integrants
biological processes molecular functions
5 955
1
100 978 9781
5 2,748,175 2,748,175i
Divergence, Site Pattern Analysis and Signals of Admixture
to gain insight into the relationship between and within bonobo and chimpanzee populations
Data: illumina reads of 16 chimpanzees and 3 bonobos the 454 reads of Ulindi the Sanger sequencing reads of Clint (panTro2)
Divergence times
Bonobo - Chimpanzee : 2.2 million years
Clint - Central chimpanzee : 1.3 million years
Clint - Eastern chimpanzee : 1.3 million years
Clint - Western chimpanzee : 0.5 millions years
Ulindi - Bonobo : 0.5 million years
Nc: A equals B, and C different
Nb: A equals C, and B different
Divergence between A and B :2 b
b c
N
N N
C1 C2 B H
( )
( )
ABBA BABAD
ABBA BABA
in blocks of 5 mega bases
Site Pattern Analysis and Signals of Admixture
Speciation Times, Ancestral Population Size and Incomplete Lineage sorting
The scenarios may lead to gene trees with a topology different from the species tree :
The population size of the ancestral species is sufficiently large.
The time span between speciation events is sufficiently small.
These areas are termed incomplete lineage sorting (ILS)
Based on the 4-way alignment (HCBO) set phred score=30 masked RepeaatMasker track removed over-collapsing of regions due to duplications
CoalHMM analysis is run on each mega base of alignment chunks
Correlation between ILS and gene ontology classes
to count the bases in each of the four ILS states for the entire length of genes including introns
to carry out GO enrichment test using FUNC
to identify GO categories that are either enriched or depleted for CH and BH bases using Wilcoxon rank test
Genes depleted in ILS : intracellular, transcription, translationGenes enriched in ILS : protein signal to the membrane cell adhesion
But, no preferential GO terms when the analyses separately identify GO categories for CH and BH bases.
Incomplete Lineage Sorting Regions and Balancing Selection
The regions may be enriched in incomplete lineage sorting (ILS) due to long-standing balancing selection.
We considered ILS assignment in 50 kb windows to identify candidate regions
If balancing selection remains active until present times, it may also affect the patterns of polymorph in present-day populations.
Balancing selection candidates: • to exhibit high diversity in chimpanzee• to be enriched for shared SNPs