python meetup2014 (ying liu)
DESCRIPTION
The New York Python Meetup: Biology & Python (05-29-2014)TRANSCRIPT
Unveiling Epigenetic Regulation with Next Generation Sequencing (NGS) and Python
Ying Liu
Weill Cornell Medical College
The New York Python Meetup, 05-29-2014
About Me
• PhD candidate, Weill Cornell Medical College
• Major Area: Stem cell epigenomics, Computational Biology
• Graduation: Fall 2014
• Dream job: data scientist in biomedical informatics
• Email: [email protected]
• LinkedIn: https://www.linkedin.com/pub/ying-liu/b/669/605
Reprogram > 20 days(Thousands of genes change expression)
Induced pluripotent stem(iPS) cells
Adult cells
Express pluripotent stem cell specific genes (4 genes)
2012 Nobel Prize in Physiology or Medicine
Generate Pluripotent Stem Cells from Mature Adult Cells
Limitation
• Reprogram efficiency: 0.01 - 0.1%
• Molecular mechanism is not fully understood
?
Human Genome Project
• Human genome: ~ 3 billion DNA base pairs
• Complete sequence: 2003
First sequence draft: 2001
Nature 454, 711-715
Gene Expression
My project: Histone X
Enriches at expressing genes
Epigenetic Regulation
• Epigenetics: study of heritable changes in gene activity that are NOT caused by changes in the DNA sequence
• One of the major epigenetic regulators: Histone protein
Histone proteins
DNA
Induced pluripotent stem(iPS) cells
Adult cells
2012 Nobel Prize in Physiology or Medicine
Project
Detect histone X function in initiating adult cells reprogramming to iPS cells.
Experiment• Collect cells at the beginning (Day 0, 3, 6,
10) and after reprogramming (iPS);
• Map genome-wide histone X localization with Next Generation Sequencing (NGS);
• Analyze the dynamic change of genome-wide histone X localization with Python program and framework.
Reprogram > 20 days(Thousands of genes change expression)
Express pluripotent stem cell specific genes (4 genes)
Generate Pluripotent Stem Cells from Mature Adult Cells
Next Generation DNA Sequencing
(Illumina, Inc)
Genome-wide Analysis of Epigenetic Regulation
Computation analysis (by genome)
Tools: Python, R, etc.
Align DNA sequence to chromosome
Display in genome browser (by gene)
chromosome
10 kb
Day 0
Day 3
Day 6
Day 10
Day 0
Day 3
Day 6
Day 10
Histone X
K27me3
Pou5f1 Nanog
Histone X Enriches Near Stem Cell Specific GenesAt the Beginning of Cell Reprogramming
Genome browser (IGV)
Alignment output (BED format)
chr1 3000062 3000113 HWI-1KL117_0134:6:2101:14893:19331#ACAGTG/A..GTG. 37+chr1 3000113 3000164 HWI-1KL117_0134:6:2302:6790:10626#ACAGTG/A..GT.. 37+chr1 3000146 3000197 HWI-1KL117_0134:6:2303:8145:108924#ACAGTG/A..GT.. 37-chr1 3000154 3000205 HWI-1KL117_0134:6:2202:14995:109690#ACAGTG/A..GT.. 37-chr1 3000241 3000292 HWI-1KL117_0134:6:1304:12589:77263#ACAGTG/A..GT.. 25-chr1 3000311 3000362 HWI-1KL117_0134:6:1101:17212:111473#ACAGTG/A..GT.. 37-chr1 3000334 3000385 HWI-1KL117_0134:6:2308:10385:78074#ACAGTG/A..GT.. 25-chr1 3000385 3000436 HWI-1KL117_0134:6:2102:20734:102615#ACAGTG/A..GG.. 37+chr1 3000498 3000549 HWI-1KL117_0134:6:1203:3146:72739#ACAGTG/A..GTG. 37-chr1 3000538 3000589 HWI-1KL117_0134:6:1101:1921:57017#ACAGTG/A..GT.. 37+
Chrom Start End Strand
Computational Pipeline for Genome-wide DNA Sequence Analysis
Bardet AF, Stark A, Nature Protocols, 2012
Alignment Analysis (Python, Perl)
• BWA
• Picard
• Samtools
• MACS, Cistrome (X. Shirley Liu Lab)
• ChIPseeqer (Olivier Elemento Lab)
Peak Identification with Python Program: Model-based Analysis of ChIP-Seq (MACS)
Zhang Y, Liu XS, et al. Genome Biology 2008Feng J, Liu XS, et al. Nature Protocol 2012
(1)
(2)
Requirement: ~3 GB of RAM, 1.5 h per data set with 30 million DNA sequence reads.
d: estimated DNA fragment size
5’
3’
3’
5’
d
• Read distribution: Poisson distribution
• Use dynamic λlocal to capture local biases in the genome
λlocal = max (λBG, [λregion, λ1k], λ5k, λ10k)
λBG: constant estimated from the genome background λregion: estimated from the candidate region λ1k, λ5k, λ10k: estimated from 1kb, 5kb, 10kb local window in the control
• p-value: default threshold is 10-5
(3)
(4)
Galaxy / CistromeMACS integrated web-based application: http://cistrome.org/ap/
ChIPseeqer
• Graphical User Interface• Command-line
http://physiology.med.cornell.edu/faculty/elemento/lab/chipseq.shtml
10 kb
Day 0
Day 3
Day 6
Day 10
Day 0
Day 3
Day 6
Day 10
Histone X
K27me3
Pou5f1 Nanog
Histone X Enriches Near Stem Cell Specific GenesAt the Beginning of Cell Reprogramming
Day: 0 3 6 10 iPS E 0 3 6 10 iPS E
L H L H
Expression Histone X
Exp
res
sion
Ch
an
ge
Ex
pre
ss
ion
Sta
ble
Pou5f1 Sox2 Cdh1 Cldn3 Jag2 Zbtb32 Elf3 Msh6 Lefty1 Piwil2 Notch4 Tjp3 Fbxo15 Cldn6 Foxh1 Zp3 Fgf15 Nodal Tdgf1 Gdf3 Nanog Fgf4 Dppa3
Histone X Enriches At Stem Cell Specific Gene PromotersPrior to Gene Expression Activation
Embryonic placenta developmentStem cell maintenanceResponse to nutrientCell-cell signalingDNA metabolic processDNA recombinationFormation of primary germ layerChromosome organizationMesoderm developmentCell fate commitmentStem cell differentiationBlastocyst formationMeiosisSexual reproductionThyroid hormone metabolic processCellular response to abiotic stimulus
Expression Active Stable
Group - a b c a b c
Gene Ontology Analysis
a. Histone X enrich during Day 0 – 10b. Histone X enrich in iPS (after Day 10)c. Histone X not Enrich
Induced pluripotent stem(iPS) cells
Adult cells
Limitation
• Reprogram efficiency: 0.01 - 0.1%• Molecular mechanism is not fully understood
Our Genome-wide analysis suggests:
Histone X participates in stem cell gene activation at the early stage of adult cell reprogram.
Express pluripotent stem cell specific genes (4 genes)
Reprogram > 20 days(Thousands of genes change expression)
Generate Pluripotent Stem Cells from Mature Adult Cells