python meetup2014 (ying liu)

15
Unveiling Epigenetic Regulation with Next Generation Sequencing (NGS) and Python Ying Liu Weill Cornell Medical College The New York Python Meetup, 05-29-2014

Upload: eilosei

Post on 26-Jan-2015

118 views

Category:

Science


0 download

DESCRIPTION

The New York Python Meetup: Biology & Python (05-29-2014)

TRANSCRIPT

Page 1: Python Meetup2014 (Ying Liu)

Unveiling Epigenetic Regulation with Next Generation Sequencing (NGS) and Python

Ying Liu

Weill Cornell Medical College

The New York Python Meetup, 05-29-2014

Page 2: Python Meetup2014 (Ying Liu)

About Me

• PhD candidate, Weill Cornell Medical College

• Major Area: Stem cell epigenomics, Computational Biology

• Graduation: Fall 2014

• Dream job: data scientist in biomedical informatics

• Email: [email protected]

• LinkedIn: https://www.linkedin.com/pub/ying-liu/b/669/605

Page 3: Python Meetup2014 (Ying Liu)

Reprogram > 20 days(Thousands of genes change expression)

Induced pluripotent stem(iPS) cells

Adult cells

Express pluripotent stem cell specific genes (4 genes)

2012 Nobel Prize in Physiology or Medicine

Generate Pluripotent Stem Cells from Mature Adult Cells

Limitation

• Reprogram efficiency: 0.01 - 0.1%

• Molecular mechanism is not fully understood

?

Page 4: Python Meetup2014 (Ying Liu)

Human Genome Project

• Human genome: ~ 3 billion DNA base pairs

• Complete sequence: 2003

First sequence draft: 2001

Page 5: Python Meetup2014 (Ying Liu)

Nature 454, 711-715

Gene Expression

My project: Histone X

Enriches at expressing genes

Epigenetic Regulation

• Epigenetics: study of heritable changes in gene activity that are NOT caused by changes in the DNA sequence

• One of the major epigenetic regulators: Histone protein

Histone proteins

DNA

Page 6: Python Meetup2014 (Ying Liu)

Induced pluripotent stem(iPS) cells

Adult cells

2012 Nobel Prize in Physiology or Medicine

Project

Detect histone X function in initiating adult cells reprogramming to iPS cells.

Experiment• Collect cells at the beginning (Day 0, 3, 6,

10) and after reprogramming (iPS);

• Map genome-wide histone X localization with Next Generation Sequencing (NGS);

• Analyze the dynamic change of genome-wide histone X localization with Python program and framework.

Reprogram > 20 days(Thousands of genes change expression)

Express pluripotent stem cell specific genes (4 genes)

Generate Pluripotent Stem Cells from Mature Adult Cells

Page 7: Python Meetup2014 (Ying Liu)

Next Generation DNA Sequencing

(Illumina, Inc)

Genome-wide Analysis of Epigenetic Regulation

Computation analysis (by genome)

Tools: Python, R, etc.

Align DNA sequence to chromosome

Display in genome browser (by gene)

chromosome

Page 8: Python Meetup2014 (Ying Liu)

10 kb

Day 0

Day 3

Day 6

Day 10

Day 0

Day 3

Day 6

Day 10

Histone X

K27me3

Pou5f1 Nanog

Histone X Enriches Near Stem Cell Specific GenesAt the Beginning of Cell Reprogramming

Genome browser (IGV)

Alignment output (BED format)

chr1 3000062 3000113 HWI-1KL117_0134:6:2101:14893:19331#ACAGTG/A..GTG. 37+chr1 3000113 3000164 HWI-1KL117_0134:6:2302:6790:10626#ACAGTG/A..GT.. 37+chr1 3000146 3000197 HWI-1KL117_0134:6:2303:8145:108924#ACAGTG/A..GT.. 37-chr1 3000154 3000205 HWI-1KL117_0134:6:2202:14995:109690#ACAGTG/A..GT.. 37-chr1 3000241 3000292 HWI-1KL117_0134:6:1304:12589:77263#ACAGTG/A..GT.. 25-chr1 3000311 3000362 HWI-1KL117_0134:6:1101:17212:111473#ACAGTG/A..GT.. 37-chr1 3000334 3000385 HWI-1KL117_0134:6:2308:10385:78074#ACAGTG/A..GT.. 25-chr1 3000385 3000436 HWI-1KL117_0134:6:2102:20734:102615#ACAGTG/A..GG.. 37+chr1 3000498 3000549 HWI-1KL117_0134:6:1203:3146:72739#ACAGTG/A..GTG. 37-chr1 3000538 3000589 HWI-1KL117_0134:6:1101:1921:57017#ACAGTG/A..GT.. 37+

Chrom Start End Strand

Page 9: Python Meetup2014 (Ying Liu)

Computational Pipeline for Genome-wide DNA Sequence Analysis

Bardet AF, Stark A, Nature Protocols, 2012

Alignment Analysis (Python, Perl)

• BWA

• Picard

• Samtools

• MACS, Cistrome (X. Shirley Liu Lab)

• ChIPseeqer (Olivier Elemento Lab)

Page 10: Python Meetup2014 (Ying Liu)

Peak Identification with Python Program: Model-based Analysis of ChIP-Seq (MACS)

Zhang Y, Liu XS, et al. Genome Biology 2008Feng J, Liu XS, et al. Nature Protocol 2012

(1)

(2)

Requirement: ~3 GB of RAM, 1.5 h per data set with 30 million DNA sequence reads.

d: estimated DNA fragment size

5’

3’

3’

5’

d

• Read distribution: Poisson distribution

• Use dynamic λlocal to capture local biases in the genome

λlocal = max (λBG, [λregion, λ1k], λ5k, λ10k)

λBG: constant estimated from the genome background λregion: estimated from the candidate region λ1k, λ5k, λ10k: estimated from 1kb, 5kb, 10kb local window in the control

• p-value: default threshold is 10-5

(3)

(4)

Page 11: Python Meetup2014 (Ying Liu)

Galaxy / CistromeMACS integrated web-based application: http://cistrome.org/ap/

Page 12: Python Meetup2014 (Ying Liu)

ChIPseeqer

• Graphical User Interface• Command-line

http://physiology.med.cornell.edu/faculty/elemento/lab/chipseq.shtml

Page 13: Python Meetup2014 (Ying Liu)

10 kb

Day 0

Day 3

Day 6

Day 10

Day 0

Day 3

Day 6

Day 10

Histone X

K27me3

Pou5f1 Nanog

Histone X Enriches Near Stem Cell Specific GenesAt the Beginning of Cell Reprogramming

Page 14: Python Meetup2014 (Ying Liu)

Day: 0 3 6 10 iPS E 0 3 6 10 iPS E

L H L H

Expression Histone X

Exp

res

sion

Ch

an

ge

Ex

pre

ss

ion

Sta

ble

Pou5f1 Sox2 Cdh1 Cldn3 Jag2 Zbtb32 Elf3 Msh6 Lefty1 Piwil2 Notch4 Tjp3 Fbxo15 Cldn6 Foxh1 Zp3 Fgf15 Nodal Tdgf1 Gdf3 Nanog Fgf4 Dppa3

Histone X Enriches At Stem Cell Specific Gene PromotersPrior to Gene Expression Activation

Embryonic placenta developmentStem cell maintenanceResponse to nutrientCell-cell signalingDNA metabolic processDNA recombinationFormation of primary germ layerChromosome organizationMesoderm developmentCell fate commitmentStem cell differentiationBlastocyst formationMeiosisSexual reproductionThyroid hormone metabolic processCellular response to abiotic stimulus

Expression Active Stable

Group - a b c a b c

Gene Ontology Analysis

a. Histone X enrich during Day 0 – 10b. Histone X enrich in iPS (after Day 10)c. Histone X not Enrich

Page 15: Python Meetup2014 (Ying Liu)

Induced pluripotent stem(iPS) cells

Adult cells

Limitation

• Reprogram efficiency: 0.01 - 0.1%• Molecular mechanism is not fully understood

Our Genome-wide analysis suggests:

Histone X participates in stem cell gene activation at the early stage of adult cell reprogram.

Express pluripotent stem cell specific genes (4 genes)

Reprogram > 20 days(Thousands of genes change expression)

Generate Pluripotent Stem Cells from Mature Adult Cells