Genome-wide association study between DSE polymorphism and Poly-A
usage in Human population
Hiren KarathiaSridhar Hannenhalli
Transcription & Polyadenylation (Poly-A)
Objectives
• Genome-wide estimation of alternate Poly-A (PA) usage on 3’UTR
• Genome-wide Prediction and investigation of
polymorphisms in DSE (Downstream Sequence Element) motifs
• Population-wide correlation study between the PA usage and DSE polymorphisms
Annotation status of Poly-A sites on 3’UTR of
Human Genome (hg19 – 2009)
1 2 3 4 5 6 7 8 9 10 11
Poly-A usage transcripts 12224 4784 1567 588 189 107 19 19 3 2 1
3.16227766016838
31.6227766016838
316.227766016838
3162.27766016838
31622.7766016838
Frequency of Transcripts Vs
Cleavage Points
log1
0 (T
rans
crip
t fre
quen
cy)
1 2 3 4 5 6 7 8 9 10 11
Poly-A usage transcripts 12224 4784 1567 588 189 107 19 19 3 2 1
3.16227766016838
31.6227766016838
316.227766016838
3162.27766016838
31622.7766016838
Frequency of Transcripts Vs
Cleavage Points
log1
0 (T
rans
crip
t fre
quen
cy)
37% - Multiple Poly-A points
Target of the analysis
RNA-Seq processing for Human Samples
SampleFastq files
BWA SamtoolsBAM file Merged BAM file
Samtools
Samtools
Sorted BAM fileDe-duplicated file
Picard tool
Indexing the BAM
Samtools
SAM file
Calculate Coverage
Bed tools
Calculate Relative usage of PAs Python script
Symbol Group of Samples Male Female DNA RNABR British in England and Scotland 1 1 FI Finnish in Finland 1 1 UT Utah residents with Northern and Western European ancestry 1 1 YO Yoruba in Ibadan, Nigeria 1 1
Differential Expression of UTR
Cuffdiff tools
Python script
De-novo assembly
Genome-wide estimation of alternate Poly-A (PA) usage on 3’UTR
PA1 Coverage PA2 Coverage
PA1 Junction PA2 Junction
Complete UTR coverage
Coverage (Stop codon – PA1 junction) / DistancePA1 Usage = Complete (complete 3’ UTR) / Distance
Coverage (Stop codon - PA2 junction) / Distance PA2 Usage = Coverage (complete 3’UTR) / Distance
Stop Codon
Cleaved 3’UTR
Prediction of DSE
Coding Strand of DNA
Sample A RNA-Seq
Sample A DNA-Seq
De-novo assembled 3’UTR fragment
Prediction of DSE motif
Template Strand of DNA
Frequency of Poly-A usage in the samples
BR - 1 (F) BR - 2 (M) FN - 1 (F) FN - 2 (M) UT - 1 (F) UT - 1 (M)
Not Expressed 5499 5833 5211 5677 5849 5514
Single PA Usage 9185 8913 9302 9037 8852 9012
Multiple PA Usage 4819 4757 4988 4787 4802 4975
500
1500
2500
3500
4500
5500
6500
7500
8500
9500
Freq
uenc
y of
Tra
nscr
ipt
Correlation of different PA usage in a Human Sample
PA1 – PA2 PA2 – PA3
r = - 0.643; p = 0.0 r = - 0.182; p = 1.06e-33
Correlation of PA usage and corresponding DSE polymorphism
Utah-Fe
male<=
>Briti
sh-M
ale
Utah-Fe
male<=
>Briti
sh-Fe
male
Nigeria
-Male
<=>U
tah-M
ale
Nigeria
-Male
<=>F
innish-M
ale
British
-Male
<=>B
ritish
-Female
Finnish
-Male
<=>B
ritish
-Male
Nigeria
-Male
<=>B
ritish
-Female
British
-Female
<=>U
tah-Fe
male
Finnish
-Male
<=>U
tah-M
ale
Nigeria
-Male
<=>U
tah-Fe
male
Utah-M
ale<=
>Briti
sh-M
ale
Finnish
-Female
<=>F
innish-M
ale
Nigeria
-Male
<=>B
ritish
-Male
0
1
2
3
4
5
6
TT composition in DSE motifs
-LO
G (P
)
Finnish
-Male
<=>B
ritish
-Male
Nigeria
-Male
<=>F
innish-M
ale
Finnish
-Female
<=>B
ritish
-Male
British
-Female
<=>U
tah-Fe
male
Nigeria
-Male
<=>B
ritish
-Male
Finnish
-Male
<=>U
tah-M
ale
Finnish
-Female
<=>B
ritish
-Female
British
-Female
<=>B
ritish
-Male
Utah-Fe
male<=
>Briti
sh-Fe
male
Nigeria
-Male
<=>B
ritish
-Female
Nigeria
-Male
<=>B
ritish
-Female
British
-Female
<=>B
ritish
-Female
Utah-Fe
male<=
>Briti
sh-M
ale
Nigeria
-Male
<=>U
tah-Fe
male0
0.51
1.52
2.53
3.5
GT composition in DSE motifs
-LO
G (P
)
British
-Female
<=>B
ritish
-Female
Finnish
-Female
<=>B
ritish
-Male
Utah-Fe
male<=
>Briti
sh-M
ale
Finnish
-Male
<=>U
tah-M
ale
Nigeria
-Male
<=>B
ritish
-Male
Utah-Fe
male<=
>Briti
sh-Fe
male
Nigeria
-Male
<=>B
ritish
-Female
Nigeria
-Male
<=>B
ritish
-Female
00.5
11.5
22.5
33.5
GG composition in DSE motifs
- LO
G (P
)
Finnish
-Male
<=>U
tah-M
ale
Nigeria
-Male
<=>F
innish-M
ale
Nigeria
-Male
<=>B
ritish
-Male
Nigeria
-Male
<=>U
tah-Fe
male
Utah-Fe
male<=
>Briti
sh-M
ale
Finnish
-Female
<=>U
tah-M
ale
Nigeria
-Male
<=>B
ritish
-Female
Nigeria
-Male
<=>U
tah-M
ale
British
-Female
<=>B
ritish
-Male
Utah-Fe
male<=
>Briti
sh-Fe
male0
1
2
3
4
5
6
Length of DSE motifs
-LO
G (P
)
Correlation of PA usage and corresponding DSE polymorphism
Functional enrichment of Genes associated with Differential PA Usage and
Polymorphic for of DSEs in Population
Thank you !!
Differential Expression of complete 3’UTR
Inter/Intra group correlation of a PA usage
r = 0.8; p = 0.0 r = 0.8; p = 0.0
r = 0.98; p = 0.0
PA1 usageBR1 – BR2 FN1 – FN2
BR1 – FN1
Statistics of predicted DSE motifs
Sample PA type Mean(Motif Length) Max(Motif Length) Min(Motif Length) Mean(Distance) Max(Distance) Min(Distance)
BR-1Single 12 79 9 30 89 1
Multiple 12 52 9 34 89 1
BR-2 Single 12 62 9 31 89 1
Multiple 12 52 9 34 89 1
FN - 1 Single 12 90 9 35 89 1
Multiple 12 54 9 39 89 1
Find Polymorphism in the DSEs
Find Correlation between the PA-usage and DSE polymorphism
Pending
Alternate Poly-A selection mechanism
Complete 3’UTR coverage VS
Alternate 3’UTR coverage
Differential expression of complete 3’UTR usage Differential expression of PA Usage
Poly Adenylation Usage on 3’UTR
PA1 Coverage PA2 Coverage
PA1 Junction PA2 Junction
Complete UTR coverage
PA1 CoverageRelative PA1 Usage = Longest UTR Coverage
PA2 CoverageRelative PA2 Usage = Longest UTR Coverage
Stop Codon
Intron
Cleaved 3’UTR
DSE statisticSample PA type Mean(Motif Length) Max(Motif Length) Min(Motif Length) Mean(Distance) Max(Distance) Min(Distance)
BR-1
Single 12 79 9 30 89 1
Multiple 12 52 9 34 89 1
BR-2
Single 12 62 9 31 89 1
Multiple 12 52 9 34 89 1
FN - 1
Single 12 90 9 35 89 1
Multiple 12 54 9 39 89 1
+ strand
- strand
Gene Strand
Template Strand
+ Read
+ Read
+ Read
- Read
- Read
RNA Strand DNA Strand
Locations of annotated multiple PA locations on 3’UTR
PA1 Junction PA2 JunctionStop CodonCleaved 3’UTR
PA1 Junction PA2 JunctionStop Codon
PAs on same exon
PAs on multiple exonsr = 0.2578p = 8.44e10-111
Poly-A Location
Leng
th o
f 3’
UTR