bioinformatic analysis of the interface between ...ucbprbe/cp2.pdf · they have their own dna...
TRANSCRIPT
Bioinformatic analysis of the interface between
mitochondrial biogenesis and apoptotic cell death signaling
pathways in Parkinson’s disease.
Robert BenthamSupervised by Dr G. Szabadkai and Dr K. Bryson
March 3, 2012
Contents
1 Introduction 1
2 Microarray Analysis 22.1 Data acquisition . . . . . . . . . . . . 32.2 Quality Control . . . . . . . . . . . . . 3
2.2.1 Normalisation . . . . . . . . . . 52.3 LIMMA . . . . . . . . . . . . . . . . . 5
2.3.1 Results . . . . . . . . . . . . . 72.4 Gene Set Analysis . . . . . . . . . . . 7
2.4.1 GSEA . . . . . . . . . . . . . . 82.4.2 GAGE . . . . . . . . . . . . . . 82.4.3 Results . . . . . . . . . . . . . 9
3 Conclusion 9
References 14
A Tables 16
B R code 17
1 Introduction
Mitochondria are subcellular organelles present in most eukaryotic cells. They have a complex evolutionaryhistory, endosymbiotic theory saying that they evolved from free living bacteria which became incorporatedwithin a cell. They have their own DNA (known as mtDNA) which is inherited from the mother only.Mitochondria primarily function being to provide ATP to the rest of the cell which is used as a source ofenergy means that they are essential for the healthy function of a cell
Cell survival is dependent on the maintenance of a healthy cellular mitochondrial pool which is in turndependent on two processes. The degradation of damaged mitochondria by autophagy and the processof mitochondrial renewal, mitochondrial biogenesis. This project will chiefly concern the latter of theseprocesses, mitochondrial biogenesis. This biogenesis is simply the process of which new mitochondriaare formed, however, the precise biological machinery controlling this process however is highly complex.Despite this complexity the PGC-1 family of transcriptional coactivators have been identifies as the masterregulators of mitochondrial biogenesis[14].
1
Robert Bentham
Cancer, cardiovascular disease and neurodegenerative diseases such as Parkinson’s have all been associatedwith dysfunction of the mitochondria [14] [5]. In a recent review on the overlapping pathways involved inParkinson’s and cancer,[3] the role of mitochondria in both is stressed.
It has previously been shown that PGC-1α down regulation occurs in Parkinson’s disease [19], this couldlead to the pathogenesis of Parkinson’s disease due to mitochondrial dysfunction, possibly meaning thatPGC-1α is a potential therapeutic target. Additionally, in previous bioinformatic analysis of the role ofPGC-1 in cancer[1], PGC-1 also was found to down regulate stress pathways involved in DNA damage.Interestingly DNA damage has also been suggested to be associated with Parkinson’s disease [12].
The aim of this work is to test the hypothesis that in clinical samples of Parkinson’s disease besidesdownregulation of mitochondrial pathways, there are alterations in pathways involving DNA damage. Todo this previously published microarray data will be studied, and significantly expressed genes and genepathways identified.
2 Microarray Analysis
A microarray is a device for measuring the expression levels of large numbers of genes. It does this viautilising the process of DNA hybridisation, which is illustrated in Figure 1. The expression level of eachgene is detected by hybridisning with a number of oligonucleotide fragments on the chip acting as probes.For a single gene there are 11 perfect match (PM) probes and 11 mismatched probes (MM), in which thesequence differs by a single base. These MM probes are important for quality control, they measure thespecificity of the hybridisation by giving an indication of any cross-hybridisation that has occurred. Thusthe chip is covered with large number of probes of DNA both of type PM and MM. The target RNA fromthe experimental sample is manipulated and fluorescently labelled. So when hybridisation occurs withthe probes, there is a measure of gene expression obtained from the intensity of the fluorescence at eachspot on the microarray. For Affymetrix chips two microarrays from different experimental conditions, onebeing from a control sample can then be compared, and differences arising from the experimental conditioninferred.
Figure 1: Image from Affymetrix illustrating the construction and workings of an affymetrix microarray.
There are numerous issues in the use of microarrays or any other high-throughput technique, firstlythere is a huge amount of data that must be analysed in a statistically robust manner. To maintain thisrobustness quality control is a essential part of any analysis, these issues and others are discussed in [20],
2
Robert Bentham
unfortunately with microarrays different statistical techniques can lead to quite different results, so onemust proceed with care. There are also many things beyond our control, there is technical variabilityin the actual experiment. This comes from differences in the temperature and pH values which affectinghybridisation on the microarray. Additionally each probe can not be optimised for hybridation equally,adding the stochastic nature of biological systems this leads to very noisy results with large systematicbias. Any statistical analysis must deal with these levels of noise and judge when to reject a microarrayfrom the analysis if any systematic bias can no longer be tolerated.
2.1 Data acquisition
For the aims of this report four datasets were identified for analysis involving microarrays from patientswith Parkinson’s disease. The first dataset, which will be referred to as the Zheng dataset (available fromGEO series accession number GSE24378 [9]) and is part of the meta study that identified PGC1-α as apotential target for parkinson’s disease [19]. This particular study is made of 17 samples with 8 replicatesfor parkinson’s disease and 9 replicates for the controls, the RNA used on the microarray is from 500dopamine (DA) neurons from the pars compacta (SNc) of the substantia nigra.
Another three data sets were furthermore selected for analysis, these included another dataset, whichwill be called the Middleton dataset (available through GEO Series accession number GSE20292[8]) whichwas also used in the meta study [19] [28]. Middleton has 18 control replicates and 11 replicates withparkinson’s disease. The next data set chosen, named Mullen (available through GEO Series accessionnumber GSE7621[6]) has 16 replicates for Parkinson’s disease and 9 replicase for the controls [17]. Thefinal data set, will be referred to as Moran (available through GEO Series accession number GSE8397[7][21]). The Moran dataset, had microarrays from the Affymetrix U133A and U133B chip, of these onlythe U133A chip were used, as well as this microarrays taken from the substantia nigra with no distinctionbetween the lateral and medial parts. After this the Moran dataset contained 24 replicates for Parkinson’sdisease and 15 replicates for the controls.
All of the data sets chosen were from microarrays using Affymetrix chips , Middleton and Moran usedU133A chips while Mullen uses the more recent U133 plus 2.0, these two differ in the number of genes theydetect the plus 2.0 having probes for an additional 6500 genes. In contrast to this the Zheng study usesthe U133 X3P chip which uses probes designed to examine sequences closer to the 3’ end of transcripts,which is useful in cases of bad RNA degradation which happens from the 5’ end of transcripts.
2.2 Quality Control
The purpose of quality control is to identify arrays which are not possible to correct and use in the analysis.Problems may include mistakes in the experimental procedure or a very high signal to noise ratio. Fora comprehensive look at array quality a variety of measures should be examined, this can be quite timeconsuming, however it is possible to automate this process somewhat with R package arrayQualityMetrics[15]. A few of the main methods of quality control used will be discussed here, though there are manydifferent techniques many of which are generated automatically in the arrayQualityMetrics package.
The first thing to check for is array defects by looking at a spatial plot of intensitied, areas such as highintensity could indicate uneven hybridisation, while patterns in the spatial plot could indicate a particlebeing loose in the chip and scratching the surface while hybridisation occurs in a centrifuge. Figure ??ashows the spatial plots for all chips in the Middleton dataset, in this case and all other datasets examinedthere were no problems with either array defects or hybridisation effects.
The next quantity to check for is RNA degradation or poor labeling. It is well known that RNA degra-dation starts from the 5’ end of a molecule and finishes at the 3’ end, a feature that the chip U133 plus2.0 makes use of. For this reason if RNA degradation has occurred the mean intensity of the probes at the3’ end should be much higher, this can easily be checked and plotted in R. Figure 2b shows an increase inthe intensities of probes at the 3’ end in the Zheng dataset. Indeed all the other datasets showed similarresults, this result could also be due to inefficient labeling as the labeling reaction used in preparing the
3
Robert Bentham
(a) Spatial plot showing probe intensities of microarraysfor the Middleton dataset, all microarrays here are nor-mal. This plot was generated with the arrayQualityMet-rics package
RNA degradation plot
5' <−−−−−> 3' Probe Number
Mea
n In
tens
ity :
shift
ed a
nd s
cale
d
0 2 4 6 8 10
020
4060
80
C1C2C3C4PD1PD2PD3C5PD4PD5PD6PD7C6C7PD8C8C9
(b) RNA degradation plot showing severe degradationfor samples in the Zheng dataset despite the specialU133 X3P chip here designed for cases with bad RNAdegradation.
(c) PM and MM log2 intensity graph for data in theMiddleton study generated with the arrayQualityMetricspackage.
Figure 2: Quality Control measures used in analysis
RNA to sample occurs from the 3’ end, however due to all samples being taken from postmortems it isvery likely that the cause of this result is RNA degradation. As all samples have comparable degradationin each dataset this should not effect the analysis[2].
Addition measures of quality control include checking the density histogram of the PM and MM log2intensities, MM probes measure the non-specific hybridisation or cross hybridisation that occurs, it isexpected that the RNA should bind more strongly to the PM probes than the MM probes, if this is notthe case than there will be a high signal to noise ratio in the results. This graph for the Middleton datasetis shown in Figure 2c, for all studies it was found the the RNA binded more strongly to the PM probes,though the graphs suggest that the levels of noise are possibly quite high.
4
Robert Bentham
2.2.1 Normalisation
An abundance of variation exists between chips in microarray analysis, even between replicates from thesame sample. Variations are caused by a combination of technical and biological reasons, technical suchas the temperature and pH levels during hybridisation, and biological such as differences between twosamples coming from patients with the same condition. Therefore for a fair comparison of all the chipsbeing analysed, all chips need must be normalised with respect to each other. Checking for successfulnormalisation is the final step in quality control. The method of normalisation used in this report wasRobust Multichip Average (RMA) which is fully described along with other possible alternatives methodsin [11].
Checking successful normalisation can be measured by examining boxplots and MvA plots both pre andpost normalisation. Figure 3 shows the effects of boxplots of the intensity values on each chip. MvA plotsmeasure M, the log-2 fold change between intensity values of each probeset on different arrays, while A isthe average log-2 intensity of each probeset on the arrays. On an MvA plot every probeset is plotted withM on the y axis and A on the x axis. Figure 4 shows MvA plots post normalisation, ideally and MvA plotis symmetrical in the x axis and resembles a comet shape [24]. Once all the data has been normalised thenext task is to find the significantly expressed genes.
(a) Boxplot showing pre normalised data for the intensityof each chip in the Middleton study
(b) Boxplot showing post normalised data for the inten-sity of each chip in the Middleton study
Figure 3: The effect of normalisation on the boxplot showing intensity for each chip, graphs generated with thearrayQualityMetrics package. Normalisation is needed so all chips can be compared fairly.
2.3 LIMMA
LIMMA or linear models for microarray data [25] is a package in R designed for finding significant genes byestimating the log fold changes in expression level between different experimental conditions. The methodLIMMA uses is fully explained in [24]. The first step is to calculate the log fold change, for which LIMMAassumes a linear model:
E[yj ] = Dαj (1)
Here yj represents the expression data for the gene j, and E[yj ] is a vector of the expression levels forgene j in each sample. D is the design matrix, which will be explained shortly, and αj is the vector ofcoefficients, containing the differences between the experimental conditions. The design matrix and vectorof coefficients can be made so that the comparison of interest, here the log fold change between the control
5
Robert Bentham
Fig
ure
4:M
vApl
ots
betw
een
asa
mpl
eof
cont
rolr
eplic
ates
inth
eM
oran
stud
ysh
own
post
norm
alis
atio
n,th
eid
eals
hape
ofa
MvA
plot
for
repl
icat
espo
stno
rmal
isat
ion
issy
mm
etri
cal
inth
ex
axis
and
rese
mbl
esa
com
etsh
ape.
Her
eth
eL
OE
SSlin
eis
show
nin
red
and
ifno
rmal
isat
ion
isdo
new
ell
shou
ldlie
onth
ex
axis
.T
here
plic
ates
show
nhe
reha
veal
lbe
enno
rmal
ised
fair
lyw
ell.
6
Robert Bentham
and Parkinson case, is built into the fitted model. To see this, suppose that there are 4 samples, 2 replicatesfor the control case and two for Parkinson’s disease. Then the design matrix and vector of coefficients canbe written as:
D =
1 01 01 11 1
, αj =(θ1θ2
)(2)
Here θ2 could be written as xpd − xc, where xc and xpd are the log expression levels of a particular genein the control and Parkinson’s samples respectively. Written like this θ2 gives the difference between thelog expression level in the control and Parkinson case, in contrast θ1 gives the difference between the logexpression level between the control and a reference. LIMMA estimates both coefficients but it is only thevalue of θ2 representing the log fold change that is of interest.
LIMMA then uses an empirical Bayes’ method to adjust these coefficients, Empirical Bayes borrowsinformation across genes and makes sure the analysis is stable which is especially needed for experimentswith small numbers of arrays [24]. After this LIMMA automatically calculates the FDR adjusted p values,this is needed as multiple hypotheses are being tested for significance. For example if 1000 genes were testedfor significance at a significance level of 0.01, statistically it is expected that 10 genes would be deemed tobe significant even if really there are no significant genes. FDR or false discovery rate adjusts the p value,so this false discovery rate is controlled. This new adjusted p value is essentially the probability of a falsediscovery of a differentially expressed gene among those genes which have been classified as differentiallyexpressed. If a particular gene has a FDR adjusted p value of 0.07 it means that an estimated 7% of thegenes with lower adjusted p values are false positives.
2.3.1 Results
Running LIMMA on the Zheng dataset identifies precisely zero significantly expressed genes after multiplehypothesis adjustment, this is surprising due to the difference that is expected between patients withParkinson’s and patients without. One reason this could be so is that due to the experimental design ofthe Zheng dataset samples were taken from only 500 DA neurons, this is a very small sample size and itis not surprising therefore that little is found in the analysis[2]. Another telling sign is that this dataset isonly part of a meta study, [19], and has no papers published just using its results, suggesting that by itselfthere are no significant findings. For these reasons the Zheng dataset was discarded from further analysis.
The other three datasets did find significantly expressed genes. The Middleton study found 91 genesthat were significantly expressed, Mullen 180 and Moran 3360. All genes were significantly expressed withmultiple hypothesis adjusted p values of less than 0.05. Clearly the Moran study found a much greaternumber of significantly expressed genes, this could be due to Moran having the largest number of replicatesof all the studies thereby being able to find more significant genes, in contrast Middleton had the smallestnumber of replicates and has the least number of significantly expressed genes.
2.4 Gene Set Analysis
From using LIMMA, a list of significant genes for each study has been found. This tells us there aredifferences between the two cases of the controls and those with Parkinson’s disease, however to extractbiological meaning from these lists presents difficulty. The truth is that in biology a gene does not act inisolation but in concourse with many others. An improved approach is to examine differences in sets ofgenes, genesets or gene pathways, that provide a common function or purpose. These gene pathways orgene sets are largely identified from major databases such as Gene Ontology or KEGG. Finally significantgene pathways involves using a set of statistical techniques known as Gene Set Analysis (GSA), here twoof these techniques will be described.
7
Robert Bentham
2.4.1 GSEA
GSEA or gene set enrichment analysis is one of the standard methods for GSA. Originally developed bySubramanian et al. [26] in 2005. Since then it has found wide use in the bioinformatics community, and iscertainly one of the most popular method of GSA. The original method involved using a ranked gene listsuch as the ones generated by a LIMMA analysis, and calculating what is referred to as an ‘EnrichmentScore’ for each gene set. This enrichment score fully described in [26] and gives a score based on whetherthe genes in a gene set were towards the top or bottom of the ranked list. Using this enrichment score,significance is inferred by use of sample permutations to derive a distribution from which the p-values canbe calculated.
Many varieties of GSEA can be found in the literature, the calculation of the enrichment score has beenseen as over complicated. Other approaches include using a two sample t test such as is found in [13] and[27]. Particularly, [13] introduces the Jiang and Gentleman statistic or the J-G statistic:
τk =∑g∈Sk
tg/√|Sk| (3)
Here tg is the t statistic for a single gene expression g and |Sk| is the size of the gene pathway of interest.The J-G statistic is normalized by the length of the pathway, such that as |Sk| approaches infinity, thedistribution of the J-K statistic approaches the unit normal. Methods of inferring the significance of genepathways are then as in the original GSEA paper made using sample permutations.
This method was slightly adapted in [22], where instead of the J-K statistic based on an aggregation oft statistics a statistic based on the aggregation of gene level regression residuals was used instead. Thismethod assumes that there is a linear relationship between the mean response variable i.e. gene expressionand the explanatory covariates such as the presence of Parkinson’s disease. If such a linear regressionmodel holds the regression residuals can be calculated, in a similar way to the J-K statistic:
Rki =∑g∈Sk
rgi/√|Sk| (4)
Significance is again inferred using sample permutations. This last procedure calculating the significantpathways with regression residues can be implemented in the R package, GSEAlm [23].
2.4.2 GAGE
GAGE or Generally Applicable Geneset Enrichment for pathway analysis is a method for gene set analysis,developed by Luo et al. [18], in which the authors claim to improve on previous GSA methods such asGSEA and PAGE[16], another popular methor for GSA . GAGE like PAGE determines the significance ofgene sets based on a parametric analysis as opposed to a method based on permutation of sample labels asis used to calculate the significance in GSEA. Some claim that GSEA has low sensitivity, while the authorsof GAGE claim that PAGE is overly sensitive.
The procedure of GAGE is outlined in [18], and will be given in brief here. As with all GSA methodsthe aim is to give a ranking and assign the significance to gene set pathways. It does this by taking intoaccount the mean fold changes of the target gene set by means of a two sample t test. PAGE in comparisonuses a z-test. The two sample t test and its degrees of freedom are defined as follows:
t =m−M√
s2
n + S2
n
(5)
df = (n− 1)(s2 + S2)2
s4 + S4(6)
Where m, s and n are the mean fold change, standard deviation and number of genes in the gene setrespectively. M and S represent the average mean fold change and standard deviation of all the genes. The
8
Robert Bentham
t test essentially compares the gene set of interest with a gene set of identical size with mean fold changeand standard deviation derived from the background.
P values can then be obtained from the t test. However GAGE combines all the p values from differentreplicates into a global P value. GAGE has two modes to compare pairs of experiment-control samples 1-1if the samples are paired or to compare the experimental samples to the average gene expression levels forthe unpaired. Since all studies used in this report were unpaired the latter case will only be discussed.
With k = 1, ...,K experimental samples and l = 1, ..., L control samples and L 6= K, the p values need tobe combined in a way where each p value is independent. If the null hypothesis is true, the p values fromthe two sample t test will follow a Uniform(0,1) distribution. Additionally it is known that the negativelog sum of K independent p-values follows a Gamma(K,1) distribution. Thus to calculate a global p-valueis simple using the gamma distribution:
P (X > x) ∼ Gamma(K, 1) (7)
The only issue therefore is constructing K independent p-values for unpaired data such as we use in thisreport, however this turns out to be fairly simple. For the first experimental sample P1 is calculated asthe average of the one on one comparison with the experimental sample to all of the control samples,in this way K independent p-values are constructed from which the negative log sum follows the gammadistribution.
x = − 1L
∑kl
logPkl (8)
Running GAGE in it’s R package is extremely simple, and automatically ranks the gene sets and correctsthe p-values for multiple testing issues. The results gained from the GAGE and GSEA analysis are discussedbelow.
2.4.3 Results
Both GSEAlm and GAGE were used to analyse the data, both implemented in R. Out of this only theresults for GAGE are presented in this report, due to problems with the GSEAlm analysis. GSEAlmpredicted that there were no significant pathways with p values less than 0.05 for the Middleton study,since the LIMMA analysis showed earlier that there were significantly expressed genes in the Middletonstudy between the Parkinson cases and the control this seems surprising and biologically unrealistic. Thiscould be explained by low sensitivity of GSEA which has been suggested in the literature [4]. Additionallythere seems to be a problem with this version of GSEA: GSEAlm. The outputs has many different pathwayswith exactly the same p value, from online resources [10] this seems to be a common feature of the program.Due to this the GSEAlm output fails to give a definitive ranking of the significance of the gene pathwaysand fails to hit pathways which are biologically relevant and so was judged unsuitable for use in this report.
The results from the GAGE analysis are given in Tables 1,2 and 3. Table 1 shows the consensus pathwaysbetween all three studies that have been significantly regulated up or down. Table 2 shows significantpathways relevant to DNA damage and stress present in each individual study, and Table 3 shows thesignificantly expressed genes in these pathways. Full implications of these results will be discussed in theconclusion.
3 Conclusion
Evidence from the bioinformatic results in this report suggest that the hypothesis given in the introductionis correct. The clearest demonstration of this is in Table 1 and 2. Table 1 shows many mitochondrialpathways down regulated as expected but also that DNA damage and stress related pathways are altered.Table 2 gives the significant pathways related to DNA damage and stress in each of the data sets analysed,this shows just how many DNA damage and stress related pathways were shown to be involved. Table
9
Robert Bentham
1(a)
GO
Term
Desc
ripti
on
GO
:0016564
transcrip
tio
nrepressor
activ
ity
GO
:0004861
cyclin-dependent
protein
kin
ase
inhib
itor
activ
ity
GO
:0007050
cell
cycle
arrest
GO
:0005540
hyalu
ronic
acid
bin
din
gG
O:0
006954
inflam
mato
ryre
sponse
GO
:0007507
heart
develo
pm
ent
GO
:0006968
cellula
rdefe
nse
response
GO
:0042326
negati
ve
regula
tion
ofphosp
hory
lati
on
GO
:0030511
posi
tive
regula
tion
oftr
ansf
orm
ing
gro
wth
facto
rbeta
recepto
rsi
gnaling
path
way
Table
1:
1(a
)and
(b)
show
all
the
signifi
cant
gen
epath
way
sfo
und
from
the
Gen
eO
nto
logy
data
base
usi
ng
GA
GE
.T
able
1(a
)sh
ows
all
the
path
way
sw
hic
hw
ere
signifi
cantl
yup
regula
ted,
while
table
(b)
show
sall
the
path
way
sth
at
wer
esi
gnifi
cantl
ydow
nre
gula
ted.
The
resu
lts
hav
eb
een
annota
ted
wit
hth
ehel
pof
Dr
Gyorg
ySza
badka
i,in
topath
way
sw
hic
hare
rela
ted
toD
NA
dam
age
and
stre
ss,
and
path
way
sre
late
dto
mit
och
ondri
al
funct
ions.
Path
way
sin
blu
ere
pre
sent
those
ass
oci
ate
dw
ith
DN
Adam
age
and
stre
ss,
while
those
inre
dare
the
mit
och
ondri
al
(PG
C-1
dep
enden
t)path
way
s.A
sca
nb
ese
enth
eup
regula
ted
path
way
sare
stro
ngly
rela
ted
toD
NA
dam
age
and
stre
ssw
hile
the
dow
nre
gula
ted
conta
inm
any
path
way
sass
oci
ate
dw
ith
mit
och
ondri
al
whic
hare
PG
C-1
dep
enden
t.T
hes
ere
sult
sco
nfirm
the
concl
usi
ons
in[1
9]
wher
ePGC
1−α
was
show
nto
be
dow
nre
gula
ted
leadin
gto
def
ects
inm
itoch
ondri
al
funct
ion,
unlike
[19],
ala
rge
met
ast
udy,PGC
1−α
was
not
show
nto
be
stati
stic
ally
signifi
cant
inth
eL
IMM
Aanaly
sis
butPGC
1−α
rela
ted
path
way
scl
earl
yare
signifi
cant
her
e.
1(b)
GO
Term
Desc
ripti
on
GO
:0006887
exocyto
sis
GO
:0007268
synapti
ctr
ansm
issi
on
GO
:0003924
GT
Pase
acti
vity
GO
:0005743
mit
ochondria
lin
ner
mem
brane
GO
:0016192
vesi
cle
-media
ted
transp
ort
GO
:0051437
posi
tive
regula
tion
ofubiq
uit
in-p
rote
inligase
acti
vity
duri
ng
mit
oti
ccell
cycle
GO
:0030426
gro
wth
cone
GO
:0031145
anaphase-prom
otin
gcom
ple
x-dependent
proteasom
al
ubiq
uit
in-dependent
protein
catabolic
process
GO
:0042416
dopam
ine
bio
synth
eti
cpro
cess
GO
:0007626
locom
oto
rybehavio
rG
O:0
001975
resp
onse
toam
pheta
min
eG
O:0
051436
negativ
eregula
tio
nofubiq
uit
in-protein
ligase
activ
ity
durin
gm
itotic
cell
cycle
GO
:0006836
neuro
transm
itte
rtr
ansp
ort
GO
:0048169
regula
tion
oflo
ng-t
erm
neuro
nalsy
napti
cpla
stic
ity
GO
:0051281
posi
tive
regula
tion
ofre
lease
ofse
quest
ere
dcalc
ium
ion
into
cyto
sol
GO
:0005759
mit
ochondria
lm
atrix
GO
:0006886
intr
acellula
rpro
tein
transp
ort
GO
:0006108
mala
te
metabolic
process
GO
:0043524
negati
ve
regula
tion
ofneuro
napopto
sis
GO
:0008344
adult
locom
oto
rybehavio
rG
O:0
000502
pro
teaso
me
com
ple
xG
O:0
043274
phosp
holipase
bin
din
gG
O:0
008198
ferr
ous
iron
bin
din
gG
O:0
005777
pero
xis
om
eG
O:0
030424
axon
GO
:0051258
pro
tein
poly
meri
zati
on
GO
:0048854
bra
inm
orp
hogenesi
sG
O:0
030666
endocyti
cvesi
cle
mem
bra
ne
GO
:0006099
tric
arboxylic
acid
cycle
GO
:0007264
small
GT
Pase
media
ted
signaltr
ansd
ucti
on
GO
:0070469
respir
atory
chain
GO
:0015992
proton
transport
GO
:0030672
synapti
cvesi
cle
mem
bra
ne
GO
:0006120
mit
ochondria
lele
ctron
transport,N
AD
Hto
ubiq
uin
one
GO
:0006626
protein
targetin
gto
mit
ochondrio
nG
O:0
006096
gly
coly
sis
GO
:0009636
resp
onse
toto
xin
GO
:0005978
gly
cogen
bio
synth
eti
cpro
cess
GO
:0016829
lyase
acti
vity
GO
:0019717
synapto
som
eG
O:0
016820
hydro
lase
acti
vity,acti
ng
on
acid
anhydri
des,
cata
lyzin
gtr
ansm
em
bra
ne
movem
ent
ofsu
bst
ances
GO
:0005747
mit
ochondria
lrespir
atory
chain
com
ple
xI
GO
:0008137
NA
DH
dehydrogenase
(ubiq
uin
one)
activ
ity
GO
:0030170
pyrid
oxalphosphate
bin
din
gG
O:0
022900
ele
ctron
transport
chain
GO
:0046933
hydrogen
ion
transportin
gAT
Psynthase
activ
ity,rotatio
nalm
echanis
mG
O:0
006091
generatio
nofprecursor
metabolites
and
energy
GO
:0000226
mic
rotu
bule
cyto
skele
ton
org
aniz
ati
on
GO
:0045263
proton-transportin
gAT
Psynthase
com
ple
x,coupling
factor
F(o)
GO
:0005838
pro
teaso
me
regula
tory
part
icle
GO
:0051289
pro
tein
hom
ote
tram
eri
zati
on
GO
:0006800
oxygen
and
reactiv
eoxygen
specie
sm
etabolic
process
GO
:0017157
regula
tion
ofexocyto
sis
GO
:0007269
neuro
transm
itte
rse
cre
tion
GO
:0019003
GD
Pbin
din
gG
O:0
042776
mit
ochondria
lAT
Psynthesis
couple
dproton
transport
GO
:0017075
synta
xin
-1bin
din
gG
O:0
007612
learn
ing
GO
:0005504
fatt
yacid
bin
din
gG
O:0
046961
proton-transportin
gAT
Pase
activ
ity,rotatio
nalm
echanis
mG
O:0
015078
hydro
gen
ion
transm
em
bra
ne
transp
ort
er
acti
vity
GO
:0006413
transl
ati
onalin
itia
tion
GO
:0048488
synapti
cvesi
cle
endocyto
sis
GO
:0051246
regula
tion
ofpro
tein
meta
bolic
pro
cess
GO
:0044262
cellula
rcarb
ohydra
tem
eta
bolic
pro
cess
GO
:0005883
neuro
fila
ment
GO
:0030234
enzym
ere
gula
tor
acti
vity
GO
:0009055
ele
ctron
carrie
ractiv
ity
GO
:0019787
small
conju
gati
ng
pro
tein
ligase
acti
vity
GO
:0051287
NA
Dor
NA
DH
bin
din
gG
O:0
051536
iron-sulfur
clu
ster
bin
din
gG
O:0
004298
thre
onin
e-t
ype
endopepti
dase
acti
vity
10
Robert Bentham
GO
Term
Desc
ripti
on
GO
:0016564
transc
ripti
on
repre
ssor
acti
vit
yG
O:0
000122
negati
ve
regula
tion
of
transc
ripti
on
from
RN
Ap
oly
mera
seII
pro
mote
rG
O:0
007050
cell
cycle
arr
est
GO
:0000080
G1
phase
of
mit
oti
ccell
cycle
GO
:0016563
transc
ripti
on
acti
vato
racti
vit
yG
O:0
004861
cycli
n-d
ep
endent
pro
tein
kin
ase
inhib
itor
acti
vit
yG
O:0
032582
negati
ve
regula
tion
of
gene-s
pecifi
ctr
ansc
ripti
on
GO
:0006968
cellula
rdefe
nse
resp
onse
(a)
Sig
nifi
cantl
yup
regula
ted
GO
path
way
sre
late
dto
stre
ss/D
NA
dam
-age
inth
eM
iddle
ton
study
GO
Term
Desc
ripti
on
GO
:0051437
posi
tive
regula
tion
of
ubiq
uit
in-p
rote
inligase
acti
vit
yduri
ng
mit
oti
ccell
cycle
(b)
Sig
nifi
cantl
ydow
nre
gula
ted
GO
path
way
sre
late
dto
stre
ss/D
NA
dam
age
inth
eM
iddle
ton
study
GO
Term
Desc
ripti
on
GO
:0000122
negati
ve
regula
tion
of
transc
ripti
on
from
RN
Ap
oly
mera
seII
pro
mote
rG
O:0
016564
transc
ripti
on
repre
ssor
acti
vit
yG
O:0
004861
cyclin-d
ep
endent
pro
tein
kin
ase
inhib
itor
acti
vit
yG
O:0
043065
posi
tive
regula
tion
of
ap
opto
sis
GO
:0030530
hete
rogeneous
nucle
ar
rib
onucle
opro
tein
com
ple
xG
O:0
007050
cell
cycle
arr
est
GO
:0000123
his
tone
acety
ltra
nsf
era
secom
ple
xG
O:0
045941
posi
tive
regula
tion
of
transc
ripti
on
GO
:0032582
negati
ve
regula
tion
of
gene-s
pecifi
ctr
ansc
ripti
on
GO
:0003705
RN
Ap
oly
mera
seII
transc
ripti
on
facto
racti
vit
y,
enhancer
bin
din
gG
O:0
000118
his
tone
deacety
lase
com
ple
xG
O:0
000060
pro
tein
imp
ort
into
nucle
us,
transl
ocati
on
GO
:0008285
negati
ve
regula
tion
of
cell
pro
life
rati
on
GO
:0016563
transc
ripti
on
acti
vato
racti
vit
yG
O:0
003676
nucle
icacid
bin
din
gG
O:0
006968
cellula
rdefe
nse
resp
onse
GO
:0043966
his
tone
H3
acety
lati
on
GO
:0042771
DN
Adam
age
resp
onse
,si
gnal
transd
ucti
on
by
p53
cla
ssm
edia
tor
resu
ltin
gin
inducti
on
of
ap
opto
sis
GO
:0008656
casp
ase
acti
vato
racti
vit
yG
O:0
006357
regula
tion
of
transc
ripti
on
from
RN
Ap
oly
mera
seII
pro
mote
rG
O:0
005667
transc
ripti
on
facto
rcom
ple
xG
O:0
043066
negati
ve
regula
tion
of
ap
opto
sis
GO
:0003727
single
-str
anded
RN
Abin
din
gG
O:0
003714
transc
ripti
on
core
pre
ssor
acti
vit
yG
O:0
006950
resp
onse
tost
ress
GO
:0008630
DN
Adam
age
resp
onse
,si
gnal
transd
ucti
on
resu
ltin
gin
inducti
on
of
ap
opto
sis
GO
:0045893
posi
tive
regula
tion
of
transc
ripti
on,
DN
A-d
ep
endent
GO
:0006309
DN
Afr
agm
enta
tion
involv
ed
inap
opto
sis
GO
:0006978
DN
Adam
age
resp
onse
,si
gnal
transd
ucti
on
by
p53
cla
ssm
edia
tor
resu
ltin
gin
transc
ripti
on
of
p21
cla
ssm
edia
tor
GO
:0003950
NA
D+
AD
P-r
ibosy
ltra
nsf
era
seacti
vit
yG
O:0
003690
double
-str
anded
DN
Abin
din
gG
O:0
008284
posi
tive
regula
tion
of
cell
pro
life
rati
on
GO
:0048384
reti
noic
acid
recepto
rsi
gnaling
path
way
GO
:0006281
DN
Are
pair
(c)
Sig
nifi
cantl
yup
regula
ted
GO
path
way
sre
late
dto
stre
ss/D
NA
dam
-age
inth
eM
ora
nst
udy
GO
Term
Desc
ripti
on
GO
:0016564
transc
ripti
on
repre
ssor
acti
vit
yG
O:0
045892
negati
ve
regula
tion
of
transc
ripti
on,
DN
A-d
ep
endent
GO
:0032583
regula
tion
of
gene-s
pecifi
ctr
ansc
ripti
on
GO
:0003704
specifi
cR
NA
poly
mera
seII
transc
ripti
on
facto
racti
vit
yG
O:0
004861
cyclin-d
ep
endent
pro
tein
kin
ase
inhib
itor
acti
vit
yG
O:0
010553
negati
ve
regula
tion
of
gene-s
pecifi
ctr
ansc
ripti
on
from
RN
Ap
oly
mera
seII
pro
mote
rG
O:0
043433
negati
ve
regula
tion
of
transc
ripti
on
facto
racti
vit
yG
O:0
016566
specifi
ctr
ansc
ripti
onal
repre
ssor
acti
vit
yG
O:0
005667
transc
ripti
on
facto
rcom
ple
xG
O:0
035257
nucle
ar
horm
one
recepto
rbin
din
gG
O:0
043984
his
tone
H4-K
16
acety
lati
on
GO
:0005694
chro
moso
me
GO
:0043966
his
tone
H3
acety
lati
on
GO
:0042800
his
tone
meth
ylt
ransf
era
seacti
vit
y(H
3-K
4sp
ecifi
c)
GO
:0030530
hete
rogeneous
nucle
ar
rib
onucle
opro
tein
com
ple
xG
O:0
006950
resp
onse
tost
ress
GO
:0007050
cell
cycle
arr
est
GO
:0000118
his
tone
deacety
lase
com
ple
xG
O:0
003714
transc
ripti
on
core
pre
ssor
acti
vit
yG
O:0
000084
Sphase
of
mit
oti
ccell
cycle
GO
:0008656
casp
ase
acti
vato
racti
vit
yG
O:0
016581
NuR
Dcom
ple
xG
O:0
006260
DN
Are
plicati
on
GO
:0045941
posi
tive
regula
tion
of
transc
ripti
on
GO
:0006338
chro
mati
nre
modeling
GO
:0016605
PM
Lb
ody
GO
:0010552
posi
tive
regula
tion
of
gene-s
pecifi
ctr
ansc
ripti
on
from
RN
Ap
oly
mera
seII
pro
mote
rG
O:0
012501
pro
gra
mm
ed
cell
death
GO
:0006281
DN
Are
pair
GO
:0044428
nucle
ar
part
GO
:0045767
regula
tion
of
anti
-ap
opto
sis
(d)
Sig
nifi
cantl
yup
regula
ted
GO
path
way
sre
late
dto
stre
ss/D
NA
dam
age
inth
eM
ullen
study
Tab
le2:
Gen
eP
athw
ays
rela
ted
tost
ress
/DN
Ada
mag
eth
atha
vesi
gnifi
cant
lybe
enup
ordo
wn
regu
late
dw
ith
pva
lues<
0.05
.
11
Robert Bentham
Gene Name logFC Adjusted P valueGAS1 1.094643 2.055961E-02
BANF1 0.541628 2.219783E-02DNAJB6 1.241308 2.219783E-02MYST3 0.484861 2.349617E-02HSPA1L 0.766496 2.784302E-02PHF21A 0.503459 2.784302E-02
INSR 0.759715 2.934864E-02HNRNPH3 0.552047 2.934864E-02
CXXC1 0.430215 2.934864E-02CUL2 -0.672837 2.943436E-02
TRIM28 0.372411 3.553511E-02KAT2A 0.567463 3.926393E-02HBP1 0.491259 4.561574E-02
HNRNPH3 0.548458 4.884567E-02PHF15 0.873731 4.901984E-02
(a) Genes in the Mullen study with P values < 0.05 ingene pathways related to DNA damage and stress
Gene Name Moran Middleton MullenDNAJB6 ! % !
HSPA1L ! % !
PHF21A ! % !
INSR ! % !
HNRNPH3 ! % !
CXXC1 ! % !
CUL2 ! % !
KAT2A ! % !
HBP1 ! % !
PHF15 ! % !
MBD3 ! ! %
IKBKB ! ! %
MAP3K11 % ! %
TCIRG1 % ! %
FOXO1 % ! %
GAS1 % % !
BANF1 % % !
MYST3 % % !
TRIM28 % % !
(b) Genes which are significant (P value < 0.05) in mul-tiple studies. Most significant genes in the Moran studyomitted here and fully given in Appendix A
Gene Name logFC Adjusted P valueMBD3 0.588015 2.692973E-02
MAP3K11 0.656159 4.069257E-02IKBKB 0.211414 4.160097E-02TCIRG1 0.509469 4.639875E-02FOXO1 0.528559 4.903848E-02
(c) Genes in the Middleton study with P values < 0.05in gene pathways related to DNA damage and stress
Table 3: Tables showing significant genes in pathways related to DNA damage and stress. a) and c) show significantgenes in the Mullen and Middleton study. While b) shows which genes are significant in multiple studies. 381 genesrelated to DNA damage and stress pathways were significant in the Moran study and are fully listed in Appendix A.
12
Robert Bentham
3 then shows significant genes involved in these DNA damage related pathways, and which genes weresignificant in more than one of the datasets. These significant genes could be useful in finding a newtherapeutic target for Parkinson’s disease.
A larger study or a meta study with more microarray data would give a clearer picture of the genesand pathways that have been up or down regulated in comparison to the fairly noisy one presented inthis study. However despite the relatively small sample sizes and accompanying noise, the overall trendof down regulated mitochondria pathways and altered DNA damage and stress pathways is clear. Moreresearch in the interface of these two areas and the role of PCG-1 in Parkinson’s disease would heightenour understanding and develop new approaches for the treatment of Parkinson’s disease.
13
Robert Bentham
References
[1] T.E. Bartlett. Bioinformatic analysis of the interface between mitochondrial biogenesis and apoptoticcell death signalling pathways in cancer. Mres Summer Project, 2011.
[2] Kevin Bryson. private communication, 2012.
[3] M.J. Devine, H. Plun-Favreau, and N.W. Wood. Parkinson’s disease and cancer: two wars, one front.Nature Reviews Cancer, 11(11):812–823, 2011.
[4] I. Dinu, J. Potter, T. Mueller, Q. Liu, A. Adewale, G. Jhangri, G. Einecke, K. Famulski, P. Halloran,and Y. Yasui. Improving gene set analysis of microarray data by sam-gs. BMC bioinformatics,8(1):242, 2007.
[5] M.R. Duchen and G. Szabadkai. Roles of mitochondria in human disease. Essays Biochem, 47:115–137,2010.
[6] GEO. http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE7621, 2007. Ac-cessed: 28/02/2012.
[7] GEO. http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE8397, 2008. Ac-cessed: 28/02/2012.
[8] GEO. http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE20292, 2010. Ac-cessed: 28/02/2012.
[9] GEO. http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE24378, 2011. Ac-cessed: 28/02/2012.
[10] Daniel Gusenleitne. Gene set enrichment analysis (gsealm)tutorial. http://bcb.dfci.harvard.edu/~aedin/courses/cccb-introduction-to-r-and-bioconductor-may-2011/tutorial.pdf. Ac-cessed: 28/02/2012.
[11] R.A. Irizarry, B. Hobbs, F. Collin, Y.D. Beazer-Barclay, K.J. Antonellis, U. Scherf, and T.P. Speed.Exploration, normalization, and summaries of high density oligonucleotide array probe level data.Biostatistics, 4(2):249, 2003.
[12] D.K. Jeppesen, V.A. Bohr, and T. Stevnsner. Dna repair deficiency in neurodegeneration. Progressin Neurobiology, 2011.
[13] Z. Jiang and R. Gentleman. Extensions to gene set enrichment. Bioinformatics, 23(3):306, 2007.
[14] A.W.E. Jones, Z. Yao, J.M. Vicencio, A. Karkucinska-Wieckowska, and G. Szabadkai. Pgc-1 familycoactivators and cell fate: Roles in cancer, neurodegeneration, cardiovascular disease and retrogrademitochondria-nucleus signalling. Mitochondrion, 2011.
[15] Audrey Kauffmann, Robert Gentleman, and Wolfgang Huber. arrayqualitymetrics–a bioconductorpackage for quality assessment of microarray data. Bioinformatics, 25(3):415–6, 2009.
[16] S.Y. Kim and D. Volsky. Page: parametric analysis of gene set enrichment. BMC bioinformatics,6(1):144, 2005.
[17] T.G. Lesnick, S. Papapetropoulos, D.C. Mash, J. Ffrench-Mullen, L. Shehadeh, M. De Andrade, J.R.Henley, W.A. Rocca, J.E. Ahlskog, and D.M. Maraganore. A genomic pathway approach to a complexdisease: axon guidance and parkinson disease. PLoS genetics, 3(6):98, 2007.
14
Robert Bentham
[18] W. Luo, M. Friedman, K. Shedden, K. Hankenson, and P. Woolf. Gage: generally applicable gene setenrichment for pathway analysis. BMC bioinformatics, 10(1):161, 2009.
[19] J.K. McGill and M.F. Beal. Pgc-1 α], a new therapeutic target in huntington’s disease? Cell,127(3):465–468, 2006.
[20] M. Miron and R. Nadon. Inferential literacy for experimental high-throughput biology. Trends inGenetics, 22(2):84–89, 2006.
[21] LB Moran, DC Duke, M. Deprez, DT Dexter, R.K.B. Pearce, and MB Graeber. Whole genomeexpression profiling of the medial and lateral substantia nigra in parkinson’s disease. Neurogenetics,7(1):1–11, 2006.
[22] A.P. Oron, Z. Jiang, and R. Gentleman. Gene set enrichment analysis using linear models anddiagnostics. Bioinformatics, 24(22):2586–2591, 2008.
[23] Assaf Oron, Robert Gentleman (with contributions from S. Falcon, and Z. Jiang). GSEAlm: LinearModel Toolset for Gene Set Enrichment Analysis. R package version 1.8.0.
[24] G. Smyth. Limma: linear models for microarray data. Bioinformatics and computational biologysolutions using R and Bioconductor, pages 397–420, 2005.
[25] Gordon K. Smyth. Limma: linear models for microarray data. In R. Gentleman, V. Carey, S. Dudoit,and W. Huber R. Irizarry, editors, Bioinformatics and Computational Biology Solutions using R andBioconductor, pages 397–420. Springer, New York, 2005.
[26] A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich,S.L. Pomeroy, T.R. Golub, E.S. Lander, et al. Gene set enrichment analysis: a knowledge-basedapproach for interpreting genome-wide expression profiles. Proceedings of the National Academy ofSciences of the United States of America, 102(43):15545, 2005.
[27] L. Tian, S.A. Greenberg, S.W. Kong, J. Altschuler, I.S. Kohane, and P.J. Park. Discovering statis-tically significant pathways in expression profiling studies. Proceedings of the National Academy ofSciences of the United States of America, 102(38):13544, 2005.
[28] Y. Zhang, M. James, F.A. Middleton, and R.L. Davis. Transcriptional analysis of multiple brain re-gions in parkinson’s disease supports the involvement of specific protein processing, energy metabolism,and signaling pathways, and suggests novel disease mechanisms. American Journal of Medical GeneticsPart B: Neuropsychiatric Genetics, 137(1):5–16, 2005.
15
Robert Bentham
Appendices
A Tables
Gene name logFC Adjusted P value Gene name logFC Adjusted P value Gene name logFC Adjusted P valueCUX2 -0.957411 3.110158E-08 FGFR1 0.220495 3.832065E-03 MYST1 0.147059 1.765914E-02NR4A2 -1.206677 2.511338E-07 ORC2L -0.238690 3.850204E-03 KRT18 -0.293551 1.768079E-02
YTHDC2 -0.956063 5.114295E-07 SIRT2 0.337159 3.852516E-03 ERI3 -0.357999 1.789257E-02PSEN2 -0.597398 6.201595E-07 DRD2 -0.323548 3.898446E-03 PFDN5 0.257103 1.796585E-02RBM9 -0.691800 9.260810E-07 MAFF 0.312910 3.917795E-03 DLG3 -0.201248 1.810632E-02ICMT -0.454469 1.077351E-06 VEZF1 0.327521 4.006702E-03 APEX1 -0.229858 1.863233E-02DRD2 -0.833074 1.736227E-06 C1D -0.376084 4.010220E-03 WARS -0.465903 1.888945E-02SUB1 -1.168283 2.409076E-06 SMARCA4 -0.421016 4.029591E-03 MAB21L1 -0.242650 1.902206E-02
NR4A2 -1.305759 2.505296E-06 NDRG4 -0.964717 4.030569E-03 CUL3 -0.280591 1.915695E-02PBX1 -1.030599 3.204438E-06 KDM5A 0.359093 4.083478E-03 CDKN1C 0.580427 1.927160E-02
ATP8A2 -0.757973 9.099608E-06 TPD52L1 0.553849 4.128617E-03 DUSP10 0.227539 1.970679E-02MED24 -0.337476 1.521774E-05 RPS9 0.213239 4.145544E-03 RPS9 0.331523 1.970843E-02FABP7 -0.948737 1.784250E-05 DEAF1 -0.456666 4.187411E-03 PPM1D 0.292983 1.991855E-02PAN2 0.587286 1.862470E-05 TFEB 0.377877 4.333228E-03 SMARCC1 0.311631 1.996298E-02
BASP1 -1.101985 2.422178E-05 ZC3H7B 0.323857 4.454371E-03 ATRX -0.252841 2.003025E-02RNF14 -0.762596 2.657807E-05 NAB1 -0.390073 4.481002E-03 CAPN10 0.171640 2.032344E-02MXI1 0.426898 2.777041E-05 RPS4X 0.301142 4.481002E-03 IKBKB 0.233266 2.032344E-02
OBFC1 -0.345506 3.074203E-05 INSR 0.571061 4.494150E-03 HSPA1L 0.337327 2.060335E-02NR4A2 -0.858372 3.837342E-05 MED7 -0.229032 4.634760E-03 RNF14 -0.613698 2.060335E-02DRD2 -0.528345 4.053527E-05 RASSF1 0.217266 4.686959E-03 FOXO4 0.261129 2.123364E-02
ZBTB16 0.804387 4.154816E-05 SAP30 0.370421 4.750235E-03 CHD3 -0.231212 2.129811E-02CASC3 0.459059 4.227519E-05 CIAO1 -0.233149 4.828717E-03 CIZ1 0.198300 2.132111E-02HLTF -0.610986 4.407303E-05 CDKN1C 0.630543 4.884169E-03 RAF1 0.297875 2.213752E-02RNF10 -0.363115 4.492968E-05 LRCH4 0.240531 4.909274E-03 MBD3 0.232923 2.216529E-02HBP1 0.430443 6.173774E-05 PHF17 0.288786 5.008478E-03 CDKN2C 0.191727 2.217483E-02TOB2 0.625310 7.242554E-05 RYBP 0.437083 5.008588E-03 DNAJA2 -0.565387 2.217483E-02ODZ1 -0.774075 8.018387E-05 AGGF1 -0.357470 5.177286E-03 PTBP1 0.312594 2.278352E-02
FOXA1 -0.493710 1.082717E-04 HTATIP2 0.260162 5.269930E-03 WARS -0.405446 2.320651E-02SIN3B -0.362442 1.132238E-04 YWHAB -0.278063 5.304984E-03 TFDP2 -0.173172 2.323106E-02SORT1 0.567520 1.433114E-04 RAN -0.494002 5.435535E-03 ZCCHC14 0.311412 2.339757E-02FABP7 -0.856448 1.466798E-04 ABCA2 0.597212 5.484714E-03 CREBBP 0.273296 2.360719E-02RBM9 -0.790446 1.526903E-04 RBM9 -0.576966 5.580142E-03 ZNF274 0.205700 2.373376E-02
DNAJB6 1.030789 1.622180E-04 YWHAB -0.526150 5.645709E-03 PRKAR1A -0.349559 2.384159E-02AZGP1 1.429307 1.780275E-04 SMARCA4 -0.309234 5.717543E-03 MLH1 -0.151832 2.420398E-02
R3HDM1 -0.407426 1.873502E-04 RBMS1 -0.331339 5.727718E-03 CUL2 -0.244754 2.434984E-02PKNOX2 -0.323459 1.917057E-04 TARDBP 0.366918 5.785979E-03 DLG3 -0.186573 2.493001E-02DNAJB2 0.566402 1.917057E-04 MAPK1 -0.465367 5.900381E-03 RTEL1 0.097418 2.494372E-02MAPK9 -0.697986 1.950452E-04 C11orf9 0.530671 5.954850E-03 FOXA2 -0.248195 2.496548E-02RXRA 0.486204 2.353479E-04 TPD52L1 0.708256 6.003614E-03 NBN -0.389994 2.509963E-02SCFD1 -0.435021 2.383114E-04 ZC3H11A 0.286946 6.123835E-03 FOXL2 0.129132 2.518111E-02PSEN2 -0.360370 2.443693E-04 SMARCC1 0.298454 6.163572E-03 DBP -0.215670 2.519681E-02TRAK1 -0.334989 2.905580E-04 KDM4B 0.215654 6.163572E-03 ARNTL 0.187634 2.555248E-02LRCH4 0.280912 3.346337E-04 PHB 0.351591 6.179972E-03 CLEC11A 0.211593 2.560748E-02
ATR -0.487612 3.365620E-04 FTH1 0.333135 6.211837E-03 CDKN1C 0.592224 2.578780E-02PBX1 -0.977497 3.587020E-04 FOXO3 0.414799 6.499113E-03 ENPP2 0.411463 2.582428E-02SCG2 -1.678182 3.704575E-04 CAND1 -0.421703 6.700786E-03 ASH2L -0.208881 2.603854E-02SIN3B -0.458179 3.775224E-04 RBMS1 -0.379899 6.708206E-03 EIF1 0.184540 2.616135E-02TXNIP 0.763070 3.835557E-04 EIF1 0.227191 6.810593E-03 OLIG2 0.361263 2.620637E-02TCF12 0.521666 3.865018E-04 TIPARP 0.423771 6.953670E-03 CTBP2 0.186639 2.696214E-02TCF25 -0.371687 4.053739E-04 ADARB2 0.367469 7.017789E-03 TBL1X 0.182123 2.738215E-02MYT1L -1.093676 4.053739E-04 ING1 0.186267 7.034542E-03 PEX14 -0.219804 2.757462E-02DRD2 -0.351110 4.269094E-04 AHSA1 0.583125 7.146545E-03 CUL4B -0.190014 2.793711E-02
PPP2R5C -0.556722 4.727540E-04 PTPRU -0.249405 7.167527E-03 DDX23 0.216878 2.814519E-02MTMR15 0.426398 4.882360E-04 CNOT8 0.349014 7.291980E-03 GAS7 0.281240 2.832397E-02
CUL2 -0.204652 5.021639E-04 SOX10 0.519145 7.291980E-03 EXOG -0.243930 2.834072E-02PIAS2 -0.439832 5.083587E-04 KPNB1 -0.304317 7.508054E-03 NFE2 0.171944 2.899066E-02
SMARCA4 -0.523213 5.326174E-04 HUS1 -0.151773 7.508054E-03 ZNF143 0.249454 2.919955E-02HTR2A -0.603349 5.429389E-04 TRAK1 -0.360190 7.714198E-03 SERTAD2 0.285793 2.950488E-02
HNRNPA0 -0.394298 5.512777E-04 C16orf5 0.480987 7.784919E-03 CAPNS1 -0.265057 2.958784E-02NFKBIA 0.848332 5.735661E-04 SMARCD3 -0.376585 7.784919E-03 HRAS -0.315547 3.000099E-02AZGP1 0.876736 5.963782E-04 HR -0.251331 7.966696E-03 ZNF862 0.168448 3.027087E-02USP21 0.384855 6.023984E-04 FADS1 -0.339560 7.966696E-03 PRKRIR -0.197368 3.055533E-02ATF4 0.493282 6.129153E-04 SERP1 0.493409 8.013586E-03 PINK1 -0.331692 3.103766E-02
UCHL1 -1.130250 6.129153E-04 MTDH -0.430631 8.035706E-03 CAT 0.316974 3.121511E-02ATR -0.197517 6.612463E-04 CDC7 -0.229709 8.044592E-03 SQSTM1 0.298721 3.255658E-02
PHF21A 0.360997 6.661077E-04 BCL6 0.684969 8.434040E-03 ELF2 0.232207 3.255658E-02SGK1 0.808574 7.873610E-04 CTBP2 0.301135 8.446550E-03 CALCOCO1 0.235879 3.285726E-02P2RX7 0.851832 7.873610E-04 TCEB1 -0.200684 8.638273E-03 NCOA6 0.167214 3.374271E-02FOXA2 -0.297080 7.874555E-04 CA2 0.830987 8.686105E-03 HTATIP2 0.370030 3.377235E-02SIAH2 0.263142 7.961976E-04 YBX1 0.380355 8.805353E-03 PARP4 0.253744 3.379494E-02CXXC1 0.304683 8.022894E-04 EIF1 0.202097 9.381447E-03 PRKCZ -0.380026 3.393437E-02TOB2 0.803282 8.116605E-04 RPS4X 0.257787 9.594234E-03 SP1 0.133998 3.402302E-02MXD4 0.330607 8.430294E-04 BECN1 -0.358866 9.722317E-03 IGF1R 0.663056 3.447343E-02
PRKDC -0.419466 8.506714E-04 MKRN1 -0.419459 9.889990E-03 SERTAD2 0.319964 3.451420E-02MUS81 0.231609 8.752821E-04 CTCF 0.281895 9.943180E-03 ADRA2A -0.233688 3.482569E-02PHF15 0.480905 9.156354E-04 HNRNPD -0.251050 9.943782E-03 SF3A3 -0.182652 3.499074E-02KIFAP3 -0.812213 1.086406E-03 NME1 -0.563889 9.944915E-03 PTPRF -0.222122 3.531795E-02RAD17 -0.225555 1.125597E-03 SIRT4 0.223675 1.003110E-02 PIAS2 -0.083276 3.626421E-02KRAS -0.511846 1.139155E-03 CCK -0.777129 1.067137E-02 SSR1 -0.285912 3.632779E-02
MKNK2 1.029514 1.139155E-03 TMEM161A 0.169364 1.071402E-02 HTR2A -0.210176 3.670744E-02PNKP -0.365547 1.233909E-03 CAT 0.447409 1.085852E-02 SMARCD2 0.190971 3.749642E-02
GADD45G 0.441461 1.254979E-03 SAP30 0.387418 1.102105E-02 USP22 -0.230039 3.773870E-02CAND2 -0.326311 1.278468E-03 ZC3H15 -0.401399 1.114595E-02 LUC7L3 0.364510 3.776656E-02
NDN -0.469761 1.292416E-03 TXNIP 0.783723 1.122945E-02 SMARCA4 -0.351617 3.785179E-02ZCCHC24 0.446347 1.357363E-03 APC -0.443112 1.123172E-02 LITAF 0.273056 3.849488E-02TRMT11 -0.427785 1.388872E-03 BRPF1 0.233774 1.150433E-02 RPS4X 0.169256 3.864644E-02
PTN -0.449501 1.388872E-03 FTSJ2 -0.196468 1.150433E-02 IL6ST 0.300552 3.947735E-02
16
Robert Bentham
GAS7 0.422373 1.401236E-03 RPS6 0.245071 1.218943E-02 FTSJ1 -0.269207 3.952213E-02CXXC1 0.223148 1.463128E-03 NFKB1 0.235819 1.218943E-02 NOTCH1 0.235740 3.976578E-02KDM1A -0.226117 1.477855E-03 HNRNPH3 0.286614 1.247986E-02 NFX1 0.136595 3.977656E-02BTG1 0.659258 1.477855E-03 REXO4 0.230182 1.258635E-02 APBB2 0.185202 4.005845E-02TXNIP 0.769755 1.512227E-03 PRPF19 -0.214907 1.265220E-02 DBC1 -0.355175 4.005845E-02PBX1 -0.403184 1.539236E-03 PTBP1 0.456295 1.279337E-02 PIAS4 0.265963 4.073692E-02NOP2 0.304194 1.574186E-03 HNRNPH3 0.415568 1.296613E-02 TRAP1 -0.162923 4.076317E-02
YWHAB -0.496887 1.577064E-03 HNRNPH3 0.401294 1.296613E-02 STK3 0.256124 4.077380E-02TCEA2 -0.538326 1.586035E-03 BCL2 0.419075 1.307179E-02 RB1CC1 -0.238072 4.137764E-02
PHLDA3 0.233626 1.586035E-03 CTBP2 0.420036 1.307265E-02 KDM4B 0.141568 4.174237E-02ZNF24 0.349494 1.598659E-03 DFFB 0.228604 1.325943E-02 PMS2L3 -0.191117 4.193934E-02MXD4 0.287742 1.619508E-03 SUB1 -0.344900 1.340769E-02 HINFP 0.219310 4.205649E-02STS -0.526712 1.923440E-03 PTBP1 0.408555 1.360090E-02 PTBP1 0.291009 4.218885E-02
HNRNPF 0.435092 1.940266E-03 ZHX2 0.249500 1.393329E-02 KHDRBS1 -0.275841 4.314293E-02TASP1 -0.205034 2.116698E-03 PTN -0.406477 1.425780E-02 BARD1 0.226402 4.330154E-02SESN1 0.317038 2.167532E-03 HDAC3 -0.152992 1.427661E-02 MXD4 0.384053 4.344766E-02TGIF1 0.413216 2.291665E-03 RPS6 0.179358 1.459720E-02 KPNB1 -0.194875 4.373219E-02
ATP8A2 -0.947873 2.312591E-03 TBP -0.133868 1.462089E-02 CAND1 -0.319295 4.409346E-02CHD3 -0.319365 2.413011E-03 NARS -0.231926 1.463586E-02 SETMAR 0.202307 4.425921E-02
HDAC1 0.443182 2.416774E-03 NFE2L2 0.447991 1.470688E-02 ATF3 0.289605 4.451829E-02EIF1 0.305392 2.455681E-03 MAFF 1.314520 1.482125E-02 CDC123 -0.195283 4.469899E-02
HNRNPH3 0.443930 2.456575E-03 CHD1L 0.256668 1.512965E-02 TAF9 -0.341545 4.469899E-02SIRT3 -0.315936 2.462529E-03 ASNS -0.651617 1.520338E-02 MAPK9 -0.295685 4.526086E-02
SMARCA4 -0.366475 2.571390E-03 DDX39 0.380958 1.523832E-02 ST18 0.501593 4.539651E-02CAND1 -0.295231 2.644765E-03 NARS2 -0.262346 1.547714E-02 NPAT -0.197968 4.600492E-02HTRA2 -0.269759 2.680301E-03 CIZ1 0.215167 1.547714E-02 RELA 0.228300 4.623955E-02TGFB3 0.443781 2.806006E-03 SMARCC1 0.260351 1.553457E-02 CDKN1C 0.583804 4.636397E-02SATB1 -0.472312 2.811070E-03 LRCH4 0.224864 1.555004E-02 BRD1 0.138380 4.648539E-02
ARID5B 0.381866 2.857923E-03 TMEM204 0.309821 1.561598E-02 HDAC4 0.204620 4.663850E-02RING1 0.174505 2.864779E-03 L3MBTL 0.210720 1.569126E-02 CCND1 -0.441077 4.689704E-02KAT2A 0.293493 2.902642E-03 EDNRB -0.924036 1.570343E-02 CIZ1 0.169355 4.735165E-02
CDKN2C 0.433614 3.136708E-03 MEIS2 0.374867 1.593708E-02 BRD7 0.260610 4.755995E-02BNIP3L 0.279732 3.194006E-03 SRCAP 0.143141 1.630483E-02 PHF16 0.205263 4.785161E-02MCTS1 -0.452188 3.226521E-03 NCOR1 0.215121 1.654489E-02 KCNMA1 0.267207 4.785472E-02GAS7 0.300674 3.336826E-03 PBXIP1 0.410710 1.674284E-02 PFDN5 0.190537 4.824541E-02
ZNF282 0.206052 3.484791E-03 AARSD1 -0.230421 1.677547E-02 HNRNPD -0.328193 4.874030E-02SMAD2 -0.327491 3.649291E-03 YTHDC2 -0.209308 1.692677E-02 RNASE4 0.209620 4.923652E-02ZNF423 -0.464177 3.728742E-03 YBX1 0.451395 1.694194E-02 ZEB1 -0.306020 4.923652E-02RBMS1 -0.340321 3.728742E-03 ZFP161 0.208603 1.714257E-02 PATZ1 0.186908 4.941602E-02
APC -0.279449 3.789623E-03 BCL2L13 -0.314110 1.744508E-02 CDKN1C 0.542014 4.942412E-02
Genes in the Moran study with P values < 0.05 in gene pathways related to DNA damage and stress
B R code
Below is the R code for this report showing the major steps, as used for the Middleton dataset.
Quality Control
1 #Quality control for Middleton Study, step 1 make sure CEL files are in wd and load them into R
23 library("affy")
4 library("arrayQualityMetrics")
5 library("limma")
67 Middleton<-ReadAffy();
89 #Define pheno_data for AffyBatch to include PD/C info
10 Middleton_Status<-c("C",rep("PD",7),"C","PD",rep("C",7),"PD","C","C","PD","PD",rep("C",7));
11 Middleton_pheno_data<-new("AnnotatedDataFrame",data=data.frame(sample=c(1:17),Status=Middleton_Status));
12 sampleNames(Middleton_pheno_data)<-list.celfiles();
13 phenoData(Middleton)<-parkinson_pheno_data;
1415 #Calculate and plot RNA degradation graph
16 Middleton_degrade<-AffyRNAdeg(Middleton,log.it=TRUE);
17 plotAffyRNAdeg(Middleton_degrade,transform="shift.scale");
1819 #Plot MvA plot PreNorm
20 Middleton_controls<-which(Middleton_Status=="C");
21 Middleton_park<-which(Middleton_Status=="PD");
22 mva.pairs(exprs(Middleton[,Middleton_controls[1:9]]),log.it=TRUE,plot.method="smoothScatter");
23 mva.pairs(exprs(Middleton[,Middleton_controls[10:18]]),log.it=TRUE,plot.method="smoothScatter");
24 mva.pairs(exprs(Middleton[,Middleton_park]),log.it=TRUE,plot.method="smoothScatter");
2526 #Note for full quality analysis use arrayQualityMetrics package:
27 # arrayQualityMetrics(expressionset = Middleton, outdir = "Middleton_QAraw", force = FALSE, do.logtransform = TRUE, intgroup = fac)
Normalisation, Quality Control and LIMMA Analysis
1 # Normalisation and postNorm quality control and LIMMA analysis
234 #Normalise using RMA
5 Middleton_normed<-rma(Middleton);
17
Robert Bentham
67 #Do post normalisation Quality control
8 mva.pairs(exprs(Middleton_normed[,Middleton_controls[1:9]]),log.it=TRUE,plot.method="smoothScatter");
9 mva.pairs(exprs(Middleton_normed[,Middleton_controls[10:18]]),log.it=TRUE,plot.method="smoothScatter");
10 mva.pairs(exprs(Middleton_normed[,Middleton_park]),log.it=TRUE,plot.method="smoothScatter");
1112 #Note for full quality analysis use arrayQualityMetrics package:
13 # arrayQualityMetrics(expressionset = Middleton_normed, outdir = "Middleton_QAnorm", force = FALSE, do.logtransform = TRUE, intgroup = fac)
1415 #Continue with LIMMA analysis - Create design matrix
16 Middleton_design<-model.matrix(~Middleton_normed$Status);
17 colnames(Middleton_design)<-c("C","PDvsC")
1819 #Run lmFit and eBayes
20 Middleton_fit<-lmFit(Middleton_normed,Middleton_design);
21 Middleton_fit<-eBayes(Middleton_fit);
2223 # Do multiple hypothesis adjustment
24 Middleton_top=topTable(Middleton_fit,coef="PDvsC",adjust="BH",number=nrow(Middleton_normed));
25 Middleton_results<-decideTests(Middleton_fit,adjust.method="fdr",p.value=0.05);
262728 #Write significant genes to file
29 Middleton_sig<-rownames(Middleton_results)[which(as.integer(Middleton_results[,2])!=0)];
30 write(Middleton_sig,file="Middleton_sig_genes.txt");
Gene Set analysis
GSEAlm
1 #GSEAlm method for Gene Set Analysis for pathways from Gene Ontology (GO) database
23 library(genefilter)
4 library(hgu133a.db) #Check with annotation() if this is right, note for Mullen need hgu133aplus2.db
5 #library(KEGG.db)
6 library(GO.db)
7 library(GSEAbase)
8 library(GSEAlm)
910 #Get GeneSetCollection from GO with all pathways
11 Middleton_gsc<-GeneSetCollection(Middleton_normed,setType=GOCollection());
1213 #Create Incidence matrix from the GeneSetColletion describing all pathways
14 Middleton_Am<-incidence(Middleton_gsc);
1516 #Create expression set with only genes in incidence matrix
17 Middleton_nsF = Middleton_normed[colnames(Middleton_Am), ];
1819 #Only select the pathways with greater than 10 genes as short pathways are difficult to analyse statistically
20 Middleton_selectedrows<-(rowSums(Middleton_Am)>10);
21 Middleton_Am2<-Middleton_Am[Middleton_selectedrows,];
2223 #Apply the GSEAlm algorithm with 2000 permutations
24 Middleton_perm<-gsealmPerm(Middleton_nsF,~Status,mat=Middleton_Am2,nperm=2000);
2526 #Prepare the output file
27 Middleton_permA=Middleton_permB=c(1:length(Middleton_perm[,1]));
2829 for (i in 1:length(Middleton_perm[,1])){
30 Middleton_permA[i]<-min(Middleton_perm[i,1],Middleton_perm[i,2]);
31 if(Middleton_tAadj[i]<0){
32 Middleton_permB[i]="DOWN"}
33 else{
34 Middleton_permB[i]="UP"}
35 }
3637 Middleton_permAdj=p.adjust(Middleton_permA,method="fdr",n=length(Middleton_permA));
3839 names(Middleton_tA)= rownames(Middleton_Am2) ;
4041 Middleton_GO<-cbind(as.vector(names(Middleton_tA)),as.vector(Term(names(Middleton_tA))),as.vector(Middleton_permB),
42 as.vector(Middleton_permA),as.vector(Middleton_permAdj));
43 Middleton_GO<-Middleton_GO[order(as.numeric(Middleton_GO[,4])),];
44 colnames(Middleton_GO)=c("GOID","GO Term", "UP/DOWN", "P value","Adjusted P value");
4546 #Save results to file.
47 write.table(Middleton_GO,file="Middleton_GO_terms.txt",sep="\t");
GAGE
1 #GAGE method for Gene Set Analysis for pathways from Gene Ontology (GO) database
23 #library(KEGG.db)
4 library(GO.db)
56 #Use GSEABase package to get Gene set collection, format needs to be changed slightly to work with GAGE.
18
Robert Bentham
7 Middleton_gsc<-GeneSetCollection(Middleton_normed,setType=GOCollection());
8 Middleton_geneset<-geneIds(Middleton_gsc);
9 Middleton_genesetnames<-names(Middleton_gsc);
1011 #Apply GAGE algorithm
12 Middleton_gage <- gage(exprs(Middleton_normed), gsets = Middleton_geneset, ref = Middleton_controls, samp = Middleton_park,compare=’unpaired’);
1314 #Get GOID from Middleton_genesetname in right format
15 Middleton_lessterms<-c(1:length(Middleton_genesetnames));
16 Middleton_greaterterms<-c(1:length(Middleton_genesetnames));
17 for (i in 1:length(Middleton_lessterms)){
18 Middleton_lessterms[i]<-Middleton_genesetnames[as.numeric(substring(rownames((Middleton_gage$less[, 1:5]))[i],2,nchar(rownames((Middleton_gage$less[,
1:5]))[i])))]
19 Middleton_greaterterms[i]<-Middleton_genesetnames[as.numeric(substring(rownames((Middleton_gage$greater[,
1:5]))[i],2,nchar(rownames((Middleton_gage$greater[, 1:5]))[i])))]
20 }
2122 #Prepare file for saving
23 Middleton_GOgageless<-cbind(Middleton_lessterms,as.vector(Term(Middleton_lessterms)),as.vector((Middleton_gage$less[, 3])),as.vector((Middleton_gage$less[,
4])));
24 Middleton_GOgagegreater<-cbind(Middleton_greaterterms,as.vector(Term(Middleton_greaterterms)),as.vector((Middleton_gage$greater[,
3])),as.vector((Middleton_gage$greater[, 4])));
25 colnames(Middleton_GOgageless)=c("GOID","GO Term", "P value","Adjusted P value");
26 colnames(Middleton_GOgagegreater)=c("GOID","GO Term", "P value", "Adjusted P value");
2728 write.table(Middleton_GOgagegreater,file="Middleton_GO_gage_greater.txt",sep="\t");
29 write.table(Middleton_GOgageless,file="Middleton_GO_gage_less.txt",sep="\t");
Find GO Pathways which are significant
1 #Find GO pathways that are significantly over expressed in all studies
23 Middleton_GO_gagegreatersig<-Middleton_GO_gagegreater[which(Middleton_GOgagegreater[,4]<0.05),];
4 Moran_GO_gagegreatersig<-Moran_GO_gagegreater[which(Moran_GOgagegreater[,4]<0.05),];
5 Mullen_GO_gagegreatersig<-Mullen_GO_gagegreater[which(MullenGOgagegreater[,4]<0.05),];
67 sig_genes<-intersect(Mullen_GO_gagegreatersig[,1],intersect(Moran_GOgagegreatersig[,1],Middleton_GOgagegreatersig[,1]));
89 GO_gage_greater_sig<-cbind(sig_genes,Term(sig_genes));
1011 write.table(Middleton_GOgagegreatersig,file="Middleton_GO_gage_greater_sig.txt",sep="\t");
12 write.table(Moran_GOgagegreatersig,file="Moran_GO_gage_greater_sig.txt",sep="\t");
13 write.table(Mullen_GOgagegreatersig,file="Mullen_GO_gage_greater_sig.txt",sep="\t");
14 write.table(GO_gagegreatersig,file="GO_gage_greater_sig.txt",sep="\t");
Table for Significant genes in pathways related to DNA damage and stress
1 #Create table of significant genes in pathways related to DNA damage in the Middleton study.
23 #Create mapping between Affy probes and gene names
4 a<-hgu133aSYMBOL;
5 mapped_probes<-mappedkeys(a);
6 xx<-as.list(a[mapped_probes]);
78 #Import relavant GO pathways from premade file
9 Middleton_GO_DNA <- read.table("~/CP2/NewStudies/Middleton_GO_DNA", quote="\")
1011 #Find all genes involved in a DNA damage related pathway
12 A=c(1:length(Middleton_GO_DNA));
13 A[1]=which(Middleton_genesetnames==Middleton_GO_DNA[1]);
14 C1=genepaths[[A[1]]]
15 for (i in 2:length(Middleton_GO_DNA)){
16 A[i]=which(Middleton_genesetnames==Middleton_GO_DNA[i])
17 C2=genepaths[[A[i]]]
18 C1=union(C1,C2)}
1920 #Select genes if interest
21 D1=which(Middleton_top[,1] %in% C1);
222324 #Map probe Affy ID to gene ID
25 C1a=C1;
26 for (i in 1:length(C1)){
27 C1a[i]=xx[[Middleton_top[D1[i],1]]]}
2829 #Select only significant genes and save file
30 Middleton_DNA_genes<-cbind(C1a,Middleton_top[D1,c(2,6)]);
31 Middleton_DNA_genes_sig<-Middleton_DNA_genes[which(Middleton_DNA_genes[,3]<0.05),];
32 for (i in 1:5){
33 write.table(sprintf("%s & %f & %E \\\\ \\hline",Middleton_DNA_genes_sig[i,1],Middleton_DNA_genes_sig[i,2],Middleton_DNA_genes_sig[i,3]),
34 file="Middletontable.txt",append=TRUE,row.names=FALSE,col.names=FALSE)
35 }
3637 #Create table for significant genes common to all datasets
3839 for (i in 1:10){
40 cat(sprintf("%s & \\Checkmark & \\XSolidBrush & \\Checkmark \\\\ \\hline",intersect(Mullen_DNA_genes_sig[,1],Moran_DNA_genes_sig[,1])[i]))}
19
Robert Bentham
41 for (i in 1:2){
42 cat(sprintf("%s & \\Checkmark & \\Checkmark & \\XSolidBrush\\\\ \\hline",intersect(Middleton_DNA_genes_sig[,1],Moran_DNA_genes_sig[,1])[i]))}
43 for (i in 1:3){
44 cat(sprintf("%s & \\XSolidBrush & \\Checkmark & \\XSolidBrush\\\\ \\hline",setdiff(Middleton_DNA_genes_sig[,1],Moran_DNA_genes_sig[,1])[i]))}
45 for (i in 1:4){
46 cat(sprintf("%s & \\XSolidBrush & \\XSolidBrush & \\Checkmark \\\\ \\hline",setdiff(Mullen_DNA_genes_sig[,1],Moran_DNA_genes_sig[,1])[i]))}
20