genome and proteome data integration in rdf
TRANSCRIPT
Genome and Proteome data integration in RDF Nadia Anwar Ela Hunt Walter Kolch and Andy Pitt
Semantic Web Applications and Tools for Life SciencesNovember 2008
Data Discovery
Genome Tr
ansc
ripts Proteins
Metabolites
Outline
bull Data Integration in Bioinformatics
bull Semantic data integration
bull Francisella
bull Integrating genome annotations with experimental proteomics data in RDF
bull Further work
Data Integration is not a solved problem
Information discovery is not Integrated
ProteomicsPeptide Profiles
Peptide AbundanceProtein IdentificationProtein Interactions
PT-ModificationsLIMS
Gene ExpressionTranscript Profile
Transcript Abundance
LIMS
GenomicsSequence
ORF PredictionGenome
Comparisons
LIMS
Genome Metabolic Pathways
Microarrayexperiments
Computationalanalysis Systems Biology
Synthetic NetworksPathways
Predictions
MetabolomicsLIMS
Translational Medicine
Regulatory Networks
Proteomicsexperiments
Computationalanalysis
High TPSequencing
Semantic Data Integration across omes data silos
Data Information Genes Transcripts Peptides Metabolites Genotype
Data Discovery
Proof of conceptFrancisella tularensis
ulceroglandular tularaemia
respiratory tularaemia
oculoglandular tularaemia
Bioterrorism
bull Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease)
bull low infectious dose (10-50 bacterium compared to anthrax which requires 8000-15000 spores)
bull weaponisation fears
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Outline
bull Data Integration in Bioinformatics
bull Semantic data integration
bull Francisella
bull Integrating genome annotations with experimental proteomics data in RDF
bull Further work
Data Integration is not a solved problem
Information discovery is not Integrated
ProteomicsPeptide Profiles
Peptide AbundanceProtein IdentificationProtein Interactions
PT-ModificationsLIMS
Gene ExpressionTranscript Profile
Transcript Abundance
LIMS
GenomicsSequence
ORF PredictionGenome
Comparisons
LIMS
Genome Metabolic Pathways
Microarrayexperiments
Computationalanalysis Systems Biology
Synthetic NetworksPathways
Predictions
MetabolomicsLIMS
Translational Medicine
Regulatory Networks
Proteomicsexperiments
Computationalanalysis
High TPSequencing
Semantic Data Integration across omes data silos
Data Information Genes Transcripts Peptides Metabolites Genotype
Data Discovery
Proof of conceptFrancisella tularensis
ulceroglandular tularaemia
respiratory tularaemia
oculoglandular tularaemia
Bioterrorism
bull Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease)
bull low infectious dose (10-50 bacterium compared to anthrax which requires 8000-15000 spores)
bull weaponisation fears
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data Integration is not a solved problem
Information discovery is not Integrated
ProteomicsPeptide Profiles
Peptide AbundanceProtein IdentificationProtein Interactions
PT-ModificationsLIMS
Gene ExpressionTranscript Profile
Transcript Abundance
LIMS
GenomicsSequence
ORF PredictionGenome
Comparisons
LIMS
Genome Metabolic Pathways
Microarrayexperiments
Computationalanalysis Systems Biology
Synthetic NetworksPathways
Predictions
MetabolomicsLIMS
Translational Medicine
Regulatory Networks
Proteomicsexperiments
Computationalanalysis
High TPSequencing
Semantic Data Integration across omes data silos
Data Information Genes Transcripts Peptides Metabolites Genotype
Data Discovery
Proof of conceptFrancisella tularensis
ulceroglandular tularaemia
respiratory tularaemia
oculoglandular tularaemia
Bioterrorism
bull Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease)
bull low infectious dose (10-50 bacterium compared to anthrax which requires 8000-15000 spores)
bull weaponisation fears
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Information discovery is not Integrated
ProteomicsPeptide Profiles
Peptide AbundanceProtein IdentificationProtein Interactions
PT-ModificationsLIMS
Gene ExpressionTranscript Profile
Transcript Abundance
LIMS
GenomicsSequence
ORF PredictionGenome
Comparisons
LIMS
Genome Metabolic Pathways
Microarrayexperiments
Computationalanalysis Systems Biology
Synthetic NetworksPathways
Predictions
MetabolomicsLIMS
Translational Medicine
Regulatory Networks
Proteomicsexperiments
Computationalanalysis
High TPSequencing
Semantic Data Integration across omes data silos
Data Information Genes Transcripts Peptides Metabolites Genotype
Data Discovery
Proof of conceptFrancisella tularensis
ulceroglandular tularaemia
respiratory tularaemia
oculoglandular tularaemia
Bioterrorism
bull Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease)
bull low infectious dose (10-50 bacterium compared to anthrax which requires 8000-15000 spores)
bull weaponisation fears
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Semantic Data Integration across omes data silos
Data Information Genes Transcripts Peptides Metabolites Genotype
Data Discovery
Proof of conceptFrancisella tularensis
ulceroglandular tularaemia
respiratory tularaemia
oculoglandular tularaemia
Bioterrorism
bull Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease)
bull low infectious dose (10-50 bacterium compared to anthrax which requires 8000-15000 spores)
bull weaponisation fears
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Proof of conceptFrancisella tularensis
ulceroglandular tularaemia
respiratory tularaemia
oculoglandular tularaemia
Bioterrorism
bull Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease)
bull low infectious dose (10-50 bacterium compared to anthrax which requires 8000-15000 spores)
bull weaponisation fears
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Bioterrorism
bull Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease)
bull low infectious dose (10-50 bacterium compared to anthrax which requires 8000-15000 spores)
bull weaponisation fears
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data sourcesGenome
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
RDF
(4)IMGgene_oid=639752258 FTN_0209 (3)IMG_Slocus_tag
229107
(3)IMG_Sgenomic_location_start
229976
(3)IMG_Sgenomic_location_end
+
(3)IMG_Sgenomic_location_strand
TPR
(2)RDFScomment
RDFdescription
(1)RDFtype
httpimgjgidoegovcgi-binpubmaincgisection=TaxonDetailamppage=taxonDetailamptaxon_oid=639633024export
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data sourcesGenome annotations
Francisella SuperFamily Data
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescriptionRDFtype
SUPERFAMILYcgi-binmodelcgimodel=0040419httppurluniprotorgcoreProtein_Family
155-367SUPERFAMILYAssignment_Region
51e-39SUPERFAMILYScore
SUPERFAMILYcgi-binscopcgisunid=52540SUPERFAMILYSCOP_ID
P-loop containing nucleoside triphosphate hydrolases
SUPERFAMILYSCOP_Fold
81269
SUPERFAMILYFamily_ID
733e-06
SUPERFAMILYEvalue
Extended AAA-ATPase domain
SUPERFAMILYFamily_Description
1l8q A77-289
SUPERFAMILYSimilar_Structure
httpsupfamcsbrisacuk
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data sourcesGenome annotations - KEGG
httpwwwgenomejpdbget-binwww_bgetpathway+ftn00010
httpwwwgenomejpdbget-binwww_bgetftnFTN_0298
httpimgjgidoegovschemagene
glpX
httpimgjgidoegovschemagene_name
fructose
rdfscomment
httpsrsebiacuksrsbincgi-binwgetz-e+[EC31311]
rdfsseeAlso
httpsrsebiacuksrsbincgi-binwgetz-e+[SPA0Q4N9_FRATN]
rdfsseeAlso
httpwwwncbinlmnihgoventrezviewerfcgidb=proteinampid=118496616
RDFdescription
RDFtype
YP_8976661
RDFidsymbol
httpsrsebiacuksrsbincgi-binwgetz[refseqp-SeqVersionYP_8976661]+-e
RDFSseeAlso
chromosomal
httppurluniprotorgAnnotation
Genome annotations - NCBI protein
httpwwwgenomejpdbget-binwww_bfindFtularensis_U112
httpwwwncbinlmnihgovsitesgqueryterm=Francisella+tularensis+novicida
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data sourcesGenome annotations - GO
httpwwwgenomejpdbget-binwww_bgetftnFTN_0277
RDFdescriptionRDFtype
httpamigogeneontologyorgcgi-binamigogocgiview=detailsampquery=0006749mglaGO_AnnotationID
glutathionemglaGO_AnnotationTerm
biological_processmglaGO_AnnotationOntology
7
mglaGO_AnnotationLevel
0879989490261963
httpwwwcompbiodundeeacukSoftwareGOtchaiscore
57273821328517
httpwwwcompbiodundeeacukSoftwareGOtchacscore
Poson annotations - Cogs
httpstoolsnwrceorgcgi-binfnu112posoncgiposon=PSN082435
httpwwwncbinlmnihgovsitesentrezdb=cddampcmd=searchampterm=COG0508mglacogNumber
AceFmglacogDomain
Pyruvate2-oxoglutarate
mglacogDescription
dihydrolipoamide
mglacogCategory
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data sources - experimentsTranscriptomics
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data sources - experimentsProteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Francisella tularensis novicida U112
Whole Cell(3)
Soluble(3)
Membrane(3)
Whole Cell(3)
Soluble(3)
Membrane(3)
WildType MglA mutant
(4) (4) (4) (4) (4) (4)
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Sequest DRAGONSequest DRAGON
Relative AbundanceIdentification
Two-sided t-test
P val lt001
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
RDF - excel conversion
Genome
mglaexperiment
subject
object
predicate
Pval
Pval-1
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data integration Reconciled Identifiers
(WashU-B) PSNV1
(WashU-B) PSNV2(COGs) COGID
(Gene Ontology) GOID
(WashU-B) PSNV3
(Fn ORF ID) FTN
(WashU-P) DDB
(Refseq) ACNo
(Uniprot) ACNo(ENZYME) ECNo
(IMG) GENEID(NCBI) PROTEINID
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data IntegrationAdding new experiments
Experiment 1
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECAC No
Experiment 4
Experiment 2
Experiment 3
Public domain data
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
NadiaAnwar~ nadia$ openrdf-sesame-21binconsolesh Connected to default data directory
Commands end with at the end of a lineType help for helpgt connect http1270018080openrdf-sesameDisconnecting from default data directoryConnected to http1270018080openrdf-sesamegt show r+----------|SYSTEM (System configuration repository)|ftnRepoNative (Francisella Test)|FrancisellaNative (FrancisellaTestStore)|FrancisellaReified (Native store with RDF Schema inferencing)|FrancisellaReified_index2 (Native store with RDF Schema inferencing)|Francisella (Native store with RDF Schema inferencing)+----------gt open FrancisellaReified_index2Opened repository FrancisellaReified_index2
Data integration Sesame
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
SesameData load (ftnRepoNative) - native (spocposc)
Data File time (s) triples
francisella_locus_tagnt 893 1767
interact-protnt 8851 20682
interact-prot-peptidesnt 248647
mgla search dbfastablastp4 ypURLn3 97 1719
NC_008601nt 4314 12781
Ft_novicidaU112gont 35914 2548
francisellardf2nt 4341 10434
francisellaSUPERFAMILYnt 5788 16110
francisellaPROTEINfastant 1363 5160
Solublent 58887 336761
WholeCellnt 46902 112625
Membranesnt 100319 298771
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
Experiment
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
SELECT psn ftn ec FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
SELECT abundance psn ec ftn FROM ftn rdfsseeAlso ec psn rdfsseeAlso ftn analysis mglaposon psnanalysis mglaexperiment abundanceWHERE ec LIKE ldquo[ECrdquoUSING NAMESPACEmgla =lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagt
Data IntegrationMgla data (ftnRepoNative)
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
DDBID
rdfsseeAlso
GO ECSP
mglasequencemglaexperiment
rdfabout
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Really easy Butbull Simple excel to RDF conversion does not enable all queries
bull Not a simple conversion - Data needs to be ldquomodelledrdquo
Identified Peptide
analysis
Peptide sequence
mglaposon
abundance PSNmglasequence
mglaexperiment
rdfabout
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Data IntegrationReified statements
Identified Peptideanalysis
Peptide sequence
Experiment Replicate
rdftype
mglaposon
PSNV3 FTNPSN PSNV2rdfsseeAlso rdfsseeAlso rdfsseeAlso
analysis datardfStatement
analysis data
InExperimentReplicate
rdfobject
rdftype
rdfsubject
rdfpredicateabundance
mglaPeptideAbundance
DDBID
rdfsseeAlso
GO ECSP
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
SesameReified Data load - native-RDFS (spocposcposc)
Data File time (s) time(mins) triples
FnU112Version3nt 38344 63 58474
PosonMappingsnt 8456 14 13760
francisella_locus_tagnt 1673 03 1767
ConstructHasGeneIDnt 2300 04 1719
interact-protnt 12495 21 20682
interact-prot-pepteidesnt 112797 187 248647
interact-protSeeAlsoisbURLnt 1067 02 1528
goAnnotation_URLIDnt 7414 12 20501
NC_008601nt 7584 13 12781
Membranes_CogNumberURLnt 860 01 2548
Ft_novicida_U112_gont 56138 93 2548
francisellardf2nt 4619 08 10602
francisellaSUPERFAMILYnt 6667 11 16110
francisellaPROTEINfastant 1527 03 5160
SolubleReifeid_3rdf 139298 232 580873
WholeCellReified_3rdf 94116 156 184221
Membranes_3rdf 102666 17111 416086
fnU112_draftRDFschemaV4nt 21501098 35835 501
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
select ftn psn exp abundance from psn rdfsseeAlso psnv2psnv2 rdfsseeAlso psnv3psnv3 rdfsseeAlso ftnanalysis fnu112poson psnanalysis rdftype rdfStatementanalysis rdfobject expanalysis mglaPeptideAbundance abundancewhere xsdinteger(abundance) gt 100000and ftn LIKE FTNusing namespace mgla=lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglagtfnu112=lthttpwwwfrancisellaorgnovicidafnu112schemafnu112experimentsmglagt
Querieswhich posons have the most highly abundant peptides
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Querieswhich posons have the most highly abundant peptides
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Querieswhich experiments have the most highly abundant peptides
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Reified statementsbull Reified mgla data are much bigger (4 more statementsabundance)
bull The really interesting queries return Java out of memory error (-Xms-1024M -Xmx 1536M)
bull Havenrsquot yet tested shortcut path expression
reifSubj reifPred reifObj pred obj
seq identifiedIn ExpRep hasAbundance abd
ltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpwwww3org19990222-rdf-syntax-nsStatementgtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nssubjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaWholeCell_Lvl7_021gtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nspredicategt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaInExperimentReplicategtltWholeCell_Lvl7_0212gt lthttpwwww3org19990222-rdf-syntax-nsobjectgt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglawildtype01_wc_01gtltWholeCell_Lvl7_0212gt lthttpwwwfrancisellaorgnovicidaschemafnu112experimentsmglaPeptideAbundancegt 2594
Peptide SequenceExperiment Replicate
abundance
identifiedIn
hasAbundance
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (gt20000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) gt 20000and experiment LIKE solusing namespace
171 146185
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
sol
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solINTERSECTselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
Comparison of integrated experimental dataDistinct and overlapping posons identified within each biological fraction (lt5000)
mem
INTERSECT
sol MINUS memmem MINUS solselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memusing namespace
select distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE memMINUSselect distinct psn fromx fnsposon psnx fnInExperimentReplicate experiment analysis rdfsubject xanalysis rdfobject expanalysis fnPeptideAbundance abundancewhere xsdinteger(abundance) lt 5000and experiment LIKE solusing namespace
219 125245
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Further work
bull Queries are slow in the native repository database repositories are probably faster
bull Adding transcriptomic experiment
Wt Vs mglA mutant
GEO AC GSE5468
bull RDF-S inferencing
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Acknowledgements
bull Funding BBSRC -Radical Solutions for Researching the Proteome
bull University of Glasgow Glasgow
bull Prof Walter Kolch
bull Dr Andy Pitt
bull University of Strathclyde Glasgow
bull Dr Ela Hunt (Scientific Advisor)
bull University of Washington Seattle
bull Prof Dave Goodlett (Scientific Advisor)
bull Dr Mitch Brittnacher Mathew Radey Laurence Rohmer
bull Dr Tina Guina (MglA experiment)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)
Abundance thresholdsbull SeRQL aggregate functions would be nice to have
bull Queries to find low and high abundance values
bull WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance)
bull WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)