Distributed Collaboration on RDF DatasetsUsing Git
Towards the Quit Store
Natanael Arndt, Norman Radtke and Michael Martin
SEMANTiCS 2016, Leipzig
September 14, 2016
Problem & Motivation
Linked Datasets as of August 2014
Enterprise Workspace
clon
e
enrich
😱
Public LOD Cloud
3 / 23
Problem & Motivation
Remark (Co-Evolution)The process of datasets simultaneously evolving separated from eachother while influencing each others evolution
4 / 23
Problem & Motivation
Usage of public LOD as background knowledgeMobile use casesIn distributed collaboration on RDF datasets
⇒ Support for multiple versions of the same dataset at the sametime
5 / 23
Approach
The same problem exists forsource code repositoriesSince around 10 yearsdistributed version controlsystem are solving thisproblemMultiple working copy exist atthe same time and can besynchronized
Server/Client Server/Client
Server/Client
Server/Client
Server/Client
6 / 23
Approach
Git is successful in softwaredevelopment
We have decided to see if thisalso works for RDFSo we have put RDF into therepositories
Server/Client Server/Client
Server/Client
Server/Client
Server/Client
7 / 23
Approach
Git is successful in softwaredevelopmentWe have decided to see if thisalso works for RDF
So we have put RDF into therepositories
Server/Client Server/Client
Server/Client
Server/Client
Server/Client
7 / 23
Approach
Git is successful in softwaredevelopmentWe have decided to see if thisalso works for RDFSo we have put RDF into therepositories
Server/Client Server/Client
Server/Client
Server/Client
Server/Client
7 / 23
Methodology
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update Read/write interface
Translating read/writeoperations to versioningSynchronizes the store withthe current working copy
8 / 23
Methodology:Serialization of RDF data
Multiple RDF serialization formats are availableFor the versioning with Git we need:
Same RDF graph = same representationMinimal difference between versionsMeaningful difference between version
⇒ We have chosen a canonicalized N-Quads serialization
9 / 23
Methodology:Blank Nodes in Versioning
With RDF as exchange format, still blank nodes are a problemBlank nodes identifiers only have a local scope… are not persistent or portable identifiers… are purely an artifact of the serialization
We follow the recommendation of RDF 1.1, to replace blanknodes with IRIs
([Cyganiak et al., 2014] sections 3.4 and 3.5)
10 / 23
Methodology:Blank Nodes in Versioning
With RDF as exchange format, still blank nodes are a problemBlank nodes identifiers only have a local scope… are not persistent or portable identifiers… are purely an artifact of the serializationWe follow the recommendation of RDF 1.1, to replace blanknodes with IRIs
([Cyganiak et al., 2014] sections 3.4 and 3.5)
10 / 23
Methodology:Read/Write Interface
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
SPARQL 1.1 Query and UpdateQuery proxy providing aSPARQL endpointExecutes Queries on the StoreTriggers read or writeoperations on the versioninglayer
11 / 23
Methodology:Translating Read/Write Operations
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
SPARQL read/write operationsare transformed to commit,merge, revert, push and pull
12 / 23
Methodology:Commit
Is triggered by UPDATE QueriesThe changed graphs are added and commited in a new GitcommitA Commit contains lines resp. statements added/removed
A B
13 / 23
Methodology:Commit
A commit is always referring to its predecessor not vice versaWe can also create two commits with the same predecessor
Branching/Forking
A B C
14 / 23
Methodology:Commit
A commit is always referring to its predecessor not vice versaWe can also create two commits with the same predecessorBranching/Forking
A B C
D
14 / 23
Methodology:Merge
If the commits are diverged we need to synchronize the versions
Create a commit with two predecessorsStill we need to actually consolidate the graphs
A B C
D
15 / 23
Methodology:Merge
If the commits are diverged we need to synchronize the versionsCreate a commit with two predecessorsStill we need to actually consolidate the graphs
A B C
D
E
15 / 23
Methodology:Merge
Using the default three-way-merge from git
<urn:ex:Tilia> a <urn:ex:Tree> .
<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .
<urn:ex:Tilia> <urn:ex:label> "Linda"@de .
<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .
16 / 23
Methodology:Merge
Using the default three-way-merge from gitOn syntactical level Git produces conflicts
Branch A
<urn:ex:Tilia> a <urn:ex:Tree> .
<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .
+ <urn:ex:Tilia> <urn:ex:height> "40"^^xsd:integer .
<urn:ex:Tilia> <urn:ex:label> "Linda"@de .
<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .
Branch B
<urn:ex:Tilia> a <urn:ex:Tree> .
<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .
+ <urn:ex:Tilia> <urn:ex:label> "Linde"@de .
- <urn:ex:Tilia> <urn:ex:label> "Linda"@de .
<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .
16 / 23
Methodology:Merge
Using the default three-way-merge from gitOn syntactical level Git produces conflicts
Git Merge:
<urn:ex:Tilia> a <urn:ex:Tree> .
<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .
<<<<<<< HEAD
<urn:ex:Tilia> <urn:ex:height> "40"^^xsd:integer . =======
<urn:ex:Tilia> <urn:ex:label> "Linda"@de . <urn:ex:Tilia> <urn:ex:label> "Linde"@de .
>>>>>>> typo
<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .
16 / 23
Methodology:Merge
Using the default three-way-merge from gitOn syntactical level Git produces conflictsBut actually there is no conflictConflicts have to be looked for on other levels
<urn:ex:Tilia> a <urn:ex:Tree> .
<urn:ex:Tilia> <urn:ex:age> "1000"^^xsd:integer .
<urn:ex:Tilia> <urn:ex:height> "40"^^xsd:integer .
<urn:ex:Tilia> <urn:ex:label> "Linde"@de .
<urn:ex:Tilia> <urn:ex:label> "Tilia"@en .
16 / 23
Methodology:Revert
Reverting a commit undoes an earlier changeThis is done by exchanging the add- and delete-set of statements
A B B−1
17 / 23
Implementation
File References
SPARQL 1.1 Interface
Public Git Repository
Local Git Repository
Query-Analyzer
Quad-Store
SPARQL Query
Update
Dump to files
Select
Parse files
Response
Written in Python, using Flask API as HTTP interface and RDFlib forSPARQL and RDF
18 / 23
Integration
Quit Store has the role of managing the repositoryProvide the read/write interfaceSynchronize the repository and the store.
Quit Store
Quit Diff Δ( , )
Quit Merge
60%Quit Notify
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
19 / 23
Integration
Quit Diff can calculate differences between commitsTrace provenance of statementTransmit patches to collaborators.
Quit Store
Quit Diff Δ( , )
Quit Merge
60%Quit Notify
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
19 / 23
Integration & Future Work
Quit Notify can actively inform other clones of updatesThis enables distributed setups for collaboration andsynchronization.
Quit Store
Quit Diff Δ( , )
Quit Merge
60%Quit Notify
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
20 / 23
Integration & Future Work
Quit Merge will implement various merge strategies for RDFDetect conflicts in diverged versions.
Quit Store
Quit Diff Δ( , )
Quit Merge
60%Quit Notify
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
20 / 23
Conclusion
With Quit we have presented amethodology for
version control and trackingprovenance of contributions,synchronization: clone, push and pullby other participants, anddistributed collaboration on RDFdatasets (gitflow)
Hopefully this can help to utilize thebig ecosystem of methodologies andtools around Git
Questions?Natanael Arndt<[email protected]>
Quit Store
Quit Diff Δ( , )
Quit Merge
60%Quit Notify
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
22 / 23
Conclusion
With Quit we have presented amethodology for
version control and trackingprovenance of contributions,synchronization: clone, push and pullby other participants, anddistributed collaboration on RDFdatasets (gitflow)
Hopefully this can help to utilize thebig ecosystem of methodologies andtools around GitQuestions?Natanael Arndt<[email protected]>
Quit Store
Quit Diff Δ( , )
Quit Merge
60%Quit Notify
Quit Store
RDFQuad Store
SPARQL 1.1 InterfaceQuery & Update
22 / 23
References I
Cyganiak, R., Wood, D., and Lanthaler, M. (2014).Rdf 1.1 concepts and abstract syntax.https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.
23 / 23