DANIEL J. ABADI , ADAM MARCUS, SAMUEL R . MADDEN, AND KATE HOLLENBACH.
2009 . THE VLDB JOURNAL.
SW-Store: a vertically partitioned DBMS for Semantic Web data
Management
Presented By:Sanam Narula
Agenda
Introduction
Current State Of Art
A simple Alternative
SW Store Design
Benchmarks and Results
Introduction
SW-Store is a new DBMS based on the approach of column oriented DBMS.
SW-Store is able to achieve high performance in RDF data management.
It is an extension of column based DBMS.
Current State Of Art
Efficient storage mechanism for RDF triples
Find all authors of books whose title has the word “Transaction”.
5 way self join.
Query will also be very slow to execute as the size will increase
Current State of Art
Two Approaches to implement property Tables:
1. A data clustering approach: Denormalize RDF tables by storing them in a wider,
flattened representation.
2. Property Class Table:It exploits the type property of subjects to cluster similar
sets of subjects together in same table.
Data Clustering Approach
Property Class table
It exploits Type property of subjects to cluster tables.
Advantages & Disadvantages
Advantages :It improves performance by reducing number of
self joins.Opens up possibility of attribute typing.Less Number of Null values
Disadvantages:Increase the complexity by requiring property
clustering.Multi-valued AttributesAlthough less but still Null values are present.
A Simple Alternative
Approach : A fully Decomposed Storage ModelThe triples table is rewritten into n two column
tables.Each table is sorted by subject so that fast merge
join can take place.
Analysis of this approach
• Effective handling of multivalued attributes• Support for heterogeneous records.• Only property tables required by a query needs to
be read• No clustering algorithms• Fewer unions
• Increased number of joins.• Slow Inserts
Extending a column Oriented DBMS
The Ideas is to store tables as collections of columns rather than collection of rows.
Projection occurs for free – only those columns relevant to a query need to be read in memory.
Record header is stored in separate columns thus reducing the tuple width and letting us choose different compression techniques for each column.
SW-Store Design
Input : RDF data in the form of triples <subject, property, object>Output : Efficient storage system for RDF data.
Objective : Improve the query performance for complex real world queries.
Components
SW-Store basically consists of four major components : A vertically partitioned storage system A relational Query Engine. A Query Rewriter that converts the queries. A Batch Writer
Data Representation
Objective : Improve the query performance for complex real world queries.
Query Engine and Query Translation
• Each column scanned to produce tuples that satisfies all three predicates
• Tupleize operator becomes merge join over two column vertical partitions
Overflow Table
A mechanism to support inserts in a batch.Additional table in the standard triples schema Not indexed or read optimizedProperties that appear very small number of times in
overflow table are not merged due to cost of merging. Horizontal “chunks” to improve the efficiency of
merging Queries must go to both overflow table and vertical
partitionsMerge operation is still needed.
Benchmark and Results
Benchmark and Results
Property table and vertical partitioning outperforms triple store by a factor of 2-3.
C-Store adds another factor of 10 performance improvement
For Property table, careful selection of column names are required.
Vertical partitioning represents the best case and worst case scenario.
Linear scaling for all tested queries
References
http://www-users.cs.umn.edu/~smithal/spatial databases CSCI 8715
Thanks