back to the future – should sql surrender to sparql? · should sql surrender to sparql? • as...

31
Back to the Future – Should SQL Surrender to SPARQL? Rainer Manthey © 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 1

Upload: phungdiep

Post on 17-Sep-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Back to the Future –Should SQL Surrender to SPARQL?

Rainer Manthey

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 1

How to Communicate with Databases?

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 2

From: http://www.intsolgrp.com/

?

Communicating with Google: Our Everyday Experience

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 3

Our request: A line of symbols

Google‘sanswer:139 Mill. Links

( … If Google is/has/uses a database?! )

Asking a Relational Database: More Complex, More Goal-Directed

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 4

From: technet.microsoft.com

Our request:An SQL Query

The DB‘s answer:A table with data rows

Reminder of Basic Terminology: DBS = DBMS + n*DB

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 5

Database Management SystemDBMS

DB Database

Database System

DB

DBS

Basics (2): Query Language and Query Manager

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 6

DBMS

DBDB

DBS

Query (= declarative program)

Query LanguageInterpreter

Relational Databases and SQL Systems: A Multi-Billion Dollar Market

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 7

• 1970: Proposal of the Relational Model of Data (RM) by Edward Codd• 1974: Design of SQL by Chamberlin/Boyce started• 1979: First commercial SQL DBMS (Oracle 2)• 1986: First SQL standard

RM/SQL: A more than 30 years success story . . . Up till now?

SQL: End of an Era?

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 8

NoSQL takes the database market by storm NoSQL takes the database market by storm

Is it the end of the line for SQL?Is it the end of the line for SQL?

Are SQL Databases Dead?Are SQL Databases Dead?

The relational model is dead, SQL is dead,and I don’t feel so good myselfThe relational model is dead, SQL is dead,and I don’t feel so good myself

History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla

History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla

From: http://crossfitlittleton.net

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 9

SPARQL: The Hardest New Competitor

The Semantic Web Dream: SPARQL‘s Vision and Goal

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 10

“ I have a dreamfor the Web [in which computers] become capableof analyzing all the data on the Web – the content, links, andtransactions between people and computers.

A "Semantic Web",which makes this possible, has yet to emerge,but when it does, the day-to-day mechanisms of trade, bureaucracyand our daily lives will be handled by machines talking to machines.The "intelligent agents" people have touted for ages will finallymaterialize.”

© 2014 by LyonLabs, LLC and Barrett Lyon

From: „Weaving the Web“ (1999)Sir Timothy Berners-Lee

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 11

W3C Activities in Developing New Query Languages

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 12

From: taxoncuration.myspecies.info

A SPARQL Taster

Restriction in this Talk: No Distributed Data Management!

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 13

SPARQL:

• Designed for managingdata over „the semanticweb“

• Navigation in distributeddata (re)sources is big issue

• IRIs as identifiers for such(re)sources used intensively

• At the same time able tomanage data without a web.

In this presentation:All web-related aspects in SPARQL ignored, as SQL has not been made for this context.

On Syntax: Triples vs. Tuples

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 14

SQL SPARQL

RM RDF

Query Language

based on

Data Model

• Goal of this contribution:

Compare SQL and SPARQL wrt to their data management capacities only !

• Therefore: First look at theunderlying data modelsof the two languages!• RM: Tables of rows and columns (or: relations as sets of tuples)• RDF: Datasets consisting of triples

RDF: The (Only?) Data Model for the Semantic Web

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 15

“RDF (Resource Description Framework) is one of the three foundational Semantic Web technologies, the other two being SPARQL and OWL.

In particular, RDF is the data model of the Semantic Web. That means that all data in Semantic Web technologies is represented as RDF.

If you store Semantic Web data, it's in RDF. If you query Semantic Web data (typically using SPARQL), it's RDF data. If you send Semantic Web data to your friend, it's RDF.”

http://www.cambridgesemantics.com/semantic-university/rdf-101

RDF Data: Graphs or Triples?

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 16

From: http://www.openarchives.org/ore/1.0/primer

Resource

Literal

Graph Representation

TripleRepresentation

RDF Datasets Are Relations

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 17

Some quite obvious observations:

• Every RDF triple can be perceived as a relational tuple.• Every RDF datasetcan be perceived as a relational table.• Every RDF dataset has the same attributes: S, P, O

⇒We could accomodate every RDF database in a RM database!

(If we wanted to do so!)

Relational Tables Represented in RDF

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 18

A B C D E. . .1 a 23 Jim 4.5 . . .

T

T as an RDF dataset

S P O

Primary key attribute

T as an RM table

T

. . .1 B a1 C 231 D Jim1 E 4.5. . .

• N-ary tuple into n-1 triples• Attributes into predicate values,

i.e., meta-data into data

Tuples as Graphs, RM DBs as Graph Databases

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 19

A B C D E. . .1 a 23 Jim 4.5 . . .

T

1

a 23

Jim

4.5 BC D

ETuple in serializednotation

Tuple in graphicalnotation

• Tables in RM can represent graphs as easily as RDF datasets.• No need to introduce a new data model for „graph-structured data“.• RM databases are graph databases.

RM vs. RDF: Brief Summary

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 20

• Triplesare specialtuples.• Uniform length 3, representing SPO statements• SO: „Things“, P: „Relationships“

• Tuplescan be turned into sets oftriples(systematically):• Provided they have a unary primary key!• Attributes are turned into data: become queryable!

• Datasetsare specialtables.• Tablescan be turned intodatasets.

RM and RDF are (in principle) equally expressive.

SQL Basics (1)

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 21

A B C D E

x

SELECT B, EFROM TWHERE A = 7

7 a 123 eg 2.1

. . .

. . .

SELECT x.B, x.EFROM T AS xWHERE x.A = 7

T

in full syntax:

• x: tuple variable• Attributes: Functions

• Written in postfix notation, e.g., x.A• Applied to each tuple in turn

Table:

SQL query

SQL Basics (2)

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 22

A B C D E

x . . a 123 . . . .

. . .

. . .

SELECT x.B, y.EFROM T AS x, T AS yWHERE x.C = y.A

T

. . . . 123 . . 3.4. . .

y

In SQL: Tuples from different tables (or copies of a table)are linked byexplicit comparisonsof attributefrom both tuples,

SPARQL Basics

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 23

SELECT ?x ?yFROM TWHERE {?x 2 ?z.

?z ?y a. }

SPARQL QueryS P O

?x

. . 2 123

. . .

. . .

T

123 . . a. . .

?y?z

• ?x, ?y, ?z: triplecomponentvariables• Each triple represented by a single

(triple) pattern in the WHERE part• Positional syntax, not attributes as selectors

Common Query Processing Paradigm

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 24

Query

FROMWHERE

SELECT

• Common principle: Sets of data elements (triple, tuple) as both, input and output

• Difference:• In SQL: Both input and output are tables, output to be always

used as further input –algebraic composition possible• In SPARQL: Output is not necessarily consisting of triples,

thusno composition possible

Datalog: SQL‘s (Relatively) Unknown Brother

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 25

• SQL is (was) not the only relational query language, e.g.:• Theoretical languages: calculus-based (TRC, DRC), relational algebra (RA)• Early languages: QUEL, Query-by-Example

• Nearly as old as SQL (developed in the 1970s/80s):

Datalog (Database + PROLOG)

• Syntactically: Like pure PROLOG (facts and rules, goals as queries)• Semantically: Like SQL (set-oriented evaluation, no backtracking)

• In Style:• Datalog : Minimalistic, purely symbolic (mathematical)• SQL: Verbose, rich of variants, English keywords (user-friendly?)

• In science: Quite successful for understanding complex problems (e.g., recursion)• Commercially: Completely „irrelevant“, no Datalog DBMS product ever• Datalog was never standardized: Free for scientific experiments

• Datalog is (at least) as expressive as SQL, if equipped with the same built-ins.

SQL and Datalog: Two Different Lingustic Approaches to Querying

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 26

p(X,Y) ← t(X,2,Z), t(Z, Y, a).

CREATE VIEW p ASSELECT x.A, y.BFROM t AS x, t AS yWHERE x.B=2 AND y.C=a AND x.C=y.A

Datalog rule

SQLview

• Based on DRC: Domain Relational Calculus• Variables represent individual tuplecomponents.• No attributes necessary! • Strictly symbolic style

• Based on TRC: Tuple Relational Calculus• Variables represent entiretuples.• Tuple components accessed via attributes! • Keyword-based style („verbose“)

SPARQL and Datalog: Two Real „Brother“ Languages

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 27

SELECT ?x ?yFROM tWHERE {?x 2 ?z. ?z ?y a. FILTER ?y > ?z}

{ (X,Y) : t(X, 2, Z) , t(Z, Y, a) , Y > Z } ?

• Obviously verysimilarbasicprinciple!• In both languages: Variablesrepresentcomponentsof tuples/triples• Literals in Datalog = Triple patterns in SPARQL• More than one literal/triple pattern connectedconjunctively(AND)• Identity conditions expressed indirectly in literals/triple patterns

• Constant values appearing on suitable position• Identity of values in different position: same variable

SPARQL query

Datalog query

SPARQL and SQL: Quite Unrelated (Except on the Surface)

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 28

SELECT ?x ?yFROM tWHERE {?x 2 ?z. ?z ?y a. FILTER ?y > ?z}

{ (X,Y) : t(X, 2, Z) , t(Z, Y, a) , Y > Z } ?

SPARQL query

Datalog query

In comparison: SQL is very different in „philosophy“ and style from both of these!

SELECT t1.A, t2.BFROM T AS t1, T AS t2WHERE t1.B = 2 AND t1.C = t2.A AND

t2.C = a AND t2.B > t2.A

SQL query

SQL and SPARQL: A Brief Summary of Additional Complex Features

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 29

• SQL: More complex queriesconstructed by . . .• Combining SELECT-FROM-WHERE blocks

usingset operatorsUNION, INTERSECT, MINUS• NestingSFW-blocks (using EXISTS quantifier in WHERE conditions)• Explicit propositional operators AND, OR, NOT• Aggregate functions (e.g., COUNT, AVG) and GROUP BY• Ordering of query results: ORDER BY

• SPARQL: • UNION operator available for merging patterns in WHERE parts• No other set operators, no combination of several queries• EXISTSoperator since SPARQL 1.1 for nested patterns in FILTER• Boolean operators only in special situations• Aggregation as in SQL since SPARQL 1.1• ORDER BY as in SQL

SPARQL stepwise enhanced with other SQL keywords

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 30

Back to the Future?

• Successful Science Fiction movie from 1985• Crazy inventor tries to do time travels using

a futuristic high-tech car• Reaches the past (1955), aiming at the future• 30 years back (like SPARQL to SQL)

As far as data management is concerned, SPARQL seems to be a step back in time.

SQL and (even more) Datalog aretoo close,but hiddenby idiosyncratic new syntax detailsand by IRIs around everywhere.

Conclusion: A Tale of Two Languages

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 31

Should SQL Surrender to SPARQL?

• As far as database management is concerned: Certainly not!• Both, RM and RDF, are very close in style and equally expressive.• SPARQL cannot really claim any advantage wrt graph databases.• The two languages have more commonalities than differences.• Superiority on the SPARQL side is not really visible.• Surprising: SPARQL is much closer to Datalog than to SQL!

• As far as „serving the web“ is concerned: No competition by SQL (yet)!

Some (more) personal opinions:

• SPARQL‘s style („look and feel“) is consequent in some aspects,but appears to me quite ugly and overblown otherwise.

• The documentation of SPARQL & Co by W3C is hard to „digest“.• The „propaganda“ for SPARQL by the „Semantic Web Movement“

is making fair comparisons hard.• Whether the SQL vendors will again be able to „swallow“ a competitor

this time remains to be seen . . . I have my doubts.