on the need for a w3c community group on rdf stream processing
DESCRIPTION
by Oscar Corcho @ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013TRANSCRIPT
![Page 1: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/1.jpg)
On the need for a W3C community group on RDF
Stream Processing
ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013
Oscar [email protected], [email protected]
@ocorchohttp://www.slideshare.net/ocorcho/
![Page 2: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/2.jpg)
Disclaimer…
2<<Texto libre: proyecto, speaker, etc.>>
This presentation expresses my view but not necessarily the one from the rest of the group (although I hope that it is similar)
![Page 3: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/3.jpg)
Acknowledgements
• All those that I have “stolen” slides, material and ideas from• Emanuele Della Valle• Daniele Dell’Aglio• Marco Balduini• Jean Paul Calbimonte• And many others who
have already startedcontributing…
3<<Texto libre: proyecto, speaker, etc.>>
![Page 4: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/4.jpg)
Why setting up a community group?
4<<Texto libre: proyecto, speaker, etc.>>
In RDF Stream models(timestamps, events, time
intervals, triple-based, graph-based …)
In RDF Stream query languages(windows, stream selection, CEP-based operators, …)
In implementations(RDF native, query rewriting, continuous query registration,
scalability, static vs streaming data…)
In operational semantics(tick, window content, report)
Heterogeneity
![Page 5: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/5.jpg)
You may think that we do not like heterogeneity…
5<<Texto libre: proyecto, speaker, etc.>>
![Page 6: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/6.jpg)
But at least I love it…
• However, we need to tell people what to expect with each system, and smooth differences when they are not crucial……
6<<Texto libre: proyecto, speaker, etc.>>
![Page 7: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/7.jpg)
The solution…
• Let’s create a W3C community group…
• To understand better those differences• The requirements on which we are based• And explain to others• …• And maybe get some “recommendation” out
7<<Texto libre: proyecto, speaker, etc.>>
![Page 8: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/8.jpg)
The W3C RDF Stream Processing Comm. Group
• http://www.w3.org/community/rsp/
8<<Texto libre: proyecto, speaker, etc.>>
![Page 9: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/9.jpg)
W3C RSP Community Group mission
“The mission of the RDF Stream Processing Community Group (RSP) is to define a common model for producing, transmitting and continuously querying RDF Streams. This includes extensions to both RDF and SPARQL for representing streaming data, as well as their semantics. Moreover this work envisions an ecosystem of streaming and static RDF data sources whose data can be combined through standard models, languages and protocols. Complementary to related work in the area of databases, this Community Group looks at the dynamic properties of graph-based data, i.e., graphs that are produced over time and which may change their shape and data over time.”
9<<Texto libre: proyecto, speaker, etc.>>
![Page 10: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/10.jpg)
Use cases
• We have started collecting them
• And I hope that by the end of my talk you will consider contributing some more…
10<<Texto libre: proyecto, speaker, etc.>>
![Page 11: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/11.jpg)
A template to describe use cases (I)
• Streaming Information • Type: Environmental data: temperatures, pressures, salinity, acidity, fluid
velocities etc, • Nature:
• Relational Stream: yes • Text stream: no
• Origin: Data is produced by sensors in oil wells and on oil and gas platforms equipments. Each oil platform has an average of 400.000.
• Frequency of update: • from sub-second to minutes • In triples/minute: [10000-10] t/min
• Quality: It varies, due to instrument/sensor issues • Management /access
• Technology in use: Dedicated (relational and proprietary) stores • Problems: The ability of users to access data from different sources is
limited by an insufficient description of the context • Means of improvement: Add context (metadata) to the data so it
become meaningful and use reasoning techniques to process that metadata
11<<Texto libre: proyecto, speaker, etc.>>
![Page 12: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/12.jpg)
A template to describe use cases (II)
• [optional] Static Information required to interpret the streaming information
• Type: Topology of the sensor network, position of each sensor, the descriptions of the oil platform
• Origin: Oil and gas production operations • Dimension:
• 100s of MB as PostGIS dump • In triples: 10^8
• Quality: Good • Management / access
• Technology in use: RDBMS, proprietary technologies • Available Ontologies and Vocabularies: Reference Semantic Model
(RSM), based on ISO 15926
12<<Texto libre: proyecto, speaker, etc.>>
![Page 13: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/13.jpg)
A tale of four heterogeneities
ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013
Oscar [email protected], [email protected]
@ocorchohttp://www.slideshare.net/ocorcho/
![Page 14: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/14.jpg)
Heterogeneity #1: Representing RDF Streams
14<<Texto libre: proyecto, speaker, etc.>>
![Page 15: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/15.jpg)
What is an RDF stream?
• Several possibilities:• An RDF stream is an infinite sequence of timestamped
events (triples or graphs), where timestamps are non-decreasing
…<eventi,ti >
<eventi+1,ti+1 >
<eventi+2,ti+2 >
…• An RDF stream is an infinite sequence of triple occurrences
<<s,p,o>,tα,tω> where <s,p,o> is an RDF triple and tα and tω are the start and end of the interval
• How are timestamps assigned?
![Page 16: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/16.jpg)
Some examples…
• What would be the best/possible RDF stream representation for the following types of problems?• Does Alice meet Bob before Carl?• Who does Carl meet first?
• How many people has Alice met in the last 5m?• Does Diana meet Bob and then Carl within 5m?
• Which are the meetings the last less than 5m?• Which are the meetings with conflicts?
16<<Texto libre: proyecto, speaker, etc.>>
e1
:alice :isWith :bob
e2
:alice :isWith :carl
e3
:bob :isWith :diana
e4
:diana :isWith :carl
t3 6 91
:alice :isWith :bob :alice :isWith :carl :bob :isWith :diana :diana :isWith :carl
e1
e2
e3
e4
![Page 17: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/17.jpg)
Data types for semantic streams - Summary
• Multiple notions of RDF stream proposed• Ordered sequence (implicit timestamp)
• One timestamp per triple (point in time semantics)
• Two timestamps per triple (interval base semantics)
• Comparison between existing approaches
• More investigation is required to agree on an RDF stream model
17
System Data item Time model # of timestamps
INSTANS triple Implicit 0
C-SPARQL triple Point in time 1
SPARQLstream triple Point in time 1
CQELS triple Point in time 1
Sparkwave triple Point in time 1
Streaming Linked Data RDF graph Point in time 1
ETALIS triple Interval 2
![Page 18: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/18.jpg)
Heterogeneity #2: RDF Stream processors
18<<Texto libre: proyecto, speaker, etc.>>
![Page 19: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/19.jpg)
Existing RDF Stream Processing systems
• C-SPARQL: RDF Store + Stream processor• Combined architecture
• CQELS: Implemented from scratch. Focus on performance• Native + adaptive joins for static-data and streaming data
• CQELS-Cloud: Reusing Storm• Paper presentation on Thursday
RDF Store
Stream processor
C-SPARQLquery
static
streaming
continuous results
Native RSPCQELSquery
continuous results
translator
Storm topology
CQELSquery
continuous results
![Page 20: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/20.jpg)
Existing RSP systems
• EP-SPARQL: Complex-event detection• SEQ, EQUALS operators
• SPARQLStream: Ontology-based stream query answering• Virtual RDF views, using R2RML mappings• SPARQL stream queries over the original data streams.
• Instans: RETE-based evaluation
Prolog engine
EP-SPARQLquery
continuous results
translator
DSMS/CEPSPARQLStreamquery
continuous results
rewriter
R2RML mappings
![Page 21: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/21.jpg)
Query languages for semantic streams - Summary
• Different architectural choices • It is not clear when each choice is best for which type of use
case• Wrappers over existing systems
• C-SPARQL, ETALIS, SPARQLstream , CQELS-Cloud
• Better reliability and maintainability?• Native implementations
• CQELS, Streaming Linked Data, INSTANS • Better scalability: optimizations that are not possible
in other systems
• Different operational semantics• See later
21
![Page 22: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/22.jpg)
Heterogeneity #3: Querying RDF Streams
22<<Texto libre: proyecto, speaker, etc.>>
![Page 23: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/23.jpg)
Querying data streams (from CQL to SPARQL-X)
Streams
Relations
…<s,τ>…
<s1
><s2
><s3
>
infiniteunbounded
bagfinitebag
Mapping: T R
stream-to-relation (S2R)
relation-to-stream (R2S)
relation-to-relation (R2R)
Stream Relation R(t)
RDF Stream
s
S2R Window operators
R2S operators
SPARQL operators
RDF
![Page 24: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/24.jpg)
Output: relation
• Case 1: the output is a set of timestamped mappings
RSP
SELECT ?a ?b …FROM ….WHERE ….
CONSTRUCT {?a :prop ?b }FROM ….WHERE ….
a … ?b… [t1]a … ?b…a … ?b… [t3]a … ?b… [t5]a … ?b… [t7]
<… :prop … > [t1] <… :prop … > <… :prop … > [t3] <… :prop … > [t5] <… :prop … > [t7]
queries bindings
triples
![Page 25: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/25.jpg)
Output: stream
• Case 2: the output is a stream• R2S operatorsCONSTRUCT RSTREAM {?a :prop ?b }FROM ….WHERE ….
… <… :prop … > [t1] <… :prop … > [t1] <… :prop … > [t3] <… :prop … > [t5] < …:prop … > [t7]…
RSPquery
stream
ISTREAM: stream out data in the last step that wasn’t on the previous step
DSTREAM: stream out data in the previous step that isn’t in the last step
RSTREAM: stream out all data in the last step
![Page 26: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/26.jpg)
Other operators
• Sequence operators and CEP world
e1 e2 e3
e4
SS
3 6 91
Sequence Simultaneous
SEQ: joins eti,tf and e’ti’,tf’ if e’ occurs after e
EQUALS: joins eti,tf and e’ti’,tf’ if they occur simultaneously
OPTIONALSEQ, OPTIONALEQUALS: Optional join variants
![Page 27: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/27.jpg)
Query languages for semantic streams - Summary
• Comparison between existing approaches
• Is it time to converge on a standard?
27
System S2R R2R Time-aware R2S
INSTANS Based on time events
SPARQL update
Based on time events Ins only
C-SPARQL Engine
Logical and triple-based
SPARQL 1.1 query
timestamp function Batch only
SPARQLstream Logical and triple-based
SPARQL 1.1 query
no Ins, batch, del
CQELS Logical and triple-based
SPARQL 1.1 query
no Ins only
Sparkwave Logical SPARQL 1.0 no Ins only
Streaming Linked Data
Logical and graph-based
SPARQL 1.1 no Batch only
ETALIS no SPARQL 1.0 SEQ, PAR, AND, OR, DURING, STARTS, EQUALS, NOT, MEETS, FINISHES
Ins only
![Page 28: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/28.jpg)
• Different syntax for S2R operator• Semantics of query languages is similar, but not
identical• Lack of R2S operator in some cases• Different support for time-aware operators
28
Query languages for semantic streams - Issues
![Page 29: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/29.jpg)
Classification of existing systems
![Page 30: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/30.jpg)
Heterogeneity #4: Operational Semantics
30<<Texto libre: proyecto, speaker, etc.>>
![Page 31: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/31.jpg)
Operational Semantics
S1 S2 S3 S4SS
t3 6 91
:bob :isIn :hall
:bob :isIn :kitchen
:alice :isIn :hall
:alice :isIn :kitchen
Where are both alice and bob in the last 5s?
System 1: :hall [5] :kitchen [10]
System 2: :hall [3] :kitchen [9]
Both correct?ISWC 2013 evaluation track for "On Correctness in RDF stream
processor benchmarking" by Daniele Dell’Aglio, Jean-Paul Calbimonte, Marco Balduini, Oscar Corcho and Emanuele Della Valle
![Page 32: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/32.jpg)
Conclusions…
32<<Texto libre: proyecto, speaker, etc.>>
![Page 33: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/33.jpg)
Next steps in the community group…
• Agree on an RDF model? • Metamodel?• Timestamps in graphs?• Timestamp intervals• Compatibility with normal (static) RDF
• Additional operators for SPARQL?• Windows (not only time based?)• CEP operators• Semantics
• Go Web• Volatile URIs• Serialization: terse, compact• Protocols: HTTP, Websockets?
![Page 34: On the need for a W3C community group on RDF Stream Processing](https://reader036.vdocuments.us/reader036/viewer/2022081519/555dec8bd8b42a1e2c8b57ad/html5/thumbnails/34.jpg)
On the need for a W3C community group on RDF
Stream Processing
ISWC2013 Workshop on Ordering and Reasoning, Sydney, 22/10/2013
Oscar [email protected], [email protected]
@ocorchohttp://www.slideshare.net/ocorcho/