Stream Reasoning: mastering the velocity and the variety
dimensions of Big Data at once
Emanuele Della Valle DEIB -‐ Politecnico di Milano
@manudellavalle [email protected] hBp://emanueledellavalle.org
University of Olso, Norway -‐ 3.11.2015
It's a streaming world … • Off-‐shore oil operaQons
• Smart CiQes
• Global Contact Center
• Social networks
• Generate data streams!
E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning upon Rapidly Changing Informa:on. IEEE Intelligent Systems 24(6): 83-‐89 (2009)
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 2
… looking for reacQve answers … • What is the expected Qme to failure when that
turbine's barring starts to vibrate as detected in the last 10 minutes?
• Is public transportaQon where the people are?
• Who are the best available agents to route all these unexpected contacts about the tariff plan launched yesterday?
• Who is driving the discussion about the top 10 emerging topics ?
• Require conQnuous processing and reacQve answer
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 3
…with conflicQng requirements 1/8 A system able to answer those queries must be able to • handle massive datasets
– A typical oil producQon plaeorm is equipped with about 400.000 sensors
– Telecom data is the most pervasive data source in urban are, in Milano there are 1.8 million mobile users
– A global contact centre of a Telecom operator counts 500 millions of clients
– Facebook alone has 1.1 billion of acQve users
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 4
…with conflicQng requirements 2/8 A system able to answer those queries must be able to • process data streams on the fly
– The sensors on typical oil producQon plaeorm generates 10,000 observaQons per minute with peaks of 100,000 o/m
– The mobile users in Milano generates 20,000 call/sms/data connecQons per minute with peaks of 80,000 c/m
– A global contact centre receives 10,000 contacts per minute with peaks of 30,000 c/m
– Facebook, as of May 2013, observes 3 millions "I like" per minute
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 5
…with conflicQng requirements 3/8 A system able to answer those queries must be able to • cope with heterogeneous dataset
– The sensors on typical oil producQon have been deployed over 10 years by 10s of different producers
– Tens of data sources are normally needed to make sense of an urban phenomena
– A global contact centre consists in 100s of offices owned by different subsidiary companies engaged yearly
– Each social network has its own data model, APIs, …
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 6
…with conflicQng requirements 4/8 A system able to answer those queries must be able to • cope with incomplete data
– 10s of sensors and networking links broke down daily
– Coverage is incomplete
– Only standard cases are covered by fully machine processable data records 100s of contacts per minute are manage ad-‐hoc
– Conversa:ons happen outside the social networks, too!
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 7
…with conflicQng requirements 5/8 A system able to answer those queries must be able to • cope with noisy data
– Sensor out-‐of-‐opera:ng range
– Faulty sensors
– Agents misunderstand, get :red, …
– Irony, sarcasm, …
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 8
…with conflicQng requirements 6/8 A system able to answer those queries must be able to • provide reac:ve answers
– detecQon of dangerous situaQons must occur within minutes
– recommendaQons to ciQzens must be performed in few seconds
– rouQng a contact through each step of the decision tree must take less than a second
– Search autocompleQng may need to be updated every few minutes
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 9
…with conflicQng requirements 7/8 A system able to answer those queries must be able to • support fine-‐grained informa:on access
– IdenQfy a turbine among thousands
– Locate a bus among thousands
– Contact an agent among thousands
– IdenQfy an opinion maker among thousands of influencers for a topic
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 10
…with conflicQng requirements 8/8 A system able to answer those queries must be able to • integrate complex domain models of
– opera:onal and control process
– various city aspects
– contact management, contract types, agent skills, contactor profiles, …
– topics, user profiles, …
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 11
Challenges A system able to answer those queries must be able to • handle massive datasets x • process data streams on the fly x • cope with heterogeneous datasets x • cope with incomplete data x x • cope with noisy data x • provide reac:ve answers x • support fine-‐grained access x x • integrate complex domain models x
Volum
e'
Veloc
ity'
Varie
ty'
Verac
ity'
In Big Data terms
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 12
Grand challenge
• Volume + Velocity + Variety = hard deal
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org
Volu
me
months days hours min. sec. ms. velocity
ZB EB PB TB GB MB KB
Variety
13
A good reason to embrace it!
• ++ Variety à ++ value
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org
Valu
e
ms. sec. min. hours days months years velocity
Variety
14
From challenges to opportuniQes • Formally data streams are :
– unbounded sequences of Qme-‐varying data elements
• Less formally, in many applicaQon domains, they are: – a “conQnuous” flow of informaQon – where recent informa:on is more relevant as it describes the current state of a dynamic system
• OpportuniQes – Forget old enough informa:on – Exploit the implicit ordering (by recency) in the data
time
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 15
State-‐of-‐the-‐art: DSMS and CEP • A paradigma:c change! • ConQnuous queries registered over streams that
are observed trough windows
window
input streams streams of answer Registered ConQnuous Query
Dynamic System
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 16
DSMS and CEP vs. requirements
Requirement DSMS CEP
massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models
✗ ✗
✗
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 17
State of the art: OBDA
• Given ontology O and query Q, use O to rewrite Q as Q’ so that, for any set of ground facts A contained in mulQple databases: – answer(Q,O,A) = answer(Q’,!,A)
The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O
• Use mapping M to map Q’ to mulQple SQL queries to the various databases
Rewrite
O
Q Q’
Map SQL
M
answer AUiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 18
DSMS/CEP,OBDA vs. requirements
Requirement DSMS CEP
OBDA
massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models
✗ ✗
✗
✗
✗ ✗
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 19
Stream Reasoning • Research quesQon
– is it possible to make sense in real :me of mul:ple, heterogeneous, gigan:c and inevitably noisy and incomplete data streams in order to support the decision processes of extremely large numbers of concurrent users?
• Proposed approach
Complexity Raw Stream Processing
SemanQc Streams
DL-‐Lite
DL AbstracQon
SelecQon
InterpretaQon
Reasoning
Querying
Re-‐wriQng
Change Frequency
PTIME
NEXPTIME
104 Hz
1 Hz
Complexity vs. Dynamics AC0
H. Stuckenschmidt, S. Ceri, E. Della Valle, F. van Harmelen: Towards Expressive Stream Reasoning. Proceedings of the Dagstuhl Seminar on SemanQc Aspects of Sensor Networks, 2010.
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 20
Sub-‐research quesQons
1. Is it possible extend the Seman:c Web stack in order to represent heterogeneous data streams, conQnuous queries, and conQnuous reasoning tasks?
2. Does the ordered nature of data streams and the possibility to forget old enough informaQon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness?
3. Can SemanQc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams?
4. Are there prac:cal cases where processing data stream at semanQc level is the best choice?
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 21
Sub-‐research quesQons 1. Is it possible extend the Seman:c Web stack
in order to represent heterogeneous data streams, conQnuous queries, and conQnuous reasoning tasks?
2. Does the ordered nature of data streams and the possibility to forget old enough informaQon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness?
3. Can SemanQc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams?
4. Are there prac:cal cases where processing data stream at semanQc level is the best choice?
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 22
State-‐of-‐the-‐art: RDF model • RDF: Resource DescripQon Framework
– It allows to make statements about resources in the form of subject-‐predicate-‐object expressions
• In RDF terminology triples • E.g. @BarakObama posts "Four more years"
– A collecQon of RDF statements represents a labelled, directed graph
• In RDF terminology a graph • E.g., the tweet above by Barak Obama is connected to
– 800,000+ twiBer user profiles via retweets – 300,000+ twiBer user profiles favorite – …
subject predicate object
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 23
ContribuQon: RDF stream Models • RDF Stream (the C-‐SPARQL way)
– Unbound sequence of :me-‐varying triples – each represented by a pair made of an RDF triple and its Qmestamp
– Timestamp are non-‐decreasing (allowing for simultaneity) … @BarakObama posts "Four more years", 8:16PM 6 Nov 2012 @Alice posts "RT: Four more years", 8:17PM 6 Nov 2012 …
D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus: Querying RDF streams with C-‐SPARQL. SIGMOD Record 39(1): 20-‐26 (2010)
subject predicate object timestamp
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 24
ContribuQon: RDF stream Models
• RDF Stream (the Streaming Linked Data way) – Unbound sequence of :me-‐varying graphs – each represented by a pair made of an RDF graph and its Qmestamp
– Timestamps (if present) are monotonically increasing – Graphs act as a form of punctuaQon (all triples in a graph are simultaneous)
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org
D.F. Barbieri, E. Della Valle: A Proposal for Publishing Data Streams as Linked Data -‐ A Posi:on Paper. LDOW (2010) 25
RDF streams Qme semanQcs 1/3
• A RDF stream without Qmestamp is an ordered sequence of data items
• The order can be exploited to perform queries – Does Alice meet Bob before Carl? – Who does Carl meet first?
S e1
:alice :isWith :bob
e2
:alice :isWith :carl
e3
:bob :isWith :diana
e4
:diana :isWith :carl
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 26
RDF streams Qme semanQcs 2/3
• One Qmestamp: the Qme instant on which the data item occurs
• We can start to compose queries taking into account the Qme – How many people has Alice met in the last 5m? – Does Diana meet Bob and then Carl within 5m?
e1 e2 e3 e4 S
t 3 6 9 1
:alice :isWith :bob :alice :isWith :carl
:bob :isWith :diana :diana :isWith :carl
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 27
RDF streams Qme semanQcs 3/3
• Two Qmestamps: the Qme range on which the data item is valid (from, to]
• It is possible to write even more complex constraints: – Which are the meeQngs the last less than 5m? – Which are the meeQngs with conflicts?
.
S
t 3 6 9 1
:alice :isWith :bob :alice :isWith :carl
:bob :isWith :diana :diana :isWith :carl
e1
e2
e3
e4
D. Anicic, P. Fodor, S. Rudolph, & N. Stojanovic. EP-‐SPARQL: a unified language for event processing and stream reasoning. In WWW 2011, pages 635–644
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 28
Finding • The Seman:c Web stack can be extended so to incorporate streaming data as a first class ciQzen – RDF stream data model(s) – Con:nuous SPARQL syntax and semanQcs – Con:nuous deduc:ve reasoning semanQcs
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 29
Work in progress
• In 2013, an RDF Stream Processing (RSP) community group was created at W3C
hBp://www.w3.org/community/rsp/
• RSP data model and serializaQon – hBps://github.com/streamreasoning/RSP-‐QL/blob/master/SerializaQon.md
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 30
State-‐of-‐the-‐art: SPARQL
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 31
ContribuQon: ConQnuous-‐SPARQL
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 32
ContribuQon: ConQnuous-‐SPARQL Who are the opinion makers? i.e., the users who are
likely to influence the behavior their followers
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE {
?opinionMaker ?opinion ?res .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?res.
FILTER (cs:timestamp(?follower ?opinion ?res) > cs:timestamp(?opinionMaker ?opinion ?res) ) }
HAVING ( COUNT(DISTINCT ?follower) > 3 )
SR 2015, Austria -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 33
ContribuQon: ConQnuous-‐SPARQL Who are the opinion makers? i.e., the users who are
likely to influence the behavior their followers
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE {
?opinionMaker ?opinion ?res .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?res.
FILTER (cs:timestamp(?follower ?opinion ?res) > cs:timestamp(?opinionMaker ?opinion ?res) ) }
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Query registra:on (for con:nuous execu:on)
FROM STREAM clause
WINDOW
RDF Stream added as new ouput format
Buil:n to access :mestamps
SR 2015, Austria -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org
D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus: Querying RDF streams with C-‐SPARQL. SIGMOD Record 39(1): 20-‐26 (2010) 34
Finding • The Seman:c Web stack can be extended so to incorporate streaming data as a first class ciQzen – RDF stream data model – Con:nuous SPARQL syntax and semanQcs – Con:nuous deduc:ve reasoning semanQcs
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 35
AlternaQves to C-‐SPARQL • CQELS
– What: STREAM clause, focus on new answer – Ref: Le-‐Phuoc, D., Dao-‐Tran, M., Xavier Parreira, J., & Hauswirth, M.
A naQve and adapQve approach for unified processing of linked streams and linked data. In ISWC 2011, pages 370–388.
• SPARQLStream – What: window in the past, focus on RDF to Stream operators – Ref: Calbimonte, J.-‐P., Corcho, O., & Gray, A. J. G. Enabling ontology-‐based
access to streaming data sources. In ISWC, 2010, pages 96–111. • EP-‐SPARQL
– What: focus on event specific operators – Ref: Anicic, D., Fodor, P., Rudolph, S., & Stojanovic, N. EP-‐SPARQL: a unified
language for event processing and stream reasoning. In WWW 2011, pages 635–644.
• TEF-‐SPARQL – What: adds "facts" as first class elements – Ref: hBps://www.merlin.uzh.ch/publicaQon/show/8467
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 36
AlternaQves to C-‐SPARQL • Comparison between exisQng approaches
System S2R R2R Time-‐aware R2S
C-‐SPARQL Engine Logical and triple-‐based
SPARQL 1.1 query
Qmestamp funcQon Batch only
Streaming Linked Data Framework
Logical and graph-‐based
SPARQL 1.1 no Batch only
SPARQLstream Logical and triple-‐based
SPARQL 1.1 query
no Ins, batch, del
CQELS Logical and triple-‐based
SPARQL 1.1 query
no Ins only
TEF-‐SPARQL no SPARQL-‐like Temporarily Facts, BEFORE SINCE, UNTIL, DURING,
Batch only
EP-‐SPARQL no SPARQL 1.0 SEQ, PAR, AND, OR, DURING, STARTS, EQUALS, NOT, MEETS, FINISHES
Ins only
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 37
Work in progress at RSP@W3C • RSP-‐QL
– Syntax • hBps://github.com/streamreasoning/RSP-‐QL/blob/master/RSP-‐QL%20Sample%20Queries.md
– Proposed semanQcs • D.Dell'Aglio, E.Della Valle, J.-‐P.Calbimonte, Ó. Corcho: RSP-‐QL SemanQcs: A Unifying Query Model to Explain Heterogeneity of RDF Stream Processing Systems. Int. J. SemanQc Web Inf. Syst. 10(4): 17-‐44 (2014)
– SemanQcs (work in progress) • hBps://github.com/streamreasoning/RSP-‐QL/blob/master/SemanQcs.md
– Quick ref. • D. Dell'Aglio, J.-‐P. Calbimonte, E. Della Valle, Ó. Corcho: Towards a Unified Language for RDF Stream Query Processing. ESWC (Satellite Events) 2015: 353-‐363
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 38
ContribuQon: conQnuous deducQve reasoning
• DL Ontology Stream ST – A ontology stream with respect to a staQc Tbox T is a sequence of Abox axioms ST(i)
• A Windowed Ontology Stream ST(o,c] – A windowed ontology stream with respect to a staQc Tbox T is the union of the Abox axioms ST(i) where o<i≤c
• Reasoning on a Windowed Ontology Stream ST(o,c] is as reasoning on a staQc DL KB
SR 2015, Austria -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 39 Emanuele Della Valle, Stefano Ceri, Davide Francesco Barbieri, Daniele Braga, Alessandro Campi: A First Step Towards Stream Reasoning. FIS 2008: 72-‐81
discusses discusses discusses
discusses discusses
discusses
discusses
Example of conQnuous deducQve reasoning
What impact has been my micropost p1 creating in the last hour? Let’s count the number of microposts that discuss it … REGISTER STREAM ImpactMeter AS SELECT (count(?p) AS ?impact) FROM STREAM <http://…/fb> [RANGE 60m STEP 10m] WHERE { :Alice posts [ sr:discusses ?p ] }
p1 p3 p5 p8
p2 p4 p7
p6 7!
Transitive property
Alice posts p1 .
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 40
Finding • The Seman:c Web stack can be extended so to incorporate streaming data as a first class ciQzen – RDF stream data model – Con:nuous SPARQL syntax and semanQcs – Con:nuous deduc:ve reasoning semanQcs
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 41
AlternaQves to conQnuous deducQve (RDFS++) reasoning
• ETALIS – What: RDFS + Allen Algebra – Ref: Anicic, D., Rudolph, S., Fodor, P., & Stojanovic, N. Stream reasoning and
complex event processing in ETALIS. SemanQc Web, 3(4), 2012, 397–407. • STARQL
– What: • DL-‐Lite + ConjuncQve Query + Qme-‐series • SHI + Grounded ConjuncQve Queries + Qme-‐series
– Ref: ÖL Özçep, R Möller. Ontology Based Data Access on Temporal and Streaming Data. Reasoning Web, 2014
• ASP-‐based – What: Qme-‐decaying ASP – Ref: hBp://arxiv.org/abs/1301.1392
• LARS – What: high-‐level unified formal foundaQon for stream reasoning – Ref: H. Beck, M. Dao-‐Tran, T. Eiter, M. Fink: LARS: A Logic-‐Based Framework
for Analyzing Reasoning over Streams. AAAI 2015: 1431-‐1438H.
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 42
Sub-‐research quesQons 1. Is it possible extend the Seman:c Web stack
in order to represent heterogeneous data streams, conQnuous queries, and conQnuous reasoning tasks?
2. Does the ordered nature of data streams and the possibility to forget old enough informaQon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness?
3. Can SemanQc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams?
4. Are there prac:cal cases where processing data stream at semanQc level is the best choice?
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 43
ContribuQon: opQmize querying for reacQve answers
• C-‐SPARQL engine Qme window-‐based selecQon outperforms SPARQL filter-‐based selecQon (Jena-‐ARQ)
D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A.Re�nger, H. Wermser: Deduc:ve and Induc:ve Stream Reasoning for Seman:c Social Media Analy:cs IEEE Intelligent Systems, 30 Aug. 2010.
Our In-memory RDF stream processing
engine
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 44
Finding • Stream Reasoning task is feasible and the very nature of streaming data offers opportuniQes to op:mise reasoning tasks where data is ordered by recency and can be forgoBen a�er a while – C-‐SPARQL Engine prototype – IMaRS conQnuous incremental reasoning algorithm
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 45
Work in progress
• When volumes also maBers …
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 46
Join
Data Stream SPARQL endpoint
Window
Maintenance Policy
Local View
RSP engine
Web
Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein: Approximate Con:nuous Query Answering over Streams and Dynamic Linked Data Sets. ICWE 2015: 307-‐325
State-‐of-‐the-‐art deducQve reasoning
• Data-‐driven (a.k.a. forward reasoning)
• Query-‐driven – backward reasoning
• Query-‐driven – query rewriQng (a.k.a. ontology based data access)
Reasoner RDFdata SPARQL Inferred
data
ontology
SPARQL
ontology
RewriBen query
Reasoner
Reasoner RDFdata SPARQL
ontology
data
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 47
Naïve approaches to Stream Reasoning windowing then reasoning
• Data-‐driven (a.k.a. forward reasoning)
• Query-‐driven – backward reasoning
• Query-‐driven – query rewriQng (a.k.a. ontology based data access)
Reasoner RDF data SPARQL Inferred
data
ontology
ontology
RewriBen query
Reasoner
Reasoner RDF data
ontology
Window
Window
Window
SPARQL
SPARQL data
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 48
Not so naïve approach to stream reasoning
• The problem is that materializaQon (the result of data-‐driven processing) are very difficult to decrement efficiently. – State-‐of-‐the-‐art: DRed algorithm
• Over delete • Re-‐derive • Insert
Reasoner Inferred data
ontology
window inserQons
deleQons
Incremental !!!
SPARQL
Y. Ren, J. Z. Pan. OpQmising ontology stream reasoning with truth maintenance system. In CIKM (2011)
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 49
Is DRed needed? • DRed works with random inserQons and deleQons • In a streaming sedng, when a triple enters the window, given the size of the window, the reasoner knows already when it will be deleted!
• E.g., – if the window is 40 minutes long, and,
– it is 10:00, the triple(s) entering now
– will exit on 10:40. • Conclusion
– dele:ons are predictable
Time Enter
window Exit
window Explicitly in
window Inferred in
window # Exp. # Inf. SUM
10:00 A!B 1 1
10:10 B!C 1 1 2
10:20 A!E 2 1 3
10:30 E!C 2 1 3
10:40 A!B 2 1 3
10:50 B!C 2 1 3
11:00 A!E 0 0 0
A B
A B C A C
A B CE
A C
A B CE
A C
A CE
A C
A B CE
A C
CE
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 50
ContribuQon: IMaRS algorithm
• Idea: – add an expira:on :me to each triple and – use an hash table to index triples by their expiraQon Qme
• The algorithm 1. deletes expired triples 2. Adds the new derivaQons that are consequences of
inserQons annota:ng each inferred triple with an expira:on :me (the min of those of the triple it is derived from), and
3. when mul:ple deriva:ons occur, for each mulQple derivaQon, it keeps the max expiraQon Qme.
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 51
ContribuQon: IMaRS algorithm • Incremental Reasoning on RDF streams (IMaRS): new reasoning
algorithm opQmized for reacQve query answering
D.F. Barbieri, D. Braga, S.Ceri, E. Della Valle, M. Grossniklaus: Incremental Reasoning on Streams and Rich Background Knowledge. ESWC (1) 2010: 1-‐15 D. Dell'Aglio, E. Della Valle: Incremental Reasoning on RDF Streams. In A.Harth, K.Hose, R.Schenkel (Eds.) Linked Data Management, CRC Press 2014, ISBN 9781466582408
! Re-materialize after each window slide
! Use DRed
! IMaRS
% of deletions w.r.t. the content of the window
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 52
ContribuQon: IMaRS algorithm • comparison of the average Qme needed to answer
a C-‐SPARQL query, when 2% of the content exits the window each Qme it slides, using – A backward reasoner on the window content – DRed + standard SPARQL on the materializaQon – IMaRS + standard SPARQL on the materializaQon
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 53
Finding • Stream Reasoning task is feasible and the very nature of streaming data offers opportuniQes to op:mise reasoning tasks where data is ordered by recency and can be forgoBen a�er a while – C-‐SPARQL Engine prototype – IMaRS conQnuous incremental reasoning algorithm
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 54
OpQmizing for stream reasoning alternaQve approaches
• DyKnow – How: logical models of an observed dynamic system + metric temporal logics – Fredrik Heintz, Jonas Kvarnström, Patrick Doherty: Bridging the sense-‐reasoning gap:
DyKnow -‐ Stream-‐based middleware for knowledge processing. Advanced Engineering InformaQcs 24(1): 14-‐26 (2010)
• MorphStream – How: rewriQng in DSMS languages (one at a Qme) – Ref: Calbimonte, J.-‐P., Corcho, O., & Gray, A. J. G. Enabling ontology-‐based access to
streaming data sources. In ISWC, 2010, pages 96–111. • TR-‐OWL
– How: Truth maintenance for EL++ with syntacQc approximaQons – Ref: Y. Ren, J. Z. Pan. OpQmising ontology stream reasoning with truth maintenance
system. In CIKM (2011) • ETALIS
– How: rewriQng in prolog – Ref: Anicic, D., Rudolph, S., Fodor, P., & Stojanovic, N.. Stream reasoning and
complex event processing in ETALIS. SemanQc Web, 3(4), 2012, 397–407. (conQnues in the next slide)
SR 2015, Austria -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 55
OpQmizing for stream reasoning alternaQve approaches
• Sparkwave – How: extended RETE algorithm for windows and RDFS – Ref: Sparkwave: ConQnuous Schema-‐Enhanced PaBern Matching over RDF Data
Streams. Komazec S, Cerri D. DEBS 2012 • DynamiTE
– How: Truth maintenance for ρDF (a fragment of RDFS) – J. Urbani, A. Margara, C. J. H. Jacobs, F. van Harmelen, H.E. Bal: DynamiTE: Parallel
MaterializaQon of Dynamic RDF Data. ISWC (1) 2013: 657-‐672 • STARQL
– How: rewriQng on a scalable DSMS with Qme-‐series support – Ref: ÖL Özçep, R Möller. Ontology Based Data Access on Temporal and Streaming
Data. Reasoning Web, 2014 • ASP-‐based
– How: opQmizing ASP for incremental and Qme-‐decaying programs – Ref: hBp://arxiv.org/abs/1301.1392
• The Backward/Forward Algorithm – How: opQmizing DRed – B. MoQk, Y. Nenov, R.E.F. Piro, I. Horrocks: Incremental Update of Datalog
MaterialisaQon: the Backward/Forward Algorithm. AAAI 2015: 1560-‐1568
SR 2015, Austria -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 56
Sub-‐research quesQons 1. Is it possible extend the Seman:c Web stack
in order to represent heterogeneous data streams, conQnuous queries, and conQnuous reasoning tasks?
2. Does the ordered nature of data streams and the possibility to forget old enough informaQon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness?
3. Can SemanQc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams?
4. Are there prac:cal cases where processing data stream at semanQc level is the best choice?
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 57
Cope with the noisy and incomplete data
• "Noise" is reduced using DSMS techniques • Deduc:ve stream reasoning copes with incompleteness deducing implicit facts • Induc:ve stream reasoning copes with "irrepairable" incompleteness inducing
missing facts
D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A. Re�nger, H. Wermser: Deduc:ve and Induc:ve Stream Reasoning for Seman:c Social Media Analy:cs. IEEE Intelligent Systems 25(6): 32-‐41 (2010)
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 58
Findings • A combina:on of deduc:ve and induc:ve stream reasoning techniques can cope with incomplete and noisy data
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 59
AlternaQve approaches • Stream Reasoning with ProbabilisQc Answer Set Programming – MaBhias Nickles, Alessandra Mileo: Web Stream Reasoning Using ProbabilisQc Answer Set Programming. RR 2014: 197-‐205
– Anastasios SkarlaQdis, Georgios Paliouras, Alexander ArQkis, George A. Vouros: ProbabilisQc Event Calculus for Event RecogniQon. ACM Trans. Comput. Log. 16(2): 11:1-‐11:37 (2015)
– Anni-‐Yasmin Turhan, Erik Zenker: Towards Temporal Fuzzy Query Answering on Stream-‐based Data. HiDeSt@KI 2015: 56-‐69
SR 2015, Austria -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 60
Sub-‐research quesQons 1. Is it possible extend the Seman:c Web stack
in order to represent heterogeneous data streams, conQnuous queries, and conQnuous reasoning tasks?
2. Does the ordered nature of data streams and the possibility to forget old enough informaQon allow to op:mize con:nuous querying and con:nuous reasoning tasks so to provide reac:ve answers to large number of concurrent users without forsaking correctness or completeness?
3. Can SemanQc Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams?
4. Are there prac:cal cases where processing data stream at semanQc level is the best choice?
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 61
ContribuQon: Streaming Linked Data Framework
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 62
Stream Bus
Recorder Re-player
Analyser Decorator
Adapter Publisher Visualizer Stream
HTT
P
HTT
P
Data Source Streaming Linked Data Server HTML5 Browser
Marco Balduini, Emanuele Della Valle, Daniele Dell'Aglio, Mikalai Tsytsarau, Themis Palpanas, CrisQan Confalonieri: Social Listening of City Scale Events Using the Streaming Linked Data Framework. InternaQonal SemanQc Web Conference (2) 2013: 1-‐16
ContribuQon: RSP services
• RSP services: a RESTful interface for RSP engines
– hBp://streamreasoning.org/download/rsp-‐services
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 63
PracQcal cases • 10+ deployments in Sensor Networks & Social media analyQcs, e.g.
BOTTARI
Winner of Semantic Web Challenge 2011
City Data Fusion
Winner of IBM
faculty award 2013
M. Balduini, I. Celino, D. Dell’Aglio, E. Della Valle, Y. Huang, T. Lee, S.-‐H. Kim, V. Tresp: BOTTARI: An augmented reality mobile applica:on to deliver personalized and loca:on-‐based recommenda:ons by con:nuous analysis of social media streams. J. Web Sem. 16: 33-‐41 (2012)
Social Listener
M.Balduini, E.Della Valle, M.Azzi, R.Larcher, F.Antonelli, and P.Ciuccarelli: CitySensing: Fusing City Data for Visual Storytelling. IEEE MulQMedia 22(3): 44-‐53 (2015)
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 64
Findings 1. The Seman:c Web stack can be extended so to incorporate
streaming data as a first class ciQzen – RDF stream data model – Con:nuous SPARQL syntax and semanQcs – Con:nuous deduc:ve reasoning semanQcs
2. Stream Reasoning task is feasible and the very nature of streaming data offers opportuniQes to op:mise reasoning tasks where data is ordered by recency and can be forgoBen a�er a while – IMaRS conQnuous incremental reasoning algorithm – C-‐SPARQL Engine prototype
3. A combinaQon of deduc:ve and induc:ve stream reasoning techniques can cope with incomplete and noisy data
4. There are applica:on domains where Stream Reasoning offers an adequate soluQon
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 65
Open issues 1. The Seman:c Web stack can be extended
– "NavigaQng the Chasm between the Scylla of PracQcal ApplicaQons and the Charybdis of TheoreQcal Approaches"
A. Bernstein, 2015 2. Stream Reasoning task is feasible
– It's Qme to start removing assumpQons • knowledge does not change • background data does not change
– OBDA for SQL ≠ OBDA for conQnuous querying 3. Stream reasoning can cope with incomplete and noisy data
– Theory is needed! 4. There are applica:on domains where Stream Reasoning offers
an adequate soluQon – Rigorous quanQtaQve comparaQve research is needed
UiO, Norway -‐ 3.11.2015 @manudellavalle -‐ hBp://emanueledellavalle.org 66
AdverQsements :-‐P
• Check out my PhD thesis – hBp://dare.ubvu.vu.nl/handle/1871/53293 – Chapter 1: IntroducQon
• The content of this presentaQon – Chapter 8: conclusions
• A review of stream reasoning approaches updated in spring 2015
• Put an "I like" to Stream Reasoning on Facebook – hBps://www.facebook.com/streamreasoning
@manudellavalle -‐ hBp://emanueledellavalle.org UiO, Norway -‐ 3.11.2015 67
Thank you! Any QuesQon?
Emanuele Della Valle DEIB -‐ Politecnico di Milano [email protected] hBp://emanueledellavalle.org
University of Olso, Norway -‐ 3.11.2015