fueling the future with semantic web patterns - keynote at wop2014@iswc
DESCRIPTION
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.TRANSCRIPT
Fueling the future with Semantic Web Patterns
Valentina Presutti!STLab Institute of Cognitive Sciences and Technologies, CNR, Rome (IT)!
!WOP 2014, October 19th, Riva del Garda (IT)!
Outline
2
• Can we implement the original Semantic Web scenario?
• Knowledge sources heterogeneity problem
• Semantic alignment at pattern level
• Knowledge Patterns as key elements
• Some STLab results on KP-based knowledge extraction
• A possible research direction to pattern alignment
• Conclusion
What’s the message?
Knowledge Patterns are a wormhole in the Web to knowledge interpretation and
understanding
3
We all want a Personal Assistant Robot!
Answering our questionsGiving opinion
on facts and things Providing
guidelines for procedures
Solving our problems Planning and
reminding our schedule
WOODY4
–Tim Berners-Lee, James Hendler and Ora Lassila, 2001
“Pete and Lucy could use their agents to carry out all these tasks thanks not to the World Wide Web of today but rather the Semantic Web that
it will evolve into tomorrow.”
WOODY
5
Today is 13 years later
How would we implement it?6
Background knowledge
7
Background knowledge
8
Heterogeneity
We want WOODY to read and understand background knowledge and use it in a smart way
!
Structured and Unstructured data
Syntactic and Semantic introperability
Syntactic interoperability
Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers 2011
Heterogeneity
• To unify the format of knowledge sources enabling e.g. distributed query
Semantic interoperability
• Making sense of distributed data
• Enabling their automatic interpretation
• Different semantic perspectives must be addressed
10
Heterogeneity
Semantic interoperability
An ontology is a formal specification of a shared
conceptualisation
11
Heterogeneity
This definition is valid for any Semantic Web knowledge resource
Semantic interoperability: formal specification
• Shared knowledge representation language
• Semantic interoperability to the extent of its formal semantics
12
rdfs:subClassOf
owl:equivalentClass
owl:sameAs
rdfs:subPropertyOf
owl:equivalentProperty
Semantic interoperability: conceptualisation
• We have to cope with knowledge sources conceptualisations
• Aligning knowledge sources at a conceptual level
13
formal specification
knowledge representation
cognition
conceptualisation
Semantic alignment
Semantic alignment 1+2+3
• One-by-one alignment of classes, properties and individuals
Xianpei Han, Le Sun, Jun Zhao: Collective entity linking in web text: a graph-based method, Proceedings of SIGIR 2011, ACM. Euzenat, Jérôme, Shvaiko, Pavel: Ontology Matching 2nd ed. 2013, Springer.
Semantic alignment 1+2+3• Alignment to foundational
theories, e.g. DOLCE
• They provide a universal reference framework from which to derive all possible consequences, inferences, errors.
• Assumption: foundational theory axioms always hold
Daniel Oberle et al., DOLCE ergo SUMO: On foundational and domain models in the SmartWeb Integrated Ontology (SWIntO). J. Web Sem. 5(3): 156-174 (2007) Aldo Gangemi, Nicola Guarino, Claudio Masolo, Alessandro Oltramari, Luc Schneider: Sweetening Ontologies with DOLCE. EKAW 2002: 166-181
Prateek Jain et al.: Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton Smith B, Rosse C.: The role of foundational relations in the alignment of biomedical ontologies. Stud Health Technol Inform. 2004;107(Pt 1):444-8
dul:Agent!dul:NaturalPerson
Semantic alignment 1+2+3
• They provide a decontextualized view on data
• It is not enough for contextualized interoperability: making sense of data for a certain interactive/cognitive task
17
Alignment one-by-one Alignment to foundational theories
18
Imagine we are interested in comparing the governors of California based on the laws they created.
18
Imagine we are interested in comparing the governors of California based on the laws they created.
one-by-one
one-b
y-one
one-
by-o
ne
one-
by-o
ne
one-by-one
one-by-one
18
Imagine we are interested in comparing the governors of California based on the laws they created.
one-by-one
one-b
y-one
one-
by-o
ne
one-
by-o
ne
one-by-one
one-by-one
In order to select the information that are relevant for performing our task we need to extract only those facts that are framed by certain political concepts and relations.
lmdb:Terminator rdf:type lmdb:film lmdb:Terminator lmdb:actor dbpedia:Arnold_Schwarzenegger lmdb:Terminator lmdb:date ^^xsd:date:1984 lmdb:Terminator lmdb:directordbpedia:James_Cameron lmdb:Terminator lmdb:sequel dbpedia:Terminator_2 dbpedia:Arnold_Schwarzenegger rdf:type dbpedia-owl:Office_Holder dbpedia:Arnold_Schwarzenegger dbpprop:predecessor dbpedia:Lee_Haney dbpedia:California_foie_gras_law dbpprop:governor dbpedia:Arnold_Schwarzenegger
ex:law_dp_CA_2010 rdf:type ex:Law ex:law_dp_CA_2010 ex:creator dbpedia:Arnold_Schwarzenegger ex:law_dp_CA_2010 ex:jurisdiction dbpedia:California ex:law_dp_CA_2010 ex:name ex:drug_policy_CA_2010 ex:law_dp_CA_2010 ex:creationTime ^^xsd:date:2010 ex:law_dp_CA_2010 ex:forbidden “marijuana possession of up to one ounce”
The boundary problem
Aldo Gangemi, Valentina Presutti: Towards a pattern science for the Semantic Web. Semantic Web 1(1-2): 61-68 (2010)
lmdb:Terminator rdf:type lmdb:film lmdb:Terminator lmdb:actor dbpedia:Arnold_Schwarzenegger lmdb:Terminator lmdb:date ^^xsd:date:1984 lmdb:Terminator lmdb:directordbpedia:James_Cameron lmdb:Terminator lmdb:sequel dbpedia:Terminator_2 dbpedia:Arnold_Schwarzenegger rdf:type dbpedia-owl:Office_Holder dbpedia:Arnold_Schwarzenegger dbpprop:predecessor dbpedia:Lee_Haney dbpedia:California_foie_gras_law dbpprop:governor dbpedia:Arnold_Schwarzenegger
ex:law_dp_CA_2010 rdf:type ex:Law ex:law_dp_CA_2010 ex:creator dbpedia:Arnold_Schwarzenegger ex:law_dp_CA_2010 ex:jurisdiction dbpedia:California ex:law_dp_CA_2010 ex:name ex:drug_policy_CA_2010 ex:law_dp_CA_2010 ex:creationTime ^^xsd:date:2010 ex:law_dp_CA_2010 ex:forbidden “marijuana possession of up to one ounce”
similar
The boundary problem
Aldo Gangemi, Valentina Presutti: Towards a pattern science for the Semantic Web. Semantic Web 1(1-2): 61-68 (2010)
Semantic alignment 1+2+3
• We need interoperability at the level of groups of relations that together identify specific interpretational contexts!
• We need local reference theories defining conceptual boundaries -> Knowledge Patterns*
20 *(cf. Gangemi&Presutti, 2010)
Patterns are present in the (Semantic) Web
domain
22
Administrative frames
Geographic frames
Communication frames
DBpedia
Top-down resources• Linguistic resources: FrameNet,
VerbNet, Corpus Pattern Analysis
• Ontology Design Patterns (Content Patterns)
• EarthCube content patterns
• Component Library
• Cyc micro theories
• Data model patterns (David C. Hay)
• Infobox templates, microformats
23
All of them define patterns that provide conceptual context for
representing data
Knowledge extraction methods
• Entity Linking based on key discovery (almost-key discovery*)
• Data/graph mining: frequent itemset/subgraphs, anomalies
• NLP: frame detection, event extraction
24* Danai Symeonidou: Automatic key discovery for Data Linking, PhD Thesis, 2014.
They all mine data looking for patterns that allow to
make sense of it.
Independently of the specific data structure or knowledge representation format, certain patterns
share a same intensional meaning
25
KP hypothesis
26
Three heterogeneous knowledge sources (different data structures, different format), but sharing the same intensional meaning i.e. describing a cooking situation
26
Knowledge Pattern
Three heterogeneous knowledge sources (different data structures, different format), but sharing the same intensional meaning i.e. describing a cooking situation
27
Three heterogeneous knowledge sources (different data structures, different format), but sharing the same intensional meaning i.e. modelling of a cooking situation
27
Knowledge Pattern
Three heterogeneous knowledge sources (different data structures, different format), but sharing the same intensional meaning i.e. modelling of a cooking situation
Cognitive foundations of KPs
• People tend to remember items that fit into a schema (cf. Bartlett and a lot of CS from then)
• In particular, schemas that are associated with some functional similarity (cf. Gibson’s affordances)
• Schema similar to (conceptual) frame, script, knowledge pattern
28
How to represent KPs• Class or property punning (with KP description)
• Property domain/range axiom punning (with KP roles)
• Typed named graphs
• OWL ontology modules (cf. ODP)
• SPARQL query patterns, SPIN patterns
• hasKey patterns
29
30
Pattern alignmentPeter Clark’s KP morphisms
Dedre Gentner’s analogical structure mapping
Content Pattern specialisation
31
Pattern alignment
Investigating the application of similarity measures to complex structures
vector spaces, graph matching, structure matching, etc.
Pattern alignment
• Network alignment (cf. Roded Sharan*) !
• Modular structure of conserved clusters among yeast, worm, and fly !
• Multiple network alignment revealed 183 conserved clusters.
32
*Roded Sharan et al.: Conserved patterns of protein interaction in multiple species, Pnas, 2005.
Some results at STLab on KP-based KE
Content Ontology Patterns
34
http://www.ontologydesignpatterns.org
Pattern-based Ontology Design
35
eXtreme Design
Including patterns in ontologies by design
Centrality discovery in datasetsmo:Track
mo:MusicArtist
mo:Playlist
mo:Torrent
tags:Tag
mo:Record
foaf:maker
rdfs:Literal
dc:titledc:datemo:image
dc:description
mo:track
tags:taggedWithTag
mo:available_as
mo:available_as
mo:available_as
Valentina Presutti, Lora Aroyo, Alessandro Adamou, Balthasar Schopman, Aldo Gangemi, Guus Schreiber: Extracting Core
Knowledge from Linked Data. COLD2011, CEUR-WS.org Vol-782.
36
Schema induction of linked datasets based on patterns. Patterns are built around central concepts and used for automatic design of SPARQL queries
Encyclopedic Knowledge Patterns: example
• An Encyclopedic Knowledge Pattern (EKP) is discovered from the paths emerging from Wikipedia page link structure
• They are represented as OWL2 ontologies
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valentina Presutti, Paolo Ciancarini: Encyclopedic Knowledge Patterns from Wikipedia Links. International Semantic Web Conference (1) 2011: 520-536
37
Serendipity in exploratory browsing
Aemoo: exploratory search based on EKP - Semantic Web Challenge @ISWC 2011 – Short listed, 4th place
http://www.aemoo.org
Andrea Giovanni Nuzzolese, Valentina Presutti, Aldo Gangemi, Alberto Musetti, Paolo Ciancarini: Aemoo: exploring knowledge on the web. WebSci 2013: 272-275
38
Using Encyclopedic Knolwedge Patterns for browsing Wikipedia
KP-based machine reading with FRED
39
http://wit.istc.cnr.it/stlab-tools/fred/
Valentina Presutti, Francesco Draicchio, Aldo Gangemi: Knowledge Extraction Based on Discourse Representation Theory and Linguistic Frames. EKAW 2012: 114-129
40
The New York Times reported that John McCarthy died. He invented the programming language LISP.
http://wit.istc.cnr.it/stlab-tools/fred/
KP-based machine reading with FRED
From natural language to linked data graphs, which are designed including event- and frame-based patterns
Relation discovery and property generation
41
http://wit.istc.cnr.it/kore-dev/legalo
Valentina Presutti et al. Uncovering the semantics of Wikipedia pagelinks. EKAW 2014.
f-measure=.83
Exploiting event- and frame-based patterns for relation discovery
Sentic frames from text
42
http://wit.istc.cnr.it/stlab-tools/sentilo
Overimposing sentic frames on event- and frame-based linked data graphs representing opinions, for sentiment analysis
Sentic frames from text
42
http://wit.istc.cnr.it/stlab-tools/sentilo
Overimposing sentic frames on event- and frame-based linked data graphs representing opinions, for sentiment analysis
Sentic frames from text
42
http://wit.istc.cnr.it/stlab-tools/sentilo
Overimposing sentic frames on event- and frame-based linked data graphs representing opinions, for sentiment analysis
• Hybridisation is the common factor of these methods
• Still far from solving the pattern alignment problem
• KP-based design of knowledge sources can support easier procedure for pattern alignment
43
Back to pattern alignment
45
KP hypothesis
Independently of the specific data structure or knowledge representation format, certain patterns share a same intensional meaning
46
Leveraging different techniques for knowledge extraction
Ontology Matching
Social Network Analysis
Frame detection
Data Mining
Graph Mining
Rules
Correspondence patterns
Unusual records
Frames
Association rulesFrequent subgraphs
AnomaliesFrequent itemset
Unifying their results by representing them as KPs
EventsEvent extraction
KP distributed system
Building a KP distributed system
The KP system starts with potentially approximate and incomplete patterns and evolves to become more and more robust and
accurate thanks to continuous feedback
Knowledge pattern system• Inspired by Minsky’s
frame-systems
• Statistical methods can help to identify relations between KPs:
• co-occurrence, causality, triggering, etc.
47
KPsKPs
KPs
KPs
KPs
KPs
KPs
Knowledge pattern system• Inspired by Minsky’s
frame-systems
• Statistical methods can help to identify relations between KPs:
• co-occurrence, causality, triggering, etc.
47
KPsKPs
KPs
KPs
KPs
KPs
KPs
A reviewing complaint case
• Imagine someone gets a paper rejection …
• … and comments on Facebook …
If we want to enable smart reasoning on heterogeneous sources we need a way to relate data
like this paper’s review with this FB status
KP entailment
E.g. Patrick Pantel’s “Verb Ocean”
reject [can-result-in] argue :: 11.634112
fn:Respond_to_proposal vo:can-result-in fn:Quarreling
reject ⊑ Respond_to_proposal argue ⊑ Quarrelingx ∈ Interlocutor.respond_to_proposal
y ∈ Speaker.respond_to_proposal z ∈ Proposal.respond_to_proposal
k ∈ Arguer1.quarreling m ∈ Arguer2.quarreling
n ∈ Issue.quarreling
= = ≈
reject(r,x,y,z,…) argue(s,k,m,n,…)entails⊢
However…• Automatic methods
are never 100% accurate
• Regularities can emerge for statistical significance even if they are not relevant
• We need procedure and metrics for validating KPs
52
http://tylervigen.com/
Patterns vs KP• A pattern is a motivated structure that is proposed
by experts or emerges from inductive methods
• A KP formalises the intensional description of a class of situations, events, cases, etc.
• When a proposed or emerging pattern is a KP?
• Real data are dirty: spurious correlations
• How to single out spurious ones?
–Protagoras, ~450 B.C.
“Human is the measure of all things.”
54
We need humans in the cycle
55
K KP
KK
K
K
K
Correspondence patterns
Unusual records
Frames
Association rulesFrequent subgraphs
Anomalies
Frequent itemset
Events
Ontology Matching
Social Network Analysis
Frame detection
Data Mining
Graph Mining
Rules
Event extraction
Crowdsourcing methods
We need humans in the cycle
55
K KP
KK
K
K
K
Correspondence patterns
Unusual records
Frames
Association rulesFrequent subgraphs
Anomalies
Frequent itemset
Events
Ontology Matching
Social Network Analysis
Frame detection
Data Mining
Graph Mining
Rules
Event extraction
Crowdsourcing methods
Marco Fossati, Claudio Giuliano, Sara Tonelli: Outsourcing FrameNet to the Crowd. ACL (2) 2013: 742-747
VideoGames with a purpose applied to semantic tasks http://knowledgeforge.org/, Roberto Navigli
Conclusion• We are less than half-way for implementing the original Semantic Web scenario
• A significant step ahead is introducing semantic interoperability at pattern level
• This requires the hybridisation of knowledge extraction methods as well as the reconciliation of patterns having different provenance (data mining, graph mining, ontology patterns, etc.)
• Knowledge Patterns are key element for enabling such hybridisation
• Knowledge Patterns should be organised as a distributed linked system where links are relations enabling smart reasoning
• A distributed KP system is a resource evolving by a feeding cycle, which includes human computation
56
Special thanks to:
Aldo Gangemi, Malvina Nissim, Misael Mongiovì, Claudia d’Amato for their help and inspiring discussions.