aldo gangemi - meaning on the web: an empirical design perspective
DESCRIPTION
Presentation from Aldo Gangemi at SSSW 2012TRANSCRIPT
Meaning on the WebAn Empirical Design Perspective
Aldo [email protected]
Semantic Technology LabInstitute for Cognitive Sciences and Technologies
National Research Council, Rome, ItalyWork described is by STLab people:
Francesco Draicchio, Alberto Musetti, Andrea Nuzzolese, Valentina PresuttiAlessandro Adamou, Eva Blomqvist, Enrico Daga
1
What is this?
2
Are these similar?
3
What do you mean?• Mathieu did it slowly and deliberately, at
midnight, in the bathroom, with a knife
4
Killing?
5
Shaving?
6
Hanging puppets?
7
Actually ...• Mathieu buttered the toast slowly and
deliberately, at midnight, in the bathroom, with a knife
8
Donald Davidson: The Logical Form of Ac1on Sentences, 1967“the 'it' ... seems to refer to some en?ty”Davidson’s analysis started formal analysis of language in a modern way, and gave a founda?on to frame logics, which were the original use case for descrip?on logics ...That ‘it’ is key to meaning
What is the key to access meaning?• When interpreting images? pattern recognition• When understanding text? parsing, resolution, expectations (background
knowledge), hypothesis formation• For querying databases? key discovery• For social data mashups? application design (e.g. FB)• For linked data mashups? sparql queries, ontologies• For event recognition, situation awareness? context ontologies, cqels queries• Such “keys” provide access by putting the pieces together. Can we make
ontology design as an empirical science of putting pieces of meaning together?
9
Cognitive background
10
Blink and “thin slicing”• Many organisms share the ability to gather a
snap judgment about a situation in a blink of an eye (cf. M. Gladwell, Blink)–making sense of situations based on thin slices of
experience: few color spots, a quick movement, a hat, the length of hair, a facial expression, the relative position of two persons, ...
• Positive and negative slicing–quick reaction and expectation building vs.
stereotypes and prejudices
11
Blink and “thin slicing”• Many organisms share the ability to gather a
snap judgment about a situation in a blink of an eye (cf. M. Gladwell, Blink)–making sense of situations based on thin slices of
experience: few color spots, a quick movement, a hat, the length of hair, a facial expression, the relative position of two persons, ...
• Positive and negative slicing–quick reaction and expectation building vs.
stereotypes and prejudices
11
Foreground and background• People tend to remember things because they
stick out from the background (“profiling”)– but what makes a background as such?
• Expectations create scenarios– even things that are not there become part of the
scenario if activated by an expectation• Cf. Gestalt psychology (Köhler, Langacker, etc.)
12
Schema-based memory• People tend to remember items that fit into a
schema. – Things that are associated through some functional
similarity (cf. Gibson’s affordances)• Schemata seem to be learnt mostly inductively
–blocks world, repeated verbalization of invariant scenes, peek-a-boo, etc. Cf. Deb Roy’s TED talk
• Schema similar to (conceptual) frame, script, knowledge pattern, etc.
13
Origin of modern frames and knowledge patterns
• «When one encounters a new situation (or makes a substantial change in one's view of the present problem) one selects from memory a structure called a Frame. This is a remembered framework to be adapted to fit reality by changing details as necessary … a frame is a data-structure for representing a stereotyped situation» (Minsky 1975)
• Frames, schemas, scripts … «These large-scale knowledge configurations supply top-down input for a wide range of communicative and interactive tasks … the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing» (Beaugrande, 1980)
How many frames?
• We (STLab) are collecting, reengineering, and aligning knowledge patterns from different knowledge formats into RDF and OWL–Web data: microformats, microdata, XML patterns–Linked data: induced schemas, data patterns–(Semantic web) ontologies: ontology design patterns,
extracted modules, query patterns–Linguistics: FrameNet and VerbNet frames, NLP-
extracted selectional restrictions, situations from parsed NL text
15
Limited keys totext/discourse understanding
16
What is a window?• (i) opening in a wall
– An opening, usually covered by one or more panes of clear glass, to allow light and air from outside to enter a building or vehicle. (Wiktionary)
• (ii) glass-filled frame fitting an opening in a wall– Welcome to Window World, America's Largest
Replacement Windows Company! (...) installs over 1 million replacement windows annually. (Ad)
• (iii) glass-filled frame– A window is a transparent or translucent opening in
a wall or door that allows the passage of light and, if not closed or sealed, air and sound. (Wikipedia)
• (iv) glass from glass-filled frame– A pane of glass in a window. (WordNet)
• (v) frame from glass-filled frame– A framework of wood or metal that contains a glass
windowpane and is built into a wall or roof to admit light or air. (WordNet) 17
Genres of polysemy in WordNet• Adam’s apple (metaphor)
– thyroid cartilage_1 (dul:PhysicalObject)– crape jasmine_1 (dul:Organism)
• Bucket shop (diachronic metaphor)– bucket shop_2 (dul:PhysicalObject)
• “(formerly) a cheap saloon selling liquor by the bucket”
– bucket shop_1 (dul:Organization)• “an unethical or overly aggressive brokerage firm”
• Stadium (established metonymy, “dot object”)– stadium_1 (dul:PhysicalObject)
• “a large structure for open-air sports or entertainments”
– arena_4 (d0:Location)• “a playing field where sports events take place”
– do we need a relation between these senses, or a coarser concept? 18
Dot objects and co-predication (Pustejovsky, Asher)
• This book is heavy but interesting– physical vs. information
• Lunch was delicious but took forever– food vs. event
• I saw the Coliseum in my tourist guide and wanted to go there– artifact vs. place
• Actually relevant?– Power of ambiguity (“systematic polysemy”)– Minimal effort seems to count in human evolution of lexical knowledge, but only if
we can easily reconstruct the context (or frame, relation, ...)• “The communicative function of ambiguity in language” (Piantadosi, Tilyb,
Gibson): ambiguity allows for greater ease of processing by permitting efficient linguistic units to be re-used. “All efficient communication systems will be ambiguous, assuming that context is informative about meaning”
– Also in science: inflammation has six interrelated meanings19
“Fictive motion” (Talmy)• The path descended abruptly• The road runs along the coast for two hours• The fence zigzags from the plateau to the valley• The highway crawls through the city• The road leads us to Cercedilla• Need for “type coercion” to satisfy hidden frame
– highway is actually a path that “can be crawled”, therefore crawling frame here is descriptive of a state, not of an action
– fence is actually an object whose shape “can be followed by zigzaging”– road is actually an object that “can be followed as an indication” to our
destination– Actually an inversion of roles: the path descends because it can be
descended: another version of systematic polysemy20
Meaning as relation
21
Meaning as relation• Event discovery and recognition
– { Water in pot ; pot on the hot stove ; egg in water } ∴ ???
• Meaning as a relational thing– Does “Egg” have a meaning independently of its relations? no, if meaning is what someone
does (or might do) with something?– cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...
• Dynamics (induction, abduction) of general and domain relations– Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause)
21
Meaning as relation• Event discovery and recognition
– { Water in pot ; pot on the hot stove ; egg in water } ∴ ???
• Meaning as a relational thing– Does “Egg” have a meaning independently of its relations? no, if meaning is what someone
does (or might do) with something?– cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...
• Dynamics (induction, abduction) of general and domain relations– Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause)
21
FrameNet frameshave a par?al order
Meaning as relation• Event discovery and recognition
– { Water in pot ; pot on the hot stove ; egg in water } ∴ ???
• Meaning as a relational thing– Does “Egg” have a meaning independently of its relations? no, if meaning is what someone
does (or might do) with something?– cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...
• Dynamics (induction, abduction) of general and domain relations– Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause)
21
FrameNet frameshave a par?al order
Limited keys tosocial tagging analysis
22
Flickr, desire tag
23
Flickr, desire tag
24
Flickr, desire tag
25
Similarly in other folksonomies
• like in FB, G+, etc.– like what? thing, comment to thing, content of
cited thing, implicit negative attitude to thing?• positive/negative polarity in sentiment
analysis– polarity of what? tweet, citation, irony, ...
26
Limited keys tolinked data analysis
27
28
Aggregated RDF data for “Barack
Obama” (sig.ma)
What about the Knowledge Graph?
29
however, where are the IRIs? named en11es are only searchable within the Google engine, content is completely encrypted
Ontologies disagree
30
Is that a problem?For many, yes
schema.org• The use cases that motivated schema.org are
related to search engine requirements, probably to meet popularity and advertisement
• We received schemas from the “Sponsors”. As often happens, “sponsors” create meaning, because they have either authority or money
• Will ontologies be swallowed by the Sponsors?• Anyway, schemas from schema.org seem to
address keys more accurately, with some exceptions ...
31
Pragmatic issue: we can communicate with landforms
32
Pragmatic issue: we can communicate with landforms
32
Pragmatic issue: we can communicate with landforms
32
Pragmatic issue: we can communicate with landforms
32
Pragmatic issue: we can communicate with landforms
32
33
Application issue: bone pathologies, locations, functions are all literals
33
Application issue: bone pathologies, locations, functions are all literals
33
Application issue: bone pathologies, locations, functions are all literals
33
Application issue: bone pathologies, locations, functions are all literals
Creating keys for the Semantic Web
34
Top-down: expertise patterns• Evidence that units of expertise are larger than what we have
from average linked data triples, or ontology learning– Cf. cognitive scientist Dedre Gentner: “uniform relational representation is a
hallmark of expertise”– We need to create expertise-oriented boundaries unifying multiple triples– “Competency questions” are used to link ontology design patterns to
requirements:• Which objects take part in a certain event?• Which tasks should be executed in order to achieve a certain goal?• What’s the function of that artifact?• What norms are applicable to a certain case?• What inflammation is active in what body part with what morphology?
35
Layered pattern morphisms• A logical design pattern describes a formal expression that
can be exemplified, morphed, instantiated, and expressed in order to solve a domain modelling problem
• owl:Class:_:x rdfs:subClassOf owl:Restriction:_:y• Inflammation rdfs:subClassOf (localizedIn some BodyPart)• Colitis rdfs:subClassOf (localizedIn some Colon)• John’s_colitis isLocalizedIn John’s_colon• “John’s colon is inflammated”, “John has got colitis”, “Colitis is the
inflammation of colon”
LogicalPaPern(MBox)
GenericContentPaPern (TBox)
SpecificContentPaPern (TBox)
DataPaPern (ABox)
exemplifiedAs morphedAs instan1atedAs Linguis?cPaPern
expressedAs
Logic Meaning Reference Expression
expressedAs
expressedAs
Abstrac1on
Bottom-up: schema extraction (COLD2011)
37
Results• A method for extracting the main knowledge components
of a LD dataset
mo:available_as
mo:format
mo:Track
mo:Playlist
mo:Record
mo:track
mo:Record
mo:trackmo:Track
foaf:maker
foaf:name
mo:MusicAr1st
dc:1tle
mo:available_asmo:Playlist
mo:Record mo:track mo:Track mo:available_as mo:Playlist dc:format rdfs:Literalmo:Signal mo:published_as mo:Track mo:available_as mo:Playlist dc:format rdfs:Literalmo:MusicArtist foaf:made mo:Record mo:available_as mo:Torrent dc:format rdfs:Literalmo:MusicArtist foaf:made mo:Record mo:available_as mo:ED2K dc:format rdfs:Literalmo:MusicArtist foaf:made mo:Record mo:available_as mo:Playlist dc:format rdfs:Literal
Path
Cluster of paths
Knowledge PaPern
Path identification (length 3)mo:MusicAr?st
mo:Track
mo:Record
mo:Playlist
mo:Signal
rdfs:Literal
PREFIX mo : http://purl.org/ontology/mo/MusicArtist
rdfs:Resource
dc:1tle
mo:published_as
mo:recorded_as
(Jamendo)
foaf:Document
Path identification (length 3)mo:MusicAr?st
mo:Track
mo:Record
mo:Playlist
mo:Signal
rdfs:Literal
PREFIX mo : http://purl.org/ontology/mo/MusicArtist
rdfs:Resource
dc:1tle
mo:published_as
mo:recorded_as
(Jamendo)
foaf:Document
Path ElementPosi?on 2
Centrality (types)mo:MusicAr?st
mo:Track
mo:Record
mo:Playlist
mo:Signal
rdfs:Literal
PREFIX mo : http://purl.org/ontology/mo/MusicArtist
rdfs:Resource
(Jamendo)
foaf:Document
dc:1tle
mo:published_as
mo:track
mo:license
mo:track_number
mo:available_as
Centrality (properties)mo:MusicAr?st
mo:Track
mo:Record
mo:Playlist
mo:Signal
rdfs:Literal
PREFIX mo : http://purl.org/ontology/mo/MusicArtist
rdfs:Resource
(Jamendo)
foaf:Document
dc:1tle
mo:published_as
mo:license
mo:track_number
mo:available_as
mo:recorded_as
42
0 2 4 6 8 10 12
foaf:made foaf:maker
mo:available_as mo:published_as
mo:track mo:time
tags2:taggedWithTag
0 20000 40000 60000 80000 100000 120000 140000
mo:available_as tags2:taggedWithTag
mo:published_as mo:time mo:track
foaf:made foaf:maker
0 2 4 6 8 10 12
mo:Record
mo:Track
mo:MusicArtist
mo:Playlist
mo:Signal
time:Interval
mo:ED2K
mo:Torrent
tags2:Tag
mo:Lyrics
0 20000 40000 60000 80000 100000 120000
mo:Playlist
mo:Track
mo:Signal
time:Interval
mo:ED2K
mo:Torrent
tags2:Tag
mo:Lyrics
mo:Record
mo:MusicArtist
Centrality in Jamendo
frequency
betweenness
Emerging Knowledge Patternmo:Track
mo:MusicAr.st
mo:Playlist
mo:Torrent
mo:ED2K
tags:Tag
mo:Record
foaf:maker
rdfs:Literal
dc:1tle
dc:datemo:image
dc:descrip1on
mo:track
tags:taggedWithTag
mo:available_as
mo:available_as
mo:available_as
Bottom-up: Encyclopedic Knowledge Patterns (EKP) (ISWC2011)
• Improving knowledge exploration and summarization by:–Empirically discovering invariances in conceptual
organization of knowledge – encyclopedic knowledge patterns – from Wikipedia crowd-sourced page links
–Understanding the most intuitive way of selecting relevant entities used to describe a given entity
– Identifying the typical / atypical types of things that people use for describing other things
–Enabling serendipitous search
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
dbpo:MusicalArtist
dbpo:MusicalArtist dbpo:Organisation dbpo:Place
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
dbpo:MusicalArtistdbpo:MusicalArtist
dbpo:Organisation
dbpo:PlacelinksToMusicalArtist linksToPlace
linksToOrganisation
• Paths are used to discover Encyclopedic Knowledge Patterns– Such patterns should make it emerge the most typical types of things that the Wikipedia
crowd uses to describe a resource of a given type
Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links
dbpo:MusicalArtistdbpo:MusicalArtist
dbpo:Organisation
dbpo:PlacelinksToMusicalArtist linksToPlace
linksToOrganisation
• Paths are used to discover Encyclopedic Knowledge Patterns– Such patterns should make it emerge the most typical types of things that the Wikipedia
crowd uses to describe a resource of a given type
Path Pi,j= [Si, p, Oj]
Encyclopedic Knowledge Patterns• An Encyclopedic Knowledge Pattern (EKP) is discovered from the
paths emerging from Wikipedia page link invariances• They are represented as OWL2 ontologies
Paths and indicators• Emerging paths are stored in RDF according
to the “Knowledge Architecture” vocabulary–Cf. our COLD2011 paper “Extracting core
knowledge from linked data”• Paths and types are associated with a set of
indicators
nRes(dbpo:MusicalArtist)
Chad_Smith
John_Lennon
Michael_Jackson
PREFIX dbpo : http://dbpedia.org/ontology/
Paul_McCartney
Jackie_Jacksonrdf:type
Anthony_Kiedis
dbpo:MusicalArtist
dbpo:MusicalArtist
rdf:type
rdf:type
dbpo:MusicalArtist
rdf:type
rdf:type
rdf:type
Number of resources having type dbpo:MusicalArtist
dbpo:MusicalArtist
nSubjectRes(Pi,j)
Dave_Grohl
Foo Fighters
John_Lennon
Michael_Jackson
PREFIX dbpo : http://dbpedia.org/ontology/
Paul_McCartney
Jackson_5
Beatles
Jackie_Jackson
dbpo:MusicalArtist dbpo:Banddbpo:wikiPageWikiLinkSi Oj
Nirvana
nSubjectRes(Pi,j)
Dave_Grohl
Foo Fighters
John_Lennon
Michael_Jackson
PREFIX dbpo : http://dbpedia.org/ontology/
Paul_McCartney
Jackson_5
Beatles
Jackie_Jackson
dbpo:MusicalArtist dbpo:Banddbpo:wikiPageWikiLinkSi Oj
Nirvana
Number of distinct resources that participate in a path as subjects
pathPopularity(Pi,j,Si)
Dave_Grohl
Foo Fighters
John_Lennon
Michael_Jackson
Paul_McCartney
Jackson_5
Beatles
Jackie_Jackson
Nirvana
Madonna
PrinceCharlie_Parker Keith_Jarrett
nSubjectRes(Pi,j)/nRes(Si)
Path Popularity distribution examplePath pathPopularity
[MusicFestival,Band] 82.33[MusicFestival,MusicalArtist] 74.17[MusicFestival,Country] 74.02[MusicFestival,MusicGenre] 72.81[MusicFestival,City] 38.37[MusicFestival,AdministrativeRegion] 32.78[MusicFestival,MusicFestival] 23.26[MusicFestival,Album] 18.13[MusicFestival,Film] 12.39[MusicFestival,Stadium] 9.52[MusicFestival,RadioStation] 9.52[MusicFestival,Single] 8.76[MusicFestival,Town] 8.61[MusicFestival,Magazine] 8.46[MusicFestival,Broadcast] 7.55[MusicFestival,Newspaper] 6.95[MusicFestival,TelevisionShow] 6.04[MusicFestival,University] 5.74[MusicFestival,Continent] 5.59[MusicFestival,Comedian] 5.29[MusicFestival,OfficeHolder] 4.98[MusicFestival,Island] 4.98
Boundaries of EKPs• An EKP(Si) is a set of paths, such that
Pi,j ∈ EKP(Si) ! pathPopularity(Pi,j, Si) ≥ t
• t is a threshold, under which a path is not included in an EKP
• How to get a good value for t?
Boundary inductionStep Description
1. For each path, calculate the path popularity
2. For each subject type, get the 40 top-ranked path popularity values*
3. Apply multiple correlation (Pearson ρ) between the paths of all subject types by rank, and check for homogeneity of ranks across subject types
4. For each of the 40 path popularity ranks, calculate its mean across all subject types
5. Apply k-means clustering on the 40 ranks
6. Decide threshold(s) based on k-means as well as other indicators (e.g. FrameNet roles distribution)
* 40 covers most “core” path popularity values, as well as many of the unusual ones.
.9
k-means clustering on Path Popularity
Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
3 small clusters with ranks above 22.67%
Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
3 small clusters with ranks above 22.67%
Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity
k-means clustering on Path Popularity
1 big cluster (4-cluster) with ranks below 18.18%
3 small clusters with ranks above 22.67%
Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity
1 alternative cluster (6-cluster) with ranks below 11.89%
What is the “agreement” between DBpedia and our sample users?
Average multiple correlation (Spearman ρ) between users’ assigned scores, and
pathPopularityDBpedia based scoresSpearman ρ range: [-1,1] −1 = no agreement
+1 = complete agreement)
Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and
pathPopularityDBpedia based score
What is the “agreement” between DBpedia and our sample users?
Average multiple correlation (Spearman ρ) between users’ assigned scores, and
pathPopularityDBpedia based scoresSpearman ρ range: [-1,1] −1 = no agreement
+1 = complete agreement)
Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and
pathPopularityDBpedia based score
Good correlation
Satisfactory precision
What is the “agreement” between DBpedia and our sample users?
Average multiple correlation (Spearman ρ) between users’ assigned scores, and
pathPopularityDBpedia based scoresSpearman ρ range: [-1,1] −1 = no agreement
+1 = complete agreement)
Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and
pathPopularityDBpedia based score
Good correlation
Satisfactory precision
Pi,j ∈ EKP (Si) ⇔pathPopularity(Pi,j, Si) ≥
11%
Aemoo• http://aemoo.org exploratory search application based on
EKP
Comparing entities of the same type
58
Nicolas Sarkozy Ronald Regan Angela MerkelBarack Obama Silvio Berlusconi
Applying topic-sensitive EKP as lenses
59
Limited keys totext/discourse analysis
60
Ontology learning: what structures?
• State of art: what do we learn?– instances (NER): BarackObama– classes (sense tagging): Person, or even TelevisionActor– relations between instances: CabezaDeVaca departed Spain– axioms: mostly taxonomic or disjointness, some work also on
learning restrictions: TelevisionActor disjointWith TVSeries– entire ontologies, but only as collections of independently learnt
axioms
61
Kinds of text analyses• Lead example:
–“In early 1527, Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to occupy the mainland of North America. After landing near Tampa Bay, Florida on April 15, 1528, Cabeza De Vaca and three other men would be the only survivors of the expedition party of 600 men.”
62
Shallow parsing (Alchemy)• Language
– english
• Person (1)
– the only survivors
• GeographicFeature (1)
– Tampa Bay
• Continent (1)
– North America
• Country (1)
– Spain
• StateOrCounty (1)
– Florida
• Concept Tags (8)
– United States
– Florida
– Gulf of Mexico
– North America
– Tampa, Florida
– Narváez expedition
– American Civil War
– Tampa Bay
• Category
– Culture & Politics (confidence: 0.6598)
Named en?ty recogni?on
Language recogni?on
Concept tagging
Subject classifica?on
Rela?on (ac?on) extrac?on
Entity resolution (Wikifier)
64Wikipedia-‐based topic detec?onNamed en?ty resolu?on
Multimedia enrichment (Zemanta)
65
Related images
Related news
Related links and geo-‐referencing
Linked data positioning (RelFinder)
66
Deep parsing• Usage of syntactic parse trees• Formal semantic interpretation on top of
syntax
67
Categorial parsing
68
w(1, 1, 'In', 'in', 'IN', 'I-PP', 'O', '(S[X]/S[X])/NP').w(1, 2, 'early', 'early', 'JJ', 'I-NP', 'I-DAT', 'N/N').w(1, 3, '1527', '1527', 'CD', 'I-NP', 'I-DAT', 'N').w(1, 4, ',', ',', ',', 'O', 'I-DAT', ',').w(1, 5, 'Cabeza', 'Cabeza', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 6, 'De', 'De', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 7, 'Vaca', 'Vaca', 'NNP', 'I-NP', 'I-ORG', 'N').w(1, 8, 'departed', 'depart', 'VBD', 'I-VP', 'O', '((S[dcl]\NP)/PP)/NP').w(1, 9, 'Spain', 'Spain', 'NNP', 'I-NP', 'O', 'N').w(1, 10, 'as', 'as', 'IN', 'I-PP', 'O', 'PP/NP').w(1, 11, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 12, 'treasurer', 'treasurer', 'NN', 'I-NP', 'O', 'N').w(1, 13, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 14, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 15, 'Narvaez', 'Narvaez', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 16, 'royal', 'royal', 'NN', 'I-NP', 'O', 'N/N').w(1, 17, 'expedition', 'expedition', 'NN', 'I-NP', 'O', 'N').w(1, 18, 'to', 'to', 'TO', 'I-VP', 'O', '(S[to]\NP)/(S[b]\NP)').w(1, 19, 'occupy', 'occupy', 'VB', 'I-VP', 'O', '(S[b]\NP)/NP').w(1, 20, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 21, 'mainland', 'mainland', 'NN', 'I-NP', 'O', 'N').w(1, 22, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 23, 'North', 'North', 'NNP', 'I-NP', 'I-LOC', 'N/N').w(1, 24, 'America', 'America', 'NNP', 'I-NP', 'I-LOC', 'N').w(1, 25, '.', '.', '.', 'O', 'O', '.').w(1, 26, 'After', 'after', 'IN', 'I-PP', 'O', '(S[X]/S[X])/NP').w(1, 27, 'landing', 'landing', 'NN', 'I-NP', 'O', 'N').w(1, 28, 'near', 'near', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 29, 'Tampa', 'Tampa', 'NNP', 'I-NP', 'I-LOC', 'N/N').w(1, 30, 'Bay', 'Bay', 'NNP', 'I-NP', 'I-LOC', 'N').w(1, 31, ',', ',', ',', 'O', 'O', ',').w(1, 32, 'Florida', 'Florida', 'NNP', 'I-NP', 'I-LOC', 'N').w(1, 33, 'on', 'on', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 34, 'April', 'April', 'NNP', 'I-NP', 'I-DAT', 'N/N[num]').w(1, 35, '15', '15', 'CD', 'I-NP', 'I-DAT', 'N[num]').w(1, 36, ',', ',', ',', 'I-NP', 'I-DAT', ',').w(1, 37, '1528', '1528', 'CD', 'I-NP', 'I-DAT', 'N\N').w(1, 38, ',', ',', ',', 'O', 'O', ',').w(1, 39, 'Cabeza', 'Cabeza', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 40, 'De', 'De', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 41, 'Vaca', 'Vaca', 'NNP', 'I-NP', 'I-ORG', 'N').w(1, 42, 'and', 'and', 'CC', 'O', 'O', 'conj').w(1, 43, 'three', 'three', 'CD', 'I-NP', 'O', 'N/N').w(1, 44, 'other', 'other', 'JJ', 'I-NP', 'O', 'N/N').w(1, 45, 'men', 'man', 'NNS', 'I-NP', 'O', 'N').w(1, 46, 'would', 'would', 'MD', 'I-VP', 'O', '(S[dcl]\NP)/(S[b]\NP)').w(1, 47, 'be', 'be', 'VB', 'I-VP', 'O', '(S[b]\NP)/NP').w(1, 48, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 49, 'only', 'only', 'JJ', 'I-NP', 'O', 'N/N').w(1, 50, 'survivors', 'survivor', 'NNS', 'I-NP', 'O', 'N').w(1, 51, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 52, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 53, 'expedition', 'expedition', 'NN', 'I-NP', 'O', 'N/N').w(1, 54, 'party', 'party', 'NN', 'I-NP', 'O', 'N').w(1, 55, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 56, '600', '600', 'CD', 'I-NP', 'O', 'N/N').w(1, 57, 'men', 'man', 'NNS', 'I-NP', 'O', 'N').w(1, 58, '.', '.', '.', 'O', 'O', '.').
Relation extraction with OIE• Cabeza De Vaca ___departed___Spain• 0.9059944442645228• In early 1527 , Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to
occupy the mainland of North America .• IN JJ CD , NNP NNP NNP VBD NNP IN DT NN IN DT NNP JJ NN TO VB DT NN IN NNP NNP .• B-PP B-NP I-NP O B-NP I-NP I-NP B-VP B-NP B-PP B-NP I-NP I-NP I-NP I-NP I-NP I-NP B-VP I-
VP B-NP I-NP I-NP I-NP I-NP O• cabeza de vaca___depart___spain
• three other men___would be the only survivors of___the expedition party of 600 men• 0.36592485864619345• After landing near Tampa Bay , Florida on April 15 , 1528 , Cabeza De Vaca and three other men
would be the only survivors of the expedition party of 600 men .• IN NN IN NNP NNP , NNP IN NNP CD , CD , NNP NNP NNP CC CD JJ NNS MD VB DT JJ NNS
IN DT NN NN IN CD NNS .• B-PP B-NP B-PP B-NP I-NP O B-NP B-PP B-NP I-NP I-NP I-NP O B-NP I-NP I-NP O B-NP I-NP I-
NP B-VP I-VP B-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP O• # other men___be survivor of___the expedition party of # men
69
Creating keys for NL
70
DRT• Discourse Representation Theory
– Hans Kamp: “A theory of truth and semantic representation”, 1981• A sentence meaning is taken to be an update operation on a context• DRT represents “the discourse context” as a discourse representation
structure (or DRS). A DRS includes:– A set of referents: the entities which have been introduced into the
context– A set of conditions: the predicates which are known to hold of these
entities– Basically a fragment of FOL
71
DRT notation
72
Boxer• Implementation of computational semantics (J. Bos)
with DRT output and Davidsonian predicates: reification of n-ary relations– E.g. preparing a coffee: agent, instrument, mix, place, time,
method, ...
• Semantic role labelling with VerbNet or FrameNet roles
• Pragmatic grasp, with statistical NER and sense tagging, tense logic, co-reference, presupposition, sentence integration, entailment, ...
73
DRT semantic parsing + semantic role labellling in Boxer
74
Issues when porting (Boxer) DRT to RDF
• Discourse referent variables: implicit or explicit?• No terminology recognition/extraction• No term compositionality• No periphrastic relations for properties (e.g. survivorOf)• Redundant variables• Missing definitional pragmatics• Implicit local restrictions• Many types of redundant “boxing” for embedded propositions, non-
standard negation, etc.• How to map efficiently to RDF/OWL?
75
Using FRED (EKAW2012)
76
http://wit.istc.cnr.it/stlab-tools/fred/
FRED RDF graph output
77
Another example
78
The New York Times reported that John McCarthy died. He invented the programming language LISP.
Heuristics to map Boxer DRT to RDF (1/2)
• Default predicates – H_pred1: special vocabulary for defaults, e.g. boxer:Per rdf:type owl:Class– H_pred2: mappings to existing vocabularies, e.g. foaf:Person ; dul:associatedWith ; time:Interval
• Default roles (SRL binary predicates)– H_rol: VerbNet/FrameNet vocabularies, e.g. verbnet:agent rdf:type owl:ObjectProperty
• Domain predicates– H_dom: default namespace, customizable, e.g. domain:Expedition ; domain:of
• Discourse referent variables: implicit or explicit?– H_ref: only referents of predicates are materialized, e.g. domain:survivor_1 rdf:type
domain:Survivor ; depart_1 rdf:type domain:Depart
• No terminology recognition/extraction– H_term: co-referential predicate merging, e.g. domain:ExpeditionParty
• No term compositionality– H_termc: create taxonomies by “unmerging” predicates, e.g. domain:ExpeditionParty
rdfs:subClassOf domain:Party
• No periphrastic relations– H_peri: generate “webby” properties, e.g. domain:survivorOf rdf:type owl:ObjectProperty79
• Redundant variables– H_red: unify variables based on local Unique Name Assumption
• Boolean constructs– H_boo: special relations between domain predicates or individuals, e.g. domain:man_1
owl:differentFrom domain:man_2
• Propositions as arguments– H_pro: special relation between events, e.g. domain:report_1 vn:theme domain:depart_1
• Missing definitional pragmatics– H_def: bypassing discourse referents and forcing universal quantification, e.g.
domain:WindInstrument rdfs:subClassOf domain:MusicalInstrument
• Implicit local restrictions (optional)– H_res: inducing anonymous classes, e.g. domain:WindInstrument rdfs:subClassOf (locationOf
some domain:Contain)
80
Heuristics to map Boxer DRT to RDF (2/2)
Wikipedia typing• Typing Wikipedia entities is not easy• A lot of resources, limited alignment, limited
coverage, different granularity• We realized a fresh approach by
– extracting NL definitions– producing RDF with FRED– refactoring FRED output to distill types– linking to DBpedia entities– linking to WordNet 3.0– linking to SuperSenses and DOLCE+DnS (DUL)
81
Tìpalo: a FRED app
82
http://wit.istc.cnr.it/stlab-tools/tipalo/
Definition of chaise longue
83
A chaise longue is an upholstered sofa in the shape of a chair that is long enough to support the legs
Conclusions• Cognitive science is quite explicit on what meaning is
for humans: an activity of “framing reality for a purpose”
• Frames can be keys to that relational meaning• The Semantic Web is now growing a lot of data: our
tools are trying to make sense of them by using appropriate keys
• Empirical extraction and discovery of frames/knowledge patterns is feasible
• But let’s join forces: meaning is what we do, and should be shared
84
References• Alchemy (shallow parsing): http://www.alchemyapi.com/api/demo.html
• Wikifier (entity resolution): http://wit.istc.cnr.it/stlab-tools/wikifier
• Zemanta (multimedia enrichment): http://www.zemanta.com/demo/
• RelFinder (linked data visualization): http://www.visualdataweb.org/relfinder/demo.swf
• ReVerb (relation extraction): http://openie.cs.washington.edu/
• C&C (categorial grammar): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo
• Boxer (DRT deep parsing): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer
• FRED (DRT to RDF): http://wit.istc.cnr.it/stlab-tools/fred/
• Tìpalo (Wikipedia definitions in RDF): http://wit.istc.cnr.it/stlab-tools/tipalo
• Aemoo (serendipitous search): http://aemoo.org/
• ontologydesignpatterns.org portal: http://www.ontologydesignpatterns.org
• Talmy’s fictive motion: http://en.wikipedia.org/wiki/Fictive_motion
• Pustejovsky’ dot objects: http://pages.cs.brandeis.edu/~jamesp/publications.php
• Knowledge Architecture paper: http://ceur-ws.org/Vol-782/PresuttiEtAl_COLD2011.pdf
• FRED paper: http://ekaw2012.ekaw.org/node/137
• EKP paper: http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf 85