aldo gangemi - meaning on the web: an empirical design perspective

106
Meaning on the Web An Empirical Design Perspective Aldo Gangemi [email protected] Semantic Technology Lab Institute for Cognitive Sciences and Technologies National Research Council, Rome, Italy Work described is by STLab people: Francesco Draicchio, Alberto Musetti, Andrea Nuzzolese, Valentina Presutti Alessandro Adamou, Eva Blomqvist, Enrico Daga 1

Upload: sssw2012

Post on 27-Jun-2015

1.442 views

Category:

Education


1 download

DESCRIPTION

Presentation from Aldo Gangemi at SSSW 2012

TRANSCRIPT

Page 1: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Meaning on the WebAn Empirical Design Perspective

Aldo [email protected]

Semantic Technology LabInstitute for Cognitive Sciences and Technologies

National Research Council, Rome, ItalyWork described is by STLab people:

Francesco Draicchio, Alberto Musetti, Andrea Nuzzolese, Valentina PresuttiAlessandro Adamou, Eva Blomqvist, Enrico Daga

1

Page 2: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What is this?

2

Page 3: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Are these similar?

3

Page 4: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What do you mean?• Mathieu did it slowly and deliberately, at

midnight, in the bathroom, with a knife

4

Page 5: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Killing?

5

Page 6: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Shaving?

6

Page 7: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Hanging puppets?

7

Page 8: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Actually ...• Mathieu buttered the toast slowly and

deliberately, at midnight, in the bathroom, with a knife

8

Donald  Davidson:  The  Logical  Form  of  Ac1on  Sentences,  1967“the  'it'  ...  seems  to  refer  to  some  en?ty”Davidson’s  analysis  started  formal  analysis  of  language  in  a  modern  way,  and  gave  a  founda?on  to  frame  logics,  which  were  the  original  use  case  for  descrip?on  logics  ...That  ‘it’  is  key  to  meaning

Page 9: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What is the key to access meaning?• When interpreting images? pattern recognition• When understanding text? parsing, resolution, expectations (background

knowledge), hypothesis formation• For querying databases? key discovery• For social data mashups? application design (e.g. FB)• For linked data mashups? sparql queries, ontologies• For event recognition, situation awareness? context ontologies, cqels queries• Such “keys” provide access by putting the pieces together. Can we make

ontology design as an empirical science of putting pieces of meaning together?

9

Page 10: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Cognitive background

10

Page 11: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Blink and “thin slicing”• Many organisms share the ability to gather a

snap judgment about a situation in a blink of an eye (cf. M. Gladwell, Blink)–making sense of situations based on thin slices of

experience: few color spots, a quick movement, a hat, the length of hair, a facial expression, the relative position of two persons, ...

• Positive and negative slicing–quick reaction and expectation building vs.

stereotypes and prejudices

11

Page 12: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Blink and “thin slicing”• Many organisms share the ability to gather a

snap judgment about a situation in a blink of an eye (cf. M. Gladwell, Blink)–making sense of situations based on thin slices of

experience: few color spots, a quick movement, a hat, the length of hair, a facial expression, the relative position of two persons, ...

• Positive and negative slicing–quick reaction and expectation building vs.

stereotypes and prejudices

11

Page 13: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Foreground and background• People tend to remember things because they

stick out from the background (“profiling”)– but what makes a background as such?

• Expectations create scenarios– even things that are not there become part of the

scenario if activated by an expectation• Cf. Gestalt psychology (Köhler, Langacker, etc.)

12

Page 14: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Schema-based memory• People tend to remember items that fit into a

schema. – Things that are associated through some functional

similarity (cf. Gibson’s affordances)• Schemata seem to be learnt mostly inductively

–blocks world, repeated verbalization of invariant scenes, peek-a-boo, etc. Cf. Deb Roy’s TED talk

• Schema similar to (conceptual) frame, script, knowledge pattern, etc.

13

Page 15: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Origin of modern frames and knowledge patterns

• «When one encounters a new situation (or makes a substantial change in one's view of the present problem) one selects from memory a structure called a Frame. This is a remembered framework to be adapted to fit reality by changing details as necessary … a frame is a data-structure for representing a stereotyped situation» (Minsky 1975)

• Frames, schemas, scripts … «These large-scale knowledge configurations supply top-down input for a wide range of communicative and interactive tasks … the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing» (Beaugrande, 1980)

Page 16: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

How many frames?

• We (STLab) are collecting, reengineering, and aligning knowledge patterns from different knowledge formats into RDF and OWL–Web data: microformats, microdata, XML patterns–Linked data: induced schemas, data patterns–(Semantic web) ontologies: ontology design patterns,

extracted modules, query patterns–Linguistics: FrameNet and VerbNet frames, NLP-

extracted selectional restrictions, situations from parsed NL text

15

Page 17: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Limited keys totext/discourse understanding

16

Page 18: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What is a window?• (i) opening in a wall

– An opening, usually covered by one or more panes of clear glass, to allow light and air from outside to enter a building or vehicle. (Wiktionary)

• (ii) glass-filled frame fitting an opening in a wall– Welcome to Window World, America's Largest

Replacement Windows Company! (...) installs over 1 million replacement windows annually. (Ad)

• (iii) glass-filled frame– A window is a transparent or translucent opening in

a wall or door that allows the passage of light and, if not closed or sealed, air and sound. (Wikipedia)

• (iv) glass from glass-filled frame– A pane of glass in a window. (WordNet)

• (v) frame from glass-filled frame– A framework of wood or metal that contains a glass

windowpane and is built into a wall or roof to admit light or air. (WordNet) 17

Page 19: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Genres of polysemy in WordNet• Adam’s apple (metaphor)

– thyroid cartilage_1 (dul:PhysicalObject)– crape jasmine_1 (dul:Organism)

• Bucket shop (diachronic metaphor)– bucket shop_2 (dul:PhysicalObject)

• “(formerly) a cheap saloon selling liquor by the bucket”

– bucket shop_1 (dul:Organization)• “an unethical or overly aggressive brokerage firm”

• Stadium (established metonymy, “dot object”)– stadium_1 (dul:PhysicalObject)

• “a large structure for open-air sports or entertainments”

– arena_4 (d0:Location)• “a playing field where sports events take place”

– do we need a relation between these senses, or a coarser concept? 18

Page 20: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Dot objects and co-predication (Pustejovsky, Asher)

• This book is heavy but interesting– physical vs. information

• Lunch was delicious but took forever– food vs. event

• I saw the Coliseum in my tourist guide and wanted to go there– artifact vs. place

• Actually relevant?– Power of ambiguity (“systematic polysemy”)– Minimal effort seems to count in human evolution of lexical knowledge, but only if

we can easily reconstruct the context (or frame, relation, ...)• “The communicative function of ambiguity in language” (Piantadosi, Tilyb,

Gibson): ambiguity allows for greater ease of processing by permitting efficient linguistic units to be re-used. “All efficient communication systems will be ambiguous, assuming that context is informative about meaning”

– Also in science: inflammation has six interrelated meanings19

Page 21: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

“Fictive motion” (Talmy)• The path descended abruptly• The road runs along the coast for two hours• The fence zigzags from the plateau to the valley• The highway crawls through the city• The road leads us to Cercedilla• Need for “type coercion” to satisfy hidden frame

– highway is actually a path that “can be crawled”, therefore crawling frame here is descriptive of a state, not of an action

– fence is actually an object whose shape “can be followed by zigzaging”– road is actually an object that “can be followed as an indication” to our

destination– Actually an inversion of roles: the path descends because it can be

descended: another version of systematic polysemy20

Page 22: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Meaning as relation

21

Page 23: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Meaning as relation• Event discovery and recognition

– { Water in pot ; pot on the hot stove ; egg in water } ∴ ???

• Meaning as a relational thing– Does “Egg” have a meaning independently of its relations? no, if meaning is what someone

does (or might do) with something?– cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...

• Dynamics (induction, abduction) of general and domain relations– Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause)

21

Page 24: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Meaning as relation• Event discovery and recognition

– { Water in pot ; pot on the hot stove ; egg in water } ∴ ???

• Meaning as a relational thing– Does “Egg” have a meaning independently of its relations? no, if meaning is what someone

does (or might do) with something?– cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...

• Dynamics (induction, abduction) of general and domain relations– Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause)

21

FrameNet  frameshave  a  par?al  order

Page 25: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Meaning as relation• Event discovery and recognition

– { Water in pot ; pot on the hot stove ; egg in water } ∴ ???

• Meaning as a relational thing– Does “Egg” have a meaning independently of its relations? no, if meaning is what someone

does (or might do) with something?– cf. Bartlett, Davidson, Fillmore, Minsky, Schank, Brian Smith, Gibson, ...

• Dynamics (induction, abduction) of general and domain relations– Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause)

21

FrameNet  frameshave  a  par?al  order

Page 26: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Limited keys tosocial tagging analysis

22

Page 27: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Flickr, desire tag

23

Page 28: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Flickr, desire tag

24

Page 29: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Flickr, desire tag

25

Page 30: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Similarly in other folksonomies

• like in FB, G+, etc.– like what? thing, comment to thing, content of

cited thing, implicit negative attitude to thing?• positive/negative polarity in sentiment

analysis– polarity of what? tweet, citation, irony, ...

26

Page 31: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Limited keys tolinked data analysis

27

Page 32: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

28

Aggregated RDF data for “Barack

Obama” (sig.ma)

Page 33: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What about the Knowledge Graph?

29

however,  where  are  the  IRIs?  named  en11es  are  only  searchable  within  the  Google  engine,  content  is  completely  encrypted

Page 34: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Ontologies disagree

30

Is that a problem?For many, yes

Page 35: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

schema.org• The use cases that motivated schema.org are

related to search engine requirements, probably to meet popularity and advertisement

• We received schemas from the “Sponsors”. As often happens, “sponsors” create meaning, because they have either authority or money

• Will ontologies be swallowed by the Sponsors?• Anyway, schemas from schema.org seem to

address keys more accurately, with some exceptions ...

31

Page 36: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Pragmatic issue: we can communicate with landforms

32

Page 37: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Pragmatic issue: we can communicate with landforms

32

Page 38: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Pragmatic issue: we can communicate with landforms

32

Page 39: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Pragmatic issue: we can communicate with landforms

32

Page 40: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Pragmatic issue: we can communicate with landforms

32

Page 41: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

33

Application issue: bone pathologies, locations, functions are all literals

Page 42: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

33

Application issue: bone pathologies, locations, functions are all literals

Page 43: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

33

Application issue: bone pathologies, locations, functions are all literals

Page 44: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

33

Application issue: bone pathologies, locations, functions are all literals

Page 45: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Creating keys for the Semantic Web

34

Page 46: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Top-down: expertise patterns• Evidence that units of expertise are larger than what we have

from average linked data triples, or ontology learning– Cf. cognitive scientist Dedre Gentner: “uniform relational representation is a

hallmark of expertise”– We need to create expertise-oriented boundaries unifying multiple triples– “Competency questions” are used to link ontology design patterns to

requirements:• Which objects take part in a certain event?• Which tasks should be executed in order to achieve a certain goal?• What’s the function of that artifact?• What norms are applicable to a certain case?• What inflammation is active in what body part with what morphology?

35

Page 47: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Layered pattern morphisms• A logical design pattern describes a formal expression that

can be exemplified, morphed, instantiated, and expressed in order to solve a domain modelling problem

• owl:Class:_:x rdfs:subClassOf owl:Restriction:_:y• Inflammation rdfs:subClassOf (localizedIn some BodyPart)• Colitis rdfs:subClassOf (localizedIn some Colon)• John’s_colitis isLocalizedIn John’s_colon• “John’s colon is inflammated”, “John has got colitis”, “Colitis is the

inflammation of colon”

LogicalPaPern(MBox)

GenericContentPaPern  (TBox)

SpecificContentPaPern  (TBox)

DataPaPern  (ABox)

exemplifiedAs morphedAs instan1atedAs Linguis?cPaPern

expressedAs

Logic Meaning Reference Expression

expressedAs

expressedAs

Abstrac1on

Page 48: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Bottom-up: schema extraction (COLD2011)

37

Page 49: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Results• A method for extracting the main knowledge components

of a LD dataset

mo:available_as

mo:format

mo:Track

mo:Playlist

mo:Record

mo:track

mo:Record

mo:trackmo:Track

foaf:maker

foaf:name

mo:MusicAr1st

dc:1tle

mo:available_asmo:Playlist

mo:Record mo:track mo:Track mo:available_as mo:Playlist dc:format rdfs:Literalmo:Signal mo:published_as mo:Track mo:available_as mo:Playlist dc:format rdfs:Literalmo:MusicArtist foaf:made mo:Record mo:available_as mo:Torrent dc:format rdfs:Literalmo:MusicArtist foaf:made mo:Record mo:available_as mo:ED2K dc:format rdfs:Literalmo:MusicArtist foaf:made mo:Record mo:available_as mo:Playlist dc:format rdfs:Literal

Path

Cluster  of  paths

Knowledge  PaPern

Page 50: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Path identification (length 3)mo:MusicAr?st

mo:Track

mo:Record

mo:Playlist

mo:Signal

rdfs:Literal

PREFIX mo : http://purl.org/ontology/mo/MusicArtist

rdfs:Resource

dc:1tle

mo:published_as

mo:recorded_as

(Jamendo)

foaf:Document

Page 51: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Path identification (length 3)mo:MusicAr?st

mo:Track

mo:Record

mo:Playlist

mo:Signal

rdfs:Literal

PREFIX mo : http://purl.org/ontology/mo/MusicArtist

rdfs:Resource

dc:1tle

mo:published_as

mo:recorded_as

(Jamendo)

foaf:Document

Path  ElementPosi?on  2

Page 52: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Centrality (types)mo:MusicAr?st

mo:Track

mo:Record

mo:Playlist

mo:Signal

rdfs:Literal

PREFIX mo : http://purl.org/ontology/mo/MusicArtist

rdfs:Resource

(Jamendo)

foaf:Document

dc:1tle

mo:published_as

mo:track

mo:license

mo:track_number

mo:available_as

Page 53: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Centrality (properties)mo:MusicAr?st

mo:Track

mo:Record

mo:Playlist

mo:Signal

rdfs:Literal

PREFIX mo : http://purl.org/ontology/mo/MusicArtist

rdfs:Resource

(Jamendo)

foaf:Document

dc:1tle

mo:published_as

mo:license

mo:track_number

mo:available_as

mo:recorded_as

Page 54: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

42

0 2 4 6 8 10 12

foaf:made foaf:maker

mo:available_as mo:published_as

mo:track mo:time

tags2:taggedWithTag

0 20000 40000 60000 80000 100000 120000 140000

mo:available_as tags2:taggedWithTag

mo:published_as mo:time mo:track

foaf:made foaf:maker

0 2 4 6 8 10 12

mo:Record

mo:Track

mo:MusicArtist

mo:Playlist

mo:Signal

time:Interval

mo:ED2K

mo:Torrent

tags2:Tag

mo:Lyrics

0 20000 40000 60000 80000 100000 120000

mo:Playlist

mo:Track

mo:Signal

time:Interval

mo:ED2K

mo:Torrent

tags2:Tag

mo:Lyrics

mo:Record

mo:MusicArtist

Centrality in Jamendo

frequency

betweenness

Page 55: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Emerging Knowledge Patternmo:Track

mo:MusicAr.st

mo:Playlist

mo:Torrent

mo:ED2K

tags:Tag

mo:Record

foaf:maker

rdfs:Literal

dc:1tle

dc:datemo:image

dc:descrip1on

mo:track

tags:taggedWithTag

mo:available_as

mo:available_as

mo:available_as

Page 56: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Bottom-up: Encyclopedic Knowledge Patterns (EKP) (ISWC2011)

• Improving knowledge exploration and summarization by:–Empirically discovering invariances in conceptual

organization of knowledge – encyclopedic knowledge patterns – from Wikipedia crowd-sourced page links

–Understanding the most intuitive way of selecting relevant entities used to describe a given entity

– Identifying the typical / atypical types of things that people use for describing other things

–Enabling serendipitous search

Page 57: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links

Page 58: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links

dbpo:MusicalArtist

dbpo:MusicalArtist dbpo:Organisation dbpo:Place

Page 59: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links

dbpo:MusicalArtistdbpo:MusicalArtist

dbpo:Organisation

dbpo:PlacelinksToMusicalArtist linksToPlace

linksToOrganisation

• Paths are used to discover Encyclopedic Knowledge Patterns– Such patterns should make it emerge the most typical types of things that the Wikipedia

crowd uses to describe a resource of a given type

Page 60: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Input data• Wikipedia page links generate 107.9M triples• Infobox-based triples are 13.6M, including data value triples (9.4M) • “Unmapped” object value triples are only 7% of page links

dbpo:MusicalArtistdbpo:MusicalArtist

dbpo:Organisation

dbpo:PlacelinksToMusicalArtist linksToPlace

linksToOrganisation

• Paths are used to discover Encyclopedic Knowledge Patterns– Such patterns should make it emerge the most typical types of things that the Wikipedia

crowd uses to describe a resource of a given type

Path Pi,j= [Si, p, Oj]

Page 61: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Encyclopedic Knowledge Patterns• An Encyclopedic Knowledge Pattern (EKP) is discovered from the

paths emerging from Wikipedia page link invariances• They are represented as OWL2 ontologies

Page 62: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Paths and indicators• Emerging paths are stored in RDF according

to the “Knowledge Architecture” vocabulary–Cf. our COLD2011 paper “Extracting core

knowledge from linked data”• Paths and types are associated with a set of

indicators

Page 63: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

nRes(dbpo:MusicalArtist)

Chad_Smith

John_Lennon

Michael_Jackson

PREFIX dbpo : http://dbpedia.org/ontology/

Paul_McCartney

Jackie_Jacksonrdf:type

Anthony_Kiedis

dbpo:MusicalArtist

dbpo:MusicalArtist

rdf:type

rdf:type

dbpo:MusicalArtist

rdf:type

rdf:type

rdf:type

Number of resources having type dbpo:MusicalArtist

dbpo:MusicalArtist

Page 64: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

nSubjectRes(Pi,j)

Dave_Grohl

Foo Fighters

John_Lennon

Michael_Jackson

PREFIX dbpo : http://dbpedia.org/ontology/

Paul_McCartney

Jackson_5

Beatles

Jackie_Jackson

dbpo:MusicalArtist dbpo:Banddbpo:wikiPageWikiLinkSi Oj

Nirvana

Page 65: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

nSubjectRes(Pi,j)

Dave_Grohl

Foo Fighters

John_Lennon

Michael_Jackson

PREFIX dbpo : http://dbpedia.org/ontology/

Paul_McCartney

Jackson_5

Beatles

Jackie_Jackson

dbpo:MusicalArtist dbpo:Banddbpo:wikiPageWikiLinkSi Oj

Nirvana

Number of distinct resources that participate in a path as subjects

Page 66: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

pathPopularity(Pi,j,Si)

Dave_Grohl

Foo Fighters

John_Lennon

Michael_Jackson

Paul_McCartney

Jackson_5

Beatles

Jackie_Jackson

Nirvana

Madonna

PrinceCharlie_Parker Keith_Jarrett

nSubjectRes(Pi,j)/nRes(Si)

Page 67: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Path Popularity distribution examplePath pathPopularity

[MusicFestival,Band] 82.33[MusicFestival,MusicalArtist] 74.17[MusicFestival,Country] 74.02[MusicFestival,MusicGenre] 72.81[MusicFestival,City] 38.37[MusicFestival,AdministrativeRegion] 32.78[MusicFestival,MusicFestival] 23.26[MusicFestival,Album] 18.13[MusicFestival,Film] 12.39[MusicFestival,Stadium] 9.52[MusicFestival,RadioStation] 9.52[MusicFestival,Single] 8.76[MusicFestival,Town] 8.61[MusicFestival,Magazine] 8.46[MusicFestival,Broadcast] 7.55[MusicFestival,Newspaper] 6.95[MusicFestival,TelevisionShow] 6.04[MusicFestival,University] 5.74[MusicFestival,Continent] 5.59[MusicFestival,Comedian] 5.29[MusicFestival,OfficeHolder] 4.98[MusicFestival,Island] 4.98

Page 68: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Boundaries of EKPs• An EKP(Si) is a set of paths, such that

Pi,j ∈ EKP(Si) ! pathPopularity(Pi,j, Si) ≥ t

• t is a threshold, under which a path is not included in an EKP

• How to get a good value for t?

Page 69: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Boundary inductionStep Description

1. For each path, calculate the path popularity

2. For each subject type, get the 40 top-ranked path popularity values*

3. Apply multiple correlation (Pearson ρ) between the paths of all subject types by rank, and check for homogeneity of ranks across subject types

4. For each of the 40 path popularity ranks, calculate its mean across all subject types

5. Apply k-means clustering on the 40 ranks

6. Decide threshold(s) based on k-means as well as other indicators (e.g. FrameNet roles distribution)

* 40 covers most “core” path popularity values, as well as many of the unusual ones.

.9

Page 70: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

k-means clustering on Path Popularity

Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity

Page 71: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

k-means clustering on Path Popularity

1 big cluster (4-cluster) with ranks below 18.18%

Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity

Page 72: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

k-means clustering on Path Popularity

1 big cluster (4-cluster) with ranks below 18.18%

3 small clusters with ranks above 22.67%

Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity

Page 73: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

k-means clustering on Path Popularity

1 big cluster (4-cluster) with ranks below 18.18%

3 small clusters with ranks above 22.67%

Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity

Page 74: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

k-means clustering on Path Popularity

1 big cluster (4-cluster) with ranks below 18.18%

3 small clusters with ranks above 22.67%

Sample distribution of pathPopularity for DBpedia paths. The x-axis indicates how many paths (on average) are above a certain value t for pathPopularity

1 alternative cluster (6-cluster) with ranks below 11.89%

Page 75: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What is the “agreement” between DBpedia and our sample users?

Average multiple correlation (Spearman ρ) between users’ assigned scores, and

pathPopularityDBpedia based scoresSpearman ρ range: [-1,1] −1 = no agreement

+1 = complete agreement)

Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and

pathPopularityDBpedia based score

Page 76: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What is the “agreement” between DBpedia and our sample users?

Average multiple correlation (Spearman ρ) between users’ assigned scores, and

pathPopularityDBpedia based scoresSpearman ρ range: [-1,1] −1 = no agreement

+1 = complete agreement)

Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and

pathPopularityDBpedia based score

Good correlation

Satisfactory precision

Page 77: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

What is the “agreement” between DBpedia and our sample users?

Average multiple correlation (Spearman ρ) between users’ assigned scores, and

pathPopularityDBpedia based scoresSpearman ρ range: [-1,1] −1 = no agreement

+1 = complete agreement)

Multiple correlation coefficient (Spearman ρ) between users’s assigned score, and

pathPopularityDBpedia based score

Good correlation

Satisfactory precision

Pi,j ∈ EKP (Si) ⇔pathPopularity(Pi,j, Si) ≥

11%

Page 78: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Aemoo• http://aemoo.org exploratory search application based on

EKP

Page 79: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Comparing entities of the same type

58

Nicolas Sarkozy Ronald Regan Angela MerkelBarack Obama Silvio Berlusconi

Page 80: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Applying topic-sensitive EKP as lenses

59

Page 81: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Limited keys totext/discourse analysis

60

Page 82: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Ontology learning: what structures?

• State of art: what do we learn?– instances (NER): BarackObama– classes (sense tagging): Person, or even TelevisionActor– relations between instances: CabezaDeVaca departed Spain– axioms: mostly taxonomic or disjointness, some work also on

learning restrictions: TelevisionActor disjointWith TVSeries– entire ontologies, but only as collections of independently learnt

axioms

61

Page 83: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Kinds of text analyses• Lead example:

–“In early 1527, Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to occupy the mainland of North America. After landing near Tampa Bay, Florida on April 15, 1528, Cabeza De Vaca and three other men would be the only survivors of the expedition party of 600 men.”

62

Page 84: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Shallow parsing (Alchemy)• Language

– english

• Person (1)

– the only survivors

• GeographicFeature (1)

– Tampa Bay

• Continent (1)

– North America

• Country (1)

– Spain

• StateOrCounty (1)

– Florida

• Concept Tags (8)

– United States

– Florida

– Gulf of Mexico

– North America

– Tampa, Florida

– Narváez expedition

– American Civil War

– Tampa Bay

• Category

– Culture & Politics (confidence: 0.6598)

Named  en?ty  recogni?on

Language  recogni?on

Concept  tagging

Subject  classifica?on

Rela?on  (ac?on)  extrac?on

Page 85: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Entity resolution (Wikifier)

64Wikipedia-­‐based  topic  detec?onNamed  en?ty  resolu?on

Page 86: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Multimedia enrichment (Zemanta)

65

Related  images

Related  news

Related  links  and  geo-­‐referencing

Page 87: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Linked data positioning (RelFinder)

66

Page 88: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Deep parsing• Usage of syntactic parse trees• Formal semantic interpretation on top of

syntax

67

Page 89: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Categorial parsing

68

w(1, 1, 'In', 'in', 'IN', 'I-PP', 'O', '(S[X]/S[X])/NP').w(1, 2, 'early', 'early', 'JJ', 'I-NP', 'I-DAT', 'N/N').w(1, 3, '1527', '1527', 'CD', 'I-NP', 'I-DAT', 'N').w(1, 4, ',', ',', ',', 'O', 'I-DAT', ',').w(1, 5, 'Cabeza', 'Cabeza', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 6, 'De', 'De', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 7, 'Vaca', 'Vaca', 'NNP', 'I-NP', 'I-ORG', 'N').w(1, 8, 'departed', 'depart', 'VBD', 'I-VP', 'O', '((S[dcl]\NP)/PP)/NP').w(1, 9, 'Spain', 'Spain', 'NNP', 'I-NP', 'O', 'N').w(1, 10, 'as', 'as', 'IN', 'I-PP', 'O', 'PP/NP').w(1, 11, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 12, 'treasurer', 'treasurer', 'NN', 'I-NP', 'O', 'N').w(1, 13, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 14, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 15, 'Narvaez', 'Narvaez', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 16, 'royal', 'royal', 'NN', 'I-NP', 'O', 'N/N').w(1, 17, 'expedition', 'expedition', 'NN', 'I-NP', 'O', 'N').w(1, 18, 'to', 'to', 'TO', 'I-VP', 'O', '(S[to]\NP)/(S[b]\NP)').w(1, 19, 'occupy', 'occupy', 'VB', 'I-VP', 'O', '(S[b]\NP)/NP').w(1, 20, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 21, 'mainland', 'mainland', 'NN', 'I-NP', 'O', 'N').w(1, 22, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 23, 'North', 'North', 'NNP', 'I-NP', 'I-LOC', 'N/N').w(1, 24, 'America', 'America', 'NNP', 'I-NP', 'I-LOC', 'N').w(1, 25, '.', '.', '.', 'O', 'O', '.').w(1, 26, 'After', 'after', 'IN', 'I-PP', 'O', '(S[X]/S[X])/NP').w(1, 27, 'landing', 'landing', 'NN', 'I-NP', 'O', 'N').w(1, 28, 'near', 'near', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 29, 'Tampa', 'Tampa', 'NNP', 'I-NP', 'I-LOC', 'N/N').w(1, 30, 'Bay', 'Bay', 'NNP', 'I-NP', 'I-LOC', 'N').w(1, 31, ',', ',', ',', 'O', 'O', ',').w(1, 32, 'Florida', 'Florida', 'NNP', 'I-NP', 'I-LOC', 'N').w(1, 33, 'on', 'on', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 34, 'April', 'April', 'NNP', 'I-NP', 'I-DAT', 'N/N[num]').w(1, 35, '15', '15', 'CD', 'I-NP', 'I-DAT', 'N[num]').w(1, 36, ',', ',', ',', 'I-NP', 'I-DAT', ',').w(1, 37, '1528', '1528', 'CD', 'I-NP', 'I-DAT', 'N\N').w(1, 38, ',', ',', ',', 'O', 'O', ',').w(1, 39, 'Cabeza', 'Cabeza', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 40, 'De', 'De', 'NNP', 'I-NP', 'I-ORG', 'N/N').w(1, 41, 'Vaca', 'Vaca', 'NNP', 'I-NP', 'I-ORG', 'N').w(1, 42, 'and', 'and', 'CC', 'O', 'O', 'conj').w(1, 43, 'three', 'three', 'CD', 'I-NP', 'O', 'N/N').w(1, 44, 'other', 'other', 'JJ', 'I-NP', 'O', 'N/N').w(1, 45, 'men', 'man', 'NNS', 'I-NP', 'O', 'N').w(1, 46, 'would', 'would', 'MD', 'I-VP', 'O', '(S[dcl]\NP)/(S[b]\NP)').w(1, 47, 'be', 'be', 'VB', 'I-VP', 'O', '(S[b]\NP)/NP').w(1, 48, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 49, 'only', 'only', 'JJ', 'I-NP', 'O', 'N/N').w(1, 50, 'survivors', 'survivor', 'NNS', 'I-NP', 'O', 'N').w(1, 51, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 52, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N').w(1, 53, 'expedition', 'expedition', 'NN', 'I-NP', 'O', 'N/N').w(1, 54, 'party', 'party', 'NN', 'I-NP', 'O', 'N').w(1, 55, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP').w(1, 56, '600', '600', 'CD', 'I-NP', 'O', 'N/N').w(1, 57, 'men', 'man', 'NNS', 'I-NP', 'O', 'N').w(1, 58, '.', '.', '.', 'O', 'O', '.').

Page 90: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Relation extraction with OIE• Cabeza De Vaca ___departed___Spain• 0.9059944442645228• In early 1527 , Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to

occupy the mainland of North America .• IN JJ CD , NNP NNP NNP VBD NNP IN DT NN IN DT NNP JJ NN TO VB DT NN IN NNP NNP .• B-PP B-NP I-NP O B-NP I-NP I-NP B-VP B-NP B-PP B-NP I-NP I-NP I-NP I-NP I-NP I-NP B-VP I-

VP B-NP I-NP I-NP I-NP I-NP O• cabeza de vaca___depart___spain

• three other men___would be the only survivors of___the expedition party of 600 men• 0.36592485864619345• After landing near Tampa Bay , Florida on April 15 , 1528 , Cabeza De Vaca and three other men

would be the only survivors of the expedition party of 600 men .• IN NN IN NNP NNP , NNP IN NNP CD , CD , NNP NNP NNP CC CD JJ NNS MD VB DT JJ NNS

IN DT NN NN IN CD NNS .• B-PP B-NP B-PP B-NP I-NP O B-NP B-PP B-NP I-NP I-NP I-NP O B-NP I-NP I-NP O B-NP I-NP I-

NP B-VP I-VP B-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP O• # other men___be survivor of___the expedition party of # men

69

Page 91: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Creating keys for NL

70

Page 92: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

DRT• Discourse Representation Theory

– Hans Kamp: “A theory of truth and semantic representation”, 1981• A sentence meaning is taken to be an update operation on a context• DRT represents “the discourse context” as a discourse representation

structure (or DRS). A DRS includes:– A set of referents: the entities which have been introduced into the

context– A set of conditions: the predicates which are known to hold of these

entities– Basically a fragment of FOL

71

Page 93: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

DRT notation

72

Page 94: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Boxer• Implementation of computational semantics (J. Bos)

with DRT output and Davidsonian predicates: reification of n-ary relations– E.g. preparing a coffee: agent, instrument, mix, place, time,

method, ...

• Semantic role labelling with VerbNet or FrameNet roles

• Pragmatic grasp, with statistical NER and sense tagging, tense logic, co-reference, presupposition, sentence integration, entailment, ...

73

Page 95: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

DRT semantic parsing + semantic role labellling in Boxer

74

Page 96: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Issues when porting (Boxer) DRT to RDF

• Discourse referent variables: implicit or explicit?• No terminology recognition/extraction• No term compositionality• No periphrastic relations for properties (e.g. survivorOf)• Redundant variables• Missing definitional pragmatics• Implicit local restrictions• Many types of redundant “boxing” for embedded propositions, non-

standard negation, etc.• How to map efficiently to RDF/OWL?

75

Page 97: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Using FRED (EKAW2012)

76

http://wit.istc.cnr.it/stlab-tools/fred/

Page 98: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

FRED RDF graph output

77

Page 99: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Another example

78

The  New  York  Times  reported  that  John  McCarthy  died.  He  invented  the  programming  language  LISP.

Page 100: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Heuristics to map Boxer DRT to RDF (1/2)

• Default predicates – H_pred1: special vocabulary for defaults, e.g. boxer:Per rdf:type owl:Class– H_pred2: mappings to existing vocabularies, e.g. foaf:Person ; dul:associatedWith ; time:Interval

• Default roles (SRL binary predicates)– H_rol: VerbNet/FrameNet vocabularies, e.g. verbnet:agent rdf:type owl:ObjectProperty

• Domain predicates– H_dom: default namespace, customizable, e.g. domain:Expedition ; domain:of

• Discourse referent variables: implicit or explicit?– H_ref: only referents of predicates are materialized, e.g. domain:survivor_1 rdf:type

domain:Survivor ; depart_1 rdf:type domain:Depart

• No terminology recognition/extraction– H_term: co-referential predicate merging, e.g. domain:ExpeditionParty

• No term compositionality– H_termc: create taxonomies by “unmerging” predicates, e.g. domain:ExpeditionParty

rdfs:subClassOf domain:Party

• No periphrastic relations– H_peri: generate “webby” properties, e.g. domain:survivorOf rdf:type owl:ObjectProperty79

Page 101: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

• Redundant variables– H_red: unify variables based on local Unique Name Assumption

• Boolean constructs– H_boo: special relations between domain predicates or individuals, e.g. domain:man_1

owl:differentFrom domain:man_2

• Propositions as arguments– H_pro: special relation between events, e.g. domain:report_1 vn:theme domain:depart_1

• Missing definitional pragmatics– H_def: bypassing discourse referents and forcing universal quantification, e.g.

domain:WindInstrument rdfs:subClassOf domain:MusicalInstrument

• Implicit local restrictions (optional)– H_res: inducing anonymous classes, e.g. domain:WindInstrument rdfs:subClassOf (locationOf

some domain:Contain)

80

Heuristics to map Boxer DRT to RDF (2/2)

Page 102: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Wikipedia typing• Typing Wikipedia entities is not easy• A lot of resources, limited alignment, limited

coverage, different granularity• We realized a fresh approach by

– extracting NL definitions– producing RDF with FRED– refactoring FRED output to distill types– linking to DBpedia entities– linking to WordNet 3.0– linking to SuperSenses and DOLCE+DnS (DUL)

81

Page 103: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Tìpalo: a FRED app

82

http://wit.istc.cnr.it/stlab-tools/tipalo/

Page 104: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Definition of chaise longue

83

A chaise longue is an upholstered sofa in the shape of a chair that is long enough to support the legs

Page 105: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

Conclusions• Cognitive science is quite explicit on what meaning is

for humans: an activity of “framing reality for a purpose”

• Frames can be keys to that relational meaning• The Semantic Web is now growing a lot of data: our

tools are trying to make sense of them by using appropriate keys

• Empirical extraction and discovery of frames/knowledge patterns is feasible

• But let’s join forces: meaning is what we do, and should be shared

84

Page 106: Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective

References• Alchemy (shallow parsing): http://www.alchemyapi.com/api/demo.html

• Wikifier (entity resolution): http://wit.istc.cnr.it/stlab-tools/wikifier

• Zemanta (multimedia enrichment): http://www.zemanta.com/demo/

• RelFinder (linked data visualization): http://www.visualdataweb.org/relfinder/demo.swf

• ReVerb (relation extraction): http://openie.cs.washington.edu/

• C&C (categorial grammar): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo

• Boxer (DRT deep parsing): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer

• FRED (DRT to RDF): http://wit.istc.cnr.it/stlab-tools/fred/

• Tìpalo (Wikipedia definitions in RDF): http://wit.istc.cnr.it/stlab-tools/tipalo

• Aemoo (serendipitous search): http://aemoo.org/

• ontologydesignpatterns.org portal: http://www.ontologydesignpatterns.org

• Talmy’s fictive motion: http://en.wikipedia.org/wiki/Fictive_motion

• Pustejovsky’ dot objects: http://pages.cs.brandeis.edu/~jamesp/publications.php

• Knowledge Architecture paper: http://ceur-ws.org/Vol-782/PresuttiEtAl_COLD2011.pdf

• FRED paper: http://ekaw2012.ekaw.org/node/137

• EKP paper: http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf 85