inducing ontologies from folksonomies using natural language understanding
DESCRIPTION
Inducing Ontologies from Folksonomies using Natural Language Understanding. Marta Tatu, Dan Moldovan Lymba Corporation Presenter: Chris Irwin Davis. Overview. Folksonomy. lexical normalization of tags semantic consistency tag-tag relations. folksonomy-based applications - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/1.jpg)
Inducing Ontologies from Folksonomies using Natural Language Understanding
Marta Tatu, Dan MoldovanLymba Corporation
Presenter: Chris Irwin Davis
![Page 2: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/2.jpg)
Overview
LREC 2010 May 19th, 2010
NLP
Folksonomy
• typographical errors, spelling variations• singular/plural forms, lower case• space/punctuation used as delimiters• same tag in different contexts• tag synonymy
Ontology
• lexical normalization of tags• semantic consistency• tag-tag relations
social annotations (author vs. user) browse/search bookmarks resource discovery (recommendations) collaborative tagging (across folksonomies)
folksonomy-based applications reasoning applications
![Page 3: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/3.jpg)
Semantic Approach
1. Folksonomy semantic representation
2. Tag understandingo Lexical: language identification, tokenization and spelling corrections, capitalization
restoration
o Syntactic: part-of-speech tagging, syntactic parsing
o Semantic: acronym understanding, word sense disambiguation, named entity recognition, semantic parsing
3. Deriving the ontological structureo Semantic relations between tags
• Sources of informationo Tag text semantics
o Social bookmarking annotations
o Machine understanding of bookmark content
LREC 2010 May 19th, 2010
![Page 4: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/4.jpg)
Representing Folksonomies
• knowledge
• advertisign
• americanhistory
• read-now
LREC 2010 May 19th, 2010
American[JJ]1 history[NN]2TOPIC
now[RB]3 read[VB]1TEMPORAL
advertising[NN]1
knowledge[NN]1
![Page 5: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/5.jpg)
Representing Folksonomies
LREC 2010 May 19th, 2010
SYNONYMY cluster
knowledge
(axlape,www.wolframalpha.com/)(nicksoni,www.curatingthecity.org/map.jsp)(pilx,www.wolframalpha.com/)...
knowledge|NN|1
knowledge,cognition
(bernsnarok,www.wolframalpha.com/)(_tarea_,academicearth.org/)(_tarea_,www.howstuffworks.com/)...
(omnamoprabhu,www.goertzel.org/dynapsyc/dynacon.html)(MikeMolto,cvcl.mit.edu/)(latrippi,nymag.com/news/features/56793/)...
cognition|NN|1
folksonomic tags
associated (user,document) pairs
WN synsetId = 20729
Associated (user, document) pairs
![Page 6: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/6.jpg)
Representing Folksonomies
cognition|NN|1; knowledge|NN|1
module|NN|1; faculty|NN|1 organization|NN|1; organisation|NN|2 pattern|NN|1; form|NN|3
ISA ISA ISA
cognitive|JJ|1 PERTAIN perception|NN|1
PW
design|NN|2
ISA
calendar|NN|1
ISA
ISAISA
PDA|NN|1 – Personal Digital Assistant
organization|NN|1; governance|NN|1
SIM
SIM
SIM
adaptive|JJ|1 design|NN|2PAH
instructional|JJ|1 design|NN|2AGT
LREC 2010 May 19th, 2010
![Page 7: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/7.jpg)
System Architecture
NLP processing
Document Cache
Document NLP
Repository
Social Tag-Tag & Tag-Doc Associations
Lexical Processing
of Tags
Syntactic Processing
of Tags
Semantic Processing
of Tags
Social Annotations
user
document
tag
Doc-2-TextLanguage IdentificationNLP of EN documents: Tokenization Part-of-speech tagging Sentence boundary detection Named entity recognition Syntactic parsing Word sense disambiguation Semantic parsing
Semantic Representation
of Tags
Ontology generation(Tag-Tag relations)
Applications: Search, browse, visualize Recommendations Collaborative tagging
Tag Classification Rules
Induced Ontology
LREC 2010 May 19th, 2010
![Page 8: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/8.jpg)
Tag Understanding
Sources used to understand tags
Tag text Social bookmarking data Document content
Lexical
Language identification X X X
Tokenization and Spell checking X X X
Capitalization restoration X X
SyntacticPart-of-speech tagging X X
Syntactic parsing X
Semantic
Abbreviation and acronym expansion X X X
Word sense disambiguation (+ ner) X X X
Semantic parsing X
LREC 2010 May 19th, 2010
![Page 9: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/9.jpg)
Acronym/Abbreviation Understanding• Abbreviation dictionary: (abbreviation - expansion - domain of usage)
o 118,055 distinct abbreviations
o 137 domains: Law, Music, TV/Radio Stations, Countries, Airport, Domain Names, Chat, Emoticons, etc.
o 25% of the abbreviations have more than one definition
• (unambiguous) Zip codes – (76012 : Arlington, TX)
• (ambiguous) SS : 192 definitions in 66 domains
o Social Security – Business and US Government, Screen Saver – File Extensions, Stainless Steel – Housing and Products, Subtropical Storm – Meteorology, Style Sheet – Software
• Check tag if part of abbreviation dictionary
• Use lexical chains to link document content to abbreviation domain
• Use co-occurring tags to identify correct expansion
• Use text alignment to find new abbreviation definitions within document content
LREC 2010 May 19th, 2010
![Page 10: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/10.jpg)
Acronym/Abbreviation Understanding• “PR” ~ 1409 documents
• 87 definitions for PR
o Press Release, Public Relations, Puerto Rico, Page Rank, Public Radio, Permanent Resident/Residency, etc.
• http://prsarahevans.com/2009/06/do-you-have-a-strategy-for-online-comments
o “PR” = “public relations” (6 times in document content)
o Other tags of the bookmark: “public”, “relations”, “media”, “strategy”
• http://www.bbc.co.uk/pressoffice/pressreleases/category/new_media_index.shtml
o “PR” = “press releases” (in document content)
• http://escape.topuertorico.com
o “PR” = “Puerto Rico” (in document content)
LREC 2010 May 19th, 2010
![Page 11: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/11.jpg)
Evaluation
• Experimental datao ~ 150,000 (user,document,tag) from del.icio.us
• 8,460 tags; 83,827 documents; 58,198 users
• Main error source: tag cannot be identified within documento Lack of document content (images, non-EN content, etc.)
• Errors propagate from initial processing steps to later oneso Bad capitalization leads to bad named entity recognition
LREC 2010 May 19th, 2010
![Page 12: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/12.jpg)
Ontological Tag-Tag Relations• EQUALITY relations
o same lemma, part-of-speech, and sense number
o EQ(activity, activities), EQ(after-effects, AfterEffects), EQ(opinion, Opnion), etc.
• SYNONYMY clusters
o Same synset id
o SYN(OS, operating.system), SYN(LA, losangeles), SYN (nyt, nytimes)
• ISA relations between named entities and type tags
o ISA(OracleCorporation, organization), ISA(davidfosterwallace, person)
• WordNet relations between tags
o ISA(vegan, vegetarian), ANTONYMY(peace, war), PART_WHOLE(Businesses, markets), ENTAIL(proofreading, +read), SIMILARITY(important, general), DOMAIN(light, physics)
LREC 2010 May 19th, 2010
![Page 13: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/13.jpg)
Ontological Tag-Tag Relations• Lexical chains of size 2 and Semantic calculus
– tag1 rel1 synset rel2 tag2
• rel1 & rel2 rel3
• rel3(tag1, tag2) is added to the ontology
– ISA(integration, events,) ISA(integration, group_action/NN/1) and ISA(group_action/NN/1, events,)
– PART_WHOLE(lobby, hotels) PART_WHOLE(lobby, building/NN/1) and ISA(building/NN/1, hotels)
• ISA relations between “modifier head” and “head” tags
– ISA(book-cover, covers)
– ISA(theoryofmind, theory)
– ISA(photoshoptutorials, tutorials,)
LREC 2010 May 19th, 2010
![Page 14: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/14.jpg)
Ontological Tag-Tag Relations
• Relations between “modifieri headi” tags (i=1,2)
– ISA(build-solar-panel, create-solar-panel)
– SIMILARITY(socialnetworks, socialweb)
LREC 2010 May 19th, 2010
modifier2
modifier1
ISA
head2
head1
ISA
modifier2
modifier1
ISA
head2
head1
SYN
modifier2
modifier1
SYN
head2
head1
ISA& & &OR OR
head2
modifier2
REL
head2
modifier2
REL
ISA⇒
![Page 15: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/15.jpg)
Evaluation
• 9,820 EQ clusters for the 8,460 unique tagso Same abbreviation expanded to different definitions
o EQ: tutorial, tutorials, tutorials,
• 8,801 SYN clusterso Largest cluster (133 bookmarks): car, automobiles, auto, autos, cars,
automobile
• 17% of tags placed into incorrect SYN clustero Errors caused by imperfect word sense disambiguation
• 5,439 ontological tag-tag relationso 3,869 ISA, 601 SIMILARITY, 429 PART_WHOLE, etc.
o 1,778 relations derived using WordNet’s lexical chains and Lymba’s semantic calculus rules
LREC 2010 May 19th, 2010
![Page 16: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/16.jpg)
Folksonomic Ontology
LREC 2010 May 19th, 2010
• Portion of ontology generated from experimental folksonomy
![Page 17: Inducing Ontologies from Folksonomies using Natural Language Understanding](https://reader036.vdocuments.us/reader036/viewer/2022062422/568136d0550346895d9e6c35/html5/thumbnails/17.jpg)
Folksonomic Ontology
LREC 2010 May 19th, 2010
• Portion of ontology generated from experimental folksonomy