converting information to topic maps with...

39
Converting Information to Topic Maps with Wandora Tutorial, Aki Kivelä & Olli Lyytinen, 29.9.2010

Upload: others

Post on 16-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Converting Information toTopic Maps with Wandora

Tutorial, Aki Kivelä & Olli Lyytinen, 29.9.2010

Page 2: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Tutorial Outline● Introduction to Wandora● Information extractors● Detailed look at select extractors

● Hands on ~ Using the information extractors

Page 3: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Preparing Hands on● Download Wandora

● http://www.wandora.org/wandora/download/wandora.zip ● http://www.wandora.org/wandora/download/other/tmra10/

wandora_workshop_tmra2010.zip ● Unzip the package● Use the scripts in the bin folder to start the application

Page 4: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Introduction to Wandora● General purpose Topic Maps editor● Desktop application● Java 6, Swing● FOSS with GNU GPL 3.0● Developed since 2001 in Grip Studios Interactive● Used in several real-life projects● Download, documentation and forum at

www.wandora.org ● ~300 downloads per month

Page 5: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Introduction to Wandora● Topic map editor

● Topic, occurrence and association editors

Page 6: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Introduction to Wandora● Graph visualizer

More at http://w

ww.w

andora.org/wandora/w

iki/index.php?title=Graph_topic_panel

Page 7: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Introduction to Wandora● Huge set of information importers, extractors, and

generators

Extractors are discussed more detailed later on...

Page 8: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Introduction to Wandora● Topic map analyzers

+ Topic map diameter+ Clustering coefficient of a topic map

Page 9: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Introduction to Wandora● Exporters

Page 10: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Introduction to Wandora● Embedded HTTP server

http: //ww

w.wand ora .org /w

an dor a/wi ki/in dex .php ?titl e=E m

be dde d_HTTP _se rve r

+ Drupal bridge+ Joomla brige

Page 11: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Information Extractors● Extractors are not as strict in preserving information as

importers and may modify imported information heavily and may import only a small fraction of the original information.

● Wandora has more than 50 different information extractors in 13 categories.

● Both file format extractors and web service extractors.● Limited use rights of topic maps generated with

extractors. Always consult the license of the extraction source.

Page 12: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Information Extractors● Incremental extractions

● Next extraction is based on the topic map generated during previous extraction.

● Information mashup is a topic map generated using different information sources and different extractors.● For example: Compose a topic map that interleaves

information from Flickr (some photos have geo coordinates) and Geonames (more information about places).

● Limitation: Mashups are static!

Page 13: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Patterns in Information Extraction● Building a topic map with information extractors is

essentially a design process where...● User consciously triggers information extractors

depending on her current topic map, vision of goal, and the available set of information extractors and resources.

● Information-Extraction-Pattern is a recipe describing extractions and topic map operations required to achieve a desired goal.

● Information-Extraction-Patterns are part of best practices in information design and architecture.

Page 14: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Extractor Categoriesin Wandora

● Subjects● Search engines● Feeds● Classification● Language● Wiki● Bibliographical

● Social ● Media ● Simple files● HTML structures● Microformats● RDF schemas● Other

Next: Looking at selected extractors

Page 15: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Subject Extractors

Dbpedia Extractor● Dbpedia is a huge knowledge base distilled out of

Wikipedia● The extractor is used to enrich selected subjects● The extractor takes a list of terms → Builds DBpedia

URLs → Reads RDF resources → Converts RDF to topic maps

● Generated topic map is structurally like RDF● Requires no service token nor authentication● dbpedia.org

http://ww

w.wandora.org/w

andora/wiki/index.php?title=D

Bpedia_extractor

Page 16: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Subject Extractors

Subj3ct Record Extractor● Subj3ct is a web service of Networked Planet● The extractor is used to solve and bridge subjects● 5 subextractors: By identifiers, by resources, by URIs,

search, and URLs● The extractor takes input → Builds Subj3ct web

service URLs → Reads XML feeds → Converts XML feed to topic maps

● Requires no service token nor authentication● subj3ct.com

http://ww

w.wandora.org/w

andora/wiki/index.php?title=S

ubj3ct_record_extractor

Page 17: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Subject Extractors

OpenCYC Extractor● THE Knowledge base● The extractor is used to describe given subjects● Subextractors: Specializations, Generalizations,

Siblings, Comments, Denotations, Classes and Instances

● Extractors use OpenCYC web service API● Extractors take terms as input → Build web service

URLs → Read XML feeds → Convert the XML to topic maps

● Requires no service token nor authentication● opencyc.org

http://ww

w.wandora.org/w

andora/wiki/index.php?title=O

penCyc_extractor

Page 18: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Search Engine Extractor

Bing Extractor● Bing is Microsoft's search engine● The extractor uses Bing web service API and

constructs a topic map out of query and search result● The extractor can ”search” both web and images● The extractor takes search query and API key as input

→ Builds web service URLs → Reads XML feeds → Converts XML to topic map

● Idea: Think of search result as a finger print of the subject addressed by the query

● Extractor requires a Bing service API key● bing.com

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Bing_extractor

Page 19: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Feed Extractors

RSS and Atom Extractors● Extractors essentially build a topic map where feed

items are associated with the feed● Extractors take a feed URL, file or raw data as input

→ Read feed XML → Convert feed XML to Topic Maps

● Interpretation of resulting topic map depends on feed content

● Requires no service token nor authentication● If the addressed feed is open

http://ww

w.wandora.org/w

andora/wiki/index.php?title=R

SS_2.0_Extractor

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Atom

_extractor

Page 20: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Classification Extractors

OpenCalais Classifier● Calais is an entity extraction service● The extractor is used to classify text fragments and

uses OpenCalais web service API● The extractor takes a text as input → Sends the text to

OpenCalais web service → Receives an XML feed containing extracted terms → Converts the XML to Topic Maps

● Results in a topic map where extracted entities are associated with the topic representing the text fragment

● Requires an application key (included in Wandora)● opencalais.com

http://ww

w.wandora.org/w

andora/wiki/index.php?title=O

penCalais_classifier

Page 21: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Classification Extractors

AlchemyAPI Extractors● AlchemyAPI is a web service with several information-

out-of-text extractors● 4 extractors: Entity, Keyword, Category and Language● The extractors are used to classify free text● Takes plain text as input → Sends the text to

AlchemyAPI → Receives XML data → Converts the data to topic map

● Results in a topic map where extracted entities are associated with the topic representing text fragment

● Requires a personal API key● alchemyapi.com

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Alchem

yAPI_extractors

Page 22: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Language Extractors

Big Huge Thesaurus Extractor● Thesaurus containing word relations● Takes a word as input → Sends the word to BHT web

service → Receives an XML result → Parses a topic map out of the XML

● Extractor results in a topic map where a given word topic is associated with other word topics

● Requires a personal API token● words.bighugelabs.com

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Big_H

uge_Thesaurus_API_extractor

Page 23: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Language Extractors

Stands4 word describer● Thesaurus, Acronym expander● The extractor is used to describe words using

synonyms, antonyms and part-of-speech relations● Takes a word as input → Sends the word to Stands4

web service → Receives an XML result → Parses a topic map out of the XML

● Resulting topic map associates input word to an equivalent concept, and the concept to all synonym words.

● Requires a personal API token● abbreviations.com

http://ww

w.wandora.org/w

andora/wiki/index.php?title=S

tands4_word_describer

Page 24: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Wiki Extractors

Wikipedia extractor● Reads a page from Wikipedia and constructs a topic

for the page.● Takes a wiki term as input → Builds a Wikipedia

page URL → Reads the page source in XML format → Transforms the XML to a topic map.

● Source text is transformed to an occurrence.● No wiki markup cleaning!

● Extractor can be used to attach wikipedia page source to given subjects.

● Requires no API key nor authentication.

http://ww

w.wandora.org/w

andora/wiki/index.php?title=W

ikipedia_extractor

Page 25: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Bibliographical Extractors

Bibtex Extractor● Bibtex is a file format used to describe bibliographical

data, books for example. ● The extractor transforms any Bibtex file, URL resource

or raw data to topic map structures.● Uses an internal Bibtex parser

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Bibtex_extractor

Page 26: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Bibliographical Extractors

MarcXML Extractor● MarcXML is an XML variant used to describe

bibliographical data● The extractor takes a MarcXML file, a URL resource

or raw data and transforms it to a topic map i.e. topics and associations.

● Wandora also has a batch extractor for MarcXML.

http://ww

w.wandora.org/w

andora/wiki/index.php?title=M

AR

CX

ML_extractor

Page 27: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Social Extractors

Facebook Extractor● Facebook is a social media used by over 590 million

people.● Facebook's Open Graph API provides data in JSON.● The extractor takes a graph node → reads equivalent

JSON feed → converts the feed to topic map.● The extractor can be used to blueprint a social graph;

friends, feeds, likes etc. Preservation of social events and graph.

● Incremental topic map building.● Requires a valid Facebook user account. User

account limits possible extractions.

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Facebook_G

raph_extractor

Page 28: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Media Extractors

Flickr Extractor● Flickr is a photograph sharing service.● Has a service API.● Wandora has profile, photo and group extractors.● The extractors transforms Flickr data to topic maps.

The extracted topic map contains topics representing photos and information about them as other associated topics.

● Requires a Flickr user account.● Idea: Use Flickr as an image storage for a topic maps

based web service

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Flickr_extractors

Page 29: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Media Extractors

YouTube Extractor● YouTube is a video sharing service with a web service

API.● Several extractors: Extract predefined video feed,

Extract using context, Search, Extract user, Extract exact feed URL.

● The extracted topic map contains video topics associated with other topics representing additional information (author, genre, keywords, thumbnails etc.).

● Incremental extractions.● Requires a YouTube user account.

http://ww

w.wandora.org/w

andora/wiki/index.php?title=YouTube_extractor

Page 30: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Media Extractors

Last.fm Extractor● Last.fm is a social music service. Keeps track of what

you listen to. Contains general information related to music artists, records and music tracks.

● 8 different extractors: overall top tags, top albums with a tag, top artists with a tag, album info, top tracks of an artist, top tags of an artist, similar artists.

● The extractors use Last.fm web service API and convert the XML feeds to a topic map.

● Requires no API key nor authentication.● An excellent source for music related topic maps.● Incremental extractions.

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Last.fm

_extractors

Page 31: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Miscellaneous Extractors

Geonames Extractor● Geonames is a geographical database covering all

countries and over eight million place names● Geonames provides a web service API.● Wandora features a family of 11 different Geonames

extractors: Neighbours, Siblings, Children, Hierarchy, Near by, Country info, Cities, Search, Weather, Wikipedia search, Wikipedia b-box.

● The extractors build a web service URL → Read XML feed → Transform the XML feed to topic maps

● Incremental extractions.● Requires no API key nor authentication.

http://ww

w.wandora.org/w

andora/wiki/index.php?title=G

eonames_extractors

Page 32: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Miscellaneous Extractors

Simple Email Extractor● Extractor converts email files and repositories to topic

maps.● Supported formats DBX and MBOX

● Thunderbird and Outlook● Limited support for attachments● Preservation of emails

http://ww

w.wandora.org/w

andora/wiki/index.php?title=Em

ail_extractor

Page 33: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Miscellaneous Extractors

GEDCOM Extractor● GEDCOM is a file format for geneological information

i.e. individual and family relations such as child-of, married-to, birth, death, etc.

● The extractor transforms a GEDCOM file, a URL resource or raw data to a topic map.

http://ww

w.wandora.org/w

andora/wiki/index.php?title=G

EDC

OM

_extractor

Page 34: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Simple File Extractors● Directory Structure Extractor● Simple Text Document Extractor● JPG Image Extractor● Simple PDF Extractor

Page 35: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

HTML Extractors● Link extractor● Property table extractor● Association table extractor● Instance list extractor● Superclass – Subclass list extractor● Definition list extractor

Page 36: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Aug 21, 2010 – Sep 20, 2010

Usage of www.wandora.org

Page 37: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Aug 21, 2010 – Sep 20, 2010

Usage of www.wandora.org

Page 38: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Summary● Wandora is an open source Topic Maps editor

application with GNU GPL license.● Wandora contains a huge set of information

extractors.● Information extractors enable rapid topic map

construction.● An information mashup is a topic map built using

several different information extractors and information sources.

● Information-Extraction-Patterns are part of best practices in information design and architecture.

Page 39: Converting Information to Topic Maps with Wandorawandora.org/download/other/tmra10/wandora_workshop_tmra2010.… · Converting Information to Topic Maps with Wandora Tutorial, Aki

Thank You

for more information visit www.wandora.org