blaz fortuna, marko grobelnik, dunja mladenic jozef stefan institute ontogen semi-automatic...

Post on 11-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Blaz Fortuna, Marko Grobelnik, Dunja MladenicJozef Stefan Institute

http://ontogen.ijs.si

ONTOGEN SEMI-AUTOMATIC

ONTOLOGY EDITOR

Outline

Motivation Functionality Conclusion

HCII2007, July 26th

2

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Motivation

HCII2007, July 26th

3

Blaz Fortuna, Jozef Stefan Institute, Slovenia

What is ontology?

Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts.

Generally it consist of Classes: sets, collections, or types of objects Instances: the basic or "ground level" objects Relations: ways that objects can be related to one another

It can be used … as schema for knowledge management system, … to reason about the objects within that domain, etc.

HCII2007, July 26th

4

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Sample Ontology

HCII2007, July 26th

5

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Ontology is normally designed by knowledge engineers using ontology editors: Protégé, OntoStudio, …

Domain experts are needed to aid the knowledge engineer at the understanding the domain Ontology editors are not aware of

the ontology’s domain

Our goal is to make ontology editor easy-to-use and domain-aware so that it can be used by domain experts. Reduces the need for knowledge

engineer This is done through the use of text

mining and machine learning.

In this presentation we focus on construction of Topic Ontologies

Ontology Editor

Creating Ontology

HCII2007, July 26th

6

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Domain Expert

Domain Expert

Knowledge

Engineer

Knowledge

Engineer

Xerox

Xerox Corporation is a technology and services enterprise engaged in developing, manufacturing, marketing, servicing and financing a portfolio of document equipment, software, solutions and services. It manages its business in four segments: Production, Office, Developing Markets Operations (DMO) and Other. The Production segment includes black and white products, which operate at speeds over 90 pages per minute …

Xerox

Xerox Corporation is a technology and services enterprise engaged in developing, manufacturing, marketing, servicing and financing a portfolio of document equipment, software, solutions and services. It manages its business in four segments: Production, Office, Developing Markets Operations (DMO) and Other. The Production segment includes black and white products, which operate at speeds over 90 pages per minute …

Yahoo!

Yahoo! Inc. is a provider of Internet products and services to consumers and businesses through the Yahoo! Network, its worldwide network of online properties. The Company's properties and services for consumers and businesses reside in four areas: Search and Marketplace, …

Yahoo!

Yahoo! Inc. is a provider of Internet products and services to consumers and businesses through the Yahoo! Network, its worldwide network of online properties. The Company's properties and services for consumers and businesses reside in four areas: Search and Marketplace, …

The Washington Post

Company's principal business activities consist of newspaper publishing (principally The Washington Post), television broadcasting (through the ownership and operation of six television broadcast stations), the ownership and operation of cable television systems, magazine publishing (principally Newsweek magazine), and (through its Kaplan subsidiary) the provision of educational services. …

The Washington Post

Company's principal business activities consist of newspaper publishing (principally The Washington Post), television broadcasting (through the ownership and operation of six television broadcast stations), the ownership and operation of cable television systems, magazine publishing (principally Newsweek magazine), and (through its Kaplan subsidiary) the provision of educational services. …

How does it work?

OntoGen suggests concepts Suggestions are generated automatically

… from the text corpus by clustering similar documents … based on user query … through text corpus map

User selects appropriate suggestions and adds them to the ontology OntoGen helps deciding which suggestions to include

… by extracting main keywords from the documents … with ontology and concept visualizations … by list documents behind concepts

Behind each concept there is a set of documents Documents are automatically assigned to concepts Document assignments can be edited manually

HCII2007, July 26th

7

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Example

Domain

Text corpus Ontology

Concept A

Concept B

Concept C

HCII2007, July 26th

8

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Functionality

HCII2007, July 26th

9

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Main Features

Interactive user interface User can interact in real-

time with the integrated machine learning and text mining methods

Concept discovery methods: Unsupervised

System provides suggestions

Supervised Concept learning Concept visualization

Methods for helping at understanding the discovered concepts: Keyword extraction

Generates a list of characteristic keywords of a given concept

Concept visualization Creates a map of

documents from a given concept

Also available as a separate tool named Document Atlas

http://docatlas.ijs.si

HCII2007, July 26th

10

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Main view

Concept hierarchyConcept

hierarchy

List of suggested sub-concepts

List of suggested sub-concepts

Ontology visualization

Ontology visualization

Selected conceptSelected concept

11

Concept suggestion

Selected conceptSelected concept

12

Suggested subconcepts

Suggested subconcepts

Add new conceptAdd new concept

New concept

New concept

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Personalized suggestions13

Topics view

Countries view

UK takeovers and mergersThe following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover …

UK takeovers and mergersThe following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover …

Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about …

Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about …

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Concept learning14

QueryQuery

New ConceptNew ConceptFinis

hFinis

h

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Concept’s instances visualization

15

Instances are visualized as points on 2D map The distance

between two instances on the map correspond to their content similarity

Characteristic keywords are shown for all parts of the map

User can select groups of instances on the map to create sub-concepts.

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Concept management

Concept’s details

Concept’s details

Concept’s instance

management

Concept’s instance

management

Selected conceptSelected concept

KeywordsKeywords

Selected instanceSelected instance

16

New documentsNew documents

Classification of selected document

Classification of selected document

Content of selected

document

Content of selected

document

Adding new documents to ontology

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

17

Selected documentSelected

document

Conclusions

HCII2007, July 26th

18

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Evaluation

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

19

First prototype was successfully used in several commercial projects: Applied in multiple domains: business, legislations and digital

libraries Users were always domain experts with limited knowledge and

experience with ontology construction / knowledge engineering Valuable data from first trails was used as input for the interface

design of the second prototype (the one presented here). Feedback from the users of the second prototype

Main impression was that the tool saves time and is especially useful when working with large collections of documents

Among main disadvantages were abstraction and unattractive look

Many users use the program for exploration of the data

Future work

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

20

Tools for suggestion and learning of more complex relations

Extended support for collaborative editing of ontologies

Easier input of background knowledge Improvement of the user interface based on the

feedback from user trails and real-world users

Questions? Comments?

Thank you for listening!

HCII2007, July 26th

21

Blaz Fortuna, Jozef Stefan Institute, Slovenia

http://ontogen.ijs.si

top related