building and integrating competitive intelligence reports using the topic map technology

12
Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology Vojtěch Svátek, Tomáš Kliegr , Jan Nemrava, Martin Ralbovsý, Vojtěch Roček ,Jan Rauch University of Economics, Winston Churchill Sq. 4, Prague, Czech Republic Jiří Šplíchal, Tomáš Vejlupek Tovek s.r.o., Chrudimská 1418, Prague, Czech Republic

Upload: tmra

Post on 27-Jun-2015

670 views

Category:

Technology


4 download

DESCRIPTION

Competitive intelligence (CI) supports the decision makers in understanding the competitive environment by means of textual reports prepared based on public resources. CI is particularly demanding in the context of larger business clusters. We report on a long-term project featuring large-scale manual semantic annotation of CI reports wrt. business clusters in several industries. The underlying ontologies are the result of collaborative editing by multiple student teams. The results of annotation are finally merged into CI maps that allow easy access to both the original documents and the knowledge structures.

TRANSCRIPT

Page 1: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

Vojtěch Svátek, Tomáš Kliegr, Jan Nemrava, Martin Ralbovsý, Vojtěch Roček ,Jan Rauch

University of Economics, Winston Churchill Sq. 4, Prague, Czech Republic

Jiří Šplíchal, Tomáš VejlupekTovek s.r.o., Chrudimská 1418, Prague, Czech Republic

Page 2: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

CI and Business Clusters• CI – Competitive Intelligence is a sub-field of business intelligence that

supports decision makers in understanding the competitive environment by means of reports prepared based on (public) resources.

• Cluster is a set of companies in related fields operating in the same geographical area

Envisaged Solution: Create a complementary topic map that would put the important facts into context

How to link and searchmultiple CI reports?

Page 3: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

The Topic Map

1] Ontology: putting concepts into context

Topic Types Instances Associations

2] Annotate important bits of text with ontology concepts

Page 4: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

TestbedA case study assignment at an introductory knowledge engineering course,

attended by 150- 200 students each semester• The goal is to get a picture of the whole industry• Students work in groups of 5• Each group covers one company

and its environment

Two assignments:

1) Students write CI reports of about 25 pages based on publicly available sources of information.

2) Important pieces of information are expressed

in a machine-readable way with topic maps.

Each semester we tested a slightly different setting (S1-S3) of tools and techniques… now running for the fourth semester

Page 5: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

S1: Individual ontologies, merge1. Each team wrote the CI report (in a text editor)

2. Consequently, they obtained a copy of a startup ontology

3. Students extended the ontology with new topic types using Tovek Topic Mapper (TTM): an ontology editor and annotating tool (desktop application)

4. Students used TTM to annotate bits of text with a topic type.

5. Annotated text became an internal occurrence in the topic map

6. The ontologies enriched with new topic types and annotations were collected from all teams

7. We used OKS to merge the topic maps

DOC

HTML

XTM

The result is a linking file between the document and the shared topic map

Result is a linking file conneting document with the topic map

Startup Ontology

Extend ontology

Annotate

Page 6: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

Topic Maps Merging• Merging of: Business cluster topic map, All unstructured documents, Linking files

Shared industry topic map

Linking files CI reportsXTM HTMLDOC

Page 7: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

Issues• Annotated text fragmented, since each fragment is stored as

internal occurrence

• Laborious• Duplicate topic types• Effective merging requires unique identifiers, which was achieved

only for companies (registration numbers used in subject indicators)

Page 8: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

S2: Collaborative Ontology Population

Collaborative Ontology Creation remote repository

Shared topic mapstudents

Goal: remove duplicate topic types1. Startup ontology was placed on a PostgreSQL server2. Student teams collaboratively enriched the ontology with topic

types, association types and occurrence types they assumed to use during the annotation in Topic Mapper

3. The ontology was then frozen: each team got its copy. 4. TTM was used only for annotation, and then OKS for merging

Importontology

Annotate only

Topic MapsforMerging

Page 9: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

Issues

• Separation of ontology enrichment and document annotation is not natural and requires an experienced annotator

• Annotations still kept as internal occurrences

• Multiple concurrent instances of OKS servers resulted in corruption in the topic map, probably due to caching in OKS

• Two topic map tools used, original documents not easily accessible

Page 10: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

S3: Annotation by linking

Goal: move annotation fully to the web

1. All students used one instance of OKS server

2. CI reports were placed into a CMS (Joomla!)

3. Each structural unit was assigned an id (via HTML’s <a name>)

4. Annotation was done via external occurrences

External occurrences point at a specific bookmark at the document, where the annotated fragment starts. The annotated fragment is assumed to span up to the nearest following bookmark.

Page 11: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

Issues … and finally advantages

Issues:• OKS Ontopoly was not stable enough in concurrent setting• X-Pointer technology, which could be used to mark spans in the

document, is not supported by current browsers

Advantages:• The text with full content (including even figures or links)

in the CMS is more intelligible than fragments in internal occurrences

• Further editing of an article is possible in the CMS without invalidating the annotation

• Full-text search feature of the CMS can be exploited• Bringing the best from the CMS world and OKS

Page 12: Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology

Summary & Plans• On the competitive intelligence use case, we tested several approaches for

collaborative ontology design and document annotation with some 500 users altogether.

• OKS is a great tool, which gets additional edge by being web-based• We deem the last approach taken: documents stored in a CMS linked through

external occurrences with OKS as usable - contingent on improvements in Ontopoly and Joomla!

Ontopoly wishes• Greater stability in case of concurrent user access• We missed user management and versioning in Ontopoly

Joomla! wishes• Support for „tagging“ arbitrary bits of text• A tool for creating XPointer URLs based on user selection• A functionality that would highlight part of the document based on a URL

containing XPointer span