building and integrating competitive intelligence reports using the topic map technology
DESCRIPTION
Competitive intelligence (CI) supports the decision makers in understanding the competitive environment by means of textual reports prepared based on public resources. CI is particularly demanding in the context of larger business clusters. We report on a long-term project featuring large-scale manual semantic annotation of CI reports wrt. business clusters in several industries. The underlying ontologies are the result of collaborative editing by multiple student teams. The results of annotation are finally merged into CI maps that allow easy access to both the original documents and the knowledge structures.TRANSCRIPT
Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology
Vojtěch Svátek, Tomáš Kliegr, Jan Nemrava, Martin Ralbovsý, Vojtěch Roček ,Jan Rauch
University of Economics, Winston Churchill Sq. 4, Prague, Czech Republic
Jiří Šplíchal, Tomáš VejlupekTovek s.r.o., Chrudimská 1418, Prague, Czech Republic
CI and Business Clusters• CI – Competitive Intelligence is a sub-field of business intelligence that
supports decision makers in understanding the competitive environment by means of reports prepared based on (public) resources.
• Cluster is a set of companies in related fields operating in the same geographical area
Envisaged Solution: Create a complementary topic map that would put the important facts into context
How to link and searchmultiple CI reports?
The Topic Map
1] Ontology: putting concepts into context
Topic Types Instances Associations
2] Annotate important bits of text with ontology concepts
TestbedA case study assignment at an introductory knowledge engineering course,
attended by 150- 200 students each semester• The goal is to get a picture of the whole industry• Students work in groups of 5• Each group covers one company
and its environment
Two assignments:
1) Students write CI reports of about 25 pages based on publicly available sources of information.
2) Important pieces of information are expressed
in a machine-readable way with topic maps.
Each semester we tested a slightly different setting (S1-S3) of tools and techniques… now running for the fourth semester
S1: Individual ontologies, merge1. Each team wrote the CI report (in a text editor)
2. Consequently, they obtained a copy of a startup ontology
3. Students extended the ontology with new topic types using Tovek Topic Mapper (TTM): an ontology editor and annotating tool (desktop application)
4. Students used TTM to annotate bits of text with a topic type.
5. Annotated text became an internal occurrence in the topic map
6. The ontologies enriched with new topic types and annotations were collected from all teams
7. We used OKS to merge the topic maps
DOC
HTML
XTM
The result is a linking file between the document and the shared topic map
Result is a linking file conneting document with the topic map
Startup Ontology
Extend ontology
Annotate
Topic Maps Merging• Merging of: Business cluster topic map, All unstructured documents, Linking files
Shared industry topic map
Linking files CI reportsXTM HTMLDOC
Issues• Annotated text fragmented, since each fragment is stored as
internal occurrence
• Laborious• Duplicate topic types• Effective merging requires unique identifiers, which was achieved
only for companies (registration numbers used in subject indicators)
S2: Collaborative Ontology Population
Collaborative Ontology Creation remote repository
Shared topic mapstudents
Goal: remove duplicate topic types1. Startup ontology was placed on a PostgreSQL server2. Student teams collaboratively enriched the ontology with topic
types, association types and occurrence types they assumed to use during the annotation in Topic Mapper
3. The ontology was then frozen: each team got its copy. 4. TTM was used only for annotation, and then OKS for merging
Importontology
Annotate only
Topic MapsforMerging
Issues
• Separation of ontology enrichment and document annotation is not natural and requires an experienced annotator
• Annotations still kept as internal occurrences
• Multiple concurrent instances of OKS servers resulted in corruption in the topic map, probably due to caching in OKS
• Two topic map tools used, original documents not easily accessible
S3: Annotation by linking
Goal: move annotation fully to the web
1. All students used one instance of OKS server
2. CI reports were placed into a CMS (Joomla!)
3. Each structural unit was assigned an id (via HTML’s <a name>)
4. Annotation was done via external occurrences
External occurrences point at a specific bookmark at the document, where the annotated fragment starts. The annotated fragment is assumed to span up to the nearest following bookmark.
Issues … and finally advantages
Issues:• OKS Ontopoly was not stable enough in concurrent setting• X-Pointer technology, which could be used to mark spans in the
document, is not supported by current browsers
Advantages:• The text with full content (including even figures or links)
in the CMS is more intelligible than fragments in internal occurrences
• Further editing of an article is possible in the CMS without invalidating the annotation
• Full-text search feature of the CMS can be exploited• Bringing the best from the CMS world and OKS
Summary & Plans• On the competitive intelligence use case, we tested several approaches for
collaborative ontology design and document annotation with some 500 users altogether.
• OKS is a great tool, which gets additional edge by being web-based• We deem the last approach taken: documents stored in a CMS linked through
external occurrences with OKS as usable - contingent on improvements in Ontopoly and Joomla!
Ontopoly wishes• Greater stability in case of concurrent user access• We missed user management and versioning in Ontopoly
Joomla! wishes• Support for „tagging“ arbitrary bits of text• A tool for creating XPointer URLs based on user selection• A functionality that would highlight part of the document based on a URL
containing XPointer span