knowledge organization in digital libraries (ii) digital libraries info 653 week 6 xia lin college...

47
Knowledge Organization in Digital Libraries (II) Digital Libraries INFO 653 Week 6 Xia Lin College of Information Science and Technology Drexel University

Upload: oscar-evans

Post on 26-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Knowledge Organization in Digital Libraries (II)

Digital LibrariesINFO 653

Week 6

Xia LinCollege of Information Science and

TechnologyDrexel University

Approaches: Keyword Indexing

Making search engines functional Metadata (bottom-up)

Extending traditional subject indexing Classification (Top-down)

Using a structured classification frame to provide hierarchical browsing and access.

Ontology Approach

Keyword Indexing Highly automated process. Use every meaningful word to

index documents. Make search engines functional Make large amount of

information accessible.

MetaData Approach Digital Object Identifiers Dublin Core

Subject tag Description tag

RDF Data model Resource

Classification Approach Use Current Classification Scheme

LC Classification Dewey Classification Most projects are not completed

A mile wide an inch deep Use ad-hoc classification schemes

Yahoo style hierarchical list Use automatic classification

Ontology Approach Ontologies

Define not only concepts but also relationships of concepts.

Define both links and types of links.

Ontology An ontology is a specification of a

conceptualization. An ontology is a description (like a

formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.

An ontology is a commitment to use the shared vocabulary in a coherent and consistent manner.

Projects

People

Government

Organizations

Concepts(taxonomy and

ontology)

Document

WorkforcePrograms

InfoResources

Events(conferences,workshops, ...)

write

describes

initiates

refers-to

includes

is-related-to

Policy and regulation Documents

Guides,Handbooks

Presentations

example-of

example-of

Describes

sponsors

represents

sponsors

refers-to

uses

describes

Peter Creticos

example-of

is-part-of

Cases that worked Lessons

learned

example-ofexample-of

sponsors

Work Force Digital Library Ontology

Why Develop an Ontology?

To enable a machine to use the knowledge in some application.

To enable multiple machines to share their knowledge.

To help yourself understand some area of knowledge better.

To help other people understand some area of knowledge.

To help people reach a consensus in their understanding of some area of knowledge.

Ontology and thesaurus Ontology inherits the ideas,

purposes, and functions of the thesaurus.

Ontology extends relationships among concepts beyond those in thesaurus (NT, BT, RT, Synonyms).

Ontology intends to be consumed by both human and machine.

Topic Maps A key component of Semantic

Web A new ISO standards

ISO 13250 Topic Maps XML-like syntax

XML Schema XTM: XML Topic Maps

XTM Topic MAPS XML Topic Maps(XTM) defines an

abstract model and XML grammar for topic maps. XTM does not define topic maps at

the implementation level. Each implementation may interpret

XTM differently or define their own “metadata” with the framework of XTM.

TAO of Topic Maps <topicmap>

TOPIC topname

basename dispname sortname

OCCURS ASSOC

assocrl facet

fvalue addthms

</topicmap>

Topic Maps for Knowledge Representation

Establishing an associative network between resources which represent concepts

Organizing legacy resources into a new information/knowledge space, by relating them to topics, and associating those topics, in a structured way

Enabling disparate sets of information resources to be used together, by interrelating them using a unifying conceptual framework

Topic Map Implementation Why is topic map implementation

hard? There are no “magic” solutions for content

representation. It is labor-intensive and involves many

manual activities to create a complete TAO.

There are no good tools for topic map creation.

XML is not designed to let end-users work directly on objects contained in a XML file.

Topic Maps and Thesaurus

Different Directions of indexing Thesaurus: assign descriptors to

documents Topic maps: associate occurrences to

terms Different structures

Thesaurus: mainly a hierarchy plus some cross-references

Topic Maps: more link types

ALL Together –

XML RDF

Ontology

Topic Maps

Thesaurus

ClassificationKeyword indexing

Metadata

Semantic Web

Libraries

Personal Research Projects Explore solutions to make

knowledge organizing practical Knowledge Class KEPT Knowledge Middleware

Knowledge Class Purposes

to customize knowledge organization and access,

to supplement and complement existing devices for Web users, and

to explore the possibility of combining existing methods of knowledge organization with advanced Web technology.

Knowledge Class Design Principles

balance of browsing and searching

balance of manual indexing and automatic indexing

balance of personal (topical) information space and the whole web space

Knowledge Class Three components

an organizing framework a dynamic web interface Search strategies for each term

Knowledge Class Features A hierarchical structure of subject

terms constructed on classification principles

Multiple levels of knowledge organization --Expandable and contractible branches of the hierarchy to allow varying levels of depths,

Static links to remote resources and related sites or pages

Dynamic links to target information through search engines such as Google, AltaVista, InfoSeek, Yohoo!, and Lycos, etc.

Coded search strategies for terms Use of scope terms for classes and for branches

Knowledge Class Features Referral links among terms within a knowledge

class and potentially among knowledge classes to assist cross reference.

Instant switch among search engines available over the Web to allow access of a variety of resources covered by different search engines.

A Knowledge Class for Digital Libraries

Developed by students two years ago

Yahoo Categories: References – Libraries – Digital

Libraries: Cataloging Electronic Resources@ Conferences (5) Electronic Literature@ Electronic Theses and Dissertations (ETDs) (14) Metadata@ Organizations (2) Projects and Collections (33)

IFLA page: Resources and Projects Cataloguing & Indexing of Electronic

Resources Electronic Text & Journal Archives Metadata Resources

Digital Libraries: a Selected Resource Guide

Overview and general resources Project planning & management Architecture Technology Standards and guidelines Archiving & Preservation Metadata Intellectual property rights.

Northern Light folders Digital Libraries

Special collections Conferences dlib.org dlib.org.ar uh.edu rutgers.edu stanford.edu stfx.ca vt.edu uni-trier.de ucla.edu Class notes & Assignments all others...

Digital libraries by William Y. Arms:

Table of Contents 1 Libraries, Technology, and People2 The Internet and the World Wide Web3 Libraries and Publishers4 Innovation and Research5 People, Organizations, and Change6 Economic and Legal Issues7 Access Management and Security8 User Interfaces and Usability9 Text10 Information Retrieval and Descriptive Metadata11 Distributed Information Discovery12 Object Models, Identifiers, and Structural Metadata13 Repositories and Archives14 Digital Libraries and Electronic Publishing Today

Practical Digital Libraries: Books, Bytes, and Bucks by Michael Lesk

1. Evolution of Libraries 2. Text Access Methods 3. Images of Pages 4. Multimedia Storage and Access 5. Knowledge Representation Methods 6 Distribution 7 Usability and Retrieval Evaluation 8 Collections and Preservation 9 Economics 10 Intellectual Property Rights 11 International Activities 12 Future: Ubiquity, Diversity, Creativity, and Public

Policy

How do I build a Thesaurus Use existing dictionaries and thesauri to decide on the terms and their relationships.

Collect a set of representative documents and try to index them; take the set of indexing terms as your preliminary list.

Review and organize the preliminary term set: decide on preferred terms and make Use

references from the variants and synonyms;

build hierarchical and associative relationships among the preferred terms.

Produce a draft list, test and revise.

Scope terms Each knowledge class can have one

scope term to limit the search scope: Technology -- will be searched by

technologies AND “digital libraries” in the kclass of Digital Libraries.

Each branch of knowledge class can have one scope term: Issues – in Technology branch will be

search by “Issues and Technology and digital libraries”

Data Format –first year

--, mutual funds, mutual-funds Investment-trusts Unit-trusts, http://www.brill.com, 1 1. Hierarchical level 2. Display term 3. Search term (synonyms) 4. URL 5. Search strategy code

Second year-- Last Year’s student

project <topicmap title="Digital Libraries"> <topic id="General Resources" type="Main category"> <topic id="Bibliography"> <topname> <basename>Bibliography</basename> <dispname>Bibliography</dispname> <sortname></sortname> </topname> <occurs> </occurs> <topic id="IFLA bibliography" type="reference"> <topname> <basename>IFLA bibliography</basename> <dispname>IFLA bibliography</dispname> <sortname></sortname> </topname> <occurs> type="website"

href="http://www.ifla.org/II/diglib.htm" </occurs> </topic>

Third year: Visual Editing

Search Strategykey word search:

0 search term + branch scope term + class scope term1 search term + class scope term2 search term only

Phrase search:3 search term (as a phrase) +branch scope term + class scope term4 search term (as a phrase) + class scope term5 search term (as a phrase)

Hierarchical search:6 search term +its all the children + branch scope term + class scope term7 search term +its all the children +class scope term8 search term +its all the children

No search:9 No search No link for this display term; Label only

Search terms+ display term:10 same as 0 except display term also adds to the query11 same as 1 except display term also adds to the query12 … …

Digital Libraries General Resources Technology Projects Indexing & Cataloging

Knowledge representation Metadata Resources Collections and Repositories Digital Preservation Economic and legal issues

Intellectual Property Rights People and organizations

Next Version Convert to XML Use topic map standards Improve the editing tool

Next Integration: KEPT

InformationResources

Knowledge-Enabled Personalization Tool (KEPT)

Web Browser

HTTP Server

XML Application Server

RDF-ISOStandards

Search engines

OAI protocol

Knowledge RepositoryDrag and drop

Hierarchical Generator

Co-occurrence Mapping

Topic Map Editor

Searching/Browsing Interface

XM

LSc

hem

a

XM

LX

SLT

Relational DatabaseThesauri

OntologiesTopic maps

…….

New InterfaceSearch: Recycling

TopicMapRelated Terms: Conservation (Environment) Depleted Resources Ecology Natural Resources Pollution Recycling Solid Wastes Waste Disposal Waste Water Wastes Water Treatment

Broader Terms:Sanitation Waste Disposal Recycling

ERIC Thesaurus

Co-occurrence Terms: Environmental Education Waste Disposal Conservation (Environment) Science Education Natural Resources Solid Wastes Ecology Pollution Learning Activities Higher Education Wastes Instructional Materials Conservation Education Energy Environment

ERIC Database

MeSH Terms matched “Pollution”:Air PollutionAir Pollution, Indoor Indoor Air PollutionAir Pollution, RadioactiveEnvironmental Pollution Pollution, EnvironmentalTobacco Smoke Pollution Air Pollution, Tobacco Smoke Environmental Pollution, Tobacco Smoke Environmental Smoke Pollution, Tobacco Environmental Tobacco Smoke PollutionWater Pollution Thermal Water Pollution Water Pollution, ThermalWater Pollution, Chemical Chemical Water PollutionWater Pollution, Radioactive

Secondary Source:

Primary Source: ERIC Thesaurus

MeSH

Recycling Ecology Wastes Waste Water Waste disposal

Pollution Air pollution Water pollution Indoor pollution Energy Natural Resources Water Power Conservation Education Attitudes Motivations ……

Next Level: Building a Knowledge Middleware

CORECollections

CORECollections

CORECollections

Visual Interface

ThesaurusA

ThesaurusB

ThesaurusC

Unification

PFNET mapping

Switching Latent Semantic

Crosswalks

RepositoryKnowledge Base

Authoring toolPersonalizedtopic maps

Kohonen Mapping

Semantic Neighborhoods

Knowledge Repository Middleware

Search Engine

Personalization

Conceptual structures

The Knowledge Middleware

A centralized repository that integrates diverse knowledge structures

A set of mapping tools and protocols for crosswalks among various thesauri;

A dynamic knowledge base for semantic neighborhoods that uses term occurrences and co-occurrences

A web-based authoring and editing tool for building personalized topic maps from existing knowledge structures in the repository

A visual search interface for content-base searching with the help of knowledge structures in the repository.

A semantic map for “Digital Libraries” in

INSPEC database

Conclusions

Knowledge Organizing is one of the major challenges of Digital Libraries.

There are increasing demand for formalized (marked up) knowledge.

There are increasing tools and specification for subject access (or knowledge access) to the Web and to Digital libraries.

References Xiao, Y. (1994). Facet Classification: A

consideration of its features as a paradigm of knowledge organization. Knowledge Organization 21(2), pp. 64-68.

Bies, W. (1996). Thinking with the help of images: on the metaphors of knowledge organization. Knowledge Organization 23(1), pp. 3-8.

Huth, M. (1995). Symbolic and sub-symbolic knowledge organization in the computational theory of mind. Knowledge Organization 22(1), 10 - 17.