knowledge engineering for teldap
DESCRIPTION
Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAPResearch Fellow Research Center for Information Technology Innovation &Institute of Information Science, Academia SinicaTRANSCRIPT
Knowledge Engineering for TELDAP
Keh-Jiann Chen Principal Investigator
Core Platforms for Digital Contents Project, TELDAPResearch Fellow
Research Center for Information Technology Innovation &Institute of Information Science, Academia Sinica
Outline
IntroductionUnion catalogDatabases and metadata for digital contents and websitesKnowledge engineeringFuture perspective
IntroductionThe integration and management of digital contents has become an important issue as the amount of digital contents produced from different projects and institutions increases rapidly.Our project goal is to achieve optimized preservation, retrieval, and presentation of digital collections.
1. Union Catalog
What is the union catalog?
It is a catalog and portal for all digital collections of
TELDAP.
It is an integrated platform for browsing and searching
entire digital contents of TELDAP.
Metadata provides core descriptions and licensing
information of each digital collection.
Browsing by topics
Search by keywords
Home Page of Union Catalogs
2. Databases and metadata for digital contents and websites
Metadata models for different types of objects
Archived digital itemsUnion catalog metadata model- Dublin core+
Web sitesDCCAP (Dublin Core Collections Application Profile)Fields for internal used only
Unique Identifier, Format, Evaluation, Cataloging History
DocumentsDocument metadata-Dublin core
9
Metadata for
digital items:
Over 2 million
digital items and
still increasing
Element Definition
Title A name given to the resource
Creator An entity primarily responsible for making the content of the resource
Subject and Keywords The topic of the content of the resource
Description An account of the content of the resource
Publisher An entity responsible for making the resource available
Contributor An entity responsible for making contributions to the content of the resource
Date A date associated with an event in the life cycle of the resource
Resource Type The nature or genre of the content of the resource
Format The physical or digital manifestation of the resource
Resource Identifier An unambiguous reference to the resource within a given context
Source A Reference to a resource from which the present resource is derived
Language A language of the intellectual content of the resource
Relation A reference to a related resource
Coverage The extent or scope of the content of the resource
Rights Management Information about rights held in and over the resource
10
Metadata for websitesOver 200 websites and still increasingMetadata
DCCAP (Dublin Core Collections Application Profile)To Combine the standard with our requirements: 19 data fields
The Website Homepage Picture
URL, Project Information
Type, Name, Author, Subject, Description, Language, Item Type, Target
Archived Information:URL, time, authorization
Copyright, Purpose, Other Information
Figure: http://digitalarchives.tw
Metadata for websites
Dynamic categorizationUser-oriented categorization
General, elementary school students, high school students, researchers, …etc.
Topical-based categorizationArchaeology, painting, animal, plant, document, …etc.
Functional-based categorizationResearch, education, business, technology,…
Categorization based on institutionsAcademia Sinica, Taiwan U., Palace museum,…
Purpose: EducationTarget: Elementary school student,
Junior high school student, Teacher…
Select Items: According to 40 evaluation indicators, select top 5 websites
Purpose: Creative applicationsSelect Items: According to 40 evaluation indicators, select top 5 websites
Purpose: Academic researchSubject: Animal, Archaeology, Anthropology…Select Items: According to 40 evaluation indicators, select top 3 websites
Figure: http://digitalarchives.tw
Metadata for project documentsOver 5000 documents and still increasingMetadata- Dublin coreConstruct Teldapwiki- A Wikipedia for Teldap http://wiki.teldap.tw/
3. Knowledge Engineering
Plans of making knowledge structures for TELDAP
Construct metadata models for different objects.Establish hyperlinks between contexts and objects.
Develop keyword extraction tools.Design automatic tagging tools.
Construct Teldap ontology and thesaurusArt & Architecture Thesaurus by GettyChinese WordNet
(1) Metadata models for different objectsDigital collections
Union catalog metadata model- Dublin core+Web sites
DCCAP (Dublin Core Collections Application Profile)Public fieldsPrivate fields
Unique Identifier, Format, Evaluation, Cataloging History
DocumentsDocument metadata-Dublin core
(2) Establish hyperlinks between contents and objects
Identify keywords in contentsTag keywords with related object hyperlinks
Develop hyperlink tagging toolsWord segmentation tools
Resolve word segmentation ambiguities and identify keywords.CKIP word segmentation system: http://ckipsvr.iis.sinica.edu.tw/
Develop hyperlink tagging toolsTELDAP keyword dictionary
Extract keywords from metadata and establish object-keyword relations.
Extract text from XML data for each objectThe text are classified by topics, titles, descriptions, authors, locations, eras etc.From each class of text file extract keywords by automatic word segmentation and keyword extraction techniques.
Prototype system for hyperlink taggerIdentify and select keywords from the input text
Prototype system for hyperlink taggerProduce text with hyperlinks
Prototype system for hyperlink taggerHyperlinks point to the related digital collections
(3) Construct Teldap ontology and thesaurusTopical relationSynonym relation蘇軾=蘇東坡= Sushi鄭成功=延平郡王
Hypernym/hyponym器物→[陶器、瓷器]/[杯、盤、碗、甕]
Establish implicit links between objects by author, material, object type, …etc..
(3) Construct Teldap ontology and thesaurusEstablish association links between Chinese keywords and Getty AAT.Merging Chinese WordNet with English WordNet
Technology developmentConstruct multi-lingua thesauri – Getty AATMaintain the TELDAP keyword and object relation databaseConstruct name authority files, gazetteers, and universal calendarsDesign hyperlink taggers and keyword extension toolsDesigning authoring tool which provides hyperlinks of keyword related digital contents automaticallyDesign knowledge-based content retrieval system
Future Perspectives
Content enrichmentWithin TELDAP:
Standardize object metadata model and data formatAll TELDAP objects should have their metadataWriting scripts and stories for different topics with Wiki-like knowledge structureEnrich the digital collections Establish hyperlinks between text books and TELDAP collections
Extend the knowledge sources:e.g. Wikipedia
Future Perspectives
Thank you for your attention!敬請指教