etana-add: an interactive tool for integrating archaeological dl collections jcdl 2006, chapel hill,...

34
ETANA-ADD: An Interactive Tool for Integrating Archaeological DL Collections JCDL 2006, Chapel Hill, NC June 13, 2006 Naga Srinivas Vemuri, Rao Shen, Sameer Tupe, Weiguo Fan, Edward A. Fox [email protected] http://fox.cs.vt.edu

Upload: benjamin-gregory

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

ETANA-ADD: An Interactive Tool for Integrating

Archaeological DL Collections

JCDL 2006, Chapel Hill, NCJune 13, 2006

Naga Srinivas Vemuri, Rao Shen, Sameer Tupe, Weiguo Fan,

Edward A. Fox

[email protected] http://fox.cs.vt.edu

Acknowledgements (Selected)

• Sponsors: NSF grant ITR-0325579, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech

• VT Students: Vidhya Vijayaraghavan, other DLRL members

• Others: Umm el-Jimal Dig Team

Acknowledgements (Selected)

• Karen Borstad, MPP

• Giorgio Buccellati, UCLA

• Douglas Clark, Walla Walla College

• Joanne Eustis, CWRU

• Nick Fischio, CWRU

• Israel Finkelstein, Tel-Aviv University

• Paul Gherman, Vanderbilt U.

• Andrew Graham, U. Toronto

• Tim Harrison, U. Toronto

• Larry Herr, Canadian University College

• Christopher Holland, LRP

• Paul Jacobs, Mississippi State U.

• Douglas Knight, Vanderbilt U.

• Stan LaBianca, Andrews U.

• David McCreery, Willamette U.

• Eric Meyers, Duke U.

• Adam Porter, Illinois College

• Jack Sasson, Vanderbilt U.

• Tom Schaub, Indiana U. of Penn.

• Randall Younker, Andrews U.

Outline

IntroductionRelated WorkETANA-ADD ToolConclusions

Introduction

ETANA, ETANA-DL, 5S

What are the issues involved in integrating new collections into a DL, with evolving metadata schema (i.e., bottom up schema evolution)?

Can we partially automate the process of integrating new collections in such situations?

ETANA-DL

Heterogeneity: 8 archaeological sites, 13 different artifact types

Example artifact types: Bone, Burial, Figurine, Locus, Pottery, Seed, etc.

Union services: Multidimensional Browsing, Searching, Recommendation, Annotation, etc.

ETANA-DL (Cont.)

Individual (archaeological) site approachLocal conventions for metadataCustom built services

ETANA-DLProvides union services across

sitesA global schema based on

incremental approach

The Mapping Process in ETANA

Mapping Process: Global schema defines collections (metadata) in the system using an incremental approach.

Adding a new artifact collection if artifact type is already defined,

perform mapping if artifact type is not defined, then

extend global schema and perform mapping

The Whole Integration Process

Conversion process: custom DB to XML format Needs to identify metadata elements in DB Results in local XML data, local XML schema

Mapping process Needs to perform schema mapping Results in global schema extension, and the

evolution of a new global collection Integration process

New site to be “published” as OAI Provider OAI harvesting results in integration of the

new collection.

The Integration Problem

Problem: The integration process requires both technical and domain expertise.

Propose: Partially automate the process to minimize the need for technical skills

Solution: ETANA-ADD Tool

Outline

IntroductionRelated WorkETANA-ADD ToolConclusions

Related Work

Gatherer: A tool used in Greenstone for adding new collections.Tightly coupled with Greenstone No knowledge of its ability to

handle evolving schema and its content

Database to OAI Provider: OAICat and OAI PMH2 PerlDoesn’t accommodate mapping

process

Related Work (Cont.)

OCHRE proposed archaeoML to define DL collectionsDoesn’t automate integration

processAbility to handle heterogeneous data

is not known Altova MapForce

Doesn’t support incremental mapping

Outline

IntroductionRelated WorkETANA-ADD ToolConclusions

ETANA-ADD Tool

An interactive tool for end usersPartially automates the integration

processMinimizes the need for technical skillsReuses existing tools to some extent,

by providing easy GUI wrapper on top of them

ETANA-ADD Tool (Cont.)

The process flow involved while using the tool:

DB2XML Schema Mapper

OAI XML File Data Provider

An Integration Scenario

Adding burial artifacts collection to ETANA DLPerform DB2XML process

using ETANA-ADDPerform Schema MappingPublish Burial Collection as

OAI Data Provider

Initial Screen with Umm el-Jimal Database Open

Tables corresponding to Burial artifact selected

Performing join on tables for burial artifact

DB2XML Process Complete

Invoking Schema Mapper

Opening Global Schema

Performing MappingProcess

Extending Global Schema toIntegrate Burial Artifact

Mapping Complete, Generating Global XML Collection

Complete Global XML Generation, Publishing as OAI Provider

Publishing as OAI Provider

Results

Integrated Ummm el-Jimal site with the help of ETANA-ADD Bone, Burial, Locus, Miscellaneous

Artifact, Pottery, Pottery Bucket. No additional code written A comparison with earlier integrated

site, Megiddo (7 artifact collections)

Results (Cont.)

Umm el-Jimal

Megiddo

Additional LOC Required

0 1350

Human Hours 2 20

Outline

IntroductionRelated WorkETANA-ADD ToolConclusions

Conclusions

Target users: Administrators handling archaeology data (to be invited in fall for usability studies)

Developed ETANA-ADD to minimize technical expertise in integrating new archaeological collections

Willing to share our software, which may be applicable to other domains with similar problems (i.e., with evolving global schema)

References

Brainbridge, D., Thompson, J., and Witten, I. H. Assembling and enriching digital library collections. In Proc. JCDL 2003: 323-334.

Raghavan, A., Vemuri, N. S., Shen, R., Gonçalves, M.A., Fan, W. and Fox, E.A. Incremental, Semi-automatic, Mapping-Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study. In Proc. ECDL 2005: 139-150.

Ravindranathan, U., Shen, R., Gonçalves, M.A., Fan, W., Fox, E. A., Flanagan, J.W., ETANA-DL: a digital library for integrating heterogeneous archaeological data. In Proc. JCDL 2004: 76-77.

Suleman, H. Open Digital Libraries, Ph.D. Dissertation, Dept. Comp. Sci., Virginia Tech, http://scholar.lib.vt.edu/theses/available/etd-11222002-155624, 2002.

Questions/Comments ?