Alexandria Digital Library Project
ALEXANDRIA DIGITAL LIBRARY PROJECT
Larry Carver James Frew Greg JanéeMike Goodchild Linda Hill Terry Smith
www.alexandria.ucsb.edu
Alexandria Digital Library Project
2Smith et al • NSF • April 3, 2003
Outline
Alexandria Digital Library Project (ADLP) History Goals, activities, partners
Distributed DL supporting georeferenced access Research and development issues Operational collections and services
Knowledge organization systems (KOS) Gazetteers and related KOS
ADEPT learning environment Concept-based learning spaces Collections and services
Alexandria Digital Library Project
3Smith et al • NSF • April 3, 2003
ADLP History
Pre-1994: UCSB geo-information and map library 1994-98 DLI-1: georeferenced collections/access 1998-99: Operational ADL (UCSB Library/CDL) 1999-2004 DLI-2: distributed DL
Extension of architecture and access services Knowledge organization services Integration of learning services Geo/GIS-based interfaces Basic CS research
2004-2008: Large-scale DLs and beyond NSDL Core Infrastructure and services Cyber Infrastructure
Alexandria Digital Library Project
4Smith et al • NSF • April 3, 2003
ADLP Goals
Current goals: Distributed DLs and applications Operational distributed digital library
– services for construction/use of georeferenced collections– DL federation and interoperation– scalability over many heterogeneous collections
Development/integration of KOS services Integration of concept-based learning spaces
– services for creating/using learning environments Development of geo-based interfaces Evaluation of services Basic computational science research
Emerging goals: Large-scale DLs and beyond Extending NSDL Core Infrastructure and services Cyber Infrastructure
Alexandria Digital Library Project
5Smith et al • NSF • April 3, 2003
ADLP Major Collaborative Activities
1994-98 4 DLI-1 partners: CMU, Illinois, Stanford, UCB SDSC, U.Arizona, US Navy, NIMA, LoC, MSFT, ESRI,…
1999-2004 UCSB Library, CDL DLI-2 partners: UCLA, GT, SDSC/NPACI, Stanford, UCB DLESE NSDL CI partners: Cornell, Columbia, U.Mass NSDL Services partners: IIT Chicago, UCSD JISC partners: Penn State, Southampton, Leeds …
Alexandria Digital Library Project
6Smith et al • NSF • April 3, 2003
• georeferenced DL tutorials• distributable software packages• operational libraries: UCSB library, ...• outreach; federated nodes
OPERATIONAL APPLICATIONS
• gazetteers: research and community• gazetteer content standard• web service protocols for gazetteers,
thesauri, and other KOS• ADL gazetteer• thesauri for feature and object types• duplicate detection for gazetteers• textual-geospatial integration services
KNOWLEDGE ORGANIZATION
• distributed georeferenced DL services• NSDL core infrastructure• data environment (e.g., GIS) integration• hardware acceleration for spatial data• collaborative tools• Z39.50 support• ingest and workflow systems
GEOREFERENCED DIGITAL LIBRARIES
• knowledgebase and lecture composing, visualization, and presentation tools
• physical geography concept space and learning object collections
• applications to undergraduate education• educational evaluation• learning services and DL integration• digital classrooms• metadata content standards
• learning objects• computational models
EDUCATIONAL APPLICATIONS
• reusable user interface components• contextual maps, footprint creation• KOS navigation• lightweight GIS functionality
• Digital Earth visualization• image processing
• query-by-content, classification• spatial extent determination
USER INTERFACES
ADLP Activities
Alexandria Digital Library Project
7Smith et al • NSF • April 3, 2003
Outline
Alexandria Digital Library Project (ADLP) History Goals, activities, partners
Distributed DL supporting georeferenced access Research and development issues Operational collections and services
Knowledge organization systems (KOS) Gazetteers and related KOS
ADEPT learning environment Concept-based learning spaces Collections and services
Alexandria Digital Library Project
8Smith et al • NSF • April 3, 2003
Goals
Digital library architecture for geospatial/georeferenced information heterogeneous rich services scalable
– many providers– collections, large and small
DL infrastructure, not artifact standard components and interfaces distributed participants
Alexandria Digital Library Project
9Smith et al • NSF • April 3, 2003
Issue: discovery
Naïve approach I want a map of Boulder “Downtown street map of Boulder, Colorado”
But... remote-sensing imagery is nameless AVHRR NOAA-13 2002-06-03 14:33 UTC
But... direct placename search is unreliable I want a map of the Flatirons in the Rocky Mountains just
behind Boulder, Colorado USGS topographic map “Eldorado Springs” generally: many names for any given place
Alexandria Digital Library Project
10Smith et al • NSF • April 3, 2003
ADL approach
Coordinate-based representation and discovery lat/lon coordinates rich geometry
– polygons, polylines spatial operators
– overlaps, contains
Gazetteer content standard defines
representation service maps placenames
coordinates
client
gazetteer
library
coordinates
placenames
Alexandria Digital Library Project
11Smith et al • NSF • April 3, 2003
Issue: multiple data types
Geospatial discovery is not amenable to text treatment constitutes new data type
Adding notion of different data types has many implications: input validation internal structures, external representations query language and processing ranking user interface components
Alexandria Digital Library Project
12Smith et al • NSF • April 3, 2003
ADL approach
Discovery: “bucket framework” extensible data type system for metadata
– XML representations, search operations native metadata is explicitly mapped to buckets software supports bucket views over arbitrary RDBMSs 9 Dublin Core-like standard buckets
User interface components background maps, item footprint identification/creation
Spatial ranking by spatial similarity to query region
Alexandria Digital Library Project
13Smith et al • NSF • April 3, 2003
Bucket mapping
Originator
FGDCCitation/Originator
USGS DOQProducer
U.S. Geological Survey
Photo Science, Inc.
bucket-levelsearching
field-levelsearching
collectionstatistics
Alexandria Digital Library Project
14Smith et al • NSF • April 3, 2003
Collection statistics
Object Typecartographic works mapsimages photographs aerial photographs • • •
Count324,876324,876
2,014,799484,083484,083
Temporal
Spatial
Alexandria Digital Library Project
15Smith et al • NSF • April 3, 2003
ADL approach
Discovery: “bucket framework” extensible data type system for metadata
– XML representations, search operations native metadata is explicitly mapped to buckets software supports bucket views over arbitrary RDBMSs 9 Dublin Core-like standard buckets
User interface components background maps, item footprint identification/creation
Spatial ranking by spatial similarity to query region
Alexandria Digital Library Project
16Smith et al • NSF • April 3, 2003
ADL in context
ADL
Web
affo
rdan
ces
generality • structure
DLsGreenstone
ODL
OAI
GIS
Alexandria Digital Library Project
17Smith et al • NSF • April 3, 2003
Issue: scalability
Size easy to accumulate lots of data
– satellites image continuously geospatial discovery scales... not so well
– indexing unwieldy at 106 items efficiently joining spatial, other constraint types is difficult
Burden & management collection building is labor-intensive providers have differing content, services, IP concerns,
policies, lifetimes providers already exist
– MS Terraserver: 3 TB, 750 million items
Alexandria Digital Library Project
18Smith et al • NSF • April 3, 2003
ADL approach
Distributed library of peer nodes library nodes host collections other nodes host gazetteers, thesauri, other KOS other components, e.g., map servers
Federated item-level search over buckets over individual metadata fields mapped to buckets
Centralized collection-level search/ranking over collection statistics derived from bucket mappings
– space, time, type, format any library node can act as collection registry
Collection aggregation
Alexandria Digital Library Project
19Smith et al • NSF • April 3, 2003
Issue: context & use of library items
Context is critical in geospatial DLs formulating queries evaluating result sets and individual results
Use of geospatial data need access descriptions
– “item content single URL” is insufficient– multiple formats– multiple access methods– multiple components
need integration with common data environments– ARC/INFO, etc.
Alexandria Digital Library Project
20Smith et al • NSF • April 3, 2003
Geospatial context
Does this answer your question?
Flatirons #1-5
Flagstaff Rd.
Green Mountain
Alexandria Digital Library Project
21Smith et al • NSF • April 3, 2003
ADL approach
All library functionality is accessible via... web service APIs Java RMI
Content access model characterizes methods of access multiple “access points”
– download, service, web interface, offline– hierarchies of alternatives, decompositions
Context background maps library-supplied lightweight GIS functionality
Alexandria Digital Library Project
22Smith et al • NSF • April 3, 2003
Incorporation into NSDL/CI
Geospatial/georeferenced data is an instance of science data complex, well-defined structure rich metadata large size poorly served by traditional information retrieval methods
Science data belongs in NSDL
For NSDL: comparable infrastructure enabling... distributed, content-specific search services association of DL items and content-specific helper tools
Alexandria Digital Library Project
23Smith et al • NSF • April 3, 2003
Operational status
ADL co-developed with UCSB Library production-quality software foundation of operational library since 2000 complete system in 2003
UCSB Library: Map & Imagery Laboratory (MIL) self-supporting, 5 full-time employees 2.6 million items, 6.5 TB, growing 1.5 TB/year 4.5 million item gazetteer
Remote sites ESSW, CNR, DLESE, SIO, NTNU, AUT
Alexandria Digital Library Project
24Smith et al • NSF • April 3, 2003
Outline
Alexandria Digital Library Project (ADLP) History Goals, activities, partners
Distributed DL supporting georeferenced access Research and development issues Operational collections and services
Knowledge organization systems (KOS) Gazetteers and related KOS
ADEPT learning environment Concept-based learning spaces Collections and services
Alexandria Digital Library Project
25Smith et al • NSF • April 3, 2003
KOS activities & contributions
KOS as primary components of DL architecture Heretofore not acknowledged as a major component ADL/ADEPT thesaurus and gazetteer service protocols
Gazetteer components of DLs Growth of a research and development community,
adopting/adapting/sharing our ADL Gazetteer components Gazetteer research issues NSDL Textual Geospatial Integration Project
KOS integration into learning environments Terry Smith will address this in detail
Alexandria Digital Library Project
26Smith et al • NSF • April 3, 2003
Digital Library ComponentsCATALOG
OF
METADATA
SERVICES
ACCESSING
ANALYZING
ARCHIVING
CATALOGING
DIGITIZING
RETRIEVING
SEARCHING
VISUALIZING
KNOWLEDGE ORGANIZATION SYSTEMS
AUTHORITY FILESCLASSIFICATION SYSTEMSCONCEPT SPACESDICTIONARIESGAZETTEERSGLOSSARIESONTOLOGIESSUBJECT HEADING SETSTHESAURI
DATA STORE
OF
OBJECTS
LibrariesCollections
Alexandria Digital Library Project
27Smith et al • NSF • April 3, 2003
KOS Generalization
Relationships
Label
TypeDefinition
Meaning
Navigation TranslationSense-making
Alexandria Digital Library Project
28Smith et al • NSF • April 3, 2003
Digital Gazetteer Essentials
(controlled vocabulary)
•None of these elements are unique identifiers of a particular place
Alexandria Digital Library Project
29Smith et al • NSF • April 3, 2003
Building gazetteer research community
1994-1996: ADL built the first multi-million-entry international gazetteer and integrated it into the ADL system
1996-1999: ADL created... Gazetteer Content Standard Feature Type Thesaurus (210 preferred terms; 1046 non-
preferred) rebuilt the ADL Gazetteer (over 4 million entries) provided web interfaces for searching the ADL Gazetteer
Alexandria Digital Library Project
30Smith et al • NSF • April 3, 2003
Building a research community 1999-present
Digital Gazetteer Information Exchange (DGIE) Workshop, funded by NSF (66 participants), 1999
JCDL 2002 workshop on Digital Gazetteers – Integration in Digital Library Services (38 participants; sponsored by NKOS)
NAACL 2003 workshop on Analysis of Geographic References
ADL-hosted discussion list for gazetteer issues; archived by NSF DLI2 (146 subscribers)
Set of 5.9 million geographic names available for download – useful for placename recognition in text
Gazetteer Service Protocol and protocol server code An “external identifier” for ADL Gazetteer records New gazetteer client that is based on the gazetteer protocol
Alexandria Digital Library Project
31Smith et al • NSF • April 3, 2003
Our network of gazetteer interactions
Electronic Cultural Atlas Initiative (ECAI) gazetteer project Academia Sinica’s Taiwan Gazetteer UK Historical Boundaries project
UK Geo-crosswalk project Digital Library for Earth System (DLESE) Education Biodiversity research, such as the “Specify” system –
University of Kansas
State projects, such as NY Agricultural History project (in proposal stage) and Florida statewide gazetteer project
University of Redlands internship proposal (mini-GIS) Bulgarian Antarctic Place-Names Commission
SRI’s Artificial Intelligence Center (spatial reasoning) Navy’s SPAWAR Systems Center (natural language process.) THREDDS project at UCAR (event gazetteers) Illinois Institute of Technology (geoparsing research)
Alexandria Digital Library Project
32Smith et al • NSF • April 3, 2003
Advancing and extending gazetteers
Named Time PeriodsWorld War I ___|_____|___
1914 1918
Named Spatiotemporal EventsSuch as Hurricane Hugo
Alexandria Digital Library Project
33Smith et al • NSF • April 3, 2003
Advancing and extending gazetteers
What happens when we extend the digital gazetteer model to anatomy: named structures in the brain, for example?
http://www.ohiou.edu/~linguist/l550ex/brainpic.htm
Credit & Copyright: Sherry Buttnor http://antwrp.gsfc.nasa.gov/apod/ap011120.html
Anticline Famennian sandstone, Hastièrehttp://www.nitg.tno.nl/eng/iccp_tripj.shtml
Or to celestial space and 3-d features?
Alexandria Digital Library Project
34Smith et al • NSF • April 3, 2003
Advancing and extending gazetteers
• Recognizing patterns
• Identifying features from gazetteers
• Deriving the extent of the features from feature analysis
• Adding bounding box footprints to gazetteer entriesSanta Barbara Municipal AirportSanta Barbara Municipal Airport
Obtaining extents from image analysis
Alexandria Digital Library Project
35Smith et al • NSF • April 3, 2003
Advancing and extending gazetteers
Lake Bigler, thru 1920s Lake Bonpland (also Bondland),
thru 1890s Da-ow-a-ga, thru 1850s
The duplicate detection problem.
Given variant names and variant footprints, how do we determine that two pieces of information are about the same place?
Alexandria Digital Library Project
36Smith et al • NSF • April 3, 2003
Advancing and extending gazetteers
From Michael Freeston, New Generic Indexing Technology
Effective and efficient database indexingtechniques for large spatial + text data collections
Test database of 2-d shapes in a geographic area to test the “sufficiency” of spatial generalizations (e.g., bounding boxes) for information retrieval based on spatial similarity (e.g., degree of overlap or containment)
Alexandria Digital Library Project
37Smith et al • NSF • April 3, 2003
Gazetteer ITR ProposalAdvancing and Extending Georeferencing
Interoperability and Services (AEGIS) Medium ITR proposal for 2003 Michael Goodchild, UCSB, PI Lewis Lancaster, Berkeley/ECAI, co-PI
Formalization and extension Performance and scalability Cross-cultural issues Cognitive and behavior issues Extents: representation of a feature’s geometry Integration of locator services
Alexandria Digital Library Project
38Smith et al • NSF • April 3, 2003
NSDL Textual Geospatial Integration
GoalsExtend NSDL infrastructure by
enabling geographic queries
across heterogeneous, text and non-text resources
spatial georeferencing of arbitrary texts without
explicit geographic cataloging
2001 - 2003
Participants
University of California, Santa Barbara
James Frew, PI Terence Smith Michael Bueno Linda Hill
Information Retrieval Lab, Illinois Institute of Technology
Ophir Frieder David Grossman Eric Jensen Steve Beitzel
The American Geological Institute (AGI) has permitted us to use a set of their GeoRef records for system training.
Alexandria Digital Library Project
Example text -> Estimated footprintStructure and petrography of the schist of Skookum Gulch, Callahan-Yreka area, eastern Klamath Mountains, Northern California
<key>blueschist | California | Callahan California | foliation | Klamath Mountains | melange | metamorphic rocks | Ordovician | Paleozoic | petrology | schists | Silurian | Siskiyou County California | Skookum Gulch | United States | Yreka California</key>
<ab>The schist of Skookum Gulch (SSG) is an informal name applied to a fault-bounded melange composed mainly of schistose metamorphic rocks and less abundant sedimentary and igneous rocks located in the eastern Klamath Mountains of Northern California. The SSG features outcrops of lawsonite+sodic amphibole blueschist and epidote+sodic amphibole rocks transitional to the greenschist facies. Isotopic dating indicates that the schist was metamorphosed during the Ordovician. The SSG is the oldest known Paleozoic blueschist-bearing melange in California and one of the oldest preserved blueschist terranes in North America. Tonalitic rocks associated with the schist have Early Cambrian ages and are among the oldest rocks yet dated within the Klamath Mountains. Field relations indicate that the schist of Skookum Gulch is a complex tectonic melange composed of metavolcanic, ...</ab>
<coord>N410000N420000W1220000W1230000</coord>
• Derived footprint - small
• Blue: derived footprint – large
• Red: GeoRef footprint
Alexandria Digital Library Project
40Smith et al • NSF • April 3, 2003
KOS activities & contributions
KOS as primary components of DL architecture Heretofore not acknowledged as a major component ADL/ADEPT thesaurus and gazetteer service protocols
Gazetteer components of DLs Growth of a research and development community,
adopting/adapting/sharing our ADL Gazetteer components Research issues NSDL Textual Geospatial Integration Project
KOS integration into learning environments Terry Smith will address this in detail
Alexandria Digital Library Project
41Smith et al • NSF • April 3, 2003
Outline
Alexandria Digital Library Project (ADLP) History Goals, activities, partners
Distributed DL supporting georeferenced access Research and development issues Operational collections and services
Knowledge organization systems (KOS) Gazetteers and related KOS
ADEPT learning environment Concept-based learning spaces Collections and services
Alexandria Digital Library Project
42Smith et al • NSF • April 3, 2003
Applications services based on DLs
Integrate applications with DL infrastructure Web portals lack library organization “packages” not integrated with DLs
Important applications include Services/collections supporting learning environments Services/collection supporting research
Apply domain-specific KOS principles for organizing collections/services for given application Geospatial applications: use georeference Science learning environments: use concept spaces
Alexandria Digital Library Project
43Smith et al • NSF • April 3, 2003
Science learning spaces: Concept KOS
Concepts of science as basic knowledge granules Sets of concepts form bases for scientific representation DL and KOS technology can support organization of science
learning materials in terms of concepts– Collections of models of science concepts (knowledge base)– Collections of learning objects (LO) cataloged with concepts– Collections of instructional materials organized by concepts
Organize learning materials as “trajectory through concept space” Lecture, lab, self-paced materials Services for creating/editing/displaying such materials
Alexandria Digital Library Project
44Smith et al • NSF • April 3, 2003
Learning environment components/services
Alexandria Digital Library Project
45Smith et al • NSF • April 3, 2003
Application to learning environments
Application Introductory physical geography (F2002, S2003)
Collections created Knowledge base (KB) of strongly structured concepts Structured lectures and labs Learning objects cataloged by ADN metadata (+ concepts)
Services created For concepts
– Web-based concept input tool– Graphic and text-based display tools
For instructional materials– Web-based “lecture composer”– “Conceptualization” graphing tool
For learning objects– Metadata input tool
Alexandria Digital Library Project
46Smith et al • NSF • April 3, 2003
Learning environment display (lecture mode)
The lecture is presented on three projection screens, showing the Concept window (left) Lecture window (center) Object window (right)
Alexandria Digital Library Project
47Smith et al • NSF • April 3, 2003
Model of science concepts
Representing a concept involves more than terms Objective, information-rich, scientific representations
– e.g., for concepts of heat diffusion, DNA, drainage basin, … Associated semantics
– e.g., relating to measurement, recognition,… Many interrelationships
– e.g., hierarchical, causative, property,…
Models of science concepts Already exist for chemistry (ASA), materials (NIST),… Generalize such models for this application
Structure items in concept KB using model
Alexandria Digital Library Project
48Smith et al • NSF • April 3, 2003
Model of science concepts
ID TYPE and FACET CONTEXT (KNOWLEDGE DOMAIN) TERM(S) (P/NP) DESCRIPTION(S) HISTORICAL ORIGIN(S) EXAMPLE(S) HIERARCHICAL RELATIONS DEFINING OPERATIONS SCIENTIFIC REPRESENTATION(S)
– Scientific classifications– Data/Graphical/Mathematical/Computational reps
PROPERTIES CAUSAL RELATIONS CO-RELATIONS APPLICATION(S)
Alexandria Digital Library Project
49Smith et al • NSF • April 3, 2003
Item in concept knowledge base
Alexandria Digital Library Project
50Smith et al • NSF • April 3, 2003
Concept input tool
Alexandria Digital Library Project
51Smith et al • NSF • April 3, 2003
Collections of learning materials
Lecture/lab composer Creates learning materials with
– Tailorable structure– Underlying organization as “forest of trees” of concepts
Small reusable granules for– Easy creation/edit/access/re-use
Can link in – Concepts from concept KB– Items from learning object collections– Items from lecture collection
Alexandria Digital Library Project
52Smith et al • NSF • April 3, 2003
Current instructional material window The left-hand
frame displays the structure of the lecture
The right-hand frame displays the content of the lecture
ADL icons (globe image) attached to a concept link to a display of concept properties in the concept window
Other icons attached to a concept link to a display of concept examples in the illustration window
Alexandria Digital Library Project
53Smith et al • NSF • April 3, 2003
View of learning material by concepts
Alexandria Digital Library Project
54Smith et al • NSF • April 3, 2003
Lecture/lab/… composer tool
Alexandria Digital Library Project
55Smith et al • NSF • April 3, 2003
Learning object collections
Cataloged with tool for metadata creation ADN metadata content standard with concept fields
Use of ADL/ADEPT middleware search services E.g., in creation of lecture/lab presentation materials
Display of collection items in collection window Photos, images, maps, text, videos,…
Support in display window for ADL browser Allows dynamic search of collection holdings
Alexandria Digital Library Project
56Smith et al • NSF • April 3, 2003
The illustrations window
Alexandria Digital Library Project
57Smith et al • NSF • April 3, 2003
Evaluation of concept-based approach
Evaluation of efficacy for student learning Do students attain “deeper levels” of understanding? Comparison approach to evaluation
Evaluation of value to instructors/TAs UCLA evaluation team
Evaluation issues Instrumenting students’ use of course materials
Time to assess pedagogic value of approach
Alexandria Digital Library Project
58Smith et al • NSF • April 3, 2003
Example of lessons learned
Importance of “conceptualizations” of concept e.g., characterize concept of Fluvial Landscape with
concepts of {River, Watershed} Embed conceptualizations in lecture/labs (not in KB) Idea of learning materials as trees in concept space
Construct labs using analogous “lab” composer Tailored for lab presentations/work Supports of logic of using concepts as framework Can import material from lecture/other collections
Alexandria Digital Library Project
59Smith et al • NSF • April 3, 2003
Summary
DL infrastructure as basis for Learning Environments Collections
– Concept KBs, Lectures, DL objects Services
– Creation/Search/Display
Evaluation of efficacy of approach Community-based development of KBs, Learning
Materials, Collections
Alexandria Digital Library Project
60Smith et al • NSF • April 3, 2003
• georeferenced DL tutorials• distributable software packages• operational libraries: UCSB library, ...• outreach; federated nodes
OPERATIONAL APPLICATIONS
• gazetteers: research and community• gazetteer content standard• web service protocols for gazetteers,
thesauri, and other KOS• ADL gazetteer• thesauri for feature and object types• duplicate detection for gazetteers• textual-geospatial integration services
KNOWLEDGE ORGANIZATION
• distributed georeferenced DL services• NSDL core infrastructure• data environment (e.g., GIS) integration• hardware acceleration for spatial data• collaborative tools• Z39.50 support• ingest and workflow systems
GEOREFERENCED DIGITAL LIBRARIES
• knowledgebase and lecture composing, visualization, and presentation tools
• physical geography concept space and learning object collections
• applications to undergraduate education• educational evaluation• learning services and DL integration• digital classrooms• metadata content standards
• learning objects• computational models
EDUCATIONAL APPLICATIONS
• reusable user interface components• contextual maps, footprint creation• KOS navigation• lightweight GIS functionality
• Digital Earth visualization• image processing
• query-by-content, classification• spatial extent determination
USER INTERFACES
ADLP Activities