1 vienna university of technology (vienna – 21 sept 2007) “from information retrieval to digital...
Post on 19-Dec-2015
213 views
TRANSCRIPT
1
Vienna University of Technology(Vienna – 21 Sept 2007)
“From information retrieval to digital libraries to computer
science education”
Edward A. Fox
• [email protected] http://fox.cs.vt.edu
• Dept. of Computer Science, Virginia Tech
• Blacksburg, VA 24061 USA
2
“From information retrieval to digital libraries to computer science education”
• ABSTRACT: Information is a fundamental human need. The field of information retrieval has helped address this need since the 1960s, with a range of models and systems. A broad view of this field leads to digital libraries, a re-definition of the concepts, systems, and human involvement in sharing information across time and space, supported by digital technologies. We can formalize and better operationalize this through the 5S framework, which addresses information with regard to Societies, Scenarios, Spaces, Structures, and Streams. This approach has supported our work with personalization and computer science syllabi, curriculum development regarding digital libraries, and ensuring that college graduates are prepared not only to live in, but also to help build our future cyberinfrastructure, i.e., for Living In the KnowlEdge Society (LIKES). This talk will summarize our related research and education innovation.
Acknowledgements (selected)
• Colleagues: Lillian Cassel, Debra Dudley, Weiguo Fan, Marcos Gonçalves, Doug Gorton, Rohit Kelapure, Neill Kipp, Aaron Krowne, Ming Luo, Uma Murthy, Manuel Perez, Ananth Raghavan, Rao Shen, Hussein Suleman, Srinivas Vemuri, Layne Watson, …
• Sponsors: ACM, AOL, CAPES, DFG, Google, IBM, IMLS, INL, Microsoft, NSF (CCF-0722259; IIS-9986089, 0080748, 0086227, 0307867, 0325579, 0535057, 0535060, 0736055 ; DUE-0121679, 0121741, 0136690, 0333531, 0333601, 0435059, 0532825), SUN, …
4
Acknowledgements - Mentors
• JCR Licklider – undergrad advisor (1969-71)– Author in 1965 of “Libraries of the Future”– Before, at ARPA, funded start of Internet
• Michael Kessler – BS thesis advisor– Project TIP (technical information project)– Defined bibliographic coupling
• Gerard Salton – graduate advisor (1978-83)– “Father of Information Retrieval”
5
Information Retrieval:Algorithms and Heuristics 2nd Ed.
• By
• David A. Grossman &
• Ophir Frieder
• Kluwer Academic Publishers
6
Document Retrieval(Grossman & Frieder Fig. 1.1)
7
Vector Space Model – 2 terms(Grossman & Frieder Fig. 2.2)
8
Language Model(Grossman & Frieder Fig. 2.5)
9
Document-Term-Query Inference Network(Grossman & Frieder Fig. 2.7)
10
Inference Network Layers(Grossman & Frieder Fig. 2.8)
11
Relevance Feedback Process(Grossman & Frieder Fig. 3.1)
12
Information Life Cycle
AuthoringModifying
OrganizingIndexing
StoringRetrieving
DistributingNetworking
Retention/ Mining
AccessingFiltering
UsingCreating
13
Asynchronous, Digital Library Mediated Scholarly Communication
Different time and/or place
14
DLs Shorten the Chain to
Author
Reader
Digital
LibraryEditor
Reviewer
Teacher
Learner
Librarian
15
DL Definitions - 1
• “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.”
• Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003
16
DL Definitions - 2
• “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities”
• Waters,D.J. CLIR Issues, July/August 1998• www.clir.org/pubs/issues/issues04.html
17
DL Definitions - 3
• Issues and Spectra
– Collection vs. Institution
– Content vs. System
– Access vs. Preservation
– “Free” vs. Quality
– Managed vs. Comprehensive
– Centralized vs. Distributed
18
DL Definitions - 4
• NOT a “digitized library”• NOT a “deconstruction” of existing
systems and institutions, moving them to an electronic box in a Library
• IS a new way to deal with knowledge– Authoring, Self-archiving, Collecting,– Organizing, Preserving,– Accessing, Propagating, Re-using
19
D ig ita l L ib ra r y C o n te n t
A rtic le s ,R e p o rts,
B o o ks
T e xtD o cum e n ts
S p ee ch ,M u s ic
V id eoA u d io
(A e ria l)P h o tos
G e og rap h icIn fo rm ation
M o d e lsS im u la tio ns
S o ftw a re ,P ro g ra m s
G e no m eH u m a n,a n im a l,
p la n t
B ioIn fo rm ation
2 D , 3 D ,V R ,C A T
Im ag es a ndG ra p h ics
C o nte n tT yp e s
20
Informal 5S & DL Definitions
DLs are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
21
Hypotheses
• A formal theory for DLs can be built based on 5S.
• The formalization can serve as a basis for modeling and building high-quality DLs.
22
• “Streams”
- All types of (multimedia) content
(as well as communications and flows over networks, or into sensors, or sense perceptions; data stream management systems)
• “Structures”
- Organizational schemes
(including data structures, databases, and knowledge representations – taxonomies, ontologies)
5S Framework
23
5S Framework
• “Spaces” - 2D and 3D interfaces, GIS data,
representations of documents and queries • “Scenarios”
- System states and events, but also can represent situations of use by human users (or machine
processes, yielding services or transformations of data)
• “Societies” - Both software “service managers” and fairly generic
“actors” who could be (collaborating) human (users).
24
5Ss
Ss Examples Objectives
Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending
Details the behavior of DL services
Societies Service managers, learners, teachers, etc.
Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
25
26
ETANA-DL
• Archaeological DL• Integrated DL
– Heterogeneous data handling
• Applies and extends the OAI-PMH– Open Archives Initiative Protocol for Metadata
Handling
• Design considerations– Componentized– Extensible– Portable
27
28
ETANA Societies
1. Historic and pre-historic societies (being studied)2. Archaeologists (in academic institutes, fieldwork
settings, or local and national governmental bodies)
3. Project directors4. Technical staff (consisting of photographers,
technical illustrators, and their assistants)5. Field staff (responsible for the actual work of
excavation)6. Camp staff (e.g., camp managers, registrars, tool
stewards)7. General public (e.g., educators, learners, citizens)
29
ETANA Societies
• Social issues1. Who owns the finds?
2. Where should they be preserved?
3. What nationality and ethnicity do they represent?
4. Who has publication rights?
5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?
30
ETANA Scenarios1. Life in the site in former times2. Digital recording: the planning stage and the excavation stage 3. Planning stage: remote sensing, fieldwalking, field surveys, building
surveys, consulting historical and other documentary sources, and managing the sites and monuments
4. Excavation1. Detailed information is recorded, including for each layer of soil, and for
features such as pole holes, pits, and ditches. 2. Data about each artifact is recorded together with information about its
exact find spot. 3. Numerous environmental and other samples are taken for laboratory
analysis, and the location and purpose of each is carefully recorded. 4. Large numbers of photographs are taken, both general views of the
progress of excavation and detailed shots showing the contexts of finds. 5. Organization and storage of material6. Analysis and hypotheses generation and testing7. Publications, museum displays8. Information services for the general public
31
ETANA Spaces
1. Geographic distribution of found artifacts2. Temporal dimension (as inferred by
archaeologists) 3. Metric or vector spaces
1. used to support retrieval operations, and to calculate distance (and similarity)
2. used to browse / constrain searches spatially
4. 3D models of the past, used to reconstruct and visualize archaeological ruins
5. 2D interfaces for human-computer interaction
32
ETANA Structures
1. Site Organization1. Region, site, partition, sub-partition, locus,
…
2. Temporal orderings (ages, periods)
3. Taxonomies1. for bones, seeds, building materials, …
4. Stratigraphic relationships1. above, beneath, coexistent
33
ETANA Streams
1. successive photos and drawings of excavation sites, loci, unearthed artifacts
2. audio and video recordings of excavation activities and discussions
3. textual reports
4. 3D models used to reconstruct and visualize archaeological ruins.
34
5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
35
Fox & Gonçalves Book Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
• Part 2 – Higher DL Constructs
• Part 3 – Advanced Topics
• Appendix
36
Book Parts and Chapters - 1
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
37
Book Parts and Chapters - 2
• Part 2 – Higher DL Constructs– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
38
Book Parts and Chapters - 3
• Part 3 – Advanced Topics– Ch. 13: Quality– Ch. 14: Integration– Ch. 15: How to build a digital library– Ch. 16: Research Challenges, Future Perspectives
• Appendix– A: Mathematical preliminaries– B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL– D: Formal Definitions: Archeological DL– E: Glossary of terms, mappings
39
Chapter 3: (Degree of) Structure
Chaotic Organized Structured
Web DLs DBs
40
Digital Objects (DOs)
• Born digital
• Digitized version of “real” object– Is the DO version the same, better, or worse?– Decision for ETDs: structured + rendered
• Surrogate for “real” object– Not covered explicitly in metamodel for a
minimal DL– Crucial in metamodel for archaeology DL
41
Metadata: Complex to Simple
MARC ($50) Dublin Core (DC)
+thesis
42
Also Important: Epub, SGML, XML
• 5S perspective: streams, structures, scenarios
• Authoring
• Rendering, presenting
• Tagging, Markup, DOM
• Semi-structured information
• Dual-publishing, eBooks
• Styles (XSL, XSLT)
• Structured queries
43
Chapter 4 Overview (Spaces)
• Retrieval models
– Boolean, extended Boolean
– Vector, LSI
– Probabilistic: classical, belief network, inference network, language models
• User interfaces and visualization – cont’d
44
User interfaces and visualization
• 2D interfaces
• 3D interfaces
• GIS
• Other paradigms: trees, graphs, bubbles, coordinated views, …
• Stepping Stones and Pathways– http://fox.cs.vt.edu/SSP/
45
Chapter 6 Overview (Societies)
• User communities– Authors, editors, teachers, students, readers– Personal(ization), group(ware), community, global– Accessibility, universal access
• Librarians: reference, acquisition, operations• Research community
– Associations, conferences, publications, labs, projects• Economics
– Copyright, intellectual property rights, digital rights management, authorization, authentication, security, privacy, self-archiving (eprints)
– Publishers, catalogers, distributors, sustainability– Open source, commercial, hybrid
46
Chapter 9 Archives & Repositories
• Open Archives Initiative (OAI)• Institutional Repositories
• Persistent storage of digital objects• Coupling of metadata with digital objects• Use of “handles” as identifiers for digital
objects
• Put, get, harvest
47
OAI - Open Archives Initiative
• Advocacy for interoperability
• Standard for transferring metadata among digital libraries– Protocol for Metadata Harvesting (PMH)
• Simplicity• Generality• Extensibility
• Support for PMH => Open Archive (OA)
48
OAI – Repository PerspectiveRequired: Protocol
DODO DO DO
MDO
MDO MDOMDOMDO
MDOMDOMDO
49
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
50
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
51
Institutional Repositories - 1
• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”
• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
52
Institutional Repositories - 2
• “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003, www.arl.org/newsltr/226/ir.html
53
What is aDigital Object Repository?
Also called: digital rep., digital asset rep., institutional repository
Stores and maintains digital objects (assets)Provides external interface for Digital Objects
Creation, Modification, Access
Enforces access policiesProvides for content type disseminations
Adapted from Slide by V. Chachra, VTLS
54
Goals of Institutional Repositories (by Steven Harnad, U. Southampton) Self Archiving of Institutional ResearchSelf Archiving of Institutional Research
Thesis and Dissertations (VTLS NDLTD Project)Thesis and Dissertations (VTLS NDLTD Project)Article preprints and post printsArticle preprints and post printsInternal documents and mapsInternal documents and maps
Management of digital collectionsManagement of digital collections
Preservation of materials – decentralized approachPreservation of materials – decentralized approach
Housing of teaching materialsHousing of teaching materials
Electronic Publishing of journals, books, posters, maps, Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objectsaudio, video and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
55
Chapter 10 Services
• Taxonomy of services
• Ontology, composition, reuse
• Evaluation
• Key services in-depth:– Crawling, indexing– Clustering, classifying– Recommending, using social networks– Logging
56
Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing
Annotating Classifying Clustering Evaluating Extracting Indexing
Measuring Publicizing
Rating Reviewing (peer)
Surveying Translating
(language)
Conserving Converting
Copying/Replicating Emulating Renewing
Translating (format)
Acquiring Cataloging
Crawling (focused) Describing Digitizing
Federating Harvesting Purchasing Submitting
Preservational Creational
Add Value
Repository-Building
Information Satisfaction
Services
Infrastructure Services
57
Ontology: Applications
• Expand definition of minimal DL by characterizing– typical DL services – in the context of “employs” and “produces”
relationships
• Use characterization to:– Reason about how DL services can be built
from other DL components– As well as be composed with other services
through extension or reuse
58
Streams
text
audio
image
video digitalobject
Repository
CollectionCatalog
describes
stores
is_version_of/ cites/links_to
Index
Service
Scenario
event
extends
reuses
ServiceManager
Actor
operationexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Topological
ProbabilisticMetric
Measurable
Measure
describes
employsproduces
employsproduces
employs
produces
Structures
Spaces
Vector
contains
metadata specifications
is_a is_a
precedes
happens_before
is_a
redefinesinvokes
contains
contains
59
Ontology: Applications
60
SearchingBrowsing
queryanchor
Society
actor
Collection, {digital object}
Recommending Filtering Binding Visualizing Expanding query
user model query/category {digital object}
{digital object} {digital object}
binder
InformationSatisfaction Services
space query’
fundamental
Rating Training
Infrastructure
Services (Add_Value)
composite
Requesting
handle
p pp
e e e{(digital object, actor, rate) }
p
e
e
p p p p p
e e
classifier
e ee e
e
p
e
Indexing
Index
p
e
transformer
e
61
5S and Generating DLs
• 5S Framework
• 5S definitions, services taxonomy, ontology
• 5SL (specification language)
• 5SGraph (to prepare 5SL)
• 5SGen (for DL development, incl. DSpace)
• SchemaMapper for development of union DL
62
Requirements Analysis Design Implementation Test
5S 5SLOO ClassesWorkflow Components
DLEvaluation
5SGraph 5SLGenFormalTheory/Metamodel
DL XMLLog
63
Chapter 11 Systems:Architectural Issues
• Independent system vs. part of federation• Centralized vs. distributed vs. open services• Monolithic vs. modular vs. componentized• Topologies: bus vs. star vs. hierarchical vs. network• Decompositions vary
– search engine, browser, DBMS, MM support– repository, handle server, client– information resources + mediators, bus or agent
collection + client with workspace/environment
64
Also Important: Agents
• 5S perspective: societies, streams, spaces, scenarios, structures
• Protocols: light-weight
• Knowledge interchange: mediators, wrappers
• Negotiation, registries
• Distributed issues
• Webbots (automatic indexing)
• Ontologies (standard upper)
65
Fedora™ Digital Object ArchitecturePersistent ID (PID)
Disseminators
System Metadata
EAD, TEI, DC, MARC,
VRA Core, MIX, etc.
Datastreams
Images, E-books, E-journals, Music, Video, etc.
Globally unique persistent id
Public view: access methods for obtaining “disseminations” of digital object content
Internal view: metadata necessary to manage the object
Protected view: content that makes up the “basis” of the object
The Mellon Fedora Project
Adapted from Slide by V. Chachra, VTLS
66
Example DisseminatorsPersistent ID (PID)
Default
Disseminators
Simple Image
System Metadata
Datastreams
Get ProfileList ItemsGet Item
List MethodsGet DC Record
Get ThumbnailGet Medium
Get HighGet VeryHigh
67
Fedora™Repository
E x ter n a lC o n ten tS o u r c e
E x ter n a lC o n ten tS o u r c e
HT
TP
E x ter n a l C o n ten tR etr iev er
X M L F ile s
Re la t io n a l D B
S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n
P o l icies
U s ers /G ro u p s
H T T P
F T P
D atas tr eam s
D ig ita l O b jec tsS to rag e S u b s ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M an ag e m e n tS u b s ys te m
A c c e s sS u b s ys te m
HT
TP
FT
P
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pplica t io n
B a tchPro g ra m
S e rv e rA pplica t io n
W e bB ro ws e r
Co mp o n e n t M g mt
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n
O b je c t D is s e min a t io n
O b je c t Re fle c t io n
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
Web Service Web Service Exposure Exposure LayerLayer
Adapted from Slide by V. Chachra, VTLS
68
5SL: a DL design language
• Domain specific languages – Address a particular class of problems by offering
specific abstractions and notations for the domain at hand
– Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping.
• XML-based realization of 5S– Interoperability– Use of many sub-languages (e.g., MIME types, XML
Schemas, UML notations)
69
• Help users model their own instances of a digital library (DL) in the 5S language (5SL).
• A simple modeling process which enables rapid generation of digital libraries
• Features– 5SGraph loads and displays a metamodel in a structured toolbox.– The structured editor of 5SGraph provides a top-down visual
building environment for the DL designer.– 5SGraph produces syntactically correct 5SL files according to the
visual model built by the designer.
5SGraph: A DL Modeling Tool
70
Overview of 5SGraph
Workspace
(instance model)
Structured
toolbox
(metamodel)
71
72
73
74
75
5SGen
• Version 1 – MARIAN as the target system– Focused on rich structures: semantic networks– Behavior attached to nodes/links
• Version 2 – Shifted for later work to componentized (ODL) approach – Focused on scenarios/societies– Structures/Spaces encapsulated within components
(e.g., relational tables, indexes)– Only textual streams supported
• Version 3 – Into DSpace (practical DL)
76
5SLGen – Version 2: ODL, Services, Scenarios
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
superclass
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SLGen
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
superclass
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SLGen
77
Tools/Applications
5S MetaModel
5SGraphDL
Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Logging ModuleXMLLog
78
5SGraph5S Archaeology
MetaModelArchDL Expert ArchDL Designer
Structure Sub-model
ETANA-DLUnion Services
Descriptions
HarvestingMapping
SearchingBrowsing
…
Scenario Sub-model
VN Metadata Format
ETANA-DL Metadata Format
HD Metadata Format
Mapping Tool
Wrapper4VN Wrapper4HD
Inverted Files
Services DB
Index
Index
BrowseService
SearchService
Browse DB
OtherETANA-DL
Services
Web
Interface
XOAI
XOAI
VNCatalog
HDCatalog
UnionCatalog
5SGen
ComponentPool
Browsing…
Ch. 12 Case Studies: CS -> CSTC
• NSF and ACM Education Committee funded a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/
• College of NJ, U. Ill. Springfield, Virginia Tech• Focus initially on labs, visualization,
multimedia• Multimedia part supported by a 2nd grant to
Virginia Tech and The George Washington University (with curricular guidelines)
CS Teaching Center (CSTC)
• Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.
• Learners benefit from having well-crafted modules that have been reviewed and tested.
• Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.
• ACM support led to Journal of Educational Resources in Computing (JERIC): completed 2 co-EIC terms
81
82
Browsing (2)
83
84
85
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
• Domain: computing / information technology
• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …
• Submission & Collection: sub/partner collections www.citidel.org
86
DIGITAL LIBRARY SERVICES
REPOSITORIES
USER PORTALS
Overview of CITIDEL architecture
87
Union Metadata Repository
OAI Data
Provider
Laboratories Repository
Applets Repository
Papers Repository
Syllabi Repository
. . .
Digital Library Services
OAI Data
Harvester
Distributed repository structure
88
Annotations
OAI Data
Harvester
EDUCATORS
ADMINISTRATORS LEARNERS
Multilingual Searching
Revising Annotating Filtering Browsing Administering
Filtering Profiles User Profiles
Union Metadata
OAI Data
Provider
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Digital library architecture for localand interoperable CITIDEL services
89
90
91
92
93
CITIDEL -> NSDL
• A collection project in the
• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
• National Science Digital Library
• www.nsdl.org
• (Next slides courtesy Lee Zia, NSF)
95
Connects:
Users: students, educators, life-long learners
Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories; ...
Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate; ...
96
Enables:Environments for
• Communication
• Collaboration
• Creation
• Validation
• Evaluation
• Recognition
• ...
• Discovery
• Stability
• Reliability
• Reusability
• Interoperability
• Customizability
• ...
of Resources
AND
97
Collections
• Discovery of content• Classification and cataloguing• Acquisition and/or linking; referencing• Disciplinary-based themes define a natural body of content,
but other possibilities are also encouraged • Access to massive real-time or archived datasets• Software tool suites for analysis, modeling, simulation, or
visualization• Reviewed commentary on learning materials and pedagogy
98
Services• Help services, frequently asked questions, etc.
• Synchronous/asynchronous collaborative learning environments using shared resources
• Mechanisms for building personal annotated digital information spaces
• Reliability testing for applets or other digital learning objects
• Audio, image, and video search capability
• Metadata system translation
• Community feedback mechanisms
99
100
101
102
NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: ETD-db, DSpace, Proquest, …
• Collection: local archives, regional collaborations, global union catalog
Project: Networked Digital Library of Theses & Dissertations (NDLTD) www.ndltd.org
104
Student Gets CommitteeSignatures and Submits ETD
Signed
Grad School
• Aiding universities to enhance graduate education, publishing and IPR efforts
• Helping improve the availability and content of theses and dissertations
• Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive)
What are we doing?
107
Why ETD? Short Answer
• For Students:– Gain knowledge and skills for the Information Age– Richer communication (digital information, multimedia, …)
• For Universities: – Easy way to enter the digital library field and benefit
thereby
• For the World: – Global digital library – large, useful, many services
• General:– Save time and money– Increased visibility for all associated with research results
108
Metamodels in the 5S Framework
• Modeling archaeological information systems using the 5S theory to better understand the domain and design the system and the supported services
• Minimal DL
• Minimal ArchDL
• …
109
Digital Object
RepositoryCollection Minimal DL
Metadata Catalog
Descriptive Metadata
Specification
A Minimal DL in the 5S Framework
Structural Metadata
Specification
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
110
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Descriptive Metadata
specification
SpaTemOrg
StraDia
Arch Descriptive Metadata specification
ArchDO
ArchObj
ArchColl
Arch Metadata catalog
ArchDColl ArchDR Minimal ArchDL
A Minimal ArchDL in the 5S Framework
111
Moving from a minimal DL towards a DL reference model (1/2)
Minimal DL DL reference model
Multimedia
Annotation
Knowledge management Practical DL
systems
PIMDL quality
Domain-specific DLs
112
Moving from a minimal DL towards a DL reference model (2/2)
• Content-based image retrieval services in a DL
• A superimposed-information-supported DL
• Practical DL generation
113
Superimposing information
Superimposed layerNew information/structures
Base layerExisting information from heterogeneous sources: text, images, audio/video documents
MarkReference to base information element
114
Preliminary SI-DL metamodel
115
Stream Structure Space Service Society
ImageStream
FeatureVector
Image Descriptor
StructuredFeatuteVector
ImageContent
Description
ImageDigitalObject
ImageObject
User InfoNeed
ImageCollection
VisualizationOperation
Content-based ImageSearching Service
Image DescriptorMetadata Catalog
Composite Descriptor
KNNQ
RQ
Minimal CBIR DL
116
Summary• 5S and Generating DLs
– 5S Framework– 5S definitions, services taxonomy, ontology– 5SL– 5SGraph– 5SGen (and DL development)– DL development of union DL– 5SGen into DSpace
• 5S Metamodels – Minimal DL– Archaeology DL– Multimedia (CBIR) DL– Union DL– Practical DL, superimposed information, personal DL, …
117
NSF Workshop on DL Future, Chatham, MA
118
People
• Digital librarians
• DL system developers
• DL system administrators
• DL managers
• DL collection development staff
• DL evaluators
• DL users
119
120
Living In the KnowlEdge Society
(LIKES)
Grant: NSF 06-608, CPATH
Proposal: for VT Pathways(themed version of core curric.)
PI: Edward A. Fox
121
Purpose• Graduates from colleges & universities should be
prepared to live in and contribute to the Knowledge Society emerging in the 21st century.
• Computing/LIS education can be revitalized:
• if the LIKES theme spreads in programs (so graduates can help build the Knowledge Society);
• if faculty collaborate (both in education and research endeavors) with colleagues globally who are interested in LIKES.
122
Knowledge Society
HCI
Visualization
Knowledge Management
Systems Analysis & Design
Programming
Database
Algorithms
Architecture
Net-Centricity
Intelligent Systems
Social & Ethical
Library Information Science
Simulation
Chemistry
Biology
Communi-
cations
Healthcare
Art
Music
Marketing
Finance
Modeling
Engineering
Sociology
Psychology
Physics
Architecture
History
Political Science
Geography
Knowledge Society
HCI
Visualization
Knowledge
Systems Analysis & Design
Database
Algorithms
Intelligent Systems
Social & Ethical
Library & Information Science
Economics
Simulation
Chemistry
Biology
Healthcare
Art
Music
Marketing
Finance
Engineering
Sociology
Psychology
Physics
Architecture
History
Political Science
Geography
English
Math
Living In the KnowlEdge Society (LIKES):Core surrounded by enabling concepts, problem providing disciplines
123
Objectives – 1 of 3
• Enhance education in the discipline:
– New courses: Living in the Global Knowledge Society, Knowledge Management
– Enhanced courses to be more driven by the LIKES theme: Artificial Intelligence, Data Mining, Digital Libraries, Multimedia/Hypertext/Information Access, …
124
Objectives – 2 of 3• Give special attention, inside the discipline and across
disciplines:• to the areas of data, information, and knowledge;• to key concepts and methods, such as:
representation/views search/discovery
inference/decisions comparison/matching
complexity/heuristics analysis/mining
integration/mapping modeling/simulation
125
Objectives – 3 of 3
• Engage researchers and teachers and students in the Knowledge Society’s problems, as motivation, orientation, and to help with solutions, e.g.,– Shifting toward digital government, including statutes,
rules, regulations, and procedures;– Handling attacks, including spam and viruses;– Ensuring quality even with disinformation, through
knowledge sourcing, provenance, and sharing of community expertise;
– Ensuring changes through education, that is cross-disciplinary, globally contextualized, based on awareness of human development, learning theory, and cognitive psychology
126
Potential Course Areas/Courses• Personal Knowledge Management
– Computer Science and Information Systems, e.g., multi-media, process design and evaluation, and Human-Computer / Human-Information interaction.
– Psychology, e.g., knowledge organization principles, human cognitive processes.– Industrial Systems Engineering, e.g., Ergonomic factors of knowledge environments. – Ethics, e.g., ethical issues of information disclosure.
• Communication and Collaboration– Communications, e.g., Communication using digital visualizations, using knowledge access
in constructing digital messages.– Information Systems and Computer Science, e.g., computer supported cooperative work
and group support systems.– Marketing, e.g., influence of knowledge presentation on on-line customer behavior.
• Organization– Information Systems, e.g., service innovation and development, system design and
development.– Management Science, e.g., decision support systems concepts, capabilities, techniques,
and tools.– Management, Marketing, Accounting, and Finance, e.g., business in the information age.
• Society– Sociology, e.g., impact of knowledge differentials across society and countries.– Political Science, e.g., governmental collection and use of knowledge, impact of technology
on elections and government.
127
DL Curriculum Project (NSF supporting VT, UNC-CH)
• Identify, develop and test educational DL modules, guided by
- Experts, international collaborators
- Computing Curriculum 2001
- 5S framework
- Analysis of DL course syllabi
…
128
CC2001 Information Management Areas
IM1. Information models and systems*
IM8. Distributed DBs
IM2. Database systems* IM9. Physical DB design
IM3. Data modeling* IM10. Data mining
IM4. Relational DBs IM11. Information storage and retrieval
IM5. Database query languages IM12. Hypertext and hypermedia
IM6. Relational DB design IM13. Multimedia information & systems
IM7. Transaction processing IM14. Digital libraries
129
Why Modular Design
• Flexibility, e.g., for ETD programs:– Self-study by NDLTD trainers– Self-study by ETD authors– Short courses by NDLTD trainers of ETD
authors– A course based on a single module– Course sequence (program) from multiple
modules– Plug in modules into an existing course
(enhancement)• Module 1. Overview + Module 10. DL
Education & Research
130
Modules
1. Collection Development2. Digital objects / Composites / Packages3. Metadata, Cataloging, Author submission4. Architecture, Interoperability5. Data visualization6. Services7. Intellectual property rights management,
Privacy, Protection8. Social issues / Future of DLs9. Archiving and Preservation
131
Ascertaining Priority Topics
• We’ve manually classified and analyzed publications using 9 Modules:
Source Count
Proceedings JCDL ’01 – ’05 354
Proceedings ACM DL ’96 – ’00 189
Magazine articles D-Lib ’95 – ‘06 521
Session titles JCDL, ACM DL, ECDL
264
132
Conference papers x modules
0
20
40
60
80
100
120
140
160
180
200
1 2 3 4 5 6 7 8 9
Module ID
Nu
mb
er
of
co
nfe
ren
ce
pa
pe
rs
JCDL 05
JCDL 04
JCDL 03
JCDL 02
JCDL 01
ACM DL 00
ACM DL 99
ACM DL 98
ACM DL 97
ACM DL 96
133
• Analysis Results:
- Total of 543 proceedings:
Most popular topics were architecture (module 4) and services (module 6)
134
Distribution of D-Lib Magazine Articles
across Module Topics
0
20
40
60
80
100
120
140
160
180
200
1 2 3 4 5 6 7 8 9
Module ID
Nu
mb
er
of
D-L
ib a
rtic
les
D-Lib 06
D-Lib 05
D-Lib 04
D-Lib 03
D-Lib 02
D-Lib 01
D-Lib 00
D-Lib 99
D-Lib 98
D-Lib 97
D-Lib 96
D-Lib 95
135
• Analysis Results:
- Total of 521 articles:
Most popular topics were architecture (module 4), services (module 6)
and social issues (module 8)
136
Distribution of Session Titles
across Module Topics
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8 9
Module ID
Nu
mb
er
of
pa
nel
se
ssio
ns
JCDL & ACM DL
ECDL
ICADL
137
• Analysis Results:
- Total of 264 session titles (JCDL, ECDL, ICADL):
Most popular topic was services (module 6)
followed by architecture (module 4)
138
Fox & Gonçalves Book Outline
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
139
Textbook Outline (2)
• Part 2 – Higher DL Constructs– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
140
Textbook Outline (3)
• Part 3 – Advanced Topics– Ch. 13: Quality– Ch. 14: Integration– Ch. 15: How to build a digital library– Ch. 16: Research Challenges, Future Perspectives
• Appendix– A: Mathematical preliminaries– B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL– D: Formal Definitions: Archeological DL– E: Glossary of terms, mappings
141
Pointers and Summary
• http://fox.cs.vt.edu
• http://fox.cs.vt.edu/talks
• www.dlib.vt.edu
• IR -> DL
• Education: CSTC, CITIDEL, NSDL, NDLTD, LIKES, DLcurric