information network overlay architecture adding value to digital content carl lagoze cs 431 – may...
TRANSCRIPT
Information Network Information Network Overlay ArchitectureOverlay Architecture
Adding Value to Digital ContentAdding Value to Digital Content
Carl LagozeCS 431 – May 4, 2005Cornell University
Overview of the TalkOverview of the Talk
Digital Libraries for search & accessDigital Libraries for search & access
Beyond Access: Adding value to digital Beyond Access: Adding value to digital contentcontent
Information Network Overlay ArchitectureInformation Network Overlay Architecture
Implementing the Architecture Implementing the Architecture
Input Phase Research QuestionsInput Phase Research Questions
Indexing and searchIndexing and search non-textualnon-textual cross languagecross language
PreservationPreservationScale issuesScale issues
everything becomes hard at mega-scaleeverything becomes hard at mega-scaleOCROCR
especially non-Romanespecially non-RomanWorkflowWorkflow
getting stuff in cheaply/reliablygetting stuff in cheaply/reliablyIntellectual propertyIntellectual property
hard enough at intra-national levelhard enough at intra-national levelDescriptionDescription
Meatadata issuesMeatadata issues
Digital Libraries – Federation PhaseDigital Libraries – Federation Phase
Z39.50
Dienst
SDLIP
OAI-PMH
SRW/SRU
Federation Phase Research Federation Phase Research QuestionsQuestions
HeterogeneityHeterogeneity
State MaintenanceState Maintenance
ReliabilityReliability Network levelNetwork level Management levelManagement level
RankingRanking
So, are we done?So, are we done?
The primary goal of digital libraries has been often been misconstrued as providing accessibility to a massive volume of resources. The real opportunity is to reestablish the library as a collaborative place where people learn from each other and organize around ideas and knowledge.
Opportunities:Opportunities:Not the same old information Flow Not the same old information Flow
Suppliers(Publishers)
Intermediaries(Librarians)
Consumers
……Towards a participatory Towards a participatory information environmentinformation environment
Shared Information
Context
Shared Information
Context
Prod
ucer
s
Consumers
Experts
Novices
Professionals
Digital Libraries: Digital Libraries: Beyond Search and AccessBeyond Search and Access
Build on foundation of near universal accessBuild on foundation of near universal accessProvide context for:Provide context for: Content aggregationContent aggregation: combining information entities in : combining information entities in
novel waysnovel ways Knowledge integrationKnowledge integration: capturing semantic : capturing semantic
relationships between information entitiesrelationships between information entities Information reuseInformation reuse: allowing secondary, tertiary : allowing secondary, tertiary
productsproducts Information transformationInformation transformation: combining information : combining information
entities with computational servicesentities with computational services collaboration and contributioncollaboration and contribution: blurring the line : blurring the line
between authors, publishers, users, experts…between authors, publishers, users, experts…
A bit of NSDL backgroundA bit of NSDL background
Mission: “Improve Science, Math, Engineering Mission: “Improve Science, Math, Engineering education through digital libraries”education through digital libraries”
Original NSDL solicitation in 1999Original NSDL solicitation in 1999
Over 180 projects fundedOver 180 projects funded
Core integration (Columbia, Cornell, UCAR) Core integration (Columbia, Cornell, UCAR) charged with providing organizational, technical charged with providing organizational, technical infrastructureinfrastructure
Funding through 2006Funding through 2006
http://www.nsdl.orghttp://www.nsdl.org
Users
Collections
Metadata repository
Existing Metadata-Centric Approach
Services The metadata repository is a resource for service providers.
It holds information about every collection and item known to the NSDL.
OAI-PMH
OAI-PMH
Characteristics of the Metadata Characteristics of the Metadata RepositoryRepository
Oracle databaseOracle database
Qualified Dublin Core Qualified Dublin Core
Item records with collection associationItem records with collection association
OAI-PMH ingest and exposureOAI-PMH ingest and exposure
Current collection ~ 800,000Current collection ~ 800,000
Metadata quality issuesMetadata quality issues
Problems in this approachProblems in this approach
Mere access does not equate to valueMere access does not equate to value Reeves Reeves Impact of Media and Technology in SchoolsImpact of Media and Technology in Schools
Static metadata records don’t capture changing Static metadata records don’t capture changing and multiple contexts of use and applicabilityand multiple contexts of use and applicability Recker and Wiley Recker and Wiley Designing Instruction with Learning Designing Instruction with Learning
ObjectsObjects
Patterns of use, informal opinions, descriptions Patterns of use, informal opinions, descriptions often more useful than taxonomic classification.often more useful than taxonomic classification. Collis and Strijker Collis and Strijker Technology and Human Issues in Technology and Human Issues in
Reusing LearningReusing Learning
Requirements of a New Approach Requirements of a New Approach
Represent (directly or by reference) multiple entities, Represent (directly or by reference) multiple entities, standardsstandards taxonomiestaxonomies agents (user profiles and roles)agents (user profiles and roles) curriculacurricula
that are contributed by multiple parties,that are contributed by multiple parties, users as actorsusers as actors reuse of primary resources for secondary, tertiary producesreuse of primary resources for secondary, tertiary produces
that are inter-related to express context,that are inter-related to express context, applicability to standardsapplicability to standards usage in curriculausage in curricula usage patterns by particular groups/peopleusage patterns by particular groups/people
and can be integrated with services and simulationsand can be integrated with services and simulations
Information Network OverlayInformation Network Overlay
DataStores
DocumentRepositories
Databases
WebResources
PublisherRepositories
Network API
Source Layer
NetworkRepresentation
Layer
Client Layer
Information Network InstanceInformation Network Instance
resource
http
metadata
metadata
oai
oai
metadataFor
metadataFor
resource
annotates
agentcontributes
API
resource
resource derivedFrom
derivedFromservice
transformedBy
SOAP
agentAPI
contributes
standard
appliesToappliesTo
http
Translate to Technical RequirementsTranslate to Technical Requirements
Rich information objectsRich information objects Integration of local and remote sourcesIntegration of local and remote sources Mixed genreMixed genre
Dynamic information objectsDynamic information objects Integration with local and distributed services Integration with local and distributed services
Graph-based information modelGraph-based information model Nodes are information objectsNodes are information objects Edges are relationships among those objectsEdges are relationships among those objects
Access and management APIAccess and management API exposing full functionality for programmatic accessexposing full functionality for programmatic access
Fine granularity access managementFine granularity access management
Fedora HistoryFedora History
Cornell ResearchCornell Research (1997-present) (1997-present) DARPA and NSF-funded researchDARPA and NSF-funded research First reference implementation developedFirst reference implementation developed Distributed, Interoperable Repositories (experiments with CNRI)Distributed, Interoperable Repositories (experiments with CNRI) Policy EnforcementPolicy Enforcement
First ApplicationFirst Application (1999-2001) (1999-2001) University of Virginia digital library prototype University of Virginia digital library prototype Technical implementation: adapted to web; RDBMS storageTechnical implementation: adapted to web; RDBMS storage Scale/stress testing for 10,000,000 objectsScale/stress testing for 10,000,000 objects
Open Source Software Open Source Software (2002-present)(2002-present) Andrew W. Mellon Foundation grantsAndrew W. Mellon Foundation grants Technical implementation: XML and web servicesTechnical implementation: XML and web services Fedora 1.0 (May 2003)Fedora 1.0 (May 2003) Fedora 2.0 (Jan 2005)Fedora 2.0 (Jan 2005)
Fedora FeaturesFedora Features
Digital Object ModelDigital Object Model Container for content and metadataContainer for content and metadata Aggregate local and remote contentAggregate local and remote content Associate behaviors with objects (integrate Associate behaviors with objects (integrate
content and web services)content and web services)
RelationshipsRelationships Define and query object-to-object relationshipsDefine and query object-to-object relationships
Repository web serviceRepository web service Digital object storageDigital object storage Web service APIs (SOAP and REST) to manage, Web service APIs (SOAP and REST) to manage,
access, searchaccess, search
in fo :fe d o ra /d e mo :11
hasR ep
hasMem ber
hasM em ber
has R ep
hasR ep
hasR ep
info:fe dor a/demo:1 1 /DC
in fo:fe dor a/demo:1 1 /THUMB
in fo:fe dor a/demo:1 1 /HIGH
in fo:fe dor a/demo:1 1 /bde f:2 /ZPAN
hasRep
in fo:fe dor a/demo:1 0 /bde f:1 /MEMB ERS
in fo :fe d o ra /d e mo :12
in fo :fe d o ra /d e mo :10
hasR ep
in fo:fe dor a/demo:1 2 /DC
in fo:fe dor a/demo:1 2 /THUMB
hasR ep
Objects, Representations, RelationshipsObjects, Representations, Relationships
Digital object identifier
Reserved Datastreams Key object metadata
DisseminatorsPointers to service definitions to provide service-mediated views
Datastreams Set of content or metadata items
Fedora Digital Object Model Component View
Persistent ID (PID)
Dublin Core (DC)
Datastream
Datastream
Audit Trail (AUDIT)
Relations (RELS-EXT)
Disseminator
Default Disseminator
Simple Fedora model for Simple Fedora model for aggregating static contentaggregating static content
Representations map to datastreamsRepresentations map to datastreams
Datastreams may be local or surrogates Datastreams may be local or surrogates (redirect) to remote data(redirect) to remote data
REST (or SOAP) URL’s provide uniform REST (or SOAP) URL’s provide uniform client access to representationsclient access to representations
Simple Content AggregationSimple Content Aggregation
hasR ep
hasR ep
hasR ep
URL1
URL2
URL3
TH UMBim a g e /g i f
D Cte x t /x m l
H IG Him age/jp eg
Datastream s
Aggregating local and remote Aggregating local and remote contentcontent
hasR ep
hasR ep
hasR ep
URL1
URL2
URL3
TH UMBim a g e /g i f
D Cte x t /x m l
H IG Him age/jp eg
Datastream s
H TTP
Dynamic ContentDynamic Content
Take advantage of computational services to Take advantage of computational services to process contentprocess contentRepresentations map to service-based Representations map to service-based transforms of static data transforms of static data Opaque at the access level (client sees only Opaque at the access level (client sees only representations, not how they are produced)representations, not how they are produced)Motivating examplesMotivating examples Canonical XML metadata format – XSLT to Dublin Canonical XML metadata format – XSLT to Dublin
CoreCore Document source in TeX, programmatic transform to Document source in TeX, programmatic transform to
PDF, PS, HTML, etc.PDF, PS, HTML, etc. Linkage of data to analysis toolsLinkage of data to analysis tools
Dynamic RepresentationsDynamic Representations
h a s R e p
h a s R e p
h a sR e p
U RL1
U RL2
U RL3hasR
ep
U RL4
H IG Him a g e /jpe g
D Cte xt /xml
T H U M Bima g e /g if
Datas tr e ams
s e r vic ec al l
Expressing Relationships Expressing Relationships Between ObjectsBetween Objects
Object-to-object RelationshipsObject-to-object Relationships Ontology of common relationships (RDF schema)Ontology of common relationships (RDF schema) Relationships stored in special datastream (RELS-EXT)Relationships stored in special datastream (RELS-EXT)
Resource Index (RI)Resource Index (RI) RDF-based index of repository (Kowari triple-store)RDF-based index of repository (Kowari triple-store)
RI SearchRI Search Powerful querying of graph of inter-related objectsPowerful querying of graph of inter-related objects REST-based query interface (using RDQL or ITQL)REST-based query interface (using RDQL or ITQL) Can be used in dynamic disseminationsCan be used in dynamic disseminations
Uses of Object RelationshipsUses of Object Relationships
Define collections (e.g., collection objects)Define collections (e.g., collection objects)Assert semantic relationships among Assert semantic relationships among objectsobjectsEnable network overlayEnable network overlay Surrogate objects referring to external entitiesSurrogate objects referring to external entities Assert relationships among themAssert relationships among them Assert other relationships (e.g., annotations)Assert other relationships (e.g., annotations)
Fedora Relationship Ontology Fedora Relationship Ontology (RDFS)(RDFS)
isPartOf / hasPartisPartOf / hasPart
isMemberOf / hasMemberisMemberOf / hasMember
isDescriptionOf / hasDescriptionisDescriptionOf / hasDescription
hasEquivalenthasEquivalent
… … othersothers
Deployment PlansDeployment Plans
Production release Phase 1 – July 2005Production release Phase 1 – July 2005 black box replacement for metadata black box replacement for metadata
repositoryrepository
Future releasesFuture releases API available at public levelAPI available at public level Relationship buildingRelationship building
Example 1 – BrandingExample 1 – BrandingProvenance of Data and MetadataProvenance of Data and Metadata
Example 2 – AggregationsExample 2 – AggregationsSemantic, Management, etc.Semantic, Management, etc.
Some open questionsSome open questions
Scalability of this modelScalability of this model
ManagementManagement
Control – trusted actorsControl – trusted actors
Cross-ontology relationshipsCross-ontology relationships
Exposing to the user - visualizationExposing to the user - visualization
Concluding GoalsConcluding Goals
Exploit the increasing ubiquity of digital Exploit the increasing ubiquity of digital contentcontent
Provide the architecture for adding value Provide the architecture for adding value to underlying contentto underlying content AggregationAggregation ReuseReuse Integration with computational servicesIntegration with computational services