information network overlay architecture adding value to digital content carl lagoze cs 431 – may...

40
Information Information Network Overlay Network Overlay Architecture Architecture Adding Value to Digital Content Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University

Upload: ambrose-parrish

Post on 01-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Information Network Information Network Overlay ArchitectureOverlay Architecture

Adding Value to Digital ContentAdding Value to Digital Content

Carl LagozeCS 431 – May 4, 2005Cornell University

Overview of the TalkOverview of the Talk

Digital Libraries for search & accessDigital Libraries for search & access

Beyond Access: Adding value to digital Beyond Access: Adding value to digital contentcontent

Information Network Overlay ArchitectureInformation Network Overlay Architecture

Implementing the Architecture Implementing the Architecture

Digital Libraries – Ingest FocusDigital Libraries – Ingest Focus

Input Phase Research QuestionsInput Phase Research Questions

Indexing and searchIndexing and search non-textualnon-textual cross languagecross language

PreservationPreservationScale issuesScale issues

everything becomes hard at mega-scaleeverything becomes hard at mega-scaleOCROCR

especially non-Romanespecially non-RomanWorkflowWorkflow

getting stuff in cheaply/reliablygetting stuff in cheaply/reliablyIntellectual propertyIntellectual property

hard enough at intra-national levelhard enough at intra-national levelDescriptionDescription

Meatadata issuesMeatadata issues

Digital Libraries – Federation PhaseDigital Libraries – Federation Phase

Z39.50

Dienst

SDLIP

OAI-PMH

SRW/SRU

Federation Phase Research Federation Phase Research QuestionsQuestions

HeterogeneityHeterogeneity

State MaintenanceState Maintenance

ReliabilityReliability Network levelNetwork level Management levelManagement level

RankingRanking

We have been very successful! We have been very successful!

So, are we done?So, are we done?

The primary goal of digital libraries has been often been misconstrued as providing accessibility to a massive volume of resources. The real opportunity is to reestablish the library as a collaborative place where people learn from each other and organize around ideas and knowledge.

Opportunities:Opportunities:Not the same old information Flow Not the same old information Flow

Suppliers(Publishers)

Intermediaries(Librarians)

Consumers

……Towards a participatory Towards a participatory information environmentinformation environment

Shared Information

Context

Shared Information

Context

Prod

ucer

s

Consumers

Experts

Novices

Professionals

Data

Information

Knowledge

Wisdom

description IP preservationmodeling

Digital Libraries: Digital Libraries: Beyond Search and AccessBeyond Search and Access

Build on foundation of near universal accessBuild on foundation of near universal accessProvide context for:Provide context for: Content aggregationContent aggregation: combining information entities in : combining information entities in

novel waysnovel ways Knowledge integrationKnowledge integration: capturing semantic : capturing semantic

relationships between information entitiesrelationships between information entities Information reuseInformation reuse: allowing secondary, tertiary : allowing secondary, tertiary

productsproducts Information transformationInformation transformation: combining information : combining information

entities with computational servicesentities with computational services collaboration and contributioncollaboration and contribution: blurring the line : blurring the line

between authors, publishers, users, experts…between authors, publishers, users, experts…

Information Foundation

Value-add,

customized

Projections

NSDL ContextNSDL Context

A bit of NSDL backgroundA bit of NSDL background

Mission: “Improve Science, Math, Engineering Mission: “Improve Science, Math, Engineering education through digital libraries”education through digital libraries”

Original NSDL solicitation in 1999Original NSDL solicitation in 1999

Over 180 projects fundedOver 180 projects funded

Core integration (Columbia, Cornell, UCAR) Core integration (Columbia, Cornell, UCAR) charged with providing organizational, technical charged with providing organizational, technical infrastructureinfrastructure

Funding through 2006Funding through 2006

http://www.nsdl.orghttp://www.nsdl.org

Users

Collections

Metadata repository

Existing Metadata-Centric Approach

Services The metadata repository is a resource for service providers.

It holds information about every collection and item known to the NSDL.

OAI-PMH

OAI-PMH

Characteristics of the Metadata Characteristics of the Metadata RepositoryRepository

Oracle databaseOracle database

Qualified Dublin Core Qualified Dublin Core

Item records with collection associationItem records with collection association

OAI-PMH ingest and exposureOAI-PMH ingest and exposure

Current collection ~ 800,000Current collection ~ 800,000

Metadata quality issuesMetadata quality issues

Problems in this approachProblems in this approach

Mere access does not equate to valueMere access does not equate to value Reeves Reeves Impact of Media and Technology in SchoolsImpact of Media and Technology in Schools

Static metadata records don’t capture changing Static metadata records don’t capture changing and multiple contexts of use and applicabilityand multiple contexts of use and applicability Recker and Wiley Recker and Wiley Designing Instruction with Learning Designing Instruction with Learning

ObjectsObjects

Patterns of use, informal opinions, descriptions Patterns of use, informal opinions, descriptions often more useful than taxonomic classification.often more useful than taxonomic classification. Collis and Strijker Collis and Strijker Technology and Human Issues in Technology and Human Issues in

Reusing LearningReusing Learning

Requirements of a New Approach Requirements of a New Approach

Represent (directly or by reference) multiple entities, Represent (directly or by reference) multiple entities, standardsstandards taxonomiestaxonomies agents (user profiles and roles)agents (user profiles and roles) curriculacurricula

that are contributed by multiple parties,that are contributed by multiple parties, users as actorsusers as actors reuse of primary resources for secondary, tertiary producesreuse of primary resources for secondary, tertiary produces

that are inter-related to express context,that are inter-related to express context, applicability to standardsapplicability to standards usage in curriculausage in curricula usage patterns by particular groups/peopleusage patterns by particular groups/people

and can be integrated with services and simulationsand can be integrated with services and simulations

Information Network OverlayInformation Network Overlay

DataStores

DocumentRepositories

Databases

WebResources

PublisherRepositories

Network API

Source Layer

NetworkRepresentation

Layer

Client Layer

Information Network InstanceInformation Network Instance

resource

http

metadata

metadata

oai

oai

metadataFor

metadataFor

resource

annotates

agentcontributes

API

resource

resource derivedFrom

derivedFromservice

transformedBy

SOAP

agentAPI

contributes

standard

appliesToappliesTo

http

Translate to Technical RequirementsTranslate to Technical Requirements

Rich information objectsRich information objects Integration of local and remote sourcesIntegration of local and remote sources Mixed genreMixed genre

Dynamic information objectsDynamic information objects Integration with local and distributed services Integration with local and distributed services

Graph-based information modelGraph-based information model Nodes are information objectsNodes are information objects Edges are relationships among those objectsEdges are relationships among those objects

Access and management APIAccess and management API exposing full functionality for programmatic accessexposing full functionality for programmatic access

Fine granularity access managementFine granularity access management

Fedora HistoryFedora History

Cornell ResearchCornell Research (1997-present) (1997-present) DARPA and NSF-funded researchDARPA and NSF-funded research First reference implementation developedFirst reference implementation developed Distributed, Interoperable Repositories (experiments with CNRI)Distributed, Interoperable Repositories (experiments with CNRI) Policy EnforcementPolicy Enforcement

First ApplicationFirst Application (1999-2001) (1999-2001) University of Virginia digital library prototype University of Virginia digital library prototype Technical implementation: adapted to web; RDBMS storageTechnical implementation: adapted to web; RDBMS storage Scale/stress testing for 10,000,000 objectsScale/stress testing for 10,000,000 objects

Open Source Software Open Source Software (2002-present)(2002-present) Andrew W. Mellon Foundation grantsAndrew W. Mellon Foundation grants Technical implementation: XML and web servicesTechnical implementation: XML and web services Fedora 1.0 (May 2003)Fedora 1.0 (May 2003) Fedora 2.0 (Jan 2005)Fedora 2.0 (Jan 2005)

Fedora FeaturesFedora Features

Digital Object ModelDigital Object Model Container for content and metadataContainer for content and metadata Aggregate local and remote contentAggregate local and remote content Associate behaviors with objects (integrate Associate behaviors with objects (integrate

content and web services)content and web services)

RelationshipsRelationships Define and query object-to-object relationshipsDefine and query object-to-object relationships

Repository web serviceRepository web service Digital object storageDigital object storage Web service APIs (SOAP and REST) to manage, Web service APIs (SOAP and REST) to manage,

access, searchaccess, search

in fo :fe d o ra /d e mo :11

hasR ep

hasMem ber

hasM em ber

has R ep

hasR ep

hasR ep

info:fe dor a/demo:1 1 /DC

in fo:fe dor a/demo:1 1 /THUMB

in fo:fe dor a/demo:1 1 /HIGH

in fo:fe dor a/demo:1 1 /bde f:2 /ZPAN

hasRep

in fo:fe dor a/demo:1 0 /bde f:1 /MEMB ERS

in fo :fe d o ra /d e mo :12

in fo :fe d o ra /d e mo :10

hasR ep

in fo:fe dor a/demo:1 2 /DC

in fo:fe dor a/demo:1 2 /THUMB

hasR ep

Objects, Representations, RelationshipsObjects, Representations, Relationships

Digital object identifier

Reserved Datastreams Key object metadata

DisseminatorsPointers to service definitions to provide service-mediated views

Datastreams Set of content or metadata items

Fedora Digital Object Model Component View

Persistent ID (PID)

Dublin Core (DC)

Datastream

Datastream

Audit Trail (AUDIT)

Relations (RELS-EXT)

Disseminator

Default Disseminator

Simple Fedora model for Simple Fedora model for aggregating static contentaggregating static content

Representations map to datastreamsRepresentations map to datastreams

Datastreams may be local or surrogates Datastreams may be local or surrogates (redirect) to remote data(redirect) to remote data

REST (or SOAP) URL’s provide uniform REST (or SOAP) URL’s provide uniform client access to representationsclient access to representations

Simple Content AggregationSimple Content Aggregation

hasR ep

hasR ep

hasR ep

URL1

URL2

URL3

TH UMBim a g e /g i f

D Cte x t /x m l

H IG Him age/jp eg

Datastream s

Aggregating local and remote Aggregating local and remote contentcontent

hasR ep

hasR ep

hasR ep

URL1

URL2

URL3

TH UMBim a g e /g i f

D Cte x t /x m l

H IG Him age/jp eg

Datastream s

H TTP

Dynamic ContentDynamic Content

Take advantage of computational services to Take advantage of computational services to process contentprocess contentRepresentations map to service-based Representations map to service-based transforms of static data transforms of static data Opaque at the access level (client sees only Opaque at the access level (client sees only representations, not how they are produced)representations, not how they are produced)Motivating examplesMotivating examples Canonical XML metadata format – XSLT to Dublin Canonical XML metadata format – XSLT to Dublin

CoreCore Document source in TeX, programmatic transform to Document source in TeX, programmatic transform to

PDF, PS, HTML, etc.PDF, PS, HTML, etc. Linkage of data to analysis toolsLinkage of data to analysis tools

Dynamic RepresentationsDynamic Representations

h a s R e p

h a s R e p

h a sR e p

U RL1

U RL2

U RL3hasR

ep

U RL4

H IG Him a g e /jpe g

D Cte xt /xml

T H U M Bima g e /g if

Datas tr e ams

s e r vic ec al l

Expressing Relationships Expressing Relationships Between ObjectsBetween Objects

Object-to-object RelationshipsObject-to-object Relationships Ontology of common relationships (RDF schema)Ontology of common relationships (RDF schema) Relationships stored in special datastream (RELS-EXT)Relationships stored in special datastream (RELS-EXT)

Resource Index (RI)Resource Index (RI) RDF-based index of repository (Kowari triple-store)RDF-based index of repository (Kowari triple-store)

RI SearchRI Search Powerful querying of graph of inter-related objectsPowerful querying of graph of inter-related objects REST-based query interface (using RDQL or ITQL)REST-based query interface (using RDQL or ITQL) Can be used in dynamic disseminationsCan be used in dynamic disseminations

Uses of Object RelationshipsUses of Object Relationships

Define collections (e.g., collection objects)Define collections (e.g., collection objects)Assert semantic relationships among Assert semantic relationships among objectsobjectsEnable network overlayEnable network overlay Surrogate objects referring to external entitiesSurrogate objects referring to external entities Assert relationships among themAssert relationships among them Assert other relationships (e.g., annotations)Assert other relationships (e.g., annotations)

Fedora Relationship Ontology Fedora Relationship Ontology (RDFS)(RDFS)

isPartOf / hasPartisPartOf / hasPart

isMemberOf / hasMemberisMemberOf / hasMember

isDescriptionOf / hasDescriptionisDescriptionOf / hasDescription

hasEquivalenthasEquivalent

… … othersothers

Deployment PlansDeployment Plans

Production release Phase 1 – July 2005Production release Phase 1 – July 2005 black box replacement for metadata black box replacement for metadata

repositoryrepository

Future releasesFuture releases API available at public levelAPI available at public level Relationship buildingRelationship building

Example 1 – BrandingExample 1 – BrandingProvenance of Data and MetadataProvenance of Data and Metadata

Example 2 – AggregationsExample 2 – AggregationsSemantic, Management, etc.Semantic, Management, etc.

Some open questionsSome open questions

Scalability of this modelScalability of this model

ManagementManagement

Control – trusted actorsControl – trusted actors

Cross-ontology relationshipsCross-ontology relationships

Exposing to the user - visualizationExposing to the user - visualization

Concluding GoalsConcluding Goals

Exploit the increasing ubiquity of digital Exploit the increasing ubiquity of digital contentcontent

Provide the architecture for adding value Provide the architecture for adding value to underlying contentto underlying content AggregationAggregation ReuseReuse Integration with computational servicesIntegration with computational services