june 2008 approved for public release, distribution unlimited digital object storage and retrieval...

10
June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

Post on 18-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

June 2008Approved for Public Release, Distribution Unlimited

Digital Object Storage and Retrieval

(DOSR)Vision

Digital Object Storage and Retrieval

(DOSR)Vision

Josh AlspectorJosh Alspector

Page 2: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

Disclaimer

This presentation discusses areas of technology investigation and interest. It does not relate to any existing DARPA program, nor should it be inferred to anticipate a future DARPA program.

Page 3: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

The Mundaneum

In 1910 Belgians Paul Otlet and future Nobel Peace Prize laureate Henri La Fontaine opened the Palais Mondial, later renamed the Mundaneum.The Mundaneum’s mission was to collect metadata on every book, journal, and periodical ever published and record it in a card file system that embodied what we would call a faceted classification scheme. By 1934 it contained over 15 million entries.Unique identifiers included embedded links to related documents.Staff responded to search requests received by post and telegraph and returned hand-copied cards by post.In 1934 Otlet conceived a global network of “electric telescopes” that would allow people to search and browse through interlinked documents, images, audio and motion picture recordings. He wrote that, “from his armchair, everyone will hear, see, participate, will even be able to applaud, give ovations, sing in the chorus, add his cries of participation to those of all the others.”

Fatal Flaw: Scalability

Documents, Images,

Recordings

“Hyper-linked” Card Catalog

Human Search Engine

Telegraph and postal “network”

“Social Network” Feedback

Mundaneum Infrastructure

Page 4: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

DOSR Vision

Create a resilient, distributed, scalable, and secure network of information that does not require a completely trusted or stable network of processing nodes [employ network overlays, and advanced cryptographic techniques]

Advance the state-of-the art in automated metadata generation and interoperability [apply machine learning techniques]

Automatically get information where it is needed, or may be needed, using less bandwidth and processing. [integrate user models, compact information retrieval encodings, and distributed content delivery]

Reliably track where information goes, and where it came from [encapsulate provenance and audit information in network-maintained virtual objects]

Enable secure, resilient information storage, characterization, retrieval, and collaboration across barriers of time, geography, community of interest, technology, and administrative domain

E-mail

Text files

Images

Spreadsheets

Videos

Web pagesWeb pages

Automated Metadata Generation

User and Data Models

What we can find defines what we can doWhat we can find defines what we can do

Photos courtesy of U.S. Army, U.S. Navy

Page 5: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

Hard Problems

Automated metadata extraction and generation DoD has many stovepipe systems with limited metadata Automatic extraction of metadata, especially from non-textual information is an unsolved problem

requiring some form of artificial intelligence Email, papers, presentations, forms, databases do not possess a community-maintained mesh of

reciprocal references, so Google-like search, relevance, and ranking algorithms do not work

Scalable security for sharable objects Decentralized (for scalability) key distribution systems present security challenges Protection from known cryptographic and corruption attacks is hard; protection from unknown attacks

is harder Usable secure sharing (as convenient as email) is needed or system won’t be used Scalable, revocable group access to synchronized, encrypted, versioned documents is essential

Scalable replicated storage and parallel data distribution

Globally unique identifiers (GUIDs) for retrieval and update are essential, and must be unbreakable, verifiable, and afford scalable resolution of a retreivable, trackable object

How to track fragmented and replicated objects for persistence and provenance Object replication for secure, scalable, high-bandwidth distribution (secure BitTorrent-style) Enhance resiliency and service in network-poor, areas Respond adaptively to service degradation for high-demand data and large-scale disruptions

Personalization, intelligent agents and user models Intelligent agents needed to locate content near likely users, based on user models User models based on authorization, active input and passive tracking

Page 6: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

Key Capabilities

Architecture and protocols– Protocols for exchanging objects, metadata, and security controls– Mobile agents and federated requests for information

Persistence of digital objects– Distribute replicas and coded fragments– Global, persistent, verifiable, unique identifiers (GUIDs)– Version-controlled, collaborative updates

Trust, security and provenance– Authorized, authenticated access– Decentralized encryption for scalability– Verifiable provenance and tracking of all objects– Resilience to attacks

Scalability– “Scale-free” architecture– Decentralized, peer-to-peer techniques– Manage latency, consistency and security as scale grows

Metadata and search– Extract metadata from video, maps, images– Relevance feedback– Efficient federated search

Accessibility and User Models– User models include authorization, preferences, location, need-to-know– Content finds you without search– Information locally available is personally relevant

Object 1

Version 1

Replicas and fragments

Retrieve latest version from closest fragments or replica

Object 1

Version 2 updateDecentralized,

scalable key distribution

Scalable resources,

storage and participant networks

Needed objects migrate to local server for user

Page 7: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

Interesting Research Ongoing in…

Automated metadata extractionDecentralized, self-configuring, location and routingFederated searchInformation retrievalPersonalization and user modelsProxy re-encryptionScalable security and PKISearch over encrypted indexesSecuring resilient peer-to-peer networks

DOSR Workshop will address these areas

Page 8: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

Preliminary ScheduleJuly 15 Talks

8:30 am Opening remarks – DARPAArchitecture

8:45 am Dr. Robert Kahn - keynote address9:15 am Dr. Peter Lucas – MAYA9:35 am Dr. Daniel Crichton – NASA9:55 am Break

Metadata10:15 am Dr. Ajay Divakaran - Sarnoff Corp.10:35 am Dr. Randal Burns - JHU10:55 am Dr. Shmuel Peleg - HU-J11:15 am Mr. Jason Byassee - Northrop Grumman

Security11:35 am Dr. James Allan - U. Mass-Amherst11:55 am Dr. Rafail Ostrovsky – UCLA12:15 pm Lunch1:40 pm Dr. Urs Muller - Net-Scale Tech.2:00 pm Dr. Matt Staker - IBM Research2:20 pm Dr. Angelos Stavrou - Global InfoTek Inc.2:40 pm Break

User Models3:00 pm Dr. Peter Brusilovsky – U. Pittsburgh3:20 pm Dr. Michael Walfish - UT-Austin3:40 pm Dr. Rafael Alonso - SET Corp.4:00 pm Mr. Peter Haglich - Lockheed Martin

July 15 Posters4:20 pm Break4:40 pm Poster Session 15:20 pm Poster Session 26:00 pm Adjourn

July 16 Breakouts9:00 am Dr. Josh Alspector - DOSR

vision and breakout group instructions9:30 am Breakout group discussions

Noon Lunch

1:30 pm Brief out Group 12:00 pm Brief out Group 2

2:30 Break

2:50 Brief out Group 33:20 Brief out Group 4

3:45 Plenary Session4:15 Adjourn

Page 9: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

Levels of Success

DoD adopts system internallyPortions of system are made available for open-source uses by ApacheLegal, medical, and financial records management firms adopt GUID’s, protocols, and system componentsISPs and media companies adopt GUID’s, protocols, and system components for subscription services Amazon, Google and iTunes use GUID’s and protocols

Page 10: June 2008 Approved for Public Release, Distribution Unlimited Digital Object Storage and Retrieval (DOSR) Vision Josh Alspector

04/18/23 Approved for Public Release, Distribution Unlimited

Prior Art

Coda (CMU)Cooperative File System (MIT)FARSITE (Microsoft)Grid (Argonne National Laboratory)Lustre (now owned by Sun Microsystems)OceanStore (UC Berkeley)PASIS (CMU)Universal Database (Maya Design)