update on memento (iipc 2011 plenary)
TRANSCRIPT
Memento Update
2011 IIPC General Assembly, Den Hague 1
Update on Memento
http://www.mementoweb.org/
Herbert Van de Sompel Robert Sanderson Michael L. Nelson
This research funded by the Library of Congress
Towards Seamless Navigation of the Web of the Past
Memento Update
2011 IIPC General Assembly, Den Hague 2
Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 3
Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 4
Memento wants to make it easy
to navigate the Web of the Past.
Memento Update
2011 IIPC General Assembly, Den Hague 5
Tate Online Today
Select Date March 16 2008
Tate Online March 16 2008
From National Archives
Memento Update
2011 IIPC General Assembly, Den Hague 6
Content Management Systems
• Designed to be aware of all versions of a resource
• Self-contained
• Variety of proprietary version mechanisms
• Versions interlinked using proprietary mechanisms
World Wide Web
• Designed to forget about prior versions of a resource
• Highly Distributed
• No standard version mechanisms
• Standardized interlinking mechanisms
Versions: Web vs CMS
Memento Update
2011 IIPC General Assembly, Den Hague 7
The Web Architecture has a hard time dealing with the versions that do exist:
• Cannot talk about a resource as it used to exist
• Cannot access a prior version given the current one
• Cannot access the current version given a prior one
Versions are not Integrated
Memento Update
2011 IIPC General Assembly, Den Hague 8
• Regards the Web as a big Content Management System
• Introduces a uniform capability to access versions on the Web
• Does not build new archives but leverages all systems that host versions
Memento Framework
Memento Update
2011 IIPC General Assembly, Den Hague 9
• Is Distributed: versions may exist on several servers
• Uses Time as a global version indicator
• Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link
Memento Framework
Memento Update
2011 IIPC General Assembly, Den Hague 10
Memento Interaction Overview
Memento Update
2011 IIPC General Assembly, Den Hague 11
Original Resource and Versions
Memento Update
2011 IIPC General Assembly, Den Hague 12
Bridge from Present to Past
Memento Update
2011 IIPC General Assembly, Den Hague 13
Bridge from Past to Present
Memento Update
2011 IIPC General Assembly, Den Hague 14
Memento Framework
Memento Update
2011 IIPC General Assembly, Den Hague 15
Framework with Multiple Archives
Memento Update
2011 IIPC General Assembly, Den Hague 16
Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 17
Significant progress has been made towards
seamless navigation of the Web of the Past.
Memento Update
2011 IIPC General Assembly, Den Hague 18
• Standardization process started via the IETF
• Interest from IETF and W3C
• Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas
https://datatracker.ietf.org/doc/draft-vandesompel-memento/
Standardization
Memento Update
2011 IIPC General Assembly, Den Hague 19
• Several client tools developed by us and others
• Add-ons for FireFox (operational) and Internet Explorer (experimental)
• Applications for Android (operational) and iPhone/iPad (in development)
• Paper in current Issue of Code4Lib Journal
http://www.mementoweb.org/tools/
Memento Clients
Memento Update
2011 IIPC General Assembly, Den Hague 20
• Memento-compliant Wayback software:
• In use by Internet Archive
• Available to Web archives, worldwide
• Please experiment with this new 1.6 version!
http://www.mementoweb.org/tools/
Memento Server Support
Memento Update
2011 IIPC General Assembly, Den Hague 21
• Plug-in for MediaWiki (operational)
• Used on W3C’s main wiki
• Please install it for your MediaWiki!
http://www.mementoweb.org/tools/
Memento Server Support (2)
Memento Update
2011 IIPC General Assembly, Den Hague 22
• Server side client:
• Attempts to perform all Memento actions against a given URI
• Reports success/failure of the interactions and warnings for optional aspects
• Kept up to date with IETF Internet Draft
http://www.mementoweb.org/tools/validator/
Memento Server Validator
Memento Update
2011 IIPC General Assembly, Den Hague 23
• Several systems that host Mementos made Memento-compliant “by proxy”:
• Many Web Archives that do not yet run Memento-compliant software
• 3,000+ MediaWiki systems, including Wikipedia, Wikia
• We would love all of these to become natively Memento compliant!
Memento Proxy Support
Memento Update
2011 IIPC General Assembly, Den Hague 24
• Ongoing effort to add materials that support understanding and adoption:
• Introduction to Memento • How to recognize
Mementos, TimeGates, Original Resources?
• Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.)
http://www.mementoweb.org/guide/
Memento Web Site
Memento Update
2011 IIPC General Assembly, Den Hague 25
• 2007-2010: US $250K grant from Library of Congress
• Approx. $50K on Memento
• 2010-2011: US $1 Million follow-up grant from Library of Congress
• For: Specification, outreach, tool development, further research
Funding
Memento Update
2011 IIPC General Assembly, Den Hague 26
Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 27
Very few Web sites provide a “timegate” link.
Need additional mechanisms to support Discovery.
Memento Update
2011 IIPC General Assembly, Den Hague 28
Batch Discovery: TimeMaps
A TimeMap minimally lists:
• URI and datetime of Mementos known to an archive • URI of Original Resource
TimeMaps can be aggregated across systems that host Mementos
Memento Update
2011 IIPC General Assembly, Den Hague 29
Batch Discovery: Feed of TimeMaps
System that hosts Mementos exposes Feed of TimeMaps to allow applications to remain in sync with its collection:
• One Atom entry per Original Resource • The entry links to or includes a TimeMap • The entry's updated changes when additional Mementos become available • The ID of the entry is a tag URI based on URI of Original Resource • Can be protected, and include license information • Could be anonymized by aggregating service
Memento Update
2011 IIPC General Assembly, Den Hague 30
Batch Discovery: robots.txt
• robots.txt file is used by Web servers to convey crawling policies
• Add a directives to support discovery of TimeGates and Feeds of TimeMaps
TimeGate: http://dutch.archive.org/timegate/ Archived: .nl
TimeGate: http://all.archive.org/timegate/ Archived: *
TimeMapFeed: http://dutch.archive.org/feed/feed1.xml
Memento Update
2011 IIPC General Assembly, Den Hague 31
Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 32
Memento can recreate pages using resources from different archives.
This poses a branding challenge.
Memento Update
2011 IIPC General Assembly, Den Hague 33
Current Branding Practice for Web Archives
Page and embedded resources from same Web Archive
Branding for
page and
embedded resources from single
archive
Memento Update
2011 IIPC General Assembly, Den Hague 34
Branding for Web Archives in Memento Mode
Will be researched
Page and embedded resources from various Web Archives
HTML's branding
No branding
No branding
Memento Update
2011 IIPC General Assembly, Den Hague 35
Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 36
Crawl-based Archives host distinct observations.
Transactional Archives never miss an update.
Memento Update
2011 IIPC General Assembly, Den Hague 37
Crawl-Based Web Archives
Distinct Observations are Archived for Many Servers
Memento Update
2011 IIPC General Assembly, Den Hague 38
Server-Side Transactional Web Archives
Entire Change History is Archived for a Single Server
Memento Update
2011 IIPC General Assembly, Den Hague 39
Development of Transactional Web Archive Software
Access: • Online, real time access via Memento TimeGates • Batch Export via WARC files for long term preservation
Capture: • Apache connection filter module captures URI, headers, body • POSTs in real-time to transactional archive
Memento Update
2011 IIPC General Assembly, Den Hague 40
Update on Memento http://mementoweb.org/
Herbert Van de Sompel Robert Sanderson Michael L. Nelson
Towards Seamless Navigation of the Web of the Past