the web archiving service tracy seneca california digital library california digital librarynew york...
TRANSCRIPT
![Page 1: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/1.jpg)
The Web Archiving Service
Tracy SenecaCalifornia Digital Library
California Digital Library New York University University of North Texas
National Digital Information Infrastructure Preservation ProgramLibrary of Congress
and the Web-at-Risk NDIIPP Project
![Page 2: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/2.jpg)
Overview
1. Web archiving: what & why
2. Web-at-Risk grant: scope & purpose
3. Web Archiving Service Sample Screens
![Page 3: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/3.jpg)
Web archiving: what & why
![Page 4: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/4.jpg)
“Web Archiving”: Assumptions
• Using automated methods to gather web content
• Building some kind of collection composed of more than one site
• Intent on preserving captured content
• Results are searchable– Public access may not be available
![Page 5: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/5.jpg)
How is the material at risk?
• Vulnerability of– Digital publications– Web publications– Government web publications– Local government web publications
![Page 6: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/6.jpg)
The Ephemeral Web
![Page 7: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/7.jpg)
Issues Unique to Government and Political Web Documents
• Publication & notification streams
• Elections, political change
• Security vs. freedom of information
• Local agencies often don’t have the resources to archive their own publications
![Page 8: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/8.jpg)
Web-at-Risk grant: scope & purpose
![Page 9: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/9.jpg)
Grant ScopeJan 2005 – Jun 2009
• Build tools to allow librarians to capture, curate and preserve web-based government and political information.– Create topical and event-based archives– Capture individual sites and documents
• Assess the impact of these tools on traditional collection development practices.
• Explore web archiving service sustainability.
![Page 10: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/10.jpg)
Project Partners
![Page 11: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/11.jpg)
Web-at-Risk Collections
![Page 12: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/12.jpg)
Beyond the Grant
• Support web archiving for the University of California– Enable collaboration across campuses– Enable collaboration between librarians and
researchers/faculty
![Page 13: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/13.jpg)
Web Archiving Service (WAS)
• Tangible outcome of grant work
• Being developed and release over a series of pilot tests
• Pilot test 5 underway until May 23
• 2008-2009 develop rights management and public access features
![Page 14: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/14.jpg)
WAS Production
• Early summer 2008, Web Archiving Service goes into ‘limited’ production.– Available 24/7 to the curators who have taken
part in the pilot tests so far
• Expand user community within UC as CDL confirms that WAS infrastructure, user support and training is sufficient.
![Page 15: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/15.jpg)
Web Archiving ServiceWorkflow and Sample Screens
![Page 16: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/16.jpg)
WAS workflowProject > Site > Capture > Collection
• Set up a project (usually a topic or event)
• Define the sites to capture
• Run single or multiple captures of each site
• Choose which results to add to a single, searchable collection
![Page 17: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/17.jpg)
![Page 18: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/18.jpg)
Capture sites individually
![Page 19: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/19.jpg)
Set Frequency
![Page 20: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/20.jpg)
Add metadata (or not)
![Page 21: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/21.jpg)
![Page 22: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/22.jpg)
Sites can be captured in batches
![Page 23: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/23.jpg)
When Capture Finishes
![Page 24: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/24.jpg)
![Page 25: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/25.jpg)
Display Results(QA capture effectiveness)
![Page 26: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/26.jpg)
Display Results: Overview & Reports
![Page 27: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/27.jpg)
Display Results: Full Text Search
![Page 28: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/28.jpg)
Display Results
![Page 29: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/29.jpg)
Display Results(metadata)
![Page 30: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/30.jpg)
![Page 31: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/31.jpg)
Create Collection
![Page 32: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/32.jpg)
Build Collection(add entire captures)
![Page 33: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/33.jpg)
Build Collection
![Page 34: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/34.jpg)
WAS features for analysis
• It’s impossible to know what a web site ‘contains’ until after you capture it!
• Tools for understanding where the data comes from and how it has changed.
![Page 35: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/35.jpg)
What’s the nature of this content?
![Page 36: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/36.jpg)
What new publications are in this capture?
![Page 37: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/37.jpg)
Build Collection(Select files from “Compare” screen)
![Page 38: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/38.jpg)
How volatile is this site?(Not yet available)
![Page 39: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/39.jpg)
Potential
• We can now capture the “chit chat” – the popular reaction to historic events, in ways never before possible.
• How will researchers interact with captured content once it is in an archive?– Visualization– Text analysis
• What is the potential, beyond simple search and display?
![Page 40: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/40.jpg)
Web Archive VisualizationDoantam Phan – Stanford University
![Page 41: The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital](https://reader030.vdocuments.us/reader030/viewer/2022032707/56649e545503460f94b4b389/html5/thumbnails/41.jpg)
Questions?
Web-at-Risk Wiki
http://wiki.cdlib.org/WebAtRisk
You Tube Video: “Web-at-Risk Collections”