saa 2014 session 703

89
From Crawling to Walking: Improving Access to Web Archives SAA 2014 Session 703

Upload: rosalielack

Post on 14-Jun-2015

1.113 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

This SAA 2014 (session 703) http://sched.co/1hIEcE2 lightning talk highlights challenges and solutions to promoting access and discovery of web archives. Speakers discussed descriptive strategies towards integrating web archives with EAD finding aids, MARC records in library catalogs, and other discovery methods and tools.

TRANSCRIPT

Page 1: SAA 2014 session 703

From Crawling to Walking: Improving Access to Web Archives

SAA 2014Session 703

Page 2: SAA 2014 session 703

From Crawling to Walking: Improving Access to Web Archives

1. Jane Zhang2. Michael Paulmeno3. Meg Tuomala4. Benn Joseph5. Polina Ilieva6. Jennifer Wright7. John Bence8. Olga Virakhovskaya9. Anna Perricci10. Rick Fitzgerald11. Rosalie Lack

Page 3: SAA 2014 session 703

Jane Zhang

Catholic University of America

Page 4: SAA 2014 session 703

Web Records, Web Archived Files, and Web Archives Access Models

Jane Zhang, Catholic University of America

Session 703 - From Crawling to Walking: Improving Access to Web Archives

SAA 2014, Washington DC Saturday, August 16

Page 5: SAA 2014 session 703

Introduction

Web as records

The Web ARChive files as recordkeeping formats

Web archives access models

Page 6: SAA 2014 session 703

Web Archiving Initiatives

• A survey on web archiving initiatives– Daniel Gomes et al., Foundation for National

Scientific Computing, Portuguese Web Archive Team

– International Conference on Theory and Practice of Digital Libraries 2011, 25-29 September 2011

• Wikipedia: List of Web archiving initiatives

Page 7: SAA 2014 session 703

Web Archiving Initiatives

A survey on web archiving initiatives (2011) 42 web archiving initiatives worldwide 9 initiatives from the United States

List of Web Archiving Initiatives (July 2014) 70 web archiving initiatives worldwide 17 initiatives from the United States

Page 8: SAA 2014 session 703

Web File Formats 2011 Worldwide Survey

The ARC and WARC formats are dominant, being used by 54% of the initiatives.

2014 List – USA 10 out of 17 initiatives identified as

using the ARC and/or WARC formats 58% of the US Web archiving initiatives

Page 9: SAA 2014 session 703

Web Archives Access Models 2011 Worldwide Survey

89% support access to URL history 79% enable searching metadata 67% provide full-text search over archived

content 2014 List – USA

URL history: 12 out of 17 – 70% Metadata: 13 out of 17 – 76% Full-text: 12 out of 17 – 70%

Page 10: SAA 2014 session 703

Metadata: Theme-based Collections

Collection overview, name, title, subject, abstract, language, year captured

Site title, subject, place, language Collection description, keyword, filter

by site title, and/or file type, topic group

Catalog records (collection or website)

Page 11: SAA 2014 session 703

Metadata: Provenance-based Collections

Site owner, business activity, topic, sub-topic, region, country, language, year created, date archived

Collection/series description, site title Keyword search, browse by agency Collection description, title keyword,

browse by agency name, government branch, or agency expiration date

Browse by region, then site owner

Page 12: SAA 2014 session 703

Archival

Jane Zhang @ Catholic University of [email protected]

Thank You

Page 13: SAA 2014 session 703

Michael Paulmeno

Delta State University

Page 14: SAA 2014 session 703

Accessing Web Archives Through the Library Catalog

ByMichael Paulmeno

Page 15: SAA 2014 session 703

Overview• Many challenges to making web archives

accessible• Archival description not fully compatible with

library catalogs• Problem not unique to web archives• Differing metadata and content standards lead

to separation between libraries and archives (i.e. silos)

• Researchers who access archives through library systems tend to use them longer1

1 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU”, 14

Page 16: SAA 2014 session 703

The Current State of Affairs

• Collections accessed through access multiple points

• Subject headings2

• Many organizations create two descriptions and link via MARC 856 field; this can cause confusion3

• Yet significant discovery occurs through search engines4

2 Michelle Mascaro, “Controlled Access Headings in EAD Finding Aids: Current Practices in Number of and Types of Headings Assigned,” 223.3 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU,” 3 –5.4. Ibid, 13

Page 17: SAA 2014 session 703

Challenges to Integration• MARC records lack detail 5 6

• Archivists uncertain about readiness to adopt new standards 7

• Many different systems (ArchivesSpace, Ebsco Discovery, Blacklight, various Integrated Library Systems) and metadata standards

• Other challenges specific to web archives• Ex. How to represent a continuously accessioned

resource?

5 Caprini and Kelcy Shepherd, “The MARC Standard and Encoded Archival Description,” 19.6 Karen F. Gracy and Frank Lambert, “Who’s Ready to Surf the Next Wave? A Study of Perceived Challenges to Implementing New and Revised Standards for Archival Description,” 102.7 Ibid, 117

Page 18: SAA 2014 session 703

Towards the Future• Increasing efforts to integrated archival

description and library catalogs– University of Denver Penrose Library 8

– Triangle Research Libraries Network 9

– Library of Congress– UNC Chapel Hill

• Adaptability key to future collaboration• What affects archives, affects web archives as

well8 Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher, “Better, Faster, Stronger: Integrating Archives Processing and Technical Services.”9 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU.”

Page 19: SAA 2014 session 703

Works Cited• Caprini, Peter, and Kelcy Shepherd. “The MARC Standard and Encoded Archival Description.”

Library Hi-Tech 22, no. 1 (2004): 18 –27. doi:10.1108/07378830410524468. • Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher. “Better, Faster, Stronger:

Integrating Archives Processing and Technical Services.” Library Resources and Technical Services 53, no. 4 (October 2009): 261 – 270.

• Karen F. Gracy, and Frank Lambert. “Who’s Ready to Surf the Next Wave? A Study of Perceived

Challenges to Implementing New and Revised Standards for Archival Description.” The American Archivist 77, no. 1 (Spring/Summer 2014): 96–132.

• Michelle Mascaro. “Controlled Access Headings in EAD Finding Aids: Current Practices in Number

of and Types of Headings Assigned.” Journal of Archival Organization 9, no. 3–4 (January 2011): 208 – 225. doi:10.1080/15332748.2011.643690.

• Noah Huffman. “More than Just Linking: Integrating MARC and EAD in a Single Discovery

Interface at Duke, UNC-Chapel Hill, and NCSU.” Journal for the Society of North Carolina Archivists 8, no. 2 (April 2011): 2 – 17.

Page 20: SAA 2014 session 703

Meg Tuomala

Gates Archive

Page 21: SAA 2014 session 703

Different strokes for different folks / Meeting the descriptive & access needs of multiple web archive collections / With minimal workflow and process change

Meg TuomalaAssistant archivist, Gates Archive

Formerly e-records archivist at UNC-Chapel Hill

Page 22: SAA 2014 session 703

Web archiving at UNC: context

● Started in 2013; using Archive-it ● 6 web archive collections ● Extension of / supplement to existing

collections● Special collections at UNC consolidated;

archival & biblio tech services are one dept

Page 23: SAA 2014 session 703

Different folks: the collections

Archival

● Southern Historical

Collection

● Southern Folklife

Collection

● University Archives

Biblio

● North Carolina Collection

● Rare Book Collection

● Digital Artists’ File

Page 24: SAA 2014 session 703

Different strokes: cataloging/ description & access

Archival

● Archival arrangement

& description at the

collection level, series

level○ Online finding aid

○ Catalog record

(and EAD) in

Library catalog

Biblio

● Bibliographic cataloging at the item level○ Catalog record in

Library catalog

Page 25: SAA 2014 session 703

(artificial

collection)

catalog record in library catalog

finding aids

Page 26: SAA 2014 session 703

Benn Joseph

Northwestern University Library

Page 27: SAA 2014 session 703

WASsup?: Describing Web Archives Using Archon

SAA Washington, D.C.August 16, 2014

Benn JosephManuscript Librarian

Northwestern University [email protected]

Page 28: SAA 2014 session 703

Image of WAS public interface

Page 29: SAA 2014 session 703

Item record for crawled site in WAS

Page 30: SAA 2014 session 703

NU version of Archon:• Only used for collection management• Separate blacklight/solr public

interface that searches and displays the finding aids

• Finding aids all live in a fedora repository

• “Ingest EAD” button added to Archon, puts xml into fedora to then be served via finding aids portal

Page 31: SAA 2014 session 703

Pic of entering in archon—container list

Page 32: SAA 2014 session 703

Entering WAS site URL as digital object in Archon

Page 33: SAA 2014 session 703

NUWA finding aid

Page 34: SAA 2014 session 703

NUWA finding aid

Page 35: SAA 2014 session 703

Finding aids exported as MODS and ingested by Primo

Page 36: SAA 2014 session 703

Benn JosephManuscript LibrarianNorthwestern University [email protected]

THE END!

Page 37: SAA 2014 session 703

Polina Ilieva

University of California, San Francisco

Page 38: SAA 2014 session 703

August 16, 2014Polina Ilieva, UCSF Archives & Special Collections

Science Online:Evaluating usage, impact and appraisal

Page 39: SAA 2014 session 703

Since it’s so easily accessible, lab websites are used as reference tools by lab members

Sharing datasets Channels for scholarly

communications After funding ends

website can be the only place where the data is preserved and available

Why collect?

Page 40: SAA 2014 session 703

Not just preserved for future use, scientists need instant access

Websites become integral part of scientific scholarly output

Impact

Page 41: SAA 2014 session 703

Curation and Appraisal How to select from hundreds of

labs? Web Archive pilot project in

collaboration with the library’s Research Informationist: Research @UCSF collection

Will use UCSF Profiles: Research Networking and Expertise Mining Tool

Collect and analyze info about faculty and researchers who lead labs: the length of service/title, # of scholarly publications, availability of websites, grants and awards.

Page 42: SAA 2014 session 703

Protocols Data Images Lectures (a/v) Publications List of lab members

What to collect?

Page 43: SAA 2014 session 703

Access

Page 44: SAA 2014 session 703

Need to know how data and collections are used to find an optimal way to provide access

Access

Page 45: SAA 2014 session 703

Thank you!

Polina E. Ilieva, CA

Head of Archives and Special Collections

University of California, San Francisco

[email protected]

Page 46: SAA 2014 session 703

Jennifer Wright

Smithsonian Institution Archives

Page 47: SAA 2014 session 703

Square Peg in a Round Hole: Integrating Web Archives into Existing Descriptive Practices

Jennifer WrightArchives and Information Management Team

Leader

SAA 2014Session 703

[email protected]

Page 48: SAA 2014 session 703

Accession-based Collections Management

• Each transfer is separate accession• Each accession cataloged separately in

CMS• Each accession has own finding aid

Solution for websites:

Crawls with similar dates and the same creator are combined into one accession

Page 49: SAA 2014 session 703

Description and Cataloging• Describes each

website/blog in accession

• Notes technical and other issues

• Includes crawl date(s)

• Indexes subjects, website/blog/ exhibition titles, and other creators

Page 50: SAA 2014 session 703

EAD Finding Aid

• Includes descriptive data from CMS

• Lists each website/blog included in accession

• Uses DAO tag to link to crawl on Archive-It

Search on “Website Records” at http://siarchives.si.edu/search/sia_search_findingaids

Page 51: SAA 2014 session 703

Archive-It• Browse URLs• Search across

all Smithsonian crawls

• Search by keyword or limiting options

• Plan to take better advantage of metadata

Smithsonian on Archive-It:https://archive-it.org/organizations/660

Page 52: SAA 2014 session 703

John Bence

Emory University

Page 53: SAA 2014 session 703

WAS GOING ON AT EMORY?

Integration of WAS-CDL web archives with MARBL online finding aids and web presence

John [email protected]

@jdbence

Page 54: SAA 2014 session 703

54

“Topics” for browsing sites by creator or by

institutional hierarchies (Laney Graduate School; ‘Administration’)

Page 55: SAA 2014 session 703

55

Supplied URL from WAS given a ID and persistent URL. The URL is then linked in <dao>

element

Page 56: SAA 2014 session 703

56

“Digital Materials Available” banner indicates existence of <dao> element

Choosing “Series 3: Web Archives”

provides link to WAS site for relevant

content

Page 57: SAA 2014 session 703

57

Website migration in summer 2013 allowed for integration of WAS search interface as a

page on MARBL website

Page 58: SAA 2014 session 703

58

• Next steps• UX testing on finding aids integration vs. local search

page• Gather (read: develop) additional use analytics

• For more go to:• http://marbl.library.emory.edu/collections/archives/web.h

tml• http://findingaids.library.emory.edu/

Google analytics for search interface from Feb 2013 to June 2014. Page went live in June 2013.

• #1 referral: Redirected URL of single web archive

• #2 referral: MARBL website search interface

• #3 referral: finding aids database

Thanks!

Page 59: SAA 2014 session 703

Olga Virakhovskaya

Bentley Historical Library, University of Michigan

Page 60: SAA 2014 session 703

Describing <archived> web content from single sites to web archives

Olga [email protected]

http://bentley.umich.edu/

Page 61: SAA 2014 session 703

Local subject heading (MARC fields 690)

LC subject headings (MARC fields 6xx)

MARC field 260/264

MARC fields 1xx/7xx

MARC fields 520 & 545 / History & Scope

and Content notes

MARC field 245

Page 62: SAA 2014 session 703

– Think BIG – Automate – Follow standards– Be consistently clear– Communicate

e hU a

…because machines don’t know everything

Page 63: SAA 2014 session 703

Anna Perricci

Columbia University Libraries

Page 64: SAA 2014 session 703

MARC records for the Contemporary Composers Web Archive

Anna PerricciColumbia University Libraries

SAA Lightning Talk (August 16, 2014)

Page 65: SAA 2014 session 703

Web Archiving at Columbia

We’ve only got 5 minutes!

• Columbia University Libraries web archiving program precedents

• Current Mellon grant

• Collaborative web archiving

Page 66: SAA 2014 session 703

Contemporary Composers Web Archive

Selectors• Borrow Direct Music Librarians Group: music librarians at Brown,

Columbia, Cornell, Dartmouth, Harvard, Johns Hopkins, Princeton, and Yale universities, MIT, and the universities of Chicago and Pennsylvania

Cataloging expertise• Russell Merritt (cataloger specializing in music resources)• Kate Harcourt (Director of Original and Special Materials

Cataloging)• Alex Thurman (Web Resources Collection Coordinator)

Page 67: SAA 2014 session 703

CCWA in Archive-It

Page 68: SAA 2014 session 703

Creating MARC records for web archives

• Creating MARC records for archived websites is standard practice at CUL– MARC records make web

archives discoverable in CLIO (Columbia Libraries Information Online)

• Collection level and seed level records

• Will use Archive-It interface to make Dublin Core records

Page 69: SAA 2014 session 703

Patron view of record in CLIO

Page 70: SAA 2014 session 703

Cataloger’s view of record in CLIO

Page 71: SAA 2014 session 703

Anticipating wider use of MARC records

• Records have been released to WorldCat

• Collaborators on cataloging were attentive to which fields will ordinarily be stripped out when a MARC record is imported to another institution’s OPAC

Page 72: SAA 2014 session 703

Conclusions

• So far sample of 10 records has taught us…

• Positive feedback from music librarians

• Next we will add another 44 records for the archived sites in CCWA soon

Page 73: SAA 2014 session 703

Thanks!

Anna [email protected] @AnnaPerricci Columbia University Libraries

Page 74: SAA 2014 session 703

Rick Fitzgerald

Library of Congress

Page 75: SAA 2014 session 703

Access in Transition:

Rethinking Descriptive Practices for the LC Web Archives

Page 76: SAA 2014 session 703

Migration effort

• Began in 2013, ongoing• Move web archives from stand-alone web

application at http://loc.gov/lcwa to library-wide discovery system at http://loc.gov/websites/

• Metadata and content migration• Cross-functional team effort

Page 77: SAA 2014 session 703

Interface - before and after

Page 78: SAA 2014 session 703

New Possibilities

• Web archives discoverable alongside other LC

collections for first time

• Web archives searchable from LC main page

for first time – greater visibility

• Consistent navigation, look and feel mirrors LC

website

Page 79: SAA 2014 session 703

Integrated into search

Page 80: SAA 2014 session 703

New Challenges

• Thousands of MODS records already created

for access, how to repurpose?

• Different interfaces, different needs

• Enable new ideas (combined records)

• Keeping useful elements, old and new

Page 81: SAA 2014 session 703

Thanks!

Rick Fitzgerald ([email protected])

Page 82: SAA 2014 session 703

Rosalie Lack

Califronia Digital Library

Web Archiving Service (WAS)

Page 83: SAA 2014 session 703

From Crawling to Walking: Improving Access to Web Archives

SAA 2014

Rosalie [email protected]

Page 84: SAA 2014 session 703

SAA Web Archiving Roundtable

Follow the blog!• http://webarchivingrt.wordpress.com/

Learn more!• http://www2.archivists.org/groups/web-archi

ving-roundtable

Page 85: SAA 2014 session 703

Tearing Down Silos

Page 86: SAA 2014 session 703

What We’re Doing

• Creating finding aids for each web archive• Adding links to existing finding aids for the

relevant archived sites• Providing a web archive collection search page• Uploading records into library catalogs• Sending records to OCLC • Building collaborative collections and providing

unified access• Integrating access with other formats in our

discovery systems

Page 87: SAA 2014 session 703

What Else Should We Be Doing?

Open Discussion

Page 88: SAA 2014 session 703

Image creditsTitle: The razing of silos on the former Roy Ranch, San Geronimo, California, May, 1964 [photograph]Creator/Contributor: unknownDate: May, 1964Contributing Institution: Marin County Free Libraryhttp://content.cdlib.org/ark:/13030/kt3489r96r/?order=1http://content.cdlib.org/ark:/13030/kt067nf0kk/?order=1http://content.cdlib.org/ark:/13030/kt467nf1dq/?order=1

Page 89: SAA 2014 session 703

Thank you!