a snapshot of the u.s. web archiving landscape through the 2013 ndsa survey report
TRANSCRIPT
Content Working Group
Archive-It Partner Meeting
November 18, 2014 “Barrage balloon manufacture...” by Alfred T. Palmer under public domain
Nicholas Taylor (@nullhandle)
Web Archiving Service Manager
Stanford University Libraries
A Snapshot of the U.S. WebArchiving Landscape through the
2013 NDSA Survey Report
Content Working Group
NDSA Web Archiving Survey Working Group
Jefferson BaileyInternet Archive / Archive-It
Kristine HannaInternet Archive / Archive-It
Edward McCainUniversity of Missouri
Cathy HartmanUniversity of North Texas
Abbie GrotkeLibrary of Congress
Christie MoffattNational Library of Medicine
Nicholas TaylorStanford University
Content Working Group
NDSA Web Archiving survey background
2011
• 78 respondents
• program info
• tools and services
• access
• policies
2013• 92 respondents• program info
• staff time, metrics, skills, content concerns
• tools and services• access and discovery
• new discovery options
• policies• embargo, social media,
robots.txt, resources
Content Working Group
Respondent Characteristics
“Lego People” by Scoobay under CC BY-NC-SA 2.0
Content Working Group
less than half repeat respondents
5633
40
20112013
Content Working Group
universities still make up most programs
College or University
47%
Archive13%
State Gov13%
Other12%
Fed Gov8%
Commercial2%
Public Library
2%Museum
3%
2011
College or University
52%Archive
15%
State Gov13%
Other8%
Fed Gov5%
Commercial4%
Public Library
2%Museum
1%
2013
Content Working Group
Archive-It and SAA top group affiliations
group 2011 2013
8% 7%
31% 33%
45%
72% 71%
Content Working Group
most programs are fractionally staffed
less than 25% FTE
25% FTE
40-50% FTE
1 FTE
1 to 3 FTE
3.5 to 15 FTE
Content Working Group
web/archiving tech savviness are key skills
39% 37%
24%21% 21%
10%6% 6%
0%5%
10%15%20%25%30%35%40%45%
Percentage of organizations
Content Working Group
data volume and archive use are key metrics
53%
47%
22%20%
8%4% 4%
0%
10%
20%
30%
40%
50%
60%
Volume Usage Cost Quality Buy-in Loss Policy
Percentage of organizations
Content Working Group
Maturity and Progress
“Apple Mouse Evolution” by raneko under CC BY 2.0
Content Working Group
programs have matured slightly since 2011
64%
16% 17%
4%
72%
14%9%
2%
0%
10%
20%
30%
40%
50%
60%
70%
80%
Active Testing Planning No longer collecting
2011 2013
Content Working Group
strong perceptions of progress since 2011
Significant progress40%
Some progress36%
About the same20%
Slightly worse off2%
Much worse off2%
Content Working Group
many new programs since 2011
10
3
0
21
2
0
23
8
65
4
67
12
19
0
2
4
6
8
10
12
14
16
18
20
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Number of organizations
Content Working Group
two-thirds of them now use Archive-It
0 0 1 02
0 1 0 1 0
3 31 2
42
64
1 0
2
0
0
11
0
1 3
53
4 2
2 5
6
15
0
2
4
6
8
10
12
14
16
18
20
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Number of organizations Archive-It Partner as of 2013
Content Working Group
Archiving Focus
“Ant Farm Media Van v.08 (Time Capsule) in Bellewether at Southern Exposure” by Steve Rhodes under CC BY-NC-SA 2.0
Content Working Group
more programs are only self-archiving
31%
49%
20%
15%
48%
37%
0%
10%
20%
30%
40%
50%
60%
Archive other sites only Archive both Archive own site only
2011 2013
Content Working Group
concern about social media, databases, video
6965 64
49
40
32
16
0
10
20
30
40
50
60
70
80
Social Media Databases Video InteractiveMedia
Audio Blogs Art
Number of organizations
Content Working Group
untapped interest in collaboration
21%
72%
7%
17%
47%
33%
2%
0%
10%
20%
30%
40%
50%
60%
70%
80%
Yes No Not yet, but interested Don't know
2011 2013
Content Working Group
“Photocopier” by Joriel "Joz" Jimenez under CC BY-NC-ND 2.0
Tools and Services
Content Working Group
web archiving as a service still most popular
60%
25%
14%
63%
20%16%
0%
10%
20%
30%
40%
50%
60%
70%
External In-house Both
2011 2013
Content Working Group
data not transferred from service provider
19%
81%
20%
80%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Transferred Haven't transferred
2011 2013
Content Working Group
increased use of tools supporting W/ARC
24%
76%
38%
62%
0%
10%
20%
30%
40%
50%
60%
70%
80%
Supports W/ARC Doesn't support W/ARC
2011 2013
Content Working Group
less granular descriptive metadata
62% 66%
47%55%
30%36%
54%60%
43%50%
22%18% 20%
5%
20%
0%
10%
20%
30%
40%
50%
60%
70%
2011 2013
Content Working Group
Archiving Policies
“Handle With Care” by ServInt under CC BY-NC-ND 2.0
Content Working Group
most don’t notify or seek permission
42 4245
17
7
1114 13
15
0
5
10
15
20
25
30
35
40
45
50
Capture Provide restricted access Provide public access
No action Notify Request permission
Content Working Group
more conditional handling of robots.txt
38%33%
8%
21%22%
55%
8%
16%
0%
10%
20%
30%
40%
50%
60%
Always respect robots.txt Sometimes/conditionallyrespect robots.txt
Never respect robots.txt Don't know
2011 2013
Content Working Group
social media archiving policies are uncommon
Has social media archiving policy
24%
Lacks social media archiving policy
76%
Content Working Group
policies based on community practices
54%
40%
25%
11%
5% 5% 7%
0%
10%
20%
30%
40%
50%
60%
Otherorganizations
ARL Code ofBest Practices
Section 108Study Group
Counsel orservice provider
Oakland ArchivePolicy
Statute Don't know
Percentage of organizations
Content Working Group
Landscape Summary
“Mt Baldy from Box Springs Mountain wi Theodolite” by signal mirror under CC BY 2.0
Content Working Group
profile of the average survey respondent
• university archive
• started in last three years
• Archive-It user
• ¼ FTE web-savvy archivist
• concerned w/ content capture, cost, and use
• broad level of description
• ambivalent about collaboration
“Container” by Glyn Lowe under CC BY 2.0
Content Working Group
maturity and convergence
• maturity• 75% cite some or significant progress since 2011
• 38% started programs since 2011
• 8% more programs in active status since 2011
• convergence• 79% using external service providers
• 81% devoting ½ FTE or less to web archiving
• 67% rely on community practices for policy-making
• 13% more using Wayback since 2011
Content Working Group
challenges and opportunities
• challenges• 53% concerned about data volume growth
• 47% concerned about fostering access
• more than 73% concerned about content capture
• opportunities• 33% interested but not yet involved in collaborations
• 76% lack social media archiving policies
• less than 23% of archived materials are described
Content Working Group
implications and questions
• implications• web archiving not (yet) a top institutional priority
• demand for ongoing Archive-It technical investment
• U.S. web archiving landscape is changing quickly
• questions• how to build institutional support?
• collaboration with whom and on what?
• what’s not being archived?
• how well are we curating what we do archive?
Content Working Group
Nicholas Taylor@nullhandle
“Thank You” by vistamommy under CC BY 2.0