Download - Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management
RLG Programs
Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management
Constance Malpas
Program Officer
RLG Programs
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
2
This presentation
Summarizes recent data-mining efforts by OCLC Programs and Research
System-wide sample (Summer 2007 – Spring 2008)
ARL unique print books (Autumn 2007)
Suggests implications for collection managers
Outlines next steps for RLG Programs
An opportunity to discuss what additional evidence and analysis is needed
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
3
What we mean by ‘last copy’
Monographic title uniquely-held by a single WorldCat contributor Cf. „single copy‟ repositories, where „last copy‟ is relative
to local/group holdings
May represent a last manifestation, expression or work Bibliographic records describe manifestations, not
copies; unique manifestations are the point of departure for analysis
Some are intrinsically unique; others are rendered unique by erosion of system-wide holdings Historical data may help document increased copy or
work-level availability, but weren‟t included in the studies presented here
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
4
Distribution of uniquely-held print books
in ARL member institutions
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
LC Yale
Alber
ta
Colum
bia
U C
hica
go
UCLA
McG
ill
Pen
nUva
Haw
aii
U M
d
San
Die
go
SUNY B
uffa
lo
Rut
gers
Dar
tmou
th
Not
re D
ame
Ore
gon
GA T
ech
Delaw
are
Flor
ida
Sta
te
So
Illinois
Alaba
ma
Irvin
e
GW
U
Way
ne S
tate
Yor
k
Virg
inia T
ech
WA S
tate
Cas
e W
este
rn
Man
itoba
How
ard
ARL member institution
Un
iqu
e t
itle
sDistribution of wealth: ARL unique books
A classic Pareto distribution
20% of the population holds >75% of unique titles
Median institutional holdings = 19K titles
institutional excellence?
(or) a “network effect?”
N = 6.95 M titles
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
5
Why focus on uniquely-held titles?
“Scarcity is common” limited redundancy in holdings = limited preservation
guarantee, limited opportunity to create economies of scale by aggregating supply
Research institutions bear the brunt of responsibility for long-term preservation and access of unique titles Academic and independent research libraries hold up to 70%
of aggregate unique print book collection
Continuing costs of managing (storing, providing access to) print collections are high; use is generally declining Space pressure on physical plant (on-campus, remote) is high;
understanding distribution and characteristics of unique holdings can inform decisions about disposition of physical collection
Increased attention to stewardship of special collections ARL SCWG, CLIR, LC Task Force on Bibliographic Control –
new attention to what constitutes „special‟ collections, appropriate standards of care, modes and metrics of use
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
6
Challenges
Identification requires group / network view of holdings
WorldCat provides a reasonably proxy for system-wide
collection
Some materials (MSS, theses and dissertations, etc.) are intrinsically unique; not all can be algorithmically identified in MARC records
hybrid approach combines computational and manual
analysis of bibliographic data
Sparse bibliographic records impede efficient work/title matching, may introduce spurious measure of uniqueness
external sources (including Google) sometimes helpful in
filling gaps
Non-English titles (especially transliterated non-roman scripts) are especially difficult to match
we resisted the temptation to exclude these
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
7
Study I: System-wide Sampling
250 randomly selected, uniquely-held titles Limited to printed books (including theses) published
before 2005
English-language cataloging only
Iterative re-sampling required to fill gaps
Independently reviewed by three project staff Level of uniqueness
Material type
Results periodically collated for group analysis Compare results of individual analysis for consistency
Seek consensus on difficult cases – relatively few of these
Re-sample as necessary to fill gaps
White paper anticipated March 2008
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
8
Study II: ARL uniquely-held books
Ad hoc analysis by RLG Programs, prompted by IMLS Connecting to Collections grant announcement
How might the existing evidence base be used to focus regional preservation investments?
Based on January 2007 snapshot of WorldCat database: 13M records for titles (6.95M print books) uniquely held by ARL institutions; 300+ OCLC symbols; 123 institutions
Iterative analysis examined relative impact of theses/dissertations and recent imprints on system-wide uniqueness; regional and institutional distribution of holdings
Findings shared with ARL Special Collections Working Group (October 2007) and selected RLG partner institutions (UC; CIC; ReCAP; Harvard; ASU; NYU)
Heritage Preservation willing to share Heritage Health survey data for cross-tabulation on as-needed basis
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
9
Limitations
Current studies limited to printed books –excludes serials, special collections; only a partial measure of uniqueness in system-wide collection
Incomplete representation of world book collection; for non-English titles especially, uniqueness of North American holdings is only relative
Cataloging backlogs of up to 5 years mean that holdings for recent acquisitions are imperfectly reflected
Incomplete coverage of rare books and special collections prior to (ongoing) integration of RLG Union Catalog
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
10
Our findings – distribution of unique titles
Research and academic libraries hold >70% of aggregate unique print book collection while value and utility of these holdings may be widely
distributed across the library community, holdings are concentrated at institutions with a research / teaching / learning mandate
limited data on aggregate use, sources of demand
Institutional distribution of unique holdings is highly skewed, with a handful of libraries holding a majority share of collective assets ARL unique print book holdings range from 400 – 600K
titles per institution; median holdings = 19K titles
generally, institutions with large collections hold more unique materials – but absolute size of collection is not an indicator of relative uniqueness
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
11
Based on a randomly selected sample of 250 uniquely-held print book titles in WorldCat (Jan. 2007)
Unique titles by library type
50%
27%
6%
6%
4%4% 2% 1%
ARL
Academic (non-ARL)
Gov't
State and National
Special
Public
Unknown
Networks
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
12
Distribution of Unique Print Books in ARL Member Institutions
0
100000
200000
300000
400000
500000
600000
700000
LC
Michi
gan
NAL
U W
isc
Urb
ana
U W
ash
Emor
yPitt
New
Mex
ico
Oklah
oma
Uta
h
Kent S
tate
Dav
is
Florid
a Sta
te
Vande
rbilt
WUSTL
Col
orad
o
Um
ass
Texas
Tech
McM
aste
r
Que
en's
Print Books
Excluding theses
Pub'd before 2000
National libraries and institutions with deep collections and an aggressive approach to collecting and cataloging new monographs –LC, Harvard, Libraries & Archives Canada –have an exceptional range of unique holdings
Unique Print Books in ARL Institutions
CRL’s focus on theses and dissertations is evident – most uniqueness is attributable to these holdings
Institutions with younger collections, actively seeking to increase scope of coverage - NCSU, Temple – are building uniqueness in new titles
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
13
Content-type Distributions: CRL and ARL
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Center for
Research Libraries
ARL aggregate
collection
Unique theses
Unique print books pub'd
2000 and after
Unique print books pub'd
before 2000
Intrinsically unique content, “only copies”
May include “first copies” in cataloging queue; uniqueness subject to rapid erosion
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
14
Our findings – levels of uniqueness
~60% of titles represent unique works Ex: Report and recommendation … on a proposed loan … equivalent
to US$70 million to the … Islamic Republic of Pakistan for a power plant efficiency improvement project (1987) – World Bank report held by George Washington University
~15% of titles represent unique manifestations Ex. Gallipolis … an account of the French five hundred and of the town
they established … compiled by Workers of the Writers' program of the Work projects administration (1940) – microform pamphlet held by Yale University; related manifestations at 40 libraries
~5% of titles represent unique expressions Ex: E.J. Luck. A pedigree of the families Luck, Lock and Lee (1908) –
book held by Masssanutten Regional Library, VA; similar title (Luck, Lock) by same author, pub‟d in 1900, held at LC
~20% of titles not unambiguously unique: duplicate or near-duplicate records can be found in WorldCat Ex: K. Kimura. Edo no akebono (1956) – book held by Harvard
Yenching; apparent duplicate (cataloged with original scripts) held by Waseda, Yale
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
15
Our findings – content characterization
Material types ~35% are books (>50pp)
most appear to be non-fiction titles, less likely to have additional manifestations
~20% theses and dissertations many at Master‟s level – unlikely to be held beyond issuing
institution
~15% government documents mostly federal and state, may be duplicated in depositories
~10% pamphlets unique content, but rarely useful in isolation
~10% analytics; single articles or issues bound as a separate volume non-unique content
<5% early imprints lost treasures?
Small numbers of by-laws, scripts, legal briefs, minutes, etc.
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
16
Implications
Institutions with significant unique holdings may benefit from „splitting the difference‟ between unique works and manifestations
unique manifestations and analytics should be judged with an eye to provenance history; unless they contribute to local distinctiveness, immediate action may not be warranted
A preliminary sort by material type may help guide local decision-making regarding the physical disposition of unique holdings
pamphlets and technical reports may be candidates for cataloging enhancement and storage transfer; books may be short-listed for digitization and/or transfer to special collections
Institutions with smaller unique print book collections may benefit from collective action to aggregate supply(through effective disclosure) and demand (through special resource-sharing and digitization initiatives) around specific topical and disciplinary interests
local collections gain in significance when presented in context with related holdings
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
17
Recommendations
Adopt a nuanced understanding of „relative uniqueness‟ when assessing local holdings
Unique manifestations may not represent unique intellectual content, but may have other value As artifacts special collections
As a networked resource increased availability
Unique works may gain relevance and value when presented as part of a larger disciplinary or topical collection Theses and dissertations may benefit from special discovery
tools, integration in local scholarly communications initiatives
Pamphlets and technical reports may be virtually aggregated for specific communities of use
Maximize disclosure of unique holdings to increase their impact and value
Focus on use and utility of unique holdings to ensure long-term preservation, enduring value to parent institution
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
18
What’s Next . . .
Holdings validation study will examine a sample of scarcely-held (<5 copies) US imprints in North-American research libraries
Compare current WorldCat holdings to historical holdings – looking for signs of collection erosion; elimination of local backlogs (diminishing uniqueness)
Compare local holdings to current WorldCat holdings –location changes/storage transfers, withdrawals
Assess impact of local preservation actions on system-wide holdings (availability, condition) and potential value of „full disclosure‟
Collaborative effort with RLG partner institutions anticipated Spring/Summer 2008
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
19
Some closing observations
Opportunities
Large research libraries hold a wealth of unique materials –long tail resources with broad potential audience
Aggregated bibliographic data supports programmatic analysis and enrichment – work-level clustering, identification of duplicates
Largest institutions, with enduring commitments to retention and access, hold majority of potential „at risk‟ titles
Challenges
Libraries ill-equipped to measure potential demand for unique holdings
Technical and social infrastructure for aggregating supply is lacking
University presses are potential distribution partners, but alliances are weak
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
20
Questions, Comments?
„Managing the Collective Collection‟ work agenda Data-mining for management intelligence
Shared print collectionshttp://www.oclc.org/programs/ourwork/collectivecoll
Midwinter RLG Update Session
1:30-3:30
Marriott 302-304
Contact: Constance Malpas
Program Officer
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
21
N=5.9M titles
Median institutional holdings =96k unique titles