cni dec 2007 copyright and mass dig for cni
DESCRIPTION
CNI Fall Task Force Presentation: Copyright and Large-scale Digitization: Implications for Access, by Merrilee Proffitt and Constance Malpas with RLG ProgramsTRANSCRIPT
RLG Programs
Copyright and Large-scale Digitization: Implications for Access
Merrilee ProffittConstance MalpasRLG Programs
CNI Fall Task Force Washington, DC10 December 2007
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
2
This presentation . . .
Summarizes findings from conversations with RLG Program Partners regarding copyright assessment practice
and considers the implications of these practices in light of What we know about the system-wide book
collection (‘supply’) What we can observe about need and use of that
collection (‘demand’) Speculations about how increased discoverability
of digitized text may impact use (and management) of library print collections
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
3
Interviews with RLG Programs Partners
8 interviewees; some (not all) engaged in mass digitization
All identify “high-risk materials” in order to eliminate them from pool, focus making as much low-risk content available as possible
Books, published in the US, before 1923 Not a lot of effort devoted to this work at this time Some well-established numbers from University
of Michigan on costs for “low-hanging fruit” and for identifying low-risk materials to 1963
Left aside are riskier materials to 1963; materials published outside of US; materials after 1963
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
4
1923-1963: How much? What’s the impact on research and teaching?
Based on a January 2007 snapshot of WorldCat, we can estimate that ~15% of US imprints were published between 1923-1963; ~2M titles
Independent studies at Stanford and Michigan suggest that ~30% of US imprints are in copyright; up to 70% may be in the public domain
An optimistic scenario: ~2M * .70 = ~1.4M titles Add to this the pre-1923 books already in the
public domain, est. ~15% of US imprints; optimistically, a total of ~3.4M titles, or the volume equivalent of a mid-level ARL collection
Suppose we go as far as we can with this? What’s the likely impact?
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
5
Based on historical samples of monographic titles in theWorldCat database: 15-20% published (anywhere) before 1923; ~10-14M titles 15% published (anywhere) 1923-1963; ~10M titles
US imprints only (i.e., the titles for which North Americanlibraries might reasonably expect to undertake copyrightassessment efforts) based on a random sample of 1000 monographic titles: 15% published before 1923 public domain 15% published 1923-1963 moderate risk/effort 30% published 1964-1988 high risk/effort 27% published after 1989 greatest risk/effort 7% ambiguous pub’n data unknown risk/effort
Supply: the system-wide book collection
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
6
Incr
easi
ng ris
k =
incr
ease
d re
war
d?
Distribution of Content by US Copyright Regimebased on a random sample of US imprints
Books published between1923 – 1963 are onlypart of the picture
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
7
US imprints in 1000 rec sample
010203040506070
1700
s
1800
s
1900
s
1910
s
1920
s
1930
s
1940
s
1950
s
1960
s
1970
s
1980
s
1990
s
2000
-200
7
Decade of Publication
Tit
les
in S
amp
le
200 years of production 15% of sample
4 decades15% of sample
13 yrs17%
10 yrs19%
18 yrs27%
US imprints in 1000 record sample
Period of Publication
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
8
US imprints in 1000 rec sample
010203040506070
1700
s
1800
s
1900
s
1910
s
1920
s
1930
s
1940
s
1950
s
1960
s
1970
s
1980
s
1990
s
2000
-200
7
Decade of Publication
Tit
les
in S
amp
le
US imprints in 1000 rec sample
010203040506070
1700
s
1800
s
1900
s
1910
s
1920
s
1930
s
1940
s
1950
s
1960
s
1970
s
1980
s
1990
s
2000
-200
7
Decade of Publication
Tit
les
in S
amp
le
~74% of US books will require more
work, other players
Optimistically, ~26% of US imprints could be made accessiblewith some research
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
9
010203040506070
1700
s
1800
s
1900
s
1910
s
1920
s
1930
s
1940
s
1950
s
1960
s
1970
s
1980
s
1990
s
2000
-200
7
Decade of Publication
Tit
les
in S
amp
leWhat’s missing from this picture?
Period of Publication
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
10
What’s missing from this picture?
010203040506070
1700
s
1800
s
1900
s
1910
s
1920
s
1930
s
1940
s
1950
s
1960
s
1970
s
1980
s
1990
s
2000
-200
7
Decade of Publication
Tit
les
in S
amp
le
Period of Publication
Period of Publication
Holdings for US imprints in 1000 record sample
0
10
20
30
40
50
60
70
Decade of Publication
Titles
Holdings
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
11
010203040506070
1700
s
1800
s
1900
s
1910
s
1920
s
1930
s
1940
s
1950
s
1960
s
1970
s
1980
s
1990
s
2000
-200
7
Decade of Publication
Tit
les
in S
amp
leWhat’s missing from this picture?
Period of Publication
Period of Publication
Holdings for US imprints in 1000 record sample
0
10
20
30
40
50
60
70
Decade of Publication
Titles
Holdings
While holdings : titles increase over time, aggregate supply dips in the period when copyright restrictions are most onerous
Median holdings per manifestation = 2
Max. holdings for a single manifestation = 737
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
12
27%
69%
4%
27%
69%
4%
Books published elsewhere
US imprints
?
What’s missing from this picture?
Books published outside of the United States
Based on January 2007 snapshot of published print books in WorldCatn = 48M titles
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
13
Other Dimensions of Supply
What about holdings/availability?In our sample of US imprints: ~90% of titles with >50 holdings were published after 1963 All titles with >300 holdings were published after 1963 Work-level holdings may help fill the gap for titles with sparse
holdings at manifestation level; mostly for teaching/learning What about non-US book titles?
Based on a January 2007 snapshot of WorldCat: US imprints account for ~30% of the global book collection;
non-US publications account for ~70% of print book records in WorldCat
Holdings for non-US publications are relatively scarce (viz. OCLC/ARL Global Resources report, 2007)
Place of publication not always explicit – add’l research needed before copyright assessment can even begin
What about non-book materials?Monographs are just one part of the scholarly record
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
14
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
15
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
16
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
17
Demand: What access is needed to support scholarship?
Citations to US imprints (monographs only)
-1922 1923 -1963
1964 -1977
1978 -
1988
1989 -
Lawrence and AaronsohnUS imprints account for only 1/3 of works cited
8 28 12 9 40
Shakespeare the
ThinkerUS imprints account for less than ¼ of works cited
1 16 12 8 5
The First WordAlmost all monographs cited published in the US. 2/3 of sources were from journal literature (not counted) 0 2 5 5 70
29% 12% 9% 41% 8%
38% 29% 19% 12%
2% 6%6% 85%
21%4% 13% 9% 52%
2%
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
18
Consequences of greater discoverability of monographs: Scenario A
Use of print decreases: Learners, teachers, and researchers turn to
what’s available and useable in digital form rather than print materials; use of print collections declines
scope of scholarly record is defined opportunistically, based on what’s most conveniently available
For some fortunate scholars, greater discoverability is accompanied by greater rights to use digitized text – but availability is determined by institutional affiliation
inequitable access to ‘liquid text’ produces an uneven body of scholarly analysis; incentives to create new analytic tools are limited
RLG Programs Copyright and Large-scale Digitization
CNI Fall Task Force Meeting - 10 December 2007
19
Consequences of greater discoverability of monographs: Scenario B
Use and value of print collections increase: Learners, teachers, and researchers find more
materials online; because they can't get in these digital form, use of print increases. Existing print copies and delivery apparatus can meet
the demand. (But what about shifting models for print?) Existing copes and delivery apparatus can't meet the
need, and that creates an opportunity for someone to do something despite rights restrictions to make print or electronic forms of high-demand materials more available. Must be high-value enough to bring rights holders to the table.
Existing copies and delivery apparatus can't meet the need but there isn't enough incentive for anyone to solve this problem.