copyright and mass-digitization: the strategic importance of data-mining presentation details...

21
Copyright and Mass- Digitization: The strategic importance of data- mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago [email protected] m www.matthewsag.c om

Upload: jaiden-garvis

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Copyright and Mass-Digitization: The strategic importance of data-mining

Presentation Details

Matthew Sag

Professor of LawLoyola University of Chicago

[email protected]

Page 2: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Abbreviated Time Line

2004 Google library project begins 2005 Class action suit filed by Authors Guild (among others) 2008 & 2009

Settlement proposed, objections follow, settlement revised 2011

(March) Settlement rejected (September) 2011 Authors Guild v. HathiTrust filed

2012 (August) oral argument in Authors Guild v. HathiTrust (October) Judge Baer ruled against the plaintiffs in Authors Guild v.

HathiTrust. Library digitization (ADA + Data) are fair use. 2013

(July) Second Cir. tells Judge Chin, no class certification without addressing the fair use issue

(September) oral argument on fair use in Authors Guild v. Google

Page 3: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

The strategic importance of text-mining

Different kinds of digitization program raise different legal issues and bring in different stakeholders.

Page 4: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

The Many Faces of Library/Archive Digitization

PreservationData production and analysis*

Searching books, testing search algorithms, computational linguistics, automated translation, natural language processing, macro-analysis of text

A platform for display and distribution of individual works

Disabled access* Scholarly access General access 4

Page 5: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Strategic Considerations

Library digitization for data production and analysis Significant academic and commercial

constituency (not just Google!) Strong normative appeal Obvious orphan works problem Justifies digitizing entire collections

Even if some other uses are ‘too much’, no all-copyright owner class action possible

Page 6: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

The Legal Argument #1

Metadata – facts about the work – does not infringe the rights of the copyright owner.

– This is not usually contested, but it’s important to make sure everyone understands the reasons why metadata can’t infringe. Those reasons are …

Idea-expression distinction Merger doctrine Metadata is not substantially similarity to

underlying text Facts about the work don’t originate with the

author

Page 7: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Whale v. Dinosaur

whale(s) old

boat(s)

sea

such

hand(s)head

men

Captaingood

might

Starbuck

water farcri

edworld cre

w airnight

0

200

400

600

800

1000

1200

Page 8: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Whale v. Dinosaur

Page 9: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Legal Argument #2

A copying process that only produces metadata does not infringe. Intermediate non-expressive use is either (a) not copying in

the relevant sense or (b) fair use

The distinction between expressive and nonexpressive parts of works is well recognized (no copyright in a phone book, etc). The same distinction should be made in relation to

potential acts of infringement.

Intermediate non-expressive uses don’t communicate the author’s original expression to the public. No expressive substitution, no infringement

Page 10: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Application to Fair Use

Sect. 107 Factors(1) purpose and character: Like transformative uses, a nonexpressive use poses no risk of expressive substitution

(2) nature of the work … “not much use”(3) Amount and Substantiality: Like transformative uses, because there is no expressive substitution in a nonexpressive use, the amount of copying is qualitatively insignificant.(4) Market effect: Like transformative uses, a nonexpressive use poses no risk of expressive substitution, thus no cognizable market effect.

Page 11: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Legal Argument #3

Non-expressive use does not harm copyright owners and has great social value

Page 12: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

“The United States is” versus “The United States are” 1780 –1900

Page 13: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

13

American Slavery in American, English, and Irish Literature, 1800-1899. Matthew Jockers, Macroanalysis: Digital Methods for Literary History (2013)

Proportion of Irish Literature with a topic of ‘slavery’ spikes ~ 1860-65

Page 14: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Importance of the Digital Humanities Brief

Focused attention on digitization for the sake of data

Demonstrated importanceDisentangled it from other issues

Not just a Google issue, Not just an internet issue, Not just a research/scholarship issue

Powerful examples tied directly to the understanding of literature

» In case making the Internet work through caching and search was not enough for you!

Page 15: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Quotes from HathiTrust judgment …

I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants' MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the ADA.

– “The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining.” (brief cited)

– … metadata and text mining, which "could actually enhance the market for the underlying work, by causing researchers to revisit the original work and reexamine it in more detail” (brief quoted)

Page 16: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Impact of the Digital Humanities Amicus Brief

Three for the price of one Authors Guild v. HathiTrust (district court) Authors Guild v. Google (district court) Authors Guild v. HathiTrust (court of appeals)

Over 100 signatories!

Discussed with approval in HathiTrust United States is/are example made its way into

the judgment in HathiTrust last year and oral argument in Google books on this week!

Page 17: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Some Concluding Thoughts

Specific legal issues vary by jurisdiction fair use, fair dealing, legislative reform

Underlying policy questions are global Idea-expression distinction The promise of big data and problem of orphan

works

Challenge for libraries and archives is making courts/decision makers understand the broader consequences

Page 18: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Action Items

Commercial and non-commercial digitizers need to work together and defend everyone’s right to non-expressive use

Digital Humanities, Linguistics, Comp. Sci., Libraries Search providers, plagiarism and copyright

infringement detection tools, music identification tools, reverse engineering

Advantage of flexible limitations and exceptions Without reform, other nations cede ground to the

U.S. as the data engine of the world.

Page 19: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Abbreviated Issues Summary

Issue Status Case NotesPreservation Still open, but

court unconvincedv. HathiTrust

Orphan works display

Still open, not ripe v. HathiTrust Trove (Australia)Best practices

Disability access Digitization ok v. HathiTrust On appealData mining Digitization ok v. HathiTrust All but given up in

v. GoogleLibrary copies as quid pro quo

Still open v. Google Easier now underlying use is fair use

Making/retaining excessive copies

Still open v. Google

Snippet display Still open v. GoogleStanding, remedies, class action …

Mixed v. HathiTrust v. Google

Page 20: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Further Reading

Matthew Jockers, Matthew Sag & Jason Schultz, Digital Archives: Don’t Let Copyright Block Data Mining, 490 NATURE 29-30 (October 4, 2012)

Page 21: Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago

Further reading

Matthew Sag, Orphan Works as Grist for the Data Mill, 27 BERKELEY TECHNOLOGY LAW JOURNAL 1503 – 1550 (2012)

Matthew Sag, Copyright and Copy-Reliant Technology, 103 NORTHWESTERN UNIVERSITY LAW REVIEW 1607–1682 (2009)