doing digital history: heuristics, hermeneutics, and source criticism in a digital age

29
Doing Digital History Heuristics, Hermeneutics, and Source Criticism in a Digital Age Melvin Wevers @melvinwevers www.translantis.nl February 25, 2015 UCLDH Seminar - UCL

Upload: melvin-wevers

Post on 23-Jul-2015

813 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Doing Digital History

Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Melvin Wevers

@melvinweverswww.translantis.nl

February 25, 2015 UCLDH Seminar - UCL

Page 2: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Overview• Consuming America: the role of the United

States as a reference culture in Dutch consumer society between 1890-1940

• Digital Humanities Cycle: heuristics, hermeneutics, corpus creation, source criticism, and tool criticism

• Methods: Full-text search, N-gram analysis, Topic modeling, Named entity recognition

Page 3: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

What is a Reference Culture ?

• Reference culture is an analytical concept to study geopolitical formations in a transnational context.

• Reference cultures serve as a model for other countries, e.g. Byzantium empire, 19th century England, Caliphate.

• Twentieth century: The American Century - Henry Luce

• Culture of references > imagined, symbolic, and metaphysical ‘America’

• Focus on the receiving end within a wider global context of globalization, Americanization and modernization (cf. Rob Kroes, John Muthyala)

Page 4: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

How do we research Reference Cultures?

• Reference cultures emerge in collective discussions on specific products, ideas, and practices

• Against a background of cultural, technological, and economic developments

• In other words, a reference culture is an imagined, symbolic ‘America’ grounded within actual material conditions and practices

• The project aims to use digital technologies to analyze reference cultures in Dutch digitized newspapers between 1890-1990

Page 5: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Case Study: Cigarettes 1890-1940

• Cultural icon of American entrepreneurialism

• “Product that defined America” (Allan Brandt)

• production, distribution, and consumption

• How was symbolic connotation perceived outside of the United States?

• Geographical connotation

• Debates on technological changes: taste and packaging

• Changing consumer behavior > consumerist abundance, female smokers

Page 6: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Geographical connotations of the cigarette - RQ

• How have the geographic connotations of the cigarette shifted between 1890-1940?

• How has this informed the idea of America? In other words, the performance of America as a reference culture?

Page 7: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Is this Big Data Research?The change of scale has led to a change of state. The quantitative change has led to a qualitative one. […]

[B]ig data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value

Viktor Mayer-Schönberger en Kenneth Cukier, Big Data: A Revolution That Will

Transform How We Live, Work, and Think (Boston 2013) 13.  

Page 8: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Distant reading‘Distant reading’, I have once called this type of approach; where distance is however not an obstacle, but a specific form of knowledge; fewer elements, hence a sharper sense of their overall interconnection. Shapes, relations, structures. Forms. Models.

Franco Moretti, Graphs, Maps, Trees. Abstract Models for a Literary History

(Londen en New York 2005) 1.

Page 9: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

• The Dutch newspaper archive is not really big data (biggish data?)

• Do we want to work with big data research? Big patterns? Or do we aim for more extensive searching, and more complexity in our sources

• “[D]ata does not always have to be used as evidence, but can be simply for discovering and framing research questions. […] [P]laying with data – in all its formats and forms – is more important than ever.”Frederick W. Gibbs and Trevor J. Owens, ‘The Hermeneutics of Data and Historical Writing’, in: Kristen Nawrotzki and Jack Dougherty (eds.), Writing History in the Digital Age (Ann Arbor, MI: University of Michigan Press, 2013).

• Exploratory searching as an advance corrective against the threat of essentialism and determinism [important in case of history/Americanization]

How Big is Big Data?

Page 10: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Digital Humanities CycleHeuristics

Corpus Selection

Hermeneutics

Full-text search, text analytics, topic modeling, named entity recognition,

n-gram analysis

Tool Criticism

Source criticism

Page 11: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Heuristics: Full-text search

• Large amounts of data

• Digital archives

• International data

• Ability to search full-text

Delpher.nl

Page 12: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Heuristics using metadata

“At least for research, digital history can be defined as the theory and practice of bringing technology to bear on the abundance we now confront.”‘Interchange: The Promise of Digital History’, The Journal of American History 95 (2008) 452-491, 454.

Page 13: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

New Way of Doing History

Bob Nicholson “The Digital Turn” Media History (2013)

Page 14: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Source Criticism

[T]he problem is that while we think we are searching newspapers, we are actually searching markedly inaccurate representations of text, hidden behind a poor quality image. And even more damning, by citing a hard copy of the original we are then refusing to document our research path, making it difficult for others to critique the process.

Tim Hitchcock, ‘Confronting the Digital: Or How Academic History Writing Lost the Plot’, Cultural and Social History 10 (2013) 9-23.

Page 15: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

N-gram analysis

• http://kbkranten.politicalmashup.nl

Page 16: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Corpus Selection• Corpus Selection

• API (JSON)

• Texcavator

• Cleaning up the Corpus: Python/OpenRefine/NLTK

• Corpus analysis / Corpus Linguistics

• Topic modeling

• Named entity recognition (NER)

Page 17: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Tool Criticism• Tools as instrument (STS)

• Bruno Latour - Laboratory Life: The construction of scientific facts (1986)

• Steven Shaping - Never Pure: Historical Studies of Science as if It Was Produced by People with Bodies, Situated in Time, Space, Culture, and Society, and Struggling for Credibility and Authority (2010)

• Explain how the tools works

• How do we define whether the tool works?

Page 18: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Topic Modeling• Method (MALLET) to discover latent structures within a

collection of texts

• Words acquire meaning through context -> Topic Modeling

• Contextual comparisons between different periods or corpora

• Main goal: discover events, users, and objects > Topics > Hidden debates

• In other words: not to prove stuff, but to find more stuff

Page 19: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

1924-1929 key topics advertisements

• sigaret virginia whip chief ardath london goud cigarettes kwaliteit olympia kurk nummer rook beste gezondheid zoo zulk vooraan punten

• sigaret sigaar pijp beter smakelijker wybert amersfoort virginia houbaer tabletten rooken oudste prijs magnums hollands nasmaak cent nemen noch

• sigaret nieuwe onze tabakken vervaardigd doosje import vraagt rookt smaak betere cents sigaretten turksche fijne kwaliteit edelste uwe proef

• sigaret club sigaretten gij army camel tabak cent sopla camels wereld kwaliteit prijs gemaakt virginia sigaren eerst rookt keel

• sigaret adamas egyptische mildste tegenwoordig stuks coupon coupons mavrides fijnste sigaretten cts geschenken gratis ste naam fijn slechts omar

Page 20: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Named Entity Recognition

• StanfordNER is a method to automatically detect specific entities within texts

• Locations

• Persons

• Organizations

Page 21: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Named Entity Recognition - output 1890-1920

Foreign Locations (N>20) Dutch Locations (N>20) American Locations

The United Kingdom / London (151 / 84) Rotterdam (496) America (70)

Germany / Berlin / Hamburg (146 / 81 / 22) Amsterdam (177) New York (34)

France / Paris (139 / 154) Tilburg (107) Washington (11)

Russia (102) Groningen (94) United States (Vereenigde Staten) (11)

The United States / America / New York ( / 70 / 34) Breda (64) Chicago (3)

Belgium / Bruxelles / Antwerp ( / 57 / 46 ) Haarlem (48) Virginia (3)

Austria / Vienna (40 / 21) Utrecht (43) North-America (Noord-Amerika) (3)

Turkey (39) Arnhem (35)

Holland (39) The Hague (26)

Europe (36) Leeuwarden (24)

Spain (33) Maastricht (26)

Leiden (24)

Friesland (21)

Page 22: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age
Page 23: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Good ‘Ole Close Reading• Don’t say goodbye to your traditional methods or theories

• Country-of-Origin effect (branding theory)

• Theories of modernization/globalization/Americanization

• Discourse analysis > Foucault

• Conceptual history > Braudel, Koselleck, Armitage [Big history manifesto]

• DH is too often about the tools or the methods; but can be bridged with theoretical / analytical models into critical digital humanities [cf. David Berry, Alan Liu]

Page 24: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age
Page 25: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age
Page 26: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age
Page 27: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age
Page 28: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Conclusion (I): Geographical connotations

• Country of origin effect

• From actual locations to symbolic references

• Shift of geographical connotation of cigarette

• Oriental, British, European, American

• Detached from United States / United States as floating signifier

Page 29: Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Conclusion (II): Collateral damage

• The output provided me with topics to further research in other chapters > data-driven

• These are provided by the source material and not only by secondary literature

• Technologies of Taste

• Consumer Behavior