07 verheul texcavator
TRANSCRIPT
![Page 1: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/1.jpg)
T O I N E P I E T E R S A N D J A A P V E R H E U L U T R E C H T U N I V E R S I T Y , T H E N E T H E R L A N D S
Texcavator Text Mining Historical Newspapers
![Page 2: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/2.jpg)
Overview
Translantis research project Concept of reference cultures
Digital humanities
Texcavator tool Requirements
Features
Configuration
Texcavator use cases
Future ambitions Challenges
Cultural Text Mining
KB Big Data Conference 24 March 2015
![Page 3: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/3.jpg)
T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Translantis research project
![Page 4: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/4.jpg)
Translantis
Topic: emergence of the United States in Public Discourse in the Netherlands, 1890-1990 Concept: transnational reference cultures Method: digital humanities text mining Translantis.nl
KB Big Data Conference 24 March 2015
![Page 5: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/5.jpg)
Culture Mining
Culture
• Ideas
• Kowledge
• Practices
Public Sphere
• Public Opinion
• Citizens engaging in enlightened debate
Public Media
• Periodicals
• Radio
• TV
• Internet
Digitized Newspapers
(sample of 10%)
Digitized Newspapers
• Sample of 10% of all printed newspapers
Mediation
KB Big Data Conference 24 March 2015
![Page 6: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/6.jpg)
T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Texcavator
![Page 7: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/7.jpg)
Texcavator
generic tool for cultural text mining and big data research
enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way
able to support exploration and contextualization
serve multiple user groups
Wide community of historians using big data
Translantis team (NWO-funded)
Asymmetrical Encounters team (HERA-funded)
KB Big Data Conference 24 March 2015
![Page 8: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/8.jpg)
Features
Direct access to big data repository
Integrated text-mining tools Boolean search
Named Entity Recognition
Sentiment mining
Stemming
Real-time visualization of search results Dynamic word clouds (and export of underlying data)
Timelines (normalized, bursts)
Input-output storage
Close and distant reading
KB Big Data Conference 24 March 2015
![Page 9: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/9.jpg)
Current configuration
Digitized newspapers
(National Library)
9m pages
Texcavator interface
Elastic Search
(500GB) xTAS
KB Big Data Conference 24 March 2015
![Page 10: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/10.jpg)
Current configuration
Digitized newspapers
(National Library)
9m pages
Texcavator interface
Elastic Search
(500GB) xTAS
real-time, scalable indexing
eXtensible Text Analysis Suite
KB Big Data Conference 24 March 2015
![Page 11: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/11.jpg)
B U F FA L O B I L L
C O C A - C O L A
TAY L O R I S M
KB Big Data Conference 24 March 2015
Use cases
![Page 12: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/12.jpg)
Records and word cloud
KB Big Data Conference 24 March 2015
![Page 13: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/13.jpg)
Timeline + cloud of one “burst” (1965)
Normalized timeline
KB Big Data Conference 24 March 2015
![Page 14: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/14.jpg)
Access to original
KB Big Data Conference 24 March 2015
![Page 15: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/15.jpg)
Configuration
KB Big Data Conference 24 March 2015
![Page 16: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/16.jpg)
Visualizing historical change
KB Big Data Conference 24 March 2015
![Page 17: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/17.jpg)
Soft drinks
KB Big Data Conference 24 March 2015
Verwijzingen naar Coca-Cola èn Amerika in reclames Verklaar de pieken en dalen
![Page 18: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/18.jpg)
Soft drinks
KB Big Data Conference 24 March 2015
Verwijzingen naar Coca-Cola zonder Amerika in reclames Verklaar de piek
![Page 19: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/19.jpg)
Topic modeling en GIS
KB Big Data Conference 24 March 2015
![Page 20: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/20.jpg)
Taylorism
KB Big Data Conference 24 March 2015
Voyant word cloud van “wetenschappelijke bedrijfsleiding” dataset
Verwijzingen over tijd binnen “wetenschappelijke bedrijfsleiding” dataset
naar “Taylor”, “taylor-stelsel”, “Taylor- systeem”
![Page 21: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/21.jpg)
C H A L L E N G E S &
O P P O R T U N I T I E S
KB Big Data Conference 24 March 2015
Ambitions
![Page 22: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/22.jpg)
Challenges
Software development Stable version of Texcavator
Intuitive interface
Additional features
Technological Processor and server capacity
Data exchange and standardization (metatags)
OCR
Scientific Combining close and distant reading
Reproducability
KB Big Data Conference 24 March 2015
![Page 23: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/23.jpg)
Cultural Text Mining
Mining of cultural aspects of entities and events Concepts, mentalities, ideas, utopia’s, etc
Mining for Meaning
Towards digital conceptual history or digital history of mentalities
Address macro-historical questions: Trends, patterns, structures in debates
Circulation of knowledge
Emergence of transnational reference cultures
KB Big Data Conference 24 March 2015
![Page 24: 07 verheul texcavator](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a613941a28ab991b8b465a/html5/thumbnails/24.jpg)
Thank you!
KB Big Data Conference 24 March 2015