April 21, 2023
WAHSP/BILAND
Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media
April 21, 2023
WAHSP/BILAND
Research team: Stephen Snelders(UU), Pim Huijnen(UU), Daan Odijk(ISLA, UvA), Fons Laan(ISLA), Maarten de Rijke (ISLA), Toine Pieters (UU),
04/21/23
Research
Creating big-data resources
National library of the NetherlandsDigital Newspaper ArchiveNational library of the NetherlandsDigital Newspaper Archive
> 10.000.000 pages> 10.000.000 pages
> 1200 titles> 1200 titles
1618-1995
1618-1995
> 30.000.000 articles> 30.000.000 articles
Still growing...Still growing...
How did/do you study 30 millionnewspaper articles?
Dutch press on GermanyFrank van Vree (1989)Dutch press on GermanyFrank van Vree (1989)
> 1200 titles> 1200 titles
1618-1995
1618-1995
> 31.000.000 articles> 31.000.000 articles44
1930- 1939
1930- 1939
4.0004.000
Sampling
04/21/23
Research
Developing semantic document selection tools
April 21, 2023
Research
WE NEED:
A semi-automatic and interactive open-source
application
An application that does not replace, but
supports the intuition and insights of the
historical researcher with expert knowledge of a
specific topic or domain.
An application that is user-friendly.
April 21, 2023
Research
Problem:
Context and background of Dutch drug and eugenics
debates in time
Aim
Understanding and evaluation of public debates around
drugs, addiction and eugenics in the Netherlands, 1900-
1945
Research question
What are the dynamics (in terms of patterns and trends)
of public debates and sentiments around drugs and
addiction, and eugenics in the Dutch newspapers in the
first half of the twentieth century
April 21, 2023
Research
Poe’s detective finds the truth by using data in those newspaper articles that do not concern the murder.
In a similar way we will find terms and sentiments in those newspaper articles that may seem irrelevant, but are not.
12
E-everything
Information-extraction
Recognize structure in text
Part of speech
Noun, verb, …
Entities
people, organisations, locations, temporal expressions, …
Relations
Who, what, with whom, how, why
13
E-everything
Information-extraction (2)
04/21/23
Enjoyable but what does it tell us?
04/21/23
Research
04/21/23
Research Start Query: Opium
04/21/23
Research Drugs and drug policy
Odijk D., de Rooij O., Peetz M-H., Pieters T., de Rijke M., Snelders S. (2012). "Semantic Document Selection", TPDL 2012: Theory and Practice of Digital Libraries: Springer, September.
04/21/23
Combining and clustering queries
04/21/23
Research
By carefully inspecting the word counts, we found quantitative evidence for historical turning points that indicated the criminalization of the drugs debate around 1924
Eugenics case; query overerving (hereditarian) 1867
04/21/23
Research
Primarily associations with health related terms/entities
04/21/23
Research
Eugenics case;
Eugenics case; query overerving 1935
04/21/23
Research
In 1935, however, the medical context of using the term inheritance made way for a legal and racial context
E-Humanity Approaches to Reference Cultures: The Emergence of the United States in Public Discourse in the Netherlands, 1890-1990
Challenges: 1. OCR-Repair
2. Improving Text-mining software and data
infrastructure
3. Developing new historical research strategies
4. Educating historians and other humanities
researchers
04/21/23
NEW HORIZONS in DIGITAL HUMANITIES