dive and dss kranten als big data
TRANSCRIPT
Dutch Ships and Sailors
Victor de Boer - [email protected]
Digitale historische kranten als big data 24-3-2015
DIVE
Dutch Ships and Sailors
Victor de Boer, Matthias van Rossum, Jur Leinenga, Rik Hoekstra
With input from Andrea Bravo Balado and Robin Ponstein
Netherlands Institute for Sound and Vision / VU University Amsterdam [email protected]
The Problem:((Maritime) historical) data is not integrated
25+ Maritime datasets; Heterogeneous
The solution
Well, Linked Data obviously!
KB Delpher
Dutch-Asiatic Shipping (DAS) –Voyages (Huygens ING)
“VOC Opvarenden”Mustering and payroll information (DANS Easy)
Dutch Ships and Sailors
DAS
GZMVOC
MDB
VOCOPVBegunstig
den
VOCOPVSoldijboek
en
PROV
AAT
VOCOPVOpvaren
den
foaf
owl:sameAs
dss:hasKBLink
rdfs:subClassOf,rdfs:subPropertyOf
dss:DAS link
skos :exactMatch
Links to original scans
Linking to Historical newspapers
• Use ML to detect links between ships and historical newspaper articles (delpher.nl)
– Features: ship name, time intervals, captain’s names, ship type, named entities, keywords, background knowledge
• 179,120 links
- Andrea Bravo Balado
Example
[HARLINGEN, 24 October.] . «et gestrande
Zweedsche schip , waarvan wij ons vorig no.
melding maakten , is door de 'eepboot van hier
afgebragt en hier binnengede u BiJ die
gelegenheid werd ons medegeeeid, dat nog vier
vaartuigen op Terschelling aren gestrand.
Tevens is het berigt ontvan°e > dat het hier
behoorende schoonerschip Transit, kapitein
Schaap, in de Noordzee is gezonken, nadat het
achterschip was weggeslagen ; een ligtmatroos
verloor daarbij het leven. Mede zijn hier drie
vreemde schepen met meer en minder zware
averij binnengeloopen.Spoiler alert! It sank in the North Sea.
Provenance (PROV-O)
• Individual named graphs have provenance information
– Who made it (people/software?)
– Based on what source
– Content confidence
• Matches historical
science requirements
Data analysis and visualisation
Take home
• Linked Data principles are a great fit to digital history requirements– Heterogeneous models/datasets, light-weight
reusable integration
– Multiple levels of normalisation, through separate named graphs
– SW Provenance matches Historical Provenance
• Watch out when you sail your Schooner into the North Sea
DIVE INTO THE EVENT-BASED
BROWSING OF LINKED HISTORICAL MEDIAVICTOR DE BOER, JOHAN OOMEN, OANA INEL, LORA AROYO, ELCO VAN STAVEREN, WERNER HELMICH AND DENNIS DE BEURS
DIGITAL HUMANITIES
RESEARCHERS
Med
ia researcher Lars A
rveR
øsslan
do
f the U
niversity o
f Bergen
. (Ph
oto
: An
dreas R
. Graven
) h
ttps://w
ww
.flickr.com
/ph
oto
s/drain
rat/14
77
99
289
98
/
EXPLORATIVE SEARCH
Erp, M. van; Oomen, J.; Segers, R.; Akker, C. van de; Aroyo, L.; Jacobs, G.; Legêne, S; Meij, L. van der; Ossenbruggen, J.R. van; Schreiber, G. Automatic Heritage Metadata Enrichment with Historic Events Museums and the Web 2011 http://www.museumsandtheweb.com/mw2011/papers/automatic_heritage_metadata_enrichment_with_hi
DATA: OPENIMAGES.EU
Open videos Netherlands Institute for Sound and Vision
3000, mostly news broadcasts
DATA: DELPHER.NL
Scans of Radio bulletins (hand annotated)
• 1937 – 1984
• 1.5 Million OCR’ed and NErred
ENTITY EXTRACTION
• CROWDTRUTH.ORG
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND CONCEPTS TO KEYFRAMES
SIMPLE EVENT MODEL (SEM), OPENANNOTATION (OA) AND SKOS
DIVE:MEDIA OBJECT
SEM:EVENT
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATION
• LINKS TO EUROPEANA (MULTILINGUAL)• LINKS TO DBPEDIA
INFINITY OF EXPLORATION
http
s://ww
w.flickr.co
m/p
ho
tos/m
ibu
chat/2
77
42
51
41
5h
ttps://w
ww
.flickr.com
/ph
oto
s/ben
jcarson
/24
51
71
88
5
DIGITAL SUBMARINE UI