the first steps towards a belgian web archive: a federal...

18
The first steps towards a Belgian web archive: a federal strategy WAC – Zagreb June 6 th 2019 Friedel Geeraert Royal Library (KBR) and State Archives of Belgium Sébastien Soyez State Archives of Belgium 1

Upload: others

Post on 27-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

The first steps towards a Belgian

web archive: a federal strategy

WAC – Zagreb

June 6th 2019

Friedel Geeraert Royal Library (KBR) and State Archives of Belgium

Sébastien Soyez State Archives of Belgium

1

Page 2: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

2

Overview

The PROMISE project

The strategy

Lessons learnt

Next steps

Page 3: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

3

Antwerp

Brussels

Ghent

Louvain-La-Neuve

Leuven

Page 4: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

AIM

PARTNERS

TIMING

I The PROMISE project

To develop a federal strategy for the preservation of the Belgian web

July 2017 – December 2019

Page 5: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

I The PROMISE project

Identify best practices in the

field of web archiving

• Literature review• Interviews with representatives of

web archiving initiatives

Page 6: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

6

Page 7: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

II The strategy

7

Page 8: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

8

Selection

.be

.vlaanderen

.brussels

.gent

gTLDs

.org

.com

Websites are registered by

Belgians

Content concerns Belgium or

its general affairs

IF

Web content

created,

produced,

published, …

on the Belgian

territory

ccTLDs

.fr

.nl

Page 9: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

9

Selection

SELECTIVE CRAWLS

● Complete capture of web content

● More frequent captures

BROAD CRAWL

● Sampling of the Belgian web

● Superficial capture

● Collected once a year

● Problem: no access to full list of Belgian

domain names

Page 10: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

10

● Special heritage collections

● Contemporary collections

● Spanish, Italian and Portuguese

communities

● Federal institutions

● Ministerial cabinets,

ministers/secretaries of state

● Public organisms with link to federal

level

● Provinces, the regions and the

communities

● Projects funded by BELSPO

601 websites 928 websites

1416 web pages

37 sections of websites

Page 11: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

11

Collecting

AGR

KBRKBRKBR

AGRAGRCrawler

robots

Server for collecting

XML/OCLCDescriptive

Metadata

WARCWeb

Archive

+

Technical

Metadata

Page 12: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

Quality control

12

Lack of automated tools

Prototype visual correspondence

Page 13: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

Access

13

USERS

BELGIAN WEB

ARCHIVES

Web collection

AGR/KBR

AGR (XML/EAD)

KBR (XML/MARC21)

KBR &AGR (XML/OCLC)

Page 14: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

Access

14

Copyright

legislation

Privacy

Legislation

(GDPR)

Illegal

content

Law on

Archives

Legal Deposit

Law

Page 15: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

A lot of web pages and Wikipedia

What is useful content?

III Lessons learnt

Selection: from book to web

DNS Belgium

Preservation

Page 16: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

Setting crawl parameters = trial and error

Quality control = pain point

III Lessons learnt

Web crawling takes time

Estimations: time and cost

Too ambitious?

Page 17: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

Web archiving = part of KBR strategy 2019-2021

Access to archived web content

IV Next steps

Validation of shared strategy

Recommendations and procedures

October 18th - Colloquium ‘Saving the web: the

promise of a Belgian web archive’

Page 18: The first steps towards a Belgian web archive: a federal strategynetpreserve.org/ga2019/wp-content/uploads/2019/07/... · 2019-07-23 · The first steps towards a Belgian web archive:

Thank you!

Friedel Geeraert

[email protected]

Researcher on PROMISE project