you ask we scan
DESCRIPTION
This is a copy of the presentation given by Ellen Fleurbaay and Marc Holtman of the Amsterdam City Archives at the the MARAC Plenary Session in Jersey City on Friday October 30, 2009.TRANSCRIPT
You ask, we scan
MARAC Conference October 30 2009
The Amsterdam City Archives
and the Archiefbank
This morning
• Ellen: Amsterdam City Archives – a new Service Concept
• Marc: large scale scanning – on request, by order and subsidised projects– economic principles and workflow
• Ellen: Mission accomplished– government and customers satisfied
MARAC Conference October 30 2009MARAC Conference October 30 2009
Growing FAST
- 5.000 different archives- 15 repositories with 20 miles of shelf-length- 91.000 prints, maps and drawings- 824.000 photo’s- 372.000 reference books- 16.000 video- and audio tapes
MARAC Conference October 30 2009
City Archives 1848 - 2009
Growing FAST
- 5.000 different archives- 15 repositories with 20 miles of shelf-length- 91.000 prints, maps and drawings- 824.000 foto’s- 372.000 referencebooks- 16.000 video- and audiotapes
MARAC Conference October 30 2009
City Archives 1848 - 2009
BUT…
MARAC Conference October 30 2009
less visitors each yearVisitors
Year Reading rooms
1982 24.027
1988 29.788
1992 27.738
1998 26.598
2002 25.014
2006 17.958
Archives are dusty
MARAC Conference October 30 2009
MORE webvisitors
MARAC Conference October 30 2009
And…
Visitors
Year Reading rooms Website
1982 24.027
1988 29.788
1992 27.738
1998 26.598 40.048
2002 25.014 224.050
2006 17.958 512.592
New
1. We want visitors to come to the archives
– to experience the look and feel of authentic archival documents
– to teach them the pleasure of doing their your own historical research
2. Everybody should be able to use all archival collections at home 24/7
MARAC Conference October 30 2009
Service Concept
How
To experience look and feel of an archive
– be where the visitors are: in the city centre
to attract visitors?
MARAC Conference October 30 2009
How
To experience look and feel of an archive
– be where the visitors are: in the city centre
to attract visitors?
MARAC Conference October 30 2009
How
To experience look and feel of an archive
– be where the visitors are: in the city centre
to attract visitors?
MARAC Conference October 30 2009
How
To experience look and feel of an archive
– be where the visitors are: in the city centre
– new corporate identity, new name, new logo
to attract visitors?
MARAC Conference October 30 2009
How
To experience look and feel of an archive
– be where the visitors are: in the city centre
– new corporate identity, new name, new logo: City Archives
– new products: museumnight, historical building, weekend open on Saturday and Sunday
to attract visitors?
MARAC Conference October 30 2009
How
To experience pleasure of research
- New readingroom formula: use the internet, use reference library, no silence please, do discuss with your fellow researchers
to attract visitors?
MARAC Conference October 30 2009
How
To experience pleasure of research
- New readingroom formula: use reference library, use the internet and no silence please
to attract visitors?
MARAC Conference October 30 2009
How
To experience pleasure of research
- New readingroom formula: use reference library, use the internet and no silence please
- staff is walking around and offers free assistance
to attract visitors?
MARAC Conference October 30 2009
How
To experience pleasure of research
- New readingroom formula: use reference library, use the internet and no silence please
- staff is walking around and offers free assistance
- Staff is trained in educational and social skills
to attract visitors?
MARAC Conference October 30 2009
How
MARAC Conference October 30 2009
to create an internet reading room?
All documents online?
– do not think about the 20 miles in your repository, think about the few
thousand documents your customers use per week
Realistic and economic principles
– estimate costs of complete proces, not just costs of scanproduction
– Dutch legislation: consult original is free, reproduction is paid for, so
scans are to be paid for
MARAC Conference October 30 2009
You ask
Scanning on customer’s request, economic principles, technical issues
and work process
We Scan
MARAC Conference October 30 2009
You ask
We Scan
MARAC Conference October 30 2009
We Store
We Do
Scanning on customer’s request, economic principles
Image quality and workflow principles
Compression and filesize
Workflow, tools and practical issues
Q. How long does it take to scan it all?
MARAC Conference October 30 2009
1 feet = 2.000 scans
Production = 10.000 scans a week
A. 406 years
Will this be our ultimate solution?
Q. How many scans can be made from 20 miles of archives?
A. 739.200.001 scans
The user doesn’t commit to anything by placing a request, but neither does the archive
You ask
We let our users set priorities in digitization
In principle all requests are honored, unless
We speak of a request for digitization and not of an order
MARAC Conference October 30 2009
1. Scanning at customer’s request
It can not be digitized for material reasonsIt can not be digitized for material reasons
Copyright materialCopyright material
Disclosure restrictions applyDisclosure restrictions apply
All archive files can be requested for digitization via the
online the finding aids
All archive files can be requested for digitization via the
online the finding aids
Costs for purchasing scans are equal for all users (the more you buy, the cheaper it gets)
Scans available are integrated in the online finding aids
All scans made are available for all users
The requester is not obliged to buy all scans
MARAC Conference October 30 2009
You ask
1. Scanning at customer’s request
Customers think a low price is important
This means that costs for producing and storing scans have to be as low as possible
Archival research easily runs into the use of dozens to hundreds of documents
You ask
The price of an ordinary copy in our reading room should be the benchmark
MARAC Conference October 30 2009
2. Low costs
100 scans should not cost $ 100
The costs when purchasing scans online should be competitive with travel
costs when visiting our reading room
The costs when purchasing scans online should be competitive with travel
costs when visiting our reading room
This asks for a streamlined, efficiently organized work process
You ask
Digitization takes time, but research should not have to be planned weeks ahead
Delivery time in a scanning on request service should be as short as possible
MARAC Conference October 30 2009
3. Fast delivery
Aim is a delivery time of 2 – 3 weeksAim is a delivery time of 2 – 3 weeks
An efficiently organized work process
Low incidental and structural costs
You ask
MARAC Conference October 30 2009
Conclusion
If we can make sure that
All finding aids can be selected for digitization by users
The scans are delivered in short time
For low costs
it can be stated that we have no backlog in digitizing and the objective that the customer is able to consult digitized item has been achieved
We need:
Besides scanning on request projects are based on:
In this presentation the focus is on large scale digitization at customer’s request
We scan
However, scanning on request is only a part of all digitization that takes place in the archives
MARAC Conference October 30 2009
Digitization at the Amsterdam City Archives in general
Grant money (often on specific topics, like WWII)
Selections of photographs, drawings etc for the Imagebank (Beeldbank)
Cooperation with Amsterdam district councils and services
Goals of digitization projects vary from access to substitution of the originals
In every project quality standard and method are set, depending on purpose and type of material
For all projects we have one workflow
We always work on project basis
We scan
MARAC Conference October 30 2009
Digitization at the Amsterdam City Archives in general
Experience shows that a constant production of 10.000 scans (at cutomer’s request) each
week is achievable
This way tasks can be planned best and deployment of staff is most efficient
We scan
1. At large scale
the more scans being made, the lower the price per scan
Large scale production is a prerequisite in order to keep production costs as low as possible
Large scale production is a prerequisite in order to keep production costs as low as possible
MARAC Conference October 30 2009
2. With a constant production
Large scale production can only be organized effectively when constant production is assumed
Large scale production can only be organized effectively when constant production is assumed
Documents that are being digitized in this reproduction process can have the following forms
We scan
MARAC Conference October 30 2009
Small and large sizeSmall and large size
Bound and loose-leafed entitiesBound and loose-leafed entities
Card indexesCard indexes
Old and modern materialOld and modern material
Low and high contrast documentsLow and high contrast documents
Text alone, text and image togetherText alone, text and image together
Hybrid formsHybrid forms
3. A broad spectrum of document types
Costs for producing and storing scans are determined to a high extent by the quality standard
set for the scans
Purpose of the scans: archival research using the web, straight from screen or print
We scan
4. For archival research from screen or print
The higher the standard of quality, the higher the costs will be
In order to keep costs low it is prudent to allow the standard of quality follow from the requirement the end user places on the scan
In order to keep costs low it is prudent to allow the standard of quality follow from the requirement the end user places on the scan
Textual information legible in de originals must be legible in the scans
MARAC Conference October 30 2009
But has no added value for the customer at all
A quality higher than that inevitably will push up both incidental and structural costs
We scan
4. For archival research from screen or print
Specified (basic) quality standard:
MARAC Conference October 30 2009
Reproduction of all significant information
Reproduction of all significant information
Reproduction of details which are not part of the textual information is not required
We scan
MARAC Conference October 30 2009
Scan quality and legibility
High quality scan
Modified scan (contrast)
Optimal tonal range
Example: very “light” original
Excellent flexibility
Poor tonal range
Little flexibility
Experience in practice learns that what is
experienced as being “good legibility” is very
personal.
We decided to solve this problem with a smart
filter in the document viewer.
Experience in practice learns that what is
experienced as being “good legibility” is very
personal.
We decided to solve this problem with a smart
filter in the document viewer.
Poor legibility
Excellent legibility
Which one would you
buy?
Which one would you
buy?
Skimming on the quality of scans (it can be better) is purely an economic decision, not one taken
on principle
We scan
MARAC Conference October 30 2009
4. For archival research from screen or print
Price comparison scanning costs
Price rates scanning, external partner
High-end 3 – 10 $
Legibility 0,30 – 0,75 $
Legibility, auto-feed 0,05 $
It does make sense to let the standard of quality follow from the purpose the end-uses places on of the scans
This way damage or loss of the originals is ruled out
After digitization the originals can not be requested in the reading room anymore
We scan
5. For conservation and security
The scans in the scanning on request service are made for the purpose of access / archival research
Not as a substitute for the originals
Nevertheless, digitization does have a real conservation function
MARAC Conference October 30 2009
Conservation of the originals remains the major
concern
Conservation of the originals remains the major
concern
A file can contain one – hundreds of documents
We scan
By definition the entire file is scanned
Never just a selection of pages
There are a few reasons for this:
MARAC Conference October 30 2009
6. Always complete files
The costs for scanning are not so much a factor of quantity, but rather of the manual processing involving in it
In the originals or the metadata it has to be indicated which documents are being digitized
When shown in the Archiefbank, the user expects completeness
When non-scanned pages have to be digitized later, the entire preparation process has to be gone through once again
Contracting out of scanning was a logical choice
We scan
The in-house scan facilities are not designed for large-scale digitizing
The complexity of the workflow and material to be scanned calls for
Investing only makes sense by very high production, organized on a large scale
MARAC Conference October 30 2009
7. Contracting out the scanning to external partners
Specialized hard- and softwareSpecialized hard- and software
Specialized set-upsSpecialized set-ups
KnowledgeKnowledge
Very complex technical infrastructureVery complex technical infrastructure
This calls for intensive collaboration
Also, the workflows of archive and digitizer have to dovetail
We scan
There are many scanning companies
Most do have experience in bulk processing
But not in this degree of complexity and diversity
MARAC Conference October 30 2009
7.
Contracting out scanning is more than awarding a contract to a supplier
Contracting out the scanning to external partners
We use a combination of 1 and 3
We store
Storage costs still are considerably high when producing large quantities of scans
In order to bring structural costs down file size of the scans has to be as low as possible
This can be achieved in three ways
MARAC Conference October 30 2009
Scans with a file size as small as possible
1. Skimming on resolution
3. Using (lossless or lossy) compression on the files
2. Skimming on bit depth / amount of colors (only possible in formats like TIFF and PNG)
We store
Resolution, compression and legibility: an example
MARAC Conference October 30 2009
300 dpi, high quility JPEG
200 dpi, low quility JPEG
Scans with a file size as small as possible
We store
Storage of compressed files as master images was “not done”
The main arguments where
Research after these arguments learned:
MARAC Conference October 30 2009
When using lossy compression you’ll loose information
Compressed files are more vulnerable (preservation)
Even when using strong lossy compression legibility is still guaranteed
Compressed files are not more vulnerable to loss then uncompressed files
But no compression means: large files high storage costs
Storage of uncompressed files is not necessary
Scans with a file size as small as possible
Filesize
Format Compression Type Resolution Color Avg 500.000 %
TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%
JPEG
Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%
Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%
Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%
Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%
JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%
Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%
MARAC Conference October 30 2009
Comparison between file format, compression,
resolution and file size
Scans with a file size as small as possible
We store
Filesize
Format Compression Type Resolution Color Avg 500.000 %
TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%
JPEG
Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%
Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%
Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%
Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%
JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%
Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%
TIFF uncompressed
MARAC Conference October 30 2009
Comparison between file format, compression,
resolution and file size
Scans with a file size as small as possible
We store
Filesize
Format Compression Type Resolution Color Avg 500.000 %
TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%
JPEG
Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%
Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%
Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%
Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%
JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%
Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%
JPEG (psd) 10
MARAC Conference October 30 2009
Comparison between file format, compression,
resolution and file size
Scans with a file size as small as possible
We store
Filesize
Format Compression Type Resolution Color Avg 500.000 %
TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%
JPEG
Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%
Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%
Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%
Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%
JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%
Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%
JPEG (psd) 4
MARAC Conference October 30 2009
Comparison between file format, compression,
resolution and file size
Scans with a file size as small as possible
We store
Filesize
Format Compression Type Resolution Color Avg 500.000 %
TIFF No --- 300 dpi 24 bits 22,1 Mb 11 Tb 100%
JPEG
Qua (ps) 12 Lossy 300 dpi 24 bits 7,5 Mb 3,7 Tb 34%
Qua (ps) 10 Lossy 300 dpi 24 bits 2,1 Mb 1,1 Tb 10%
Qua (ps) 4 Lossy 200 dpi 24 bits 255 Kb 124 Gb 1,1%
Qua (ps) 10 Lossy 400 dpi 24 bits 3,3 Mb 1,6 Tb 15%
JPEG2000Part 1 Lossless 300 dpi 24 bits 12 MB 6 Tb 55%
Part 6 Lossy 300 dpi 24 bits 120 Kb 59 Gb 0,5%
JPEG2000 lossless
MARAC Conference October 30 2009
Comparison between file format, compression,
resolution and file size
Scans with a file size as small as possible
We store
We store
Comparison storage costs
MARAC Conference October 30 2009
Fileformat Storage Costs 1 year Costs 10 years
Tiff uncompressed 11 TB $ 77.000 $ 770.000
JPEG 10 1,1 TB $ 7.700 $ 77.000
JPEG 4 (200 dpi) 124 GB $ 868 $ 8.680
JPEG 2000 (part 1, ll) 6 TB $ 42.000 $ 420.000
Storage of 500.000 images Avg size per scan uncompressed = 22,1 MB
Price rate: 1 TB, storage in a controlled e-repository environment on two separate locations, including IT costs
$ 7.000 (NLD, nov 2009)
Scans with a file size as small as possible
(File)size still does matter!
Projects with different goals, document types and partners take place at the same time
A streamlined, standardized process is indispensable when digitizing on a large scale
Guidelines and best practices often take no account of these complex factors
and the amount of scans to be produced
We developed a process in which large scale and flexibility are starting points
All digitization projects follow this process
Developing the reproduction process
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
We Do
We developed a simple, but effective workflow application in-house
This asks for workflow management with a user-friendly application
For all projects, at any moment, it has to be clear:
We Do
MARAC Conference October 30 2009
What the current status is of each to digitize unit
Where each unit can be located
What current and succeeding tasks are to be performed on each unit
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Developing the reproduction process
In the following slides we focus on the weekly production of 10.000 scans
in the digitizing on request service
We developed a simple, but effective workflow application in-house
This asks for workflow management with a user-friendly application
For all projects, at any moment, it has to be clear:
We Do
MARAC Conference October 30 2009
What the current status is of each to be digitized unit
Where each unit can be located
What current and succeeding tasks are to be performed on each unit
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Developing the reproduction process
All public files can be requested for digitization via the findings aids in the Archiefbank
Just by clicking on the “digitize” button
Production of 10.000 scans on weekly basis
1. Requesting for digitization
MARAC Conference October 30 2009
We Do2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
A unit to be digitized must be able to be identified at each step of the handling process
The units therefore get a unique meaningless order number
An order number is provided by the metadata management system
and is the basis for
In practice: all units to be digitized get an order ticket
2. Providing ordernumbers
MARAC Conference October 30 2009
Communication with the digitizerCommunication with the digitizer
ScanningScanning
Assigning filenamesAssigning filenames
Registration of filenamesRegistration of filenames
Billing by digitizerBilling by digitizer
We Do2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
A unit to be digitized must be able to be identified at each step of the handling process
The units therefore get a unique meaningless order number
An order number is provided by the metadata management system
and is the basis for
In practice: all units to be digitized get an order ticket
2. Providing ordernumbers
MARAC Conference October 30 2009
Communication with the digitizerCommunication with the digitizer
ScanningScanning
Assigning filenamesAssigning filenames
Registration of filenamesRegistration of filenames
Billing by digitizerBilling by digitizer
We Do2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
The workflow system generates a list of all originals to asses from the repositories
The list is sorted on repository / shelf to make retrieval efficient
We Do
3. Assessing the originals
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
MARAC Conference October 30 2009
All assessed originals are stored in a special room
In this room all checks are executed
We Do
4. Checking the originals
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
MARAC Conference October 30 2009
Information about the originals in our management
systems is not always complete
If an item falls into one of these categories the request is rejected
B. Condition of the material
A rough check of the originals takes place
A. Content
We Do
4. Checking the originals
Copyrights Publicity Privacy
Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
MARAC Conference October 30 2009
Information about the originals in our management
systems is not always complete
If an item falls into one of these categories the request is rejected
B. Condition of the material
A rough check of the originals takes place
A. Content
We Do
4. Checking the originals
Copyrights Publicity Privacy
Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Material preparation is limited to the most minimal
We Do
4. Checking the originals
MARAC Conference October 30 2009
Staples are being removed as a rule
Small reparations are executed by our restoration employees
The sequence of the originals as found in the repository is not checked or altered
We Do
We don’t
The originals are not numbered
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
But this is only true when the numbering tallies exact, because:
Numbering the originals has one advantage:
We Do
Not number the originals
MARAC Conference October 30 2009
The completeness of the scans (compared to the originals) can be guaranteed
Numbers that are assigned double lead to illogical end numbers (100 scans: scan 100 has been numbered as 99)
Experiments with numbering in practice learned that faultless numbering can not be realized
A missing number in a sequence of scans leads to the conclusion that there is one original that has not been scanned
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Securing completeness can be realized by other means:
We Do
MARAC Conference October 30 2009
Comparing scans to originals 1:1 after digitization
Scanning the originals twice
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
# scans = 365 # scans = 365
Low quality High quality master files
Not number the originals
For secure transport, special flight cases are used
We Do
5. Transport
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
It has to be perfectly clear which filenames this should be
After scanning the scan operator or data manager has to assign filenames to the scans
Because, when the meaning changes, filenames should change too
As a rule filenames contain no meaningful information
We Do
6. / 7. Scanning and assigning filenames
MARAC Conference October 30 2009
Filenames are the key between scans
metadata
Filenames are the key between scans
metadata
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Assigning filenames at City Archives Amsterdam
MARAC Conference October 30 2009
Customer request Management systems
First 6#: ordernr
Last 6#: serial nr
Order ticket
Filename
Scanning the order
A20758000001
A20758000002
A20758000003
Range
A20758000001 – A20758999999
Archive 195File 836 Order: A20758
A20758000004
A20758000005
Scan report
A20758000001
A20758000002
A20758000003
A20758000004
A20758000005
12 digits
Registration
filenames
Registration
filenames
Import
An application from which all checks can be executed is in development
Scans and metadata are checked efficiently
Where possible checks are automated
10. 11. Checking scans and metadata
Check Method
Viruses Virus checker
Data integrity MD-5 checksum comparison
File format validity Jhove
Quality scansVisual check reference scans
Visual check production scans
Completeness Depends on project
Filenames Script
Basic checks
We Do
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
After import the “order for digitization” of each unit is completed
After approving of all checks, scans and metadata are imported into the management
systems
The imports are executed automatically, on basis of scripts and standard protocols
for file transfer
13. 14. Import metadata and scans into management systems
We Do
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
After import the metadata are optimized for the search system
For exchange of finding aids we use EAD
From any workstation at the archive, directly via the CMS of the website
The website is hosted from an external location
Metadata are uploaded to the webserver by simple HTTP transfer
18. Import metadata into the website
We Do
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Until then scans are transported by use of portable USB harddisks
Bandwith of the internet connections at the archive is still too small for direct sFTP
(or suchlike) upload of large quantities of scans to the webserver
It seems likely that in the near future this will change
17. Import scans into the website
Transport medium
We Do
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
Derivates for use of thumbnails and zoom / contrast functionality are made
After connecting the harddisk to the server the import process starts
Some basic checks are executed on the scans
Import
17. Import scans into the website
We Do
MARAC Conference October 30 2009
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
MARAC Conference October 30 2009
The requester can decide whether to buy scans or not
When both scans and metadata have been imported, automatically an e-mail is sent
to the requester for digitization
This email contains a link to the finding aid and thumbnails on the website
Request complete!
The happy customer:
We Do2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
MARAC Conference October 30 2009
The requester can decide whether to buy scans or not
When both scans and metadata have been imported, automatically an email is send
to the requester for digitization
This email contains a link to the finding aid and thumbnails on the website
Request completed
We Do
The happy customer:
2. Providing Ordernr(s)
3. Assessing the originals
4. Preparing the originals
5. Transport
6. Scanning7. Assinging
filenames
8. Transport
9. Checking originals
10. Checking scans
13. Import in controled
storage system
15. Export scans
17. Import scans
16. Export metadata
18. Import metadata
14. Import in metadata system
11. Checking metadata
1. Requesting digitalization
12. Originals back to
repositry
MARAC Conference October 30 2009
MARAC Conference October 30 2009
Mission accomplished
1. Government satisfied: number of visitors increased fivefold
2. Management satisfied: costs and funding balance each other
3. Staff satisfied: enjoy their new role
4. Customers satisfied: lots of compliments
MARAC Conference October 30 2009
Accomplished
Government satisfied
MARAC Conference October 30 2009
Government
Visitors
Year Reading rooms Website
1982 24.027
1988 29.788
1992 27.738
1998 26.598 40.048
2002 25.014 224.050
2006 17.958 512.592
2007 92.678 520.483
2008 118.312 538.483
2009 (3/4) 77.298 531.143
MARAC Conference October 30 2009
Costs Archiefbank (2008)
Digitsation on request € 140,000
Webservices € 52,000
Digitization projects € 200,000
Income Archiefbank (2008)
Digitsation on request € 100,000
Project funding € 330,350
Government € 40,000
Management
Management satisfied
Customers
Registered users: ca. 15.000
Requests: 10.605
Scans online: more than 7 million
Archives Next and Computable awards
MARAC Conference October 30 2009
Customers satisfied