scanning woes and war stories

28
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 1/28 Scanning woes and war stories ELAG 2014 Toke Eskildsen IT nerd (boss says “System Architect”)

Upload: toke-eskildsen

Post on 29-Jan-2018

455 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 1/28

Scanning woes and

war stories

ELAG 2014

Toke Eskildsen IT nerd (boss says “System Architect”)

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 2/28

State and University Library

Denmark

“Everything onlinein 2020”

- Vision

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 3/28

We would like plentiful, raw, visible, solid pixels

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 4/28

Zoom

Not like this! Like this!

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 5/28

Histogram

Reference

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 6/28

Adjust Color Levels

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 7/28

That's a nice scan!

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 8/28

A shame it was sharpened

Haloes around text indicates sharpening

Reference

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 9/28

But this one seems fine?

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 10/28

Sharpened and JPEG compressed

Square areas and localized noise indicates JPEG compression

Reference

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 11/28

Lossless!We Promise!

Lossless workflow

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 12/28

A chain is only as strong...

Lossless!We Promise!

JPEG

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 13/28

This one? Please?

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 14/28

JPEG 2000 compression

JPEG 2000 lossy compression signs are best learned from multiple examples

Reference

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 15/28

Burnout

Sharp spikes at either end of the histogram indicates burnout

Reference

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 16/28

Burnout - visualization

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 17/28

But we need the dark to read!

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 18/28

No you don't!

Visualisation of ALTO-OCR files: https://github.com/tokee/quack

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 19/28

¦ “. i N ¦ M i; sk s, 011 el en al vei dens -t oi - te kom ei ner milen t'i bi -i i. 1 1 id iiiip .il -ukkersvg, I i ; \\ . : i .i i mod! , 1 1 km ikui i ene i !i;i • 1. t.v U.ilisk kali! Dit ei l u 't eksi isk.a bet /.> . i : 1 1 'I I l'b.! : man 'lit le.l b der ledes at Lundbecks tulbpeie forsk mi I I i 1 ' kt ' '1 F.va Sti 11 iess m i m ! o! i i . ! i , v! I ; vende l.epemiddel • ¦ ! a a 1. 1 >!:' a ! t \\ p. ¦ 2 tliahvtcs ; n 'i i te '. oi st, ¦ Klm.sKe i ol sop I n slik kel s vpe pat len! er Di amerikanske 1 1 1 1 1 1 1 i , F! >A il pi ve";

ABBYY FineReader 10.5

Some software upgrades matter!Novo Nordisk, som er en af verdens største koncerner inden for behandling af sukkersyge, bliver nu mødt af konkurrence fra en ny dansk kant. Det er biotekselskabet Zea - land Pharmaceuticals, der ledes af Lundbecks tidligere forsk - ningsdirektør Eva Steiness, som fører et nyt lovende lægemiddel til behandling af type 2-diabetes frem til de første kliniske forsøg på sukkersyge-patienter. De amerikanske sundheds - myndigheder, FDA, har givet

ABBYY FineReader 11

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 20/28

Post-processing

Holes in the histogram indicates leveling / exposure / contrast

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 21/28

Beware: Post-processing + JPEG

Post-processing indicators becomes less distinct when the image is JPEG compressed

Reference

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 22/28

Notice the lines?

300DPI @ 20x enlargement 300DPI @ 19x enlargement

Eve

ry o

the

r lin

e

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 23/28

Time for scanner calibrationG E N E R A L C A M E R A S E T T I N G S:

Camera Model No.: P2-20-08K40Camera Serial No.: 11041074Camera Network ID: 0Network Message Mode: disabled

Firmware Design Rev.: 03-081-20017-01 Aug 29 2007DSP Design Rev.: 03-056-20013-00

SETTINGS FOR UNCALIBRATED MODE:

Analog Gain (dB): +0.0 +0.0Analog Offset: 634 630

SETTINGS FOR CALIBRATED MODE:

Analog Gain (dB): -0.4 -0.5Analog Offset: 624 630Digital Offset: 0 0Calibration Status: FPN [uncalibrated] PRNU [calibrated]

SETTINGS COMMON TO CALIBRATED AND UNCALIBRATED MODES:

System Gain: 0 0Background Subtract: 0 0

Pretrigger: 0Number Of Line Samples: 32Video Mode: calibratedData Mode: 0Exposure Mode: 4

SYNC Frequency: external (9398.09) HzExposure Time: external

End-Of-Line Sequence: onUpper Threshold: 240Lower Threshold: 15Region Of Interest: 0001 - 8192

OK>

Systematic alternating lines indicates that the scanner should be calibrated

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 24/28

Last one is tricky

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 25/28

Zoom 9000

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 26/28

New tool – Grid lines

Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 27/28

Upscaling

I'm working on a small tool for detecting scaling. Very alpha: https://github.com/tokee/telltale

BibTekConf 2013 - Lucene/Solr samsøgning og skaleringToke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 28/28

Are your scans just fine?

Toke Eskildsen, Statsbibliotekethttp://en.statsbiblioteket.dk/newsdigi

http://[email protected]

@TokeEskildsen