second look at google books ryan james university of hawaii at manoa [email protected]
TRANSCRIPT
Legibility
Methodology
• 50 randomly selected books– Random.org– OED
• 2,500 pages examined for legibility errors– First fifty pages, excluding prefatory material
• Major errors = loss of information
• Minor errors = difficult to read information
Summary of Results Legibility
Number of Books 50
Number of Pages 2500
Major Errors 15
% of Pages with Major Errors 0.6
Minor Errors 9
% of Pages with Minor Errors 0.36
% of Pages with both types of errors
0.96
Major Error
Minor Error
Impact of Legibility Errors
• Loss of information
• Difficult to read information
• Frustrated Users
• OCR problems
Metadata Errors
Methodology
• 400 randomly selected records reviewed– Random.org– OED
• 1,600 metadata fields reviewed– <Author> <Title> <Publication Date>
<Publisher>
Metadata Field Errors Found % of Errors
Publisher 83 41%
Author 48 24%
Publication Date 41 20%
Title 31 15%
Total 203 100%
Errors per Book
• Expect around 1-12% error rate per traditional library catalog error rate
• Found 35.76% error rate (records having at least one error)
Types of Errors
0
10
20
30
40
50
60
70
80
90
Title Author Publisher Date
Errors per Book
0
20
40
60
80
100
120
1 Error 2 Errors 3 Errors 4 Errors
Example Search
• Author Edgar Allan Poe
• Works published before 1809
Impact of Metadata Errors
• Errors in search results– results list order– advanced searching– Frustrated Users
• Problems integrating with other information systems
Optical Character Recognition
Ham. To be, or not to be, that is the question, Whether tis nobler in the minde to suffer
The flings and arrowes of outragious fortune, Or to take Armes against a sea of troubles, And by opposing, end them, to die to fleepe
Impact of OCR Errors
• Keyword searching!
• Frustrated Users
• Frustrated Disabled Users– text-to-speech technology
The Importance of Fleepe
• Cross-contamination of errors– errors in one Google product show up in other
Google products
Ngram Viewer
Final thoughts
• How significant are the errors found in Google Books?
• Is it useful to patrons?
• What role can Google Books play?
James, R. (2010). An Assessment of the legibility of Google Books. Journal of Access Services, 7(4), 223-228.
pre-pub version: http://scholarspace.manoa.hawaii.edu/handle/10125/15358
Email: [email protected]