british library labs presentation at elpub 2014, june 20, 2014
DESCRIPTION
Key note presentation given at ElPub2014, June 20 about the Digital Scholarship department and the work of the Digital Research Team and British Library Labs.TRANSCRIPT
Putting data to use for researchers: How the British Library's Digital Scholarship department is
putting data to use for researchers through its Digital
research Team and British Library Labs project
Mahendra Mahey
18th International Conference on Electronic Publishing (Elpub)
Keynote speech, Friday 20 June, 2014, 0930 – 1030 (EST)
Alexander Technological Education Institute of Thessaloniki, Greece
Manager of British Library Labs
http://labs.bl.uk 2 #bl_labs [email protected]
Overview
• The British Library and a typical scholar
• The Nature of Digital and the Digital Scholar
• The British Library supporting Digital Scholarship
• Experiences of the Digital Research Team and British
Library Labs project in supporting digital scholarship
• Conclusions
http://labs.bl.uk 3 #bl_labs [email protected]
The British Library
St Pancras, London, UK Many books are stored 5 stories below the building
Inside the British Library Space for 1200 readers, around 400,000 visitors per year
Uses low oxygen and robots
Storage at Boston Spa
http://labs.bl.uk 4 #bl_labs [email protected]
British Library Collections > 150 million items
> 0.8 m serial titles
> 8 m stamps
> 14 m books
> 3 m sound recordings
> 4 m maps
> 1.6 m musical scores
> 0.3 m manuscripts
> 60 m patents
King’s Library
http://labs.bl.uk 5 #bl_labs [email protected]
Our Scholar in Humanities…
• Travel routes in the 19th Century
Pieter Francois Post doctoral researcher at University of Oxford
Bob Nicholson History Lecturer at Edge Hill University
• History lecturer specialising in the Victorian period
http://labs.bl.uk 6 #bl_labs [email protected]
The Nature of Digital
Data broken down
recombined and
duplicated Image: Tower of Babble, Book Sculpture by Brian Dettmer
http://labs.bl.uk 7 #bl_labs [email protected]
The Digital Scholar
not necessarily be a recognised academic or someone who posts online,
just a specialist
Digital
Networked Open
From Digital Scholar : How technology is transforming scholarly practice, Martin Weller, Bloomsbury Academic, 2011, page 4
It is someone who employs digital, networked and open
approaches to demonstrate their specialism.
http://labs.bl.uk 8 #bl_labs [email protected]
Digital Humanities
“The emergence of the new digital humanities isn’t an
isolated academic phenomenon. The institutional and
disciplinary changes are part of a larger cultural shift, inside
and outside the academy, a rapid cycle of emergence and
convergence in technology and culture”
Steven E Jones, Emergence of the Digital Humanities (2013)
http://emergenceofdhbook.tumblr.com/
http://www.corpusthomisticum.org/it/index.age
Father Roberto Busa (1913-2011)
http://labs.bl.uk 9 #bl_labs [email protected]
“Reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church.” -Franco Moretti
http://labs.bl.uk 10 #bl_labs [email protected]
Example Digital research methods
http://labs.bl.uk/Launch+Event (has some examples from researchers)
Corpus analysis tools
Text Mining
Visualisations
Location based searching
Geotagging
Annotation
Natural Language
Processing
Using Application Programming Interfaces for
datasets e.g. Metadata, Images
Transcribing
Crowdsourcing /
Human Computation
http://labs.bl.uk 11 #bl_labs [email protected]
Digitisation at the British Library
http://labs.bl.uk 12 #bl_labs [email protected]
Digitised Books
250,000 books being digitised with Google
68,000 volumes digitised with Microsoft
17th, 18th and 19th Century
Image taken from page 344, Volume 2, Cassell's Illustrated History of
the Russo-Turkish War, etc. by OLLIER, Edmund.
Otto, King of Greece
Image taken from page 10, "The Greece
of the Greeks", PERDICARIS, G. A.,
http://goo.gl/v7p1Lj
http://labs.bl.uk 13 #bl_labs [email protected]
Digitised Newspapers
Newspapers stored at Colindale (now closed)
http://www.britishnewspaperarchive.co.uk/
http://labs.bl.uk 15 #bl_labs [email protected]
Not just text…Moving Image Collections
http://labs.bl.uk 16 #bl_labs [email protected]
Digitisation - Transforming access
Spreading the value of collections, content and expertise
Connecting as much as collecting, e.g. social media
Encouraging others to integrate our materials into their
services – and vice versa
http://labs.bl.uk 17 #bl_labs [email protected]
only in
Reading
Rooms due
to ©
only on
site due to
©
not
online –
various
storage
devices
online
and open
British Library
online
behind
paywall
Challenges of Digital access
http://labs.bl.uk 18 #bl_labs [email protected]
Digital Scholarship Department
…become a leading centre of digital scholarship
… internationally recognised for innovation and
collaboration in support of research and
learning…
• The Digital Research Team
– Digital Curators
• The British Library Labs project
18
http://labs.bl.uk 19 #bl_labs [email protected]
What is a Digital Curator?
• Explore how digital technologies are
re/shaping research and how this
informs how the library does its
business.
• Support staff across the library to identify
the opportunities that digital tools and
collections afford in modern scholarship
and to gain the skills to engage confidently
in this area.
• Partner with libraries and institutions to
enable innovation in digital scholarship.
• No specific collection but rather expertise
in digital scholarship, broadly defined. James Baker Nora McGregor
Stella Wisdom Aquiles
Alencar-Brayner
http://labs.bl.uk 20 #bl_labs [email protected]
Training Library Staff
• Foundations in working with Digital Objects:
From Images to A/V
• Data Visualisation for Analysis in Scholarly
Research
• Information Integration: Mash-ups, API’s and The
Semantic Web
Digital Scholarship Training Programme
• Behind the Screen: Basics of the Web
• What is Digital Scholarship?
• Digital Collections at British Library
• Digitisation at British Library
• Text Encoding Initiative & Annotation
• Geo-referencing and Digital Mapping
• Crowdsourcing in Libraries, Museums
and Cultural Heritage Institutions
http://labs.bl.uk 21 #bl_labs [email protected]
Opening up Digital content
• Picturing Canada: Mapping a Collection:
http://bit.ly/13GhLIe
http://commons.wikimedia.org/wiki/Commons:British_Library/Picturing_Canada
http://labs.bl.uk 22 #bl_labs [email protected]
Crowdsourcing Digitised Maps
http://www.bl.uk/maps/georeferencingmap.html
http://labs.bl.uk 23 #bl_labs [email protected]
Creative with Wildlife Sounds
http://goo.gl/s7siv0
Sound Edit Wildlife Films
Competition 2013 http://vimeo.com/60401313
'Dave's Wild Life' by
Samuel de Ceccatty, won first prize!
http://sounds.bl.uk/Environment
http://labs.bl.uk 24 #bl_labs [email protected]
The Big Data Experiment
• Microsoft Azure
• University College London’s
Computing and Digital
Humanities department
• Recommender engine for BL
Public domain content
http://goo.gl/VN0Wg2
http://labs.bl.uk 25 #bl_labs [email protected]
Technology Strategy Board Competition
Winner
• Competition with Technology Strategy Board
• Focus on understanding the value and impact of making the
British Library’s Digital Content and data open / in the public
domain
• Peter Balman will develop an analytics dashboard for the
Library showing what is happening to our public domain
content
Challenge details: http://goo.gl/Hb6l4A https://www.vimeo.com/94067983
http://labs.bl.uk 26 #bl_labs [email protected]
Computer Games
Off the Map Competition 2013
Pudding Lane Productions, 6 second-year students,
De Montfort University, Leicester, won first prize.
Off the Map
Gothic 2014 !
http://youtu.be/SPY-hr-8-M0
http://offthemap.gamecity.org/
http://labs.bl.uk 27 #bl_labs [email protected]
Funded by the Andrew Mellon Foundation
http://labs.bl.uk 28 #bl_labs [email protected]
Digital
Scholarship
Digital
Research
Access &
Reuse Group
©
Developers/
Technical
Staff
British Library
Universities & wider
e.g. companies, start-ups,
independent scholars etc.
Stakeholders involved in Labs
United Kingdom The World
Researchers
Developers
BL Labs
Curators /
Researchers
Digital
Content
http://labs.bl.uk 29 #bl_labs [email protected]
What is Labs…
BL Labs
Open
Software
Publications
Tools &
services to
support Digital
Scholarship
Case
Studies
Audience Research
question / idea
idea
idea
Competition
Contact
Events
Meetings
and visits
Experimenting with our
digital collections
Outputs from
engagement Data
Other Digital
Collection / Data
BL Digital
Collection /
Data
Researchers
Developers
Data Driven
http://labs.bl.uk 30 #bl_labs [email protected]
Labs audience
Courtesy of Ben O’Steen
Researchers
Developers
Ability to question,
Review or interpret domain in
Potentially meaningful way
Good things
happen here
upskill
Skill and / or capability
To realise that potential
*human and / or hardware
Specific
Domain
Knowledge
People of
interest
Desired outcome
of BL Labs activities
http://labs.bl.uk 31 #bl_labs [email protected]
British National
Bibliography
UK Web Archive Data
Text-mining of
electronic journals
Book ordering and
anonymised reader
data
Sample Labs Digital Collections
http://labs.bl.uk/Digital+Collections
• Copyright cleared for research
use
• Curated (Is there someone who
knows the ‘story’ about the
collection?)
• Collection / Item Level
Metadata available? (What state is
and does it need cleaning?)
• Where is it?
http://data.bl.uk coming soon!!
http://labs.bl.uk 32 #bl_labs [email protected]
Engaging with Labs
Brainstorm ideas & group
Consider and choose
Work late and show what has
been done
1 2 3
Labs Data Cards
Ideas Labs
Hack and Data days
Projects
http://dml.city.ac.uk/
http://labs.bl.uk 33 #bl_labs [email protected]
The winners of the Labs 2013 competition
Pieter Francois (left) and Dan Norton (right)
and each received a cheque for £2000 in November 2013
as winners of the first British Library Lab Competition 2013
Two entries chosen in June 2013
They both worked in residence from July to October 2013
with Labs to complete their projects
http://labs.bl.uk 34 #bl_labs [email protected]
Sample Generator: representative samples
• Pieter Francois
• Focus on European travel in the
19th Century
• Uses statistical methods to
support text analysis
• Tool produces representative
samples of texts based on
search criteria
http://goo.gl/YFnZmu
http://labs.bl.uk 35 #bl_labs [email protected]
Pieter Francois
https://www.youtube.com/watch?v=xK80Jy0ijkA
http://labs.bl.uk 36 #bl_labs [email protected]
Mixing the Library:
The Disc Jockey & the Digital Collection
http://www.tompro.co.uk
http://www.ablab.org/shetland
http://www.ablab.org/pd/di/
Prototype design
Annotation
Preview ‘item’
Selected ‘right’
channel ‘item’
Selected ‘left’
channel ‘item’ Collection ‘stalks’ made of ‘items’. Each ‘item’ is a URL.
The order of the ‘items’ can be ‘shuffled’ and sent to the ‘left’ or ‘right’ channels
‘Play back’ of ‘items’ (Blue)
and annotations (Yellow)
http://212.71.253.54:8000/a
Living Lab: Library of the Future, see: http://alturl.com/284zw
Basic functioning prototype:
http://labs.bl.uk 37 #bl_labs [email protected]
Curatorial for Library metadata
Geo location
http://datatales.artefacto.org.uk/
Timeline Slide show
India Office Select materials
http://labs.bl.uk 38 #bl_labs [email protected]
Winners of 2014 Competition
Victorian Meme Machine
Bob Nicholson of Edge Hill University
Anna Gerber and Desmond Schmidt from Queensland University
Blog posting http://goo.gl/iJy0aT
YouTube: http://goo.gl/mBTlk2
Blog: http://goo.gl/ofpNosl
YouTube: http://goo.gl/iseHTE
Text to Image Linking Tool (TILT)
http://labs.bl.uk 39 #bl_labs [email protected]
Bob Nicholson
https://www.youtube.com/watch?v=zK95lzaPNp0
http://labs.bl.uk 40 #bl_labs [email protected]
Story of one digital collection
What can 68,000 books tell us?
Image: Artwork by Alicia Martin
http://labs.bl.uk 41 #bl_labs
Extracting Images from OCR
41
<?xml version="1.0"
encoding="UTF-8" ?>
- <mets:mets
xmlns:xsi="http://ww
w.w3.org/2001/XML
Schema-instance"
xmlns:mets="http://w
ww.loc.gov/METS/"
xsi:schemaLocation=
"http://www.loc.gov/
METS/
http://www.loc.gov/
standards/mets/ver
sion18/mets.xsd
info:lc/xmlns/premi
s-v2
Image snipped out
Algorithmically
From ALTO XML
Image taken from page 207 of 'London and its Environs. A
picturesque survey of the metropolis and the suburbs ...
Translated by Henry Frith. With ... illustrations'
ALTO XML
http://labs.bl.uk 42 #bl_labs [email protected]
Face Recognition of 19th Century Faces
The face-recognition algorithm worked
better for female faces than men’s
http://labs.bl.uk 43 #bl_labs [email protected]
The Mechanical Curator
http://mechanicalcurator.tumblr.com
• #similar_to_77576796197_published_date
• #similar_to_77576796197_slantyness
• #similar_to_77576796197_bubblyness_x
• #similar_to_77576796197_bubblyness_y
• #new_train_of_thought
Image from ‘A Lost Estate, by Mary E.Mann,Volume: 02,
Page: 91, 1889, London, Bentley & Son
http://labs.bl.uk 44 #bl_labs [email protected]
1,020,418 images!
http://www.flickr.com/photos/britishlibrary/
Each image has a URL
Some metadata, but you can add tags!
Flickr has an API so researchers and developers can build apps
And query the data
Flickr Commons – 1,020,418 images!
http://labs.bl.uk 45 #bl_labs [email protected]
Flickr in numbers
163,000,000 !!! image views since launch December 13th, 2013
to June 10th
Almost all images seen at least 5 times
90,699 tags added
18,567 images favourited
Labs involved with 2 potential research projects & 4 grassroots crowdsourcing efforts.
http://labs.bl.uk 46 #bl_labs [email protected]
Tagging a million images
- Metadata games and other projects
http://www.metadatagames.org/
Games will probably be developed using Flickr sets
http://goo.gl/j6fxac
Cardiff University’s - Lost Visions Project
http://labs.bl.uk 47 #bl_labs [email protected]
Risks of releasing the images
Funny Books for Boys and Girls. Struwelpeter. Good-for-nothing Boys
and Girls. Troublesome Children. King Nutcracker and Poor Reinhold.
http://labs.bl.uk 48 #bl_labs [email protected]
Opportunities
– increasing traffic to Library services
You can purchase
a ‘High Res’ Copy
View in the
Library Item Viewer
Download .pdf All illustrations
in book
Other illustrations in books
Published in same year
View the item in
the Library Catalogue Tags auto generated
User generated
Tag
Grouping for image
http://labs.bl.uk 49 #bl_labs [email protected]
Flickr coverage in the media!
http://labs.bl.uk 50 #bl_labs [email protected]
Creative Uses
http://goo.gl/qPPgxX
http://goo.gl/OH6FSn
Jura’s Sound Skateboard
http://labs.bl.uk 51 #bl_labs [email protected]
Burning Man
http://goo.gl/Htg4XS
David Normal, creating light boxes around the
Burning man, using the British Library’s Flickr Images
http://labs.bl.uk 52 #bl_labs [email protected]
Other Labs stories….
• Augmenting news metadata
• Opening up over 100,000 Playbills
• 3D printed objects representing statistical data with possibly
embedded USBs and RFID chips
• data.bl.uk, place for all our open data and digital collections
• Content next to parallel compute power, analysis at scale
• Funding till 2017!
http://labs.bl.uk 53 #bl_labs [email protected]
Conclusions
• Huge appetite for openly available digital content,
• There needs to be a continuous dynamic interaction with
data and the researchers to formulate and reformulate
research questions
• Working with Digital Scholars creates new opportunities
• Content and service providers, researchers and technical
people need to talk to each other to create the new tools,
services and data needed to facilitate new discoveries
• Don’t be afraid to experiment and make mistakes too!
http://labs.bl.uk 54 #bl_labs [email protected]
Acknowledgements
Ben O’Steen
- Labs Technical Lead
Digital Curator Team Digital Scholarship Head
Stella Wisdom
- Digital Curator
Nora McGregor
- Digital Curator
James Baker
- Digital Curator
Adam Farquhar
- Head of Digital Scholarship
(Wrote Labs proposal)
http://labs.bl.uk 55 #bl_labs [email protected]
Email Labs
• Let us know your ideas for engaging with Labs!
• Questions?