university of edinburghhomepages.inf.ed.ac.uk/balex/talks/bl-labs-symposium... · 2014. 12. 1. ·...
TRANSCRIPT
Palimpsest: an Edinburgh Literary Cityscape!Beatrice Alex!
University of Edinburgh
British Library Labs Symposium 2014, London, November 3rd 2014
British Library Labs Symposium 2014, London, November 3rd 2014
Palimpsest!!AHRC (Big Data) project: 01/2014- 03/2015 Literature, University of Edinburgh James Loxley, Professor of Early Modern Literature Miranda Anderson, Research Fellow !Informatics, University of Edinburgh Jon Oberlander, Professor of Epistemics Beatrice Alex, Research Fellow in Text Mining Claire Grover, Senior Research Fellow !SACHI: St Andrews Human Computer Interaction Research Aaron Quigley, Director of SACHI & Chair of Human Computer Interaction David Harris-Birtill, Research Fellow Uta Hinrichs, Research Fellow !EDINA James Reid, Workgroup Leader, Geoservices Nicola Osborne, Social Media Officer
Prototype
British Library Labs Symposium 2014, London, November 3rd 2014
Prototype
British Library Labs Symposium 2014, London, November 3rd 2014
I visited Edinburgh with languid eyes and mind; and yet that city might have interested the most unfortunate being. Clerval did not like it so well as Oxford; for the antiquity of the latter city was pleasing to him. But the beauty and regularity of the new town of Edinburgh, its romantic castle and its environs, the most delightful in the world, Arthur’s Seat, St. Bernards Well, and the Pentland Hills, compensated him for the change and filled him with cheerfulness and admiration.
Mary Shelley, Frankenstein
Frankenstein
British Library Labs Symposium 2014, London, November 3rd 2014
Edinburgh: Picturesque Notes
But it is not only pipers who have vanished, many a solid bulk of masonry has been likewise spirited into the air. Here, for example, is the shape of a heart let into the causeway. This was the site of the Tolbooth, the Heart of Midlothian, a place old in story and namefather to a noble book.
! Stevenson, Edinburgh: Picturesque Notes
British Library Labs Symposium 2014, London, November 3rd 2014
Trainspotting
These burds ur gaun oantay us aboot how fuckin beautiful Edinburgh is, and how lovely the fuckin castle is oan the hill ower the gairdins n aw that shite. That's aw they tourist cunts ken though, the castle n Princes Street, n the High Street. Like whin Monny's auntie came ower fae that wee village oan that Island oaf the west coast ay Ireland, wi aw her bairns. The wifey goes up tae the council fir a hoose. The council sais tae her, whair's it ye want tae fuckin stey, like? The woman sais, ah want a hoose in Princes Street lookin oantay the castle.…Perr cunt jist liked the look ay the street whin she came oaf the train, thoat the whole fuckin place wis like that. The cunts in the council jist laugh n stick the cunt n one ay they hoatline joabs in West Granton, thit nae cunt else wants. Instead ay a view ay the castle, she's goat a view ay the gasworks. That's how it fuckin works in real life, if ye urnae a rich cunt wi a big fuckin hoose n plenty poppy.
Irvine Welsh, Trainspotting
British Library Labs Symposium 2014, London, November 3rd 2014
Datasets
HathiTrust collection (all worldwide public domain material)
British Library Nineteenth Century Books collection
English Project Gutenberg books
Oxford Text Archive data
National Library of Scotland data
ECCO/EEBO?
Limited set of copyrighted material, if author/publisher agrees (Irvine Welsh, Muriel Spark, Alexander McCall Smith ...)
British Library Labs Symposium 2014, London, November 3rd 2014
Palimpsest Workflow
British Library Labs Symposium 2014, London, November 3rd 2014
HathiTrust collectionBritish Library Nineteenths Century Books
National Library of Scotland collectionOxford Text ArchiveProject Gutenberg
...
TEXT MINING
DIGITISED DOCUMENTS DOCUMENT RETRIEVAL & FILTERING
RELATIONAL DATABASE
USER INTERFACES
EDINBURGH GAZETTEER
Ranked lists of Edinburgh-specific candidates
MANUAL CURATION
Curation of Edinburgh-specific literature
fine-grained location extraction and geo-referencing using the Edinburgh Geoparser
geo-referenced locationssnippets
meta data
24.189 The Journal of Sir Walter Scott (Scott, Walter) 22.079 Robert Louis Stevenson (Black, Margaret Moyes)20.725 The Modern Scottish Minstrel, Volumes I-VI. (Various)19.610 Spare Hours (Brown, John)17.181 The Heart of Mid-Lothian (Scott, Walter)15.369 The Works of Robert Louis Stevenson (Stevenson, Robert L.)15.018 Rab and His Friends and Other Papers (Brown, John)14.177 Greyfriars Bobby (Atkinson, Eleanor)...
gazetteer of Edinburgh place names and their latitude/longitude pairs or shape files derived from several sources
Palimpsest Workflow
British Library Labs Symposium 2014, London, November 3rd 2014
HathiTrust collectionBritish Library Nineteenths Century Books
National Library of Scotland collectionOxford Text ArchiveProject Gutenberg
...
TEXT MINING
DIGITISED DOCUMENTS DOCUMENT RETRIEVAL & FILTERING
RELATIONAL DATABASE
USER INTERFACES
EDINBURGH GAZETTEER
Ranked lists of Edinburgh-specific candidates
MANUAL CURATION
Curation of Edinburgh-specific literature
fine-grained location extraction and geo-referencing using the Edinburgh Geoparser
geo-referenced locationssnippets
meta data
24.189 The Journal of Sir Walter Scott (Scott, Walter) 22.079 Robert Louis Stevenson (Black, Margaret Moyes)20.725 The Modern Scottish Minstrel, Volumes I-VI. (Various)19.610 Spare Hours (Brown, John)17.181 The Heart of Mid-Lothian (Scott, Walter)15.369 The Works of Robert Louis Stevenson (Stevenson, Robert L.)15.018 Rab and His Friends and Other Papers (Brown, John)14.177 Greyfriars Bobby (Atkinson, Eleanor)...
gazetteer of Edinburgh place names and their latitude/longitude pairs or shape files derived from several sources
Big data IN!!
Small data OUT
Geo-specific Tasks
Retrieve literary works which are at least partly set in Edinburgh from all literature accessible to us.
Devise a method for identifying “loco-specificity” in literature automatically based on input from literary scholars.
Create a fine-grained location gazetteer for Edinburgh.
Identify and geo-reference locations (including street names and buildings) using the Edinburgh Geoparser.
British Library Labs Symposium 2014, London, November 3rd 2014
But first …
All input documents must first be:
Converted to a common format.
Identified as written English text.
Post-corrected automatically, if necessary.
Once curated, linguistically pre-processed.
British Library Labs Symposium 2014, London, November 3rd 2014
Document Retrieval
Goal: Find all Edinburgh loco-specific items which fit our remit (fiction, autobiography, travel writing, memoirs, ...).
Index collections and perform query & meta information dependent ranking.
Initial experiments on HathiTrust data (239,481 documents = all books, serials, journals, biographies).
Ranked outputs to be checked by literary scholars and feedback to improve the retrieval component.
Applied improved methods to all other collections.
British Library Labs Symposium 2014, London, November 3rd 2014
Ranked Documents
255.23 Cassell's Old and New Edinburgh, etc GRANT, James (1880)
…
130.41 Picturesque Edinburgh LOCKIE, Katharine F. (1899)
…
98.02 Memorials of Edinburgh in the olden time. 2nd ed. WILSON, Daniel (1891)
95.33 Memorials of Edinburgh in the olden time WILSON, Daniel (1848)
…
90.13 Home Country of R. L. Stevenson ... GEDDIE, John (1898)
89.75 Water of Leith, Source to Sea…. GEDDIE, John (1896)
…
27.18 Brought to Bay; Experiences of a City Detective MACGOVAN, James (1878)
British Library Labs Symposium 2014, London, November 3rd 2014
Assisted Curation
British Library Labs Symposium 2014, London, November 3rd 2014
Discovery
British Library Labs Symposium 2014, London, November 3rd 2014
Gazetteer Creation
Our text mining tools use the Edinburgh Geoparser to mark-up place names and resolve them to coordinates with a choice of gazetteer as the reference source (GeoNames, OS, ...).
We need to create a local gazetteer by aggregating information from multiple sources:
OpenStreetMap
OSLocator (Ordnance Survey roads)
RCAHMS and Historic Scotland (listed buildings, parks, monuments)
British Library Labs Symposium 2014, London, November 3rd 2014
Multiple Sources and Formats
British Library Labs Symposium 2014, London, November 3rd 2014
Aggregated Gazetteer
Good examples: <place name="Adam Bothwell's House" lat="55.949947" long="-3.19158" source="rcahms"/>
<place name="Adam Smith Statue" lat="55.9497628" long="-3.1900024" source="osm"/>
<place name="Oxford Bar" lat="55.9529618" long="-3.2047389" source="osm"/>
<place name="Oxford Bar" lat="55.952983" long="-3.204677" source="rcahms"/>
Bad examples: <place name="Spring" lat="55.9097916" long="-3.2180268" source="osm"/>
<place name="Aerated Water Factory" lat="55.942886" long="-3.04704" source="rcahms"/>
<place name="Amaryllis" lat="55.9406106" long="-3.2079740" source="osm"/>
British Library Labs Symposium 2014, London, November 3rd 2014
Geo-Referencing
British Library Labs Symposium 2014, London, November 3rd 2014
Mobile Interface
British Library Labs Symposium 2014, London, November 3rd 2014
start walking! ! ! select mode literature-on-the-go
…!under development at SACHI.
Mobile Interface
British Library Labs Symposium 2014, London, November 3rd 2014
start walking! ! ! select mode literature-on-the-go
…!under development at SACHI.
Summary
We are 10 months into the Palimpsest project.
The final outputs will be web-based visualisations and a mobile app. The aim is to create interfaces for literary scholars and the general public.
Big data to small data.
Assisted curation of literary works set in Edinburgh ensures that the final data content is of high precision and recall. A good example of digital humanities and interdisciplinary collaboration at work.
The fine-grained Edinburgh gazetteer and the Geoparser can be used for future research.
British Library Labs Symposium 2014, London, November 3rd 2014
LTG
Ongoing projects of the Edinburgh Language Technology Group:
Palimpsest (Mining Literary Edinburgh)
UK Connectivity (Analysis of social media for the British Council)
BotaniTours (Information aggregation and presentation of botanical points of interest in the Scottish Borders).
Trading Consequences (Text mining trends in commodity trading of large 19th century text collections).
New: Text mining brain scan reports for clinical neurologists.
British Library Labs Symposium 2014, London, November 3rd 2014
Thank You
!
LTG: www.ltg.ed.ac.uk
Palimpsest: palimpsest.blogs.edina.ac.uk
Twitter: @LitPalimpsest
Contact:
Beatrice Alex | [email protected]
British Library Labs Symposium 2014, London, November 3rd 2014