the implementation of historical and humanities big data

22
DC2020 The Implementation of Historical and Humanities Big Data Platform of Shanghai Library Xia Cuijuan

Upload: others

Post on 26-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Implementation of Historical and Humanities Big Data

D C 2 0 2 0

T h e I m p l e m e n t a t i o n o f H i s t o r i c a l a n d H u m a n i t i e s

B i g D a t a P l a t f o r m o f S h a n g h a i L i b r a r y

Xia Cuijuan

Page 2: The Implementation of Historical and Humanities Big Data

海字

字B a c k g r o u n d a n d p r o b l e m s壹

S o l u t i o n s贰

P l a n n i n g肆

T e c h n o l o g i e s a n d M e t h o d s 叁

目录

C o n t e n t s

1

2

3

4

Page 3: The Implementation of Historical and Humanities Big Data

DH Projects of Shanghai Library and the difficulties

during the development process

B a c k g r o u n d

壹1

Page 4: The Implementation of Historical and Humanities Big Data

D H P r o j e c t s o fS h a n g h a i L i b r a r y • Transform all metadata records of different

resources in different formats into RDF. Take authors,contributors, publishers,etc. as entities.

• Give every entity cool URI as global identifier and locator then link them together after NER and entity disambiguation.

• Enrich more semantic data for the Entities by extracting structured data from the content of resources or Open datasets on the Web.

• Provide open data and authority control in a web-scale , Digital Humanities services for researchers and APIs for the third-party developers.

Process(from DL to DH)

Page 5: The Implementation of Historical and Humanities Big Data

海字

字 Basic knowledgebases

Name Authority Database

CN Geo NamesCN Chronology

His & Cul Event Knowledgebase

SH Architecture Knowledgebase

Special collection knowledgebases

Genealogy

Old Photos Old Movies

Manuscripts & Archives

Rest APIs

Shanghai Memory mobile Web Apps

Wukang Road Tour Old Movie TourRed Tour

Red Revolution Books

URI Link

Linked entities and data of different knowledgebases with URIs semanticly by the One

Ontology Abstract Model.

Immovable cultural relics DatasetGLAM Organization Dataset

SH Geo Names Dataset

Acient Books

Modern Books, Magazines & Newspapers

Search Restful APIHTTP URI LinkCrowd Sourcing Sparql Endpoint

DevelopersEnd Users

Page 6: The Implementation of Historical and Humanities Big Data

L i n k e d O p e n D a t a P l a t f o r m f o r D H

• Data Cleaning and NER. Have to correct the Inconsistency of metadata records in digital library systems and deal with the entity disambiguation manually.

• Provide proper Services for researchers. Not easy to understand the specific requirements of researchers from different areas. The services of so many knowledgebases and applications need to be integrated and well designed for user need and user experience.

Difficulties

Page 7: The Implementation of Historical and Humanities Big Data

Build an infrastructure to Bridge the gaps among the process of

digitalization, data production, data cleaning, NER, and LOD

publishing.

Build one platform to integrate the services of multiple DH systems

and meet the specific requirments of researchers from different areas.

S o l u t i o n s

贰2

Page 8: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

Goals

https://dhc.library.sh.cn

• One portal to access all

• One engin to search all

• One KOS to integrate all

• One data hub to Mash-up all

• One infrastructure to fulfill all

Page 9: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

Technical Achitecture• Digital objects Layer

I I IF Server ,PDF Server , Image Server provide scan image management and access for all digital ob jects .

• LOD Layer

The RDF data of all Metadata and authority data is stored in RDF Store and published as LOD.

• APIs Layer

Performs data output and input between LOD layer and user Service layer

• User Service Layer

Provid navigation,search, visualization, statistics and analysis. . . . . .

Digital Objects Layer

LOD Layer

APIs Layer

User Service Layer

Page 10: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

KOS Based on One Ontology

Old Movies

Old Photos

Archives and Manuscrips

Genealogy

Ancient Books

Page 11: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

Serch Engin Cross Knowledgebases

• Eevry knowledgebase is based on LOD technologies, and the RDF data is stored in Openlink Virtuoso(VT).

• Every Digital Object(scaned image) is linked tO the RDF data and displayed in the framework of I I IF based on IIP Server.

• A Federal search engin through APIs provided by ElasSearch Index of every single knowledgebase.

Page 12: The Implementation of Historical and Humanities Big Data

Crowd sourcing for Image2TextLOD for metadata and authority data publishing and sharing on the webMachine Learning for NERIIIF for displaying and reorganization of the scaned imagesText analysis, SNS, GIS for typical DH research paradigms

T e c h n o l o g i e s

叁3

Page 13: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

Crowd Sourcing for Image2Text

• Transcription online

• Captcha

http://zb.library.sh.cn

Page 14: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

LOD for Metadata and Authority data publishing and sharing

(more than 300 millions of RDF triples totally)

Page 15: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

Machine Learning for NERBERT (Bidirectional Encoder Representation from Transformers)

5548 person entities from 7430 records,159 incorect,314 omitted, Accuracy: 91.47%

Page 16: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

IIIF for Displaying, Sharing, and Reorganization of the Scaned Images

Page 17: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

Text Analysis

Page 18: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

SNS for exploring the relationships among people

Page 19: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

GIS for searching and visualizing big data on the map

HGIS Architecture

Page 20: The Implementation of Historical and Humanities Big Data

The future plans in the next 5 years to accomplish the Platform

P l a n n i n g

肆4

Page 21: The Implementation of Historical and Humanities Big Data

Historical and Humanities Big Data Platform

PlanningPhase 2• New Technologies• New Methods• Data and knowledge

Integration• Busyness and services

integration

Phase 4• Outreach

• Cultural tourisim• Digital exhibition

• Innovation Support

Phase 1• Transform Digital library

systems to knowledgebases based on lOD technologies

• DH projects development

Phase 3Support Specific Application scenaries• Rebuilding and long term

preservation of City memory • Specific fields research support

2015~2025

Page 22: The Implementation of Historical and Humanities Big Data

海字

字Thanks

Questions and Comments

Xia Cuijuan

[email protected]

D C 2 0 2 0