between information retrieval services and bibliometrics research. new ways of...
TRANSCRIPT
Between information retrieval services and bibliometrics research
–new ways of semantic browsing and visual analytics
Rob Koopman, Shenghui Wang OCLC ResearchAndrea ScharnhorstDANS- KNAW
November 7, 2015ASIST, sigmetrics workshop
Content- New approach to find structure in
bibliographic information – ARIADNE (2 Method)- Applications: - Data curation – author disambiguation (1 Motivation)- Illustration of topics – the case of digital humanities
Topical browsing – DEMO (3)- Excursion into bibliometrics – the Berlin group challenge
(4)- Wrapping up (5)
Data curation – author disambiguation
Mapping topics, communities, research fronts, …..
Bibliometrics
Documents are similar because they:- Cite each other- Are cited together- Use the same references- Use the same vocabulary- Have the same authors
Information retrieval
Documents are similar because they:- Use the same vocabulary- - ….
ARIADNE is about similarity of entities!
Document/work, Record and Entity
…
Authors Title Journal … Reference Subject
Authors names
Topical terms
Reference
Journal
Glänzel, W.
Glanzel, W.
bibliometrics
…
…
citations … Casimir effect
N=SUM (doc)
A MARC record
title
authors
issn
deweypublisher
Demo examples
• http://thoth.pica.nl/demo/relate WorldCat
• http://thoth.pica.nl/relate ArticleFirst
• http://thoth.pica.nl/astro/relate Astrophysics data Berlin group
Dataset
● WorldCat, 300+ million records● Selected 13 million items (topical terms,
authors, ISSNs, Dewey decimal codes, publishers, subject headings)
● Represented by 6 million topical terms
But a matrix of 13M x 6M is too big to process
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C after random projection -- Semantic matrix
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne’s thread: In- teractive navigation in a world of networked information. In: CHI’15 Extended Abstracts.
Step 1: Building the semantic matrix – and Dimension reduction based on Random Projection
Step 2: Interactive exploration
- Provide a simple search/text box- Calculate the top 500 most related
candidates- Find mutually related items - Convert distances to probabilities- Project to 2D
- Enhance interface with links to other spaces
Exploration of a topic
http://thoth.pica.nl/relate?input=hirsch%20index&fsize=100&ncluster=
EINS 1st PLENARY
Digital libraries
Science, ComputerScience, ontologies
Many different humanities fieldsProminently language &Literary studies
Illustration of context around a topic/field – journal view
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactive navigation in a world of networked information. In: CHI'15 ExtendedAbstracts. (2015)
As visual exploration of any dataset – astrophysics case
Wrapping up – future work● Compare the algorithm to other existing algorithms – benchmarking
● More metadata fields (publisher, subject, identifiers) – ongoing
● Identify further problems to which Ariadne can be applied ● Curation (e.g. author name disambiguation); ● Knowledge discovery (e.g. matching chemical molecules); ● Information science – population of libraries, subject areas, …
● Feedback from users – Prepare user scenarios for usability testing and set up an evaluation project – tbd
● Improve visualisation
● More functionality (timeline, history)
● Extend the implementation to other databases
Thank [email protected]@[email protected]
http://thoth.pica.nl/relate (ArticleFirst)http://thoth.pica.nl/astro/relate (Astrophysics articles)http://thoth.pica.nl/demo/relate (WorldCat)
ReferencesKoopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactivenavigation in a world of networked information. In: B. Begole, J. Kim, K. Inkpen, W. Woo(eds.) Proceedings of the 33rd Annual ACM Conference Extended Abstracts on HumanFactors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea, April 18 - 23, 2015, pp. 1833{1838. ACM (2015). DOI 10.1145/2702613.2732781. URLhttp://doi.acm.org/10.1145/2702613.2732781 (Preprint Arxiv.org)
Koopman, R., Wang, S., Scharnhorst, A.: Contextualization of Topics - Browsing throughTerms, Authors, Journals and Cluster Allocations. In: A.A. Salah, Y. Tonta, A.A.A.Salah, C. Sugimoto, U. Al (eds.) Proceedings of ISSI 2015 Istanbul. 15th InternationalSociety of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29th June to 4thJuly 2015, pp. 1042{1053. Boazici University Printhouse, Istanbul (2015). URL http://www.issi2015.org/en/Proceedings-of-ISSI-2015.html