exploring a world of networked information built from free-text metadata

Post on 28-Jul-2015

342 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Shenghui WangRob Koopman

Exploring a world of networked information built from free-text metadata

OCLC Research EMEA

ELAG2015

What would you do if you are interested in a topic?

Difficult to answer these questions: • What are the different aspects of this topic? • Are there related aspects missing in my search terms? • Who are the most prominent authors about this topic? • Which journals publish most about this topic? • How have others — e.g. librarians — described and classified

this topic?

Demo

• http://thoth.pica.nl/relate?input=opac

How do we do this?

• OFFLINE: generates a semantic representation for each entity

• ONLINE: finds the most related entities and using multidimensional scaling to display

Build semantic representation

• Basic assumptions– Entities can be represented by its context– Entities which share more context are more likely

to be related• Context is the textual environment where an

entity occurs

• The effects of state prekindergarten programs on young children’s school readiness in five states

• [author:jung kwanghee]• [subject:readiness for school]

Dataset

● ArticleFirst, 65 million articles● Selected 4 million entities (topical terms,

authors, ISSNs, Dewey decimal codes)● Represented by 1 million topical terms

But a matrix of 4M x 1M is too big to process

Dimension reduction based on Random Projection

C: a co-occurrence matrix

R: a random matrix of +/-1

C’: approximation of C after random projection -- Semantic matrix

Online interface

• Find mutual nearest neighbors

• Use multidimensional scaling to display

Nearest neighbors

Mutual nearest neighbors

Possible applications

• Explorative interface• Context based search:

– brain

• Journal finder– Arctic ice journals– http://brain.oxfordjournals.org/

• Author name disambiguation– pre kindergarten

Ariadne(demo) http://thoth.pica.nl/relate

• An extremely fast way of navigating large scale hetereogeneous entities

• Generalisable to different datasets– Full WorldCat– Small but highly curated astrophysics dataset

• Supports explorative information retrieval and entity disambiguation

References• Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting

Journal Similarity Based on What Has Been Published There.” In Proceedings of Digital Libraries 2014, 483–484. London, United Kingdom. Association for Computing Machinery. Paper, Poster

• Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne. 2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked Information”. In CHI '15 Extended Abstracts on Human Factors in Computing Systems. ACM, Seoul, South Korea. Paper, Poster

• Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization of topics - browsing through terms, authors, journals and cluster allocations”. In Proceedings of 15th International Conference on Scientometrics & Informetrics. Istanbul, Turkey. Paper

Explore. Share. Magnify.

Thank youShenghui WangRob KoopmanOCLC Research EMEA

shenghui.wang@oclc.org rob.koopman@oclc.org

top related