seed and expand
TRANSCRIPT
![Page 1: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/1.jpg)
Seed+Expandaggregating the scientific output of the
Netherlands, 2000-2010
Linda Reijnhoudt, Rodrigo Costas, Ed Noyons, Katy Börner, Andrea Scharnhorst
1 [email protected], [email protected], Royal Netherlands Academy of Arts and Sciences (KNAW), the Hague, the Netherlands
2 [email protected], [email protected] for Science and Technology Studies (CWTS)-Leiden University, Leiden, the Netherlands
3 [email protected] for Network Science Center, School of Library and Information Science, Indiana
University, Bloomington, Indiana, United States of America
![Page 2: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/2.jpg)
to study the dynamics on the output of Dutch professors (2001-2011)
but, lack of data on the output of full professors!
goal
![Page 3: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/3.jpg)
the problem
given a Dutch professor in the NARCIS system
find all his/her publications
how to connect bibliographic data from CWTS with the NARCIS system?
![Page 4: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/4.jpg)
CWTSBibliometric publications database:● author● author-order● email (sometimes)● affiliation
(sometimes)● journal
context
DANSNARCISdutch scholars:● name, initials● DAI ● affiliations● organisation● email
=?
![Page 5: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/5.jpg)
non trivial I● misspelled names
○ Van Knienberg instead Van Knippenberg
● different initials / first name
○ Johannes and Hans
● different formats in the data across sources
○ Prefixes separated in the NARCIS system
■ P.M.P. | van | Bergen en Henegouwen
○ Made initials or concatenated in WoS
■ Henegouwen, PMPVE (Henegouwen, Paul M. P. van Bergen En)
![Page 6: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/6.jpg)
non trivial II
● multiple scholars have the same author
name (homonymy)
● the same scholar with multiple author
names (synonymy)○ changes over time, e.g., due to marriage
![Page 7: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/7.jpg)
the raw dataNARCIS database (DANS)
○ 8378 Dutch full professors ■ affiliation to dutch organizations■ name, initials■ email■ DAI
CWTS bibiometric data system○ close to 23 million publications in more than 12,000
journals○ no unique author identifier for all authors
![Page 8: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/8.jpg)
the Gold Standard
we already know the complete oeuvre of 1400 Dutch full professors, due to manually verified publication lists by CWTS (2001-2010)
USEFUL TO VALIDATE OUR METHODOLOGY
the 1400 of the 8376 (17%) full professors who already appear in this list:
the Gold Standard
![Page 9: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/9.jpg)
the sources & main overview
![Page 10: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/10.jpg)
Seed+Expand main concept
● seed creation, precision○ given a full professor, {initials, name, email, affiliations}○ find one or more publications that are most likely
authored by this professor
● seed expansion, recall○ given these 'seed' publications,○ find publications by the same author
1. publication-based classifications2. Scopus Author Identifier
![Page 11: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/11.jpg)
seed creation
1. Email seed (EM)
2. Author Address approaches (*)a. Reprint Author (RP)
b. Direct linkage author-addresses (DL)
c. Approximate linkage author addresses (AL)
3. Digital Author Identifier seed (DAI)(*) For these seeds, very common
names have been excluded
![Page 12: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/12.jpg)
seed expansion
1. CWTS Paper-Based Classification (2001-2011)○ based on citation relationships of publications○ 672 meso, over 20K micro disciplines○ micro: +23% unique papers over seed○ meso: +34% unique papers over seed
2. Scopus Author Identifier (1996-2011)○ +69% unique papers over seed
![Page 13: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/13.jpg)
evaluation
Gold standard:2001-2010
![Page 14: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/14.jpg)
results
● 80% of Dutch professors detected● Micro-disciplines: highest precision (88.5)● Scopus Author id & micro disciplines:
same recall (95.9)● This methodology can be applied to other
sets and author identity schemes (ORCID, VIVO, etc.)
● Further research on disciplinary differences and improvements
![Page 15: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/15.jpg)
general discussion
● increasing bibliographic data sources but still lacking author disambiguated data!!
● lack of research on how to connect databases○ repositories○ bibliographic databases (WoS, Scopus, etc.)○ altmetrics
● e-mail data and DAI/ORCID-like identifiers are powerful linking elements across systems
![Page 16: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/16.jpg)
the end ...
thank you very much for your attention!questions?comments?
![Page 17: Seed and Expand](https://reader034.vdocuments.us/reader034/viewer/2022052600/5582f036d8b42a21168b49a7/html5/thumbnails/17.jpg)
five seeds
combined: 6753 of 8376 full professors found