![Page 1: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/1.jpg)
Overview of Citation Analysis
Clic
kstr
eam
Dat
a Yi
elds
Hig
h-Re
solu
tion
Map
s of
Sci
ence
. By
Joha
n Bo
llen,
Her
bert
Van
de
Som
pel,
Aric
Hag
berg
, Lui
s Bett
enco
urt,
Ryan
Chu
te, M
arko
A. R
odrig
uez,
Lyu
dmila
Bal
akire
va. P
ublic
Li
brar
y of
Sci
ence
ON
E, M
arch
11,
200
9.
Version: 5/6/14
![Page 2: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/2.jpg)
Overview of Citation Analysis
Micah AltmanDirector of Research
MIT Libraries
Sean ThomasProgram Manager for Scholarly Repository Services and the Product
Manager of DSpace@MIT
Prepared for
IAPril
MIT
April 2014
![Page 3: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/3.jpg)
Overview of Citation Analysis
DISCLAIMERThese opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R.
Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.
Version: 5/6/14
![Page 4: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/4.jpg)
Overview of Citation Analysis
Collaborators & Co-Conspirators
• Thanks to:– Michael Noga– Peter Cohn– Courtney Crummett
Version: 5/6/14
![Page 5: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/5.jpg)
Overview of Citation Analysis
Related Work• K. Smith-Yoshimura, et al., 2014, Registering Researchers in
Authority Files, OCLC Research. • Liz Allen, Jo Scott, Amy Brand, Marjorie M.K. Hlava, Micah Altman
(Forthcoming), Beyond authorship: recognising the contributions to research; Nature.
• Data Synthesis Task Group. 2014. Joint Principles for Data Citation.• CODATA Data Citation Task Group, 2013. Out of Cite, Out of Mind:
The Current State of Practice, Policy and Technology for Data Citation. Data Science Journal. 2013;12:1–75.
Slides and reprints available from:informatics.mit.edu
Version: 5/6/14
![Page 6: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/6.jpg)
Overview of Citation Analysis
And now, a word from our sponsor…The Libraries @ MIT
The MIT libraries provide support for all researchers at MIT:
• Research consulting, including:bibliographic information management; literature searches; subject-specific consultation
• Data management, including:data management plan consulting; data archiving; metadata creation
• Data acquisition and analysis, including:database licensing; statistical software training; GIS consulting, analysis & data collection
• Scholarly publishing:open access publication & licensing
libraries.mit.eduVersion: 5/6/14
![Page 7: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/7.jpg)
Overview of Citation Analysis
Roadmap
* Background * * Metrics *
* Data ** Tools *
* Data Processing * * Putting it all together *
* Resources *
Version: 5/6/14
![Page 8: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/8.jpg)
Overview of Citation Analysis
Background(Why?)(What?)(Which?)
Version: 5/6/14
![Page 9: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/9.jpg)
Overview of Citation Analysis
What are bibliometrics?(simple definition)
Bibliometrics are measures of scholarly outputs.
Version: 5/6/14
![Page 10: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/10.jpg)
Overview of Citation Analysis
Scholarly output effects reputation, ranking, and funding of the discipline, institution, and individual scholar
We initially use bibliometric analysis to look at the top institutions, by publications and citation count for the past ten years…
Universities are ranked by several indicators of academic or research performance, including… highly cited researchers…
Citations… are the best understood and most widely accepted measure of research strength.
Version: 5/6/14
![Page 11: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/11.jpg)
Overview of Citation Analysis
Then
Clarke, Beverly L. "Multiple authorship trends in scientific papers." Science 143.3608 (1964): 822-824.
Version: 5/6/14
![Page 12: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/12.jpg)
Overview of Citation Analysis
Now
Version: 5/6/14
![Page 13: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/13.jpg)
Overview of Citation Analysis
Now is More
Version: 5/6/14
![Page 14: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/14.jpg)
Overview of Citation Analysis
What are bibliometrics?(Extended Definition)
• Analysis of characteristics of/relationships amongresearch/scholarly outputs/publications
– Analysis includes: lists, descriptive statistics, visualization, inference
– Outputs include:grants, articles, books, databases, software, patents
Version: 5/6/14
![Page 15: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/15.jpg)
Overview of Citation Analysis
Which questions are bibliometrics being used to answer?
Some examples:
• What are the most influential journals in a particular field?
• How influential is this scholar?• Where is interdisciplinary research occurring?• Which groups of people effectively collaborate?• Which institutions are using funding most
productively?
Version: 5/6/14
![Page 16: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/16.jpg)
Overview of Citation Analysis
Data
(Leading Databases)(Subject-Specific)(MIT Internal)(Selection)
Version: 5/6/14
![Page 17: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/17.jpg)
Google Scholar
Data Sources• Unspecified coverage, but…• Wide coverage of books,
preprint, conference proceedings, non-english work, working papers, patents, institutional repositories
Built-in Metrics• Journal H-Index• Author Profiles
– Total & Five-Year Counts– I-10 index and H-index– Yearly citations
• Limited filtering
Overview of Citation Analysis
scholar.google.com Version: 5/6/14
![Page 18: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/18.jpg)
Overview of Citation Analysis
Data• Frequently updated/current• Covers journal articles
published after 1995• Wide disciplinary coverage• Includes theses and patents,
and citations from these • Includes some institutional
repositories• Commercial
Metrics• Citation lists & counts• Author impact & articles
– Statistics – Metrics – Graphs
• Journal impact – Statistics– Metrics– graphs
scopus.com
Scopus
Version: 5/6/14
![Page 19: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/19.jpg)
Overview of Citation Analysis
Data• Journal coverage after 1899• Many conference proceedings
since 1990• Many books since 2005• Limited coverage of non-
english works• Doesn’t index institutional
repositories and e-print servers• Commercial
Metrics• Citation lists & counts• Author impact & articles
– Statistics – Metrics – Graphs
• Journal impact – Statistics– Metrics– graphs
apps.webofknowledge.com
Web of Science
Version: 5/6/14
![Page 20: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/20.jpg)
Overview of Citation Analysis
Major Subject Specific Catalogs With Citation Metrics• SciFinder:
chemical abstracts scifinder.cas.org
• PsycInfo: psychological literaturewww.apa.org/pubs/databases/psycinfo/
• Business Source Complete:business articleswww.ebscohost.com/academic/business-source-complete
• arXiv: physics, mathematics, nonlinear sciences, computer science, quantitative biology, quantitative finance, statistics (Integrates w/NASA-ADS and INSPIRE)arxiv.org
• mathSciNetMathematical Reviews. Computes collaboration distances.www.ams.org/mathscinet/
• IEEE Digital Librarycontent published by the IEEE including citing references
• USPTO: find patents that are cited by/cite othersuspto.gov/patft/
• ACM Digital LibrariesFull text and citation of ACM articles and proceedingsdl.acm.org
VERA: owens.mit.edu/sfx_local/az/mit_db Version: 5/6/14
![Page 21: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/21.jpg)
APIs for Scholarly Resources
What are API’s?
• Application programming interface (APIs), are tools used to expose raw data, query interfaces, or other functions to other software applications
• Typically more flexible than interactive interfaces
Challenges• Requires programming• Requires data manipulation and
reorganization• Variety of interfaces, coverage, results
and terms of service
Overview of Citation Analysis
libguides.mit.edu/apis
Version: 5/6/14
![Page 22: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/22.jpg)
Overview of Citation Analysis
Using API’sChoosing tools
• Recommend python or R• Many resources such as
PUBMED, DataVerse, and arXiv are accessible through OAI-PMH protocol
• More in tools section and resources section
Example: Harvesting ArXiv with pyoai
Version: 5/6/14
from oaipmh.client import Clientfrom oaipmh.metadata import MetadataRegistryfrom lxml import etree
URL = 'http://export.arxiv.org/oai2’registry = MetadataRegistry()
class Reader(object): def __call__(self, element): return etree.tostring(element, pretty_print=True, encoding='UTF8')
registry.registerReader('oai_dc', Reader())
client = Client(URL, registry)
for count, record in enumerate(client.listRecords(metadataPrefix='oai_dc')): header = record[0] metadata = record[1] or '’ print header.identifier() print metadata
![Page 23: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/23.jpg)
Overview of Citation Analysis
MIT Internal DataInstitute Data (Restricted Use)• IS&T DataWarehouse
Data from administrative systems. E.g. MIT people, organizations, grants and awards
ist.mit.edu/warehouse
• Office of the Provost – Institutional Research
Provides analytical and research support to the Provost, academic departments, research laboratories and centers.
web.mit.edu/ir/
Libraries Data • DSpace@MIT
lists of publications in Dspace by author/department
dspace.mit.edu
• Barton
lists of MIT these by author/advisor
library.mit.edu
Version: 5/6/14
![Page 24: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/24.jpg)
Overview of Citation Analysis
Comparing Databases
Coverage• Years• Disciplines• Publishers/sources• Venue –
journals/conferences/working paper/IR/personal web sites
• Documentation of coverage• Completeness
Characteristics• Internal vs. external• Free vs. fee-based• API vs. interactive• Open data vs. restrictive
licensed• Structured vs. unstructured • Full text vs. metadata
Version: 5/6/14
![Page 25: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/25.jpg)
Overview of Citation Analysis
Selecting a Database
• Free, quick, and useful Google Scholar• Extract data for further simple analysis
Scholarometer (google scholar extract), Scopus, WOS
• More complete coverage use multiple databases
• Specialized subject/single article disciplinary database/API
• Extract data for network analysis API
Free & Easy
$$ and/or programmatic
Version: 5/6/14
![Page 26: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/26.jpg)
Overview of Citation Analysis
Measures(Article metrics)(Author Impact)(Journal Impact) (Collaboration)(Network Analysis)
Version: 5/6/14
![Page 27: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/27.jpg)
Overview of Citation Analysis
Article Metrics: Overview
What are article-level metrics?
• Measures on specific published articles
• Typically used in construction of literature reviews; or as building blocks for other measures
Common measures• Citations list• Citation counts• References• Captures/bookmarks• Downloads• Mentions• Likes• Views• Readers
sparc.arl.org/sites/default/files/sparc-alm-primer.pdf Version: 5/6/14
![Page 28: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/28.jpg)
Overview of Citation Analysis
Article Metrics: Using Google Scholar
Steps1. Go to scholar.google.com 2. Search (Full Text + Metadata)
– Unstructured keyword search OR
– “Advanced” fielded search
3. Sort– by relevance
OR– ny date
4. Filter– By Date range AND/OR– By Corpus (case law, patents)
Results• Number of citations to
article indexed google scholar
• List of citing articles• Article text
(sometimes)
Version: 5/6/14
![Page 29: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/29.jpg)
Overview of Citation Analysis
Article Metrics: Example – Google Scholar
Version: 5/6/14
![Page 30: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/30.jpg)
Overview of Citation Analysis
Article Metrics: Altmetrics
Types• Captures/bookmarks• Downloads• Mentions• Likes• Views• ReadersSources• Social media• Reference management
(e.g. citeulike, mendeley )• Indexes/searches
(e.g. Scopus)
Sources• PLOS article metrics
article-level-metrics.plos.org
• Plum Analyticsplumanalytics.com
• ImpactStoryimpactstory.org
Version: 5/6/14
![Page 31: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/31.jpg)
Overview of Citation Analysis
Article Metrics: Database Comparison
Google Scholar,Scopus,WOS
PLOS
Plos Articles Only
PlumX
Coverage Wide variety PLOS Articles Wide Variety
Measures Citation countCitation list
Citation countCitation listViewsDownloadsMentionsBookmarksComments
Citation countCitation listViewsDownloadsMentionsBookmarksComments
Version: 5/6/14
![Page 32: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/32.jpg)
Overview of Citation Analysis
‘Impact’ Factors: Overview
What are impact factors?• Descriptive statistics • Usually based on citations• Commonly treated as a
proxy for the level of influence of an article, person, or journal
Common measures• ISI Journal Impact Factor:
The frequency with which the “average article” has been cited in a particular year. It is based on the most recent two years of citations. It is only supplied for journals indexed by ISI in the Web of Science.
• Article Citation Count:
Total number of citations received from other articles to target article.
• H-Index:
The maximum number of articles h such that each has received at least h citations
libraries.mit.edu/scholarly/publishing/impact-factors/ Version: 5/6/14
![Page 33: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/33.jpg)
Overview of Citation Analysis
Author Impact: Example – Google Scholar
Version: 5/6/14
![Page 34: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/34.jpg)
Overview of Citation Analysis
Author Impact: Example – Exporting Data with Scholarometer
Version: 5/6/14
![Page 35: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/35.jpg)
Overview of Citation Analysis
Author Impact: Example – Web of Science
Version: 5/6/14
![Page 36: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/36.jpg)
Overview of Citation Analysis
Author Impact: Database Comparison
Google Scholar Scholar+Scholarometer
Scopus Web of Science
Select Any Author
Only w/profiles Yes Yes Yes
Export data No Yes Yes Yes
Exclude articles No Yes Yes Yes
Metrics H-index,I10,num cites
H-index,I10,num cites
H-index,… H-index
Visualization Minimal Minimal Yes Yes
Longitudinal Minimal Minimal Yes Yes
Version: 5/6/14
![Page 37: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/37.jpg)
Overview of Citation Analysis
Journal Impact: Using Online Services
Scholar
1. Go to scholar.google.com
2. Click on METRICS
3. Google rank and journal h-5 factor displayed
4. Filter by country & field
Scopus• Go to
scopus.com • Click on
Journal Analyzer
• Select journal• Select statistics
Web of Science1. Go to admin
-apps.webofknowledge.com/JCR/
2. Select field and year + SUBMIT
3. Select subject + SUBMIT
Version: 5/6/14
![Page 38: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/38.jpg)
Overview of Citation Analysis
Journal Impact: Example – Google Scholar
Version: 5/6/14
![Page 39: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/39.jpg)
Overview of Citation Analysis
Journal Impact: Example – Web of Science
Version: 5/6/14
![Page 40: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/40.jpg)
Overview of Citation Analysis
Journal Impact: Example – Scopus
Version: 5/6/14
![Page 41: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/41.jpg)
Overview of Citation Analysis
Journal Impact: Database Comparison
Google Scholar Scopus Web of Science
Journals Covered Top 100 ranked in each language
Mostly english-language Many (selected) Journals
Metrics H5 Median Many Impact factor, Many others
Visualization No Yes Yes
Longitudinal analysis
No Yes Yes
Discipline Rankings No No Yes
Version: 5/6/14
![Page 42: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/42.jpg)
Overview of Citation Analysis
Network Analysis
What is network analysis?• Study of objects and
interactions modeled as an induced network (or graph)
• Units of observation form nodes
• Relationships form edges
Common measures• Community detection
– Modularity– Clustering– Clique
• Centrality– Betweeness– Degree– Closeness
• Diameter• Visualization
Version: 5/6/14
![Page 43: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/43.jpg)
Overview of Citation Analysis
Network Analysis: Example – CitNetExplorer
Version: 5/6/14
![Page 44: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/44.jpg)
Overview of Citation Analysis
Network Analysis: Example – CitNetExplorer
1. Use WOS to locate records2. Add records to “marked list”3. Click “marked list”4. Check “cited references”5. Save to other file formats6. Select windows tab delimeted7. Open in CitNetExplorerVersion: 5/6/14
![Page 45: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/45.jpg)
Overview of Citation Analysis
CoAuthorship Analysis Example – Using R and JSTOR – Part 1
Version: 5/6/14
![Page 46: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/46.jpg)
Overview of Citation Analysis
% cut -d"," -f 1-11 citations.CSV >areastudies2003.csv
R> areastudies.df< read.table(file="citations.CSV",row.names=NULL,sep=",",quote="",stringsAsFactors=F,header=T)R> authorList <- strsplit(areastudies.df$author,perl=TRUE,split="\t")R> plot(table(sapply(authorList,length)))
CoAuthorship Analysis Example – Using R and JSTOR – Part 2
Version: 5/6/14
![Page 47: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/47.jpg)
Overview of Citation Analysis
createCoauthorlist<-function(pl){ coauthors<-list() updateCoauthor<-function(co,paperAuthors) { tmp <- unlist( coauthors[co] ) tmp <- union(tmp,unlist(paperAuthors)) coauthors[[co]]<<-tmp } sapply(pl, function(x)sapply(x,function(y)updateCoauthor(y,x))) return (coauthors)}
CoAuthorship Analysis Example – Using R and JSTOR – Part 3
R> R> coa<-
createCoauthorlist(authorList)R> plot(table(sapply(coa,length)))
Note: Results are biased down, if a sample of records is used!
Version: 5/6/14
![Page 48: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/48.jpg)
Overview of Citation Analysis
Variations: Retrieving Authors from PLOSlibrary(rplos)options(PlosApiKey= “YOURKEY")
fetchPlosResults<-function(qstring, fstring,start=0) { moreResults <- TRUE results.df <- NULL batStart<-start batSize <- 999 while (moreResults) { tmp.df <- try(silent=TRUE, searchplos(terms="*:*", toquery = qstring, fields=fstring, start=batStart, limit=batSize) ) if (class(tmp.df) == "try-error") { moreResults<-FALSE } else if (is.null(dim(tmp.df))) { moreResults<-FALSE } else if (dim(tmp.df)[1]==0) {
moreResults<-FALSE } else { results.df<-merge(tmp.df,results.df,all=TRUE) batStart <- batStart + batSize cat (paste(batStart,date(),"\n")) save(results.df,file="/tmp/plosTMP.RData")
} } return(results.df)}
plosRes.df <- fetchPlosResults( qstring= 'publication_date:[2012-01-01T00:00:00Z TO 2012-12-31T23:59:59Z]', fstring= "id,author,journal,publication_date,subject,subject_level_1,references,article_type")
Version: 5/6/14
![Page 49: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/49.jpg)
Overview of Citation Analysis
Limitations
Limitations of data• Citation differs systematically from sharing,
reading, or ‘use’• Relationships signaled by citation are
heterogenous: citations may indicate evidentiary support, definitions, disagreement, kudos,…
• Cited objects are heterogenous – e.g. journals include letters, comments, reviews and original research
• Databases may have limited or inconsistent coverage of publishers, fields, years, or types of publications (e.g. conference proceedings), types of objects (databases, software, books, articles)
• Some types of objects are often used without being cited
Limitations of measures• Most measures are vulnerable
to self-citation and other sorts of manipulation
• Most measures are descriptive estimates – they are not forecasting or causal inferences
• Few studies of the external validity of measures
• Few studies on error and bias in estimators
Version: 5/6/14
![Page 50: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/50.jpg)
Overview of Citation Analysis
Tools
(Built-in tools)(Analysis tools)
Version: 5/6/14
![Page 51: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/51.jpg)
Overview of Citation Analysis
Built-in Tools
• Database portals have built-in tools: Google Scholar; Scholarometer; Web of Science …
• Typical restrictions of built-in tools– Single database– Number of records– Usually single-author/single journal metrics– Lacks statistical forecasting/causal models– Limited data-cleaning options– Simple visualizations
Version: 5/6/14
![Page 52: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/52.jpg)
Overview of Citation Analysis
External Tools
Feature sets• Data retrieval• Data processing
(next section)• Core statistics• Visualization• Exploratory network
analysis• Network modeling
Choosing a tool• Open vs. closed source• Free vs. commercial• GUI vs. CLI• Scalability• Single Platform/Multi-
Platform• Feature Set• Maintenance/support
Version: 5/6/14
![Page 53: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/53.jpg)
Overview of Citation Analysis
Publish or Perish• Automatic data retrieval
– MS Academic Search– Google Scholar
• Standard single-author metrics – Total number of papers and
total number of citations– Average citations per paper,
citations per author, papers per author, and citations per year
– Hirsch's h-index and related parameters and variations
• Data export to CSV www.harzing.com/pop.htm
Version: 5/6/14
![Page 54: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/54.jpg)
Overview of Citation Analysis
Scholarometer
Data• Google Scholar• Crowd-source tags
(disciplines) – available through API
• Data export to CSV
Metrics• Single/combined author
citation count/h-index rank• Discipline rank/• Author network
visualization• Discipline network
visualization
scholarometer.indiana.eduVersion: 5/6/14
![Page 55: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/55.jpg)
Overview of Citation Analysis
PajekAnalysis• Network visualization• Supports complex
networks: multi-relational, longitudinal, 2-mode
• Layout control• Clustering• Community detection
pajek.imfm.si
Source: www.public.asu.edu/~majansse/pubs/SupplementIHDP.htm
Version: 5/6/14
![Page 56: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/56.jpg)
Overview of Citation Analysis
CitNetExplorerFeatures• Citation/bibliometric specific
tool• Web of Science import.• Pajek export. • Large networks.
(millions of publications)• Simple network visualizations• Network measures:
connected components, clusters, core publications …
citnetexplorer.nlVersion: 5/6/14
![Page 57: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/57.jpg)
Overview of Citation Analysis
CiteSpaceFeatures• Citation/bibliometric tool• Import from
WOS, ArXiV, NSF, ADS,Pubmed• Export to CSV, GraphML, Pajek• Time slicing• Network measures: connected
components, clusters, core publications …
• Topic clustering
cluster.cis.drexel.edu/~cchen/citespace
Version: 5/6/14
![Page 58: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/58.jpg)
Overview of Citation Analysis
SciMatFeatures• Workflow support• Network visualization• Data processing and
cleanup• Longitudinal analysis • Metrics: h-index
sci2s.ugr.es/scimat/ Version: 5/6/14
![Page 59: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/59.jpg)
Overview of Citation Analysis
GephiAnalysis• Network graphs & layout• Dynamic filtering
(including time-sliders)• Clustering• SNA: betweeness,
closeness, diameter, PageRank, HITS,…
• Community detection(modularity)
gephi.org Version: 5/6/14
![Page 60: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/60.jpg)
Overview of Citation Analysis
Sci2Tool
Analysis and Visualization• Temporal – burst detection• Geospatial• Topical• Networks – trees and
graphs
Additional Benefits• Parsers for citation data• Bibliometric analysis tools• Portable output files• Direct connections to R and
Gephi
http://sci2.cns.iu.eduVersion: 5/6/14
![Page 61: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/61.jpg)
Overview of Citation Analysis
Command-Line ToolsUsing Python
• Scipy:scientific data processing, statistics, visualizationscipy.org
• NLTK:text processing and analysisnltk.org
• NetworkX:network measures (descriptive)networkx.github.io
• Bibtools:parse WOS data, and identify comunities of cocitationwww.sebastian-grauwin.com/?page_id=492
• PythonOAI:retrieve bibliographic metadata from OAI sources, such as arXivpypi.python.org/pypi/pyoai/
Using R
• tm:simple text processing and analysiscran.r-project.org/web/packages/tm/
• StatNet: network measures (descriptive); social network analysis (forecasting, causal); visualizationstatnet.org
• Citan: citation analysiscran.r-project.org/web/packages/CITAN
• Rplos:retrieve citation data from PLOShttp://cran.r-project.org/web/packages/rplos/
• Rmendeleyretrieve citation data from Mendeleyhttp://ropensci.org/packages/rmendeley.html
• RISmedretrieve data from NCBIhttp://cran.r-project.org/web/packages/RISmed/index.html
• OAIHarvesterretrieve data from OAI-PMH Sourcescran.r-project.org/web/packages/OAIHarvester/
Web integration for interactive visualization: d3js.org
Version: 5/6/14
![Page 62: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/62.jpg)
Overview of Citation Analysis
Characteristics of Tools
• Built-in vs. external• Free vs. fee-based• Command line vs. interactive• Open source vs. closed source• Domain– Data extraction, retrieval, integration– Data cleaning and manipulation– Network visualization– Advanced measures– Statistical analysis
Version: 5/6/14
![Page 63: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/63.jpg)
Overview of Citation Analysis
Choosing tools.
• Simple standard impact built-in database tools; Publish or Perish; Scholarometer
• Messy data OpenRefine + …• Network analysis measures– Network measures Sci2,SciMat, Pajek– Visualizations Gephi, Pajek, CitNet, SciMat
• Need to estimate complex statistical (predictive, statistical) models R
• Need maximum software flexibility, integration with software Python
Quick Start
Power Tools
Version: 5/6/14
![Page 64: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/64.jpg)
Overview of Citation Analysis
Data Processing
(reorganizing data)(cleaning data)(matching names)
Version: 5/6/14
![Page 65: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/65.jpg)
Overview of Citation Analysis
Open Refine
• Spreadsheet/database combination– Ease of use of spreadsheets– Reporting and manipulative power of databases
• Filters, facets, and clustering– Allow granular overview of what’s in your data– Easily see occurrence distribution of values– Easily make global corrections
• Supports both row-level and record-level (multi-row) operationsopenrefine.org
Version: 5/6/14
![Page 66: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/66.jpg)
Overview of Citation Analysis
Open Refine – Reorganize DataReorganizing Data• Splitting/joining multi-
valued cells• Transposing rows/columns• Supports logic-based
transformation– Google Refine Expression
Language (GREL)– Clojure– Jython
openrefine.orgVersion: 5/6/14
![Page 67: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/67.jpg)
Overview of Citation Analysis
Open Refine – Cleaning DataCleaning Data• Duplicate detection• Common data
transformations– Trimming whitespace– Normalizing text case
• Cluster/edit for matching and normalization
Additional Benefits• Perform mass edits
efficiently• Revision history allows for
roll-back to earlier state• Transformations recorded
as JSON– Portable for future data sets
• Browser-based
openrefine.orgVersion: 5/6/14
![Page 68: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/68.jpg)
Overview of Citation Analysis
Open Refine – Matching NamesMatching names• Create filters to navigate
larger datasets• Create facets to see all
unique values/occurrences• Auto-detect variant entries• Cluster/edit for matching
and normalization• Reconciliation services
against external data for normalization/aggregation
openrefine.orgVersion: 5/6/14
![Page 69: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/69.jpg)
Overview of Citation Analysis
Name Disambiguation
Methods• Dictionary-based entity
matching• Phonetic Matching• Rules-based linkage• Probability based linking
– Edit distance– Felligi-Sunter algorithm– Machine-learning
Tools• Febrl
sourceforge.net/projects/febrl/
• RecordLinkage (for R)cran.r-project.org/web/packages/RecordLinkage/
• Link-King (for SAS)the-link-king.com
Source: en.wikipedia.org/wiki/Record_linkage
Version: 5/6/14
![Page 70: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/70.jpg)
Overview of Citation Analysis
Matching Names – Author Identifiers
What are Author Identifiers?
• Author identifiers give you a way to reliably and unambiguously connect your names(s) with your work throughout your career, including your papers, data, biographical information, etc. This can be helpful in a number of ways:
• Provides a means to distinguish between you and other authors with identical or similar names.
• Links together all of your works even if you have used different names over the course of your career.
• Makes it easy for others (grant funders, other researchers etc.) to find your research output.
• Ensures that your work is clearly attributed to you.
Getting started with ORCID...
• ORCID (Open Researcher and Contributor ID) is a non-prorietary, non-profit community-based registry of research identifiers.
• Links authors to their datasets and other works in addition to articles.
• Authors can control what information in their ORCID profile they share. Only the ORCID ID is automatically shared. (See their privacy policy.)
• It is easy to import research output from other sources (including ResearcherID, Scopus Author ID, and Datacite Metadata Store to your ORCID profile. (See ORCID's import works page.)
• Many organizations and publishers have created integrations with ORCID including Nature Publishing Group, Elsevier, and the American Physical Society.
• Free, private, 30-second registration:orcid.org/register
libguides.mit.edu/content.php?pid=573578&sid=4729602 Version: 5/6/14
![Page 71: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/71.jpg)
Overview of Citation Analysis
Application
(Combining External and Internal Sources)
(Co-authorship Analysis)(Visualization)
Version: 5/6/14
![Page 72: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/72.jpg)
Overview of Citation Analysis
Citation analysis – export citationsQuestion: For a given paper’s citing articles, what other articles were frequently cited?
Version: 5/6/14
![Page 73: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/73.jpg)
Overview of Citation Analysis
Citation analysis – Open Refine
Version: 5/6/14
![Page 74: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/74.jpg)
Overview of Citation Analysis
Citation analysis – Open Refine
Version: 5/6/14
![Page 75: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/75.jpg)
Overview of Citation Analysis
Citation analysis – Open Refine
Version: 5/6/14
![Page 76: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/76.jpg)
Overview of Citation Analysis
Resources
(Readings)(Software)(Data)
(Glossary)
Version: 5/6/14
![Page 77: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/77.jpg)
Recommended Reading• Data Processing - General
– Getting Started:programminghistorian.org/lessons/cleaning-data-with-openrefine
– References:Verborgh, Ruben, and Max De Wilde. Using OpenRefine. Packt Publishing Ltd, 2013.
– Tutorials: github.com/OpenRefine/OpenRefine/wiki/External-Resources
• Data Processing – Dealing with Names– Getting Started -- author identifiers guide:
libguides.mit.edu/content.php?pid=573578&sid=4729602
– References:Winkler 2012; Name Matching and Record Linkages, U.S.
Censushttp://www.census.gov/srd/papers/pdf/rr93-8.pdf
Overview of Citation AnalysisVersion: 5/6/14
![Page 78: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/78.jpg)
Recommended Reading (Continued)
• Bibliometric Analysis– Tutorials:
Anne-Wil Harzing ,2011 The Publish or Perish Book, part 3: Doing bibliometric research with Google Scholar, Tarma software press
Wouter De Nooy , et al.,2011, Exploratory Social Network Analysis with Pajek, 2nd Edition, Cambridge University Press
author identifiers guide: libguides.mit.edu/content.php?pid=573578&sid=4729602
article level metrics:sparc.arl.org/sites/default/files/sparc-alm-primer.pdf
– References:Eric D. Kolaczyk, 2009, Statistical Analysis of Network Data: Methods and Models, Springer.
Overview of Citation AnalysisVersion: 5/6/14
![Page 79: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/79.jpg)
Available Databases & API’s
• Scholarly APIs:libguides.mit.edu/apis
• Google Scholar:scholar.google.com
• Scopus:scopus.com
• Web of science: admin-apps.webofknowledge.com
• Author identifiers: libguides.mit.edu/content.php?pid=573578&sid=4729602
• List of MIT-licensed Databases: owens.mit.edu/sfx_local/az/mit_db • Altmetrics
– PLOS article metrics article-level-metrics.plos.org– Plum Analytics plumanalytics.com– ImpactStory impactstory.org
Overview of Citation AnalysisVersion: 5/6/14
![Page 80: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/80.jpg)
Additional Selected Tools
• OpenRefine: openrefine.org• Publish or Perish: www.harzing.com/pop.htm
• Scholarometer: scholarometer.indiana.edu
• CitNetcitnetexplorer.nl
• CiteSpace cluster.cis.drexel.edu/~cchen/citespace
• Gephi gephi.org
• Sci2 sci2.cns.iu.edu
• Pajek pajek.imfm.si
• Scimat sci2s.ugr.es/scimat/
• R Packages:– tm cran.r-project.org/web/packages/tm/– StatNet statnet.org– CITAN cran.r-project.org/web/packages/CITAN– Rplos: cran.r-project.org/web/packages/rplos/ – Rmendeley ropensci.org/packages/rmendeley.html – RISmed cran.r-project.org/web/packages/RISmed– OAIHarvester cran.r-project.org/web/packages/OAIHarvester/l
• Python Packages: – scipy scipy.org – Nltk nltk.org – networkx networkx.github.io– bibtools: www.sebastian-grauwin.com/?page_id=492 – pyOAI pypi.python.org/pypi/pyoai/
Overview of Citation AnalysisVersion: 5/6/14
![Page 81: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/81.jpg)
Glossary of Metrics• Author H-Index:
The maximum number of articles h such that each has received at least h citations
• CentralityA measure of the importance of some node in the network based on a selected abstract model of influence/flow across network. Centrality measures include degree centrality (number of connections); closeness centrality (distance of node to other nodes in network); betweenness centrality (proportion of information that must pass through the node to go from one part of the network to another)
• (ISI Journal) Impact Factor:The frequency with which the “average article” has been cited in a particular year. It is based on the most recent two years of citations. It is only supplied for journals indexed by ISI in the Web of Science.
• Clustering:Method that partition n observations into k clusters based on the characteristics of the object. Clusters are defined either by a set of heuristics for forming the cluster, or according to a solution concept that the clusters will satisfy.
One common algorithm, K-Means assigns each observation to a fixed-K number of clusters such that each observation belongs to the cluster that has a mean value closest to that of the observation
• Network community structure measures:The detection of highly-interconnected groups of nodes within a network. Methods include hierarchical-clustering; information maximization; modularity; clique-detection
• Network Diameter:The greatest distance between any two nodes in the network.
• Page Rank:a family of iteratively-calculated recursive impact factors in which citations from other journals are weighted by the impact of those journals
Overview of Citation AnalysisVersion: 5/6/14
![Page 82: Overview of Bibliometrics - IAP Course version 1.1](https://reader033.vdocuments.us/reader033/viewer/2022061114/54628666b4af9f671c8b4844/html5/thumbnails/82.jpg)
Overview of Citation Analysis
Questions?E-mail: [email protected]:informatics.mit.edu
Version: 5/6/14