analogist/ezpaarse: analysing locally gathered logfiles to determine users’ accesses to subscribed...
DESCRIPTION
AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources (Thomas Jouneau, Université de Lorraine, France). This presentation was one of the 10 most highly ranked at LIBER's Annual Conference 2014 in Riga, Latvia. Learn more: www.libereurope.euTRANSCRIPT
LIBER 2014 - RIGA - 3/07/2014
ANALOGIST/EZPAARSE : ANALYSING LOCALLY GATHERED LOGFILES TO DETERMINE USERS’
ACCESSES TO SUBSCRIBED E-RESOURCES
http://ezpaarse.couperin.org
http://analogist.couperin.org
LIBER 2014 - RIGA - 3/07/2014
1- The Context : A Need for Evaluation 2- Gathering Local Data 3- Parsers and Analyses 4- AnalogIST and ezPAARSE 5- Results and Visualization 6- Project Organization
Presentation Outline
LIBER 2014 - RIGA - 3/07/2014
1 The Context :
A Need for Evaluation
LIBER 2014 - RIGA - 3/07/2014
1. The Context : A need for evaluation
About some well-known facts
5.000 to 10.000 publishers / 23.000 e-journals
$25 billion global revenue in 2012, increasing 4-5 %/year
The 4 biggest publishers make half the market
For 10 years the price of most journals increases from 3% to 5% / year
5.500.000 researchers, increasing 3,5% per year
1.5 billion articles downloaded per year and by 10M users
The Scientific and Technical
Information Market
We need to assess and evaluate the use of these e-resources
LIBER 2014 - RIGA - 3/07/2014
1. The Context : A need for evaluation
What we’ve currently got
… are not available
… are available and COUNTER-compliant
… are available but not COUNTER-
compliant
1st limitation : Vendors are the only source
2nd limitation : Only a partial view, no comparison possible
3d limitation : These numbers just offer mere quantification
A possible solution : → locally-gathered usage quantification
Publisher provided statistics
→ We need to assess these numbers
→ We need to complete the figures
→ We need to qualify them
LIBER 2014 - RIGA - 3/07/2014
2 Gathering Usage
Data Locally
LIBER 2014 - RIGA - 3/07/2014
4
3
2. Gathering usage data locally
The reverse proxy
LIBER 2014 - RIGA - 3/07/2014
1
4
2
3
2. Gathering usage data locally with a reverse proxy
Where ezPAARSE comes into play
LIBER 2014 - RIGA - 3/07/2014
3 Parsers and Analyses
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Example of an URL structuration
http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S0001457512000747&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wchp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0-S0001457512000747-main.pdf
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Example of an URL structuration
ISSN & type of the downloaded file
http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S0001457512000747&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wchp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0-S0001457512000747-main.pdf
LIBER 2014 - RIGA - 3/07/2014
http://www.sciencedirect.com/science/journal/00014575
ISSN By manually trying the URL, we find an HTML table of contents
3. Parsers and analyses
Example of an URL structuration
LIBER 2014 - RIGA - 3/07/2014
http://www.cairn.info/load_pdf.php?ID_ARTICLE=RFG_218_0009
We know it’s a PDF but we only get a publisher-specific identifier : we need a correspondance table : the Publisher Knowledge Base (ideally a KBART file)
3. Parsers and analyses
Example of an URL structuration
LIBER 2014 - RIGA - 3/07/2014
http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S0001457512000747&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wchp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0-S0001457512000747-main.pdf
/_pii=S([0-9]{0,7}[0-9X])/i
3. Parsers and analyses
Parse the URL
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
What do we count?
Serials E-‐books Law databases Inst. repositories
Ar#cles (ARTICLE) Book by #tle (BOOK) Law encyclopedia (ENCYCLOPEDIES)
PHD_THESIS
Abstract (ABS) Chapter, sec#on (BOOK_SECTION)
Law memento (FORMULES)
MD_THESIS
Table of contents (TOC) Book series (BOOKSERIE) Law manual (BROCHES) MASTER_THESIS
Reference (REF) Manuals, handbooks (HANDBOOK)
Law codes (CODES)
Ar#cle preview (for ex. “Look inside” func#on of SpringerLink) (PREVIEW)
Ar#cle in basket/personal folder (BOOKMARK)
- The availability of these items depend on the elements present in the URL - The Law databases currently covered are only French ones
LIBER 2014 - RIGA - 3/07/2014
...we need one parser for each
3. Parsers and analyses
Platforms covered
Each platform has its own structuration...
LIBER 2014 - RIGA - 3/07/2014
Opaque URLs : session ids, encryption…. Example : the former Springer platformhttp://www.springerlink.com/content/j5q872410p510m63/fulltext.pdf
Publisher IDs, needing to be linked to a knowledge base or a reference file. Example : Cairnhttp://www.cairn.info/load_pdf.php?ID_ARTICLE=RFG_218_0009
- Opaque URLs (session ids, encryption…) - Knowledge bases having to be manually edited
3. Parsers and analyses
Some limitations apply
LIBER 2014 - RIGA - 3/07/2014
4 AnalogIST
and ezPAARSE
LIBER 2014 - RIGA - 3/07/2014
AnalogIST : the wiki portal Analyse des Logs de l'IST = Analysing the logs of Scientific and Technical Information → The place where we gather the platform analysis, and synchronise the new parsers with the local installations http://analogist.couperin.org
4. AnalogIST and ezPAARSE
● ezPAARSE : the software ez : easy / PAARSE : Progiciel d'Analyse des Accès aux RessourceS Electroniques = Software for Analysing the Accesses to Online Resources
● as a local installation ● as an online service (SaaS)
Free (libre) software Multi-platform http://ezpaarse.couperin.org
LIBER 2014 - RIGA - 3/07/2014
4. AnalogIST and ezPAARSE
Univ 1
Univ 2
...
AnalogIST
local installations global installation + collaborative space
LIBER 2014 - RIGA - 3/07/2014
4. AnalogIST and ezPAARSE
Through a web form With the command line (cURL)
a actualiser nouveau formulaire EN
Use the web form to create the command line suiting your needs.
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : Using the Results
Example of an ezPAARSE output
KBART fields geoip fields
Ded
uplic
ate
cons
ulta
tion
even
ts :
CO
UN
TER
reco
mm
enda
tion
Text file (CSV format)
LIBER 2014 - RIGA - 3/07/2014
5 ezPAARSE :
Using the Results
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
(Libre/MS) Office rendering macros
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Exploiting the Results with
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results Who (student, researcher, staff) consults what? (UL)
Repartition of consultations of paid content (books, journals, law references…) by user type at the Université de Lorraine
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Consultations by research unit (UL)
Consultations of articles from Jan 2014 to May 2014 by research units at the Université de Lorraine
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Consultations by teaching unit (UL)
Consultations of articles by teaching unit or faculty at the Université de Lorraine
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Geolocalisation of consultations (CNRS)
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Detection of an anomaly (CNRS)
The consultation peak corresponds to an abuse of an e-resource. Detection allows to react promptly to this incident.
LIBER 2014 - RIGA - 3/07/2014
6 Project Organization
LIBER 2014 - RIGA - 3/07/2014
6. Project organization : the method
SCRUM : An agile development method
4
C
PRODUCT VISION
LIBER 2014 - RIGA - 3/07/2014
6. Project organization : the team
LIBER 2014 - RIGA - 3/07/2014
In conclusion
● ezPAARSE is free and open source ● Simple use and testing ● State of the art technologies
● Feel free to test
● send us log samples ● give us feedback !
LIBER 2014 - RIGA - 3/07/2014
Any Questions?
http://ezpaarse.couperin.org
http://analogist.couperin.org
https://twitter.com/ezpaarse
nuage de tag avec termes appropriés
LIBER 2014 - RIGA - 3/07/2014
http://analogist.couperin.org/platforms/analyse-helper/start
The rest is automatically processed
dokuwiki syntax generated
LIBER 2014 - RIGA - 3/07/2014
More features : exploiting the results with geolocalization