code4 lib 20141129 python

17
The LITA Forum & library data in Python

Upload: tdsmithcapu

Post on 27-Jul-2015

1.469 views

Category:

Data & Analytics


0 download

TRANSCRIPT

The LITA Forum & library data in Python

Library and Information Technology Association (LITA)

Nov 5-8LITA ForumAlbuquerque

Learn Python by Playing with Library DataBy Francis Kayiwa& Eric Phetteplace

Github

BitBucket

Main class

https://bitbucket.org/ fkayiwa/litaconf/overview

PyMARC scriptsBy Eric Phetteplace

https://github.com/phette23/pymarc-ebooks-scripts

• count-tag.py find out many records have a particular tag• dual856.py find all your records with multiple 856 (electronic location) tags• ebooks-to-csv.py save all your ebook (defined as anything with an 856 $u) titles to

a CSV file• gmd-counter.py count number of occurrences of different General Material

Designations (245 $h) in a collection of records. Example JSON output included.• pymarc-notes.md some very minimal notes on using pymarc, mostly links to

documentation• python-on-windows.md notes on getting set up on a Windows machine• proxy-ebooks.py the main script I wrote, others were basically tests leading up to

this. We were implementing a proxy server and this cleaned up our 856 fields while proxying appropriate vendor URLs.

• search-gmd.py find titles of records with a certain GMD• subfield-counter.py count subfields used in all records? I actually don't know, this is

horrible code, Eric.• web-links.py output stats on 856 fields in records• webfeet.py find records with "[selected by Web Feet]" in the title since at some

point we imported one of these misguided attempts to catalog "the good parts" of the Internet

• write856s.py write records with multiple 856 fields out to a separate MARC file

MARCkbart

https://github.com/lpmagnuson

EZProxy Analysis

https://github.com/robincamille/ezproxy-analysis

Analyzes EZproxy-generated log files and spits out a CSV with this info:

• Filename of log being analyzed• # total connections• # on-campus connections (as determined by IP addresses starting with

"10." -- may be different for your campus)• % on-campus connections of total• # off-campus connections• % off-campus connections of total• # library connections (as determined by IP addresses starting with

"10.11" and "10.12" -- will almost certainly be different for your campus)• % library of on-campus connections• % library of total connections• # student sessions off-campus• % student sessions of total off-campus• # fac/staff sessions off-campus• % fac/staff sessions of total off-campus

Beautiful Soup

Real world

Real world

TIPS: Don’t use python 3

Albequerque is lovely and small