anne l. highsmith director, consortia systems texas a&m university do it yourself primo...

Post on 08-Jan-2018

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Our Primo Environment  Texas A&M University is a hosted, Direct customer, in production since June  As a hosted customer, we have a staging system as well as production. All program development for these extracts has been done on the production system.  We are currently on release 4.4.1

TRANSCRIPT

ANNE L. HIGHSMITHDIRECTOR, CONSORTIA SYSTEMSTEXAS A&M UNIVERSITYHISMITH@TAMU.EDUHTTP://LIBRARY.TAMU.EDU/DIRECTORY/HISMITH

Do It Yourself Primo StatisticsThe Art of the (Relatively) Painless Extraction

Our Environment

Our Primo Environment Texas A&M University is a hosted, Direct

customer, in production since June 2012. As a hosted customer, we have a staging

system as well as production. All program development for these extracts has been done on the production system.

We are currently on release 4.4.1

Our Reporting Environment Report server with an Oracle database Oracle is separately licensed, so we can

do development on it Contains SFX/MetaLib extracts and

statistics and a full copy of the Voyager database, rebuilt nightly from backup

Viewing the Views

How to see what’s available Log in as primo user Execute: s+ RPT00 Execute: SELECT VIEW_NAME FROM

ALL_VIEWS WHERE OWNER LIKE ‘%RPT00’ CLICK_EVENTS SEARCH_STATISTICS SEARCH_STRINGS

To see view definition, execute: SELECT TEXT FROM ALL_VIEWS WHERE VIEW_NAME = ‘CLICK_EVENTS’

SELECT ID,SUMMARY_TIMESTAMP EVENT_DATE,CLICK_TYPE EVENT_TYPE,CASE WHEN CLICK_VALUE='N/A' THEN '' ELSE CLICK_VALUE END CLICK_VALUE,CLICK_COUNT,SOURCE_VIEW,SOURCE_INSTITUTION,SOURCE_ON_CAMPUS,SOURCE_USER_GROUPfrom P41_PRM00.S_CLICK_SUMMARIESWHERE CLICK_TYPE NOT IN ('File System', 'DB Listener', 'Load', 'Indexes', 'Table Space', 'Search Problem', 'IO Wait', 'Memory')

View Definitions All stats views seem to be based on

S_SEARCH_SUMMARIES & S_CLICK_SUMMARIES tables Notice that CLICK_EVENTS excludes some

system-type stats SEARCH_STATISTICS is a subset of

S_SEARCH_SUMMARIES, where SUMMARY_TYPE='SEARCH_COUNT‘

SEARCH_STRINGS is a subset of S_SEARCH_SUMMARIES, where SUMMARY_TYPE = 'TOP_SEARCHES_SUMMARY'

Pop quiz #1 In 1745, settlers from the English

colonies, assisted by the British fleet, invaded and captured the capital of one of the provinces of New France. Which one?

(Will accept the name of the French province, the modern Canadian province of which it is a part, the fortress, or the place name.)

Data Anomalies

SQL vs. BIRT Reports Replicate BIRT report for Click EventsEvent type

BIRT SQL

Display details tab

5 49,629

DS 444,539 89,791GetIt!Link2

1 20,682

SQL Selection Criteria Issues Some tables contain “junk” Out of 10M rows in the

CLICK_EVENTS view, 36% had no institution name

Myriad variations in INSTITUTION_NAME

Basic Selection CriteriaSELECT event_type, click_value, click_count, institution, \"VIEW\" AS view_name, on_campus, user_groupFROM p41_rpt00.click_eventsWHERE to_char(event_date,'YYYYMM') = '$previous_month'AND institution is not nullAND lower(institution) not like 'primo%'

Scope Names Hoped that SCOPE_NAME would be

equivalent to the Search Scope Name as it appears on the Search Scope List in the Primo Back Office.

Current default SCOPE_NAME appears as: scope:("MSL"),scope:(libguides),scope:

(archon),scope:(AMDB_VOYAGER),scope:(TAMU-SFX ),scope:(EVANS),scope:(tamu_dspace_qdc),primo_central_multiple_fe

Collected all known SCOPE_NAME values in a Perl module, TAMU_Primo.pm

Scope Types SEARCH_STATISTICS and

SEARCH_STRINGS views contain an element called SCOPE_TYPE

SCOPE_TYPE in SEARCH_STRINGS should be limited to LOCAL/REMOTE

SCOPE_TYPE IN SEARCH_STATISTICS should be limited to LOCAL/REMOTE/DS

Scope Types (Continued) SEARCH_STATISTICS – 16% of

SCOPE_TYPE values are something other than LOCAL/REMOTE

SEARCH_STRINGS – 12% of SCOPE_TYPE values are something other than LOCAL/REMOTE/DS

If the retrieved value didn’t match the list of defined values, I set it to null.

Data I Can’t Make Sense of SEARCH_STRINGS has only 149,127 rows

in the view Are these unique strings? If yes, why does the same string appear

in different rows? What do the numbers, such as

AVERAGE_RESULTS and SEARCH_COUNT, really mean?

Example “Fluid mechanics” appears as a search

string in the default scope 5 times in the period 1/18/2014-3/5/2014.

AVERAGE_RESULTS by date 18-Jan-14 210677 31-Mar-14 150528 27-Feb-14 58544 5-Mar-14 58576 5-Mar-14 74119

Pop quiz #2 Which city in western Canada was the

birthplace/hometown of the following personalities: Deanna Durbin, actress Anna Pacquin, actress Doug Henning, magician and entertainer Sir William Stephenson, AKA Intrepid, spy Guy Gavriel Kay, novelist and poet Brett Hull, professional hockey player Marshall McLuhan, media guru Fred Turner, musician, Bachman-Turner Overdrive Monty Hall, host of Let’s Make a Deal

Perl Extract Programs

Generalities The extract and processing programs for

the TAMU report server are written in Perl; the front end is written in PHP

The Primo stats extract programs I have written live on the production Primo server; they sftp output to the report server

The perl programs use a local symlink from /exlibris/product/perl-5.8.9/bin/perl to /exlibris/primo/scripts/perl

Generalities (Continued) The Primo group consists of 5 Perl programs

and 1 module click_extract.pl, click_compile.pl, facets.pl,

search_statistics.pl, search_strings.pl, TAMU_Primo.pm

click_extract.pl extracts data from the CLICK_EVENTS view and stores it in output files, which are mined by click_compile.pl & facets.pl to create useful output.

search_statistics.pl & search_strings.pl extract data from their corresponding views to an output file

Generalities (Continued) Programs are designed to be run on a

monthly basis, to be put into a cronjob and cumulate the previous month’s data. But they can also be run from the command line with parameters that let you select other months earlier in the calendar.

The programs that create output files also have a step to sftp the output to a different server. But you have to do the sftp setup between servers yourself.

A Few Specifics Facets.pl creates 2 sets of output files –

one set which cumulates all facet requests and a second one that provides detail about certain facet types

If it’s a domain, language, library, resource type, or top-level facet, it cumulates the individual values under each of those types. So you would know how many times the facet for English language was applied or the facet for Thesis resource type.

Normalization Contained in TAMU_Primo.pm

Defines variations in the institution value, code versus spelled out name, and normalizes them all to the codes

Defines a list of valid view names Normalizes the user groups.

Defines a long list of valid scope_names Search_statistics.pl collects undefined

scope_names and emails the list to a designated email account so that the list can be updated

Pop quiz #3 In 1993, the National Hockey League

changed its conference and division names to boring stuff like “Eastern Conference” and “Pacific Division”. Before that, the historic conference and division names were based on people who had something to do with hockey and the history of the NHL. Give one of those old conference or division names.

top related