hathi trust research center building collections and analyzing data stacy kowalczyk

15
HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Upload: beverly-harris

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

HATHI TRUST RESEARCH CENTER

Building Collections and Analyzing Data

Stacy Kowalczyk

Page 2: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Goals

• To show the current capability of the HTRC• To get feedback about additional functionality• To get feedback on the usefulness of the

applications• To find out more about researchers needs for

subcollections

Page 3: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Agenda

• Introduction• Application demos• Hands on collection building• Hands on data analysis• Discussion

Page 4: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

HTRC Applications

Page 5: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

HTRC Collection Builder

• Provides a familiar search interface to the entire HTRC collection

• Allows for fielded and free text searching• Interface for personalized sub-collections of

the data for further processing– Creating new subcollection– Updating existing subcollections

• Based on the Blacklight open source search interface

Page 6: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

HTRC Interactive Web Interface

• Provides simple interface to the HTRC infrastructure– Manage Collections– Submit Algorithms– Use Computational resources– Manage Jobs– View Results

Page 7: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Infrastructure Components

• Authentication• Agent• Registry• APIs– Data access– Solr access

• Secured computational resources

Page 8: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Collection Builder Demo

Page 9: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Collection Builder Process Flow

Login

Search

Authentication

Retrieve a Collection

solr API

Save a Collection

Registry

Registry

Agent

Agent

Agent

Search Application

Page 10: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Collection Builder Process Flow

Login

Search

Retrieve a Collection

solr API

Save a Collection

Registry

Registry

Agent

Agent

Agent

Page 11: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Build a Collection

• http://htrc.mine.nu/blacklight • Use Firefox, Safari, Chrome browsers• Login using the ID provided• Create a collection to be analyzed – Search– Create a collection– Retrieve the collection– Add or remove a volume– Save revised collection

Page 12: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

HTRC Web Interface Demo

Page 13: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Algorithm Process Flow

Login

View Algos parameters

Authentication

Submit Algo

Registry

Computation Resource

Registry

Agent

Agent

Agent

HTRC Web Application

View Job Status

RegistryAgentView Job Results

Page 14: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Run an Algorithm

• http://smoketree.cs.indiana.edu:8999/HTRC-UI-Portal2/LogoutAction

• Sign on with the IDs provided• View your collections• Run an algorithm– Word count– Simple tag cloud

• Results– Job management– View Results

Page 15: HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk

Feedback

• What worked for you?• What did not work?• Where there interactions that were awkward?• What additional functions would you like to see?• What types of algorithms do you use in your research?• What environments do these algorithms require?• What is the approximate size of the collections you use in

your work?• Were you able to find useful materials for your research?

Would subsetting the collection apriori help you find materials?