feb.2016 demystifying digital humanities - workshop 3

Post on 13-Apr-2017

1.150 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Wrangling II:Programming on the Whiteboard

February 26, 2016Paige Morgan

Digital Humanities Librarian

Starting Activity:Open Syllabus Project

http://opensyllabusproject.org/

Open Syllabus Project•Use the syllabus explorer to

examine the data.•Keep track of each step you take

as you drill down.•Goal: develop a research question

based on your explorations.•What other data would you need

to answer this research question?

Last week...•The work of creating usable data•Forms that this data might take:

•markup language•Spreadsheets (MySQL & relational

DBs)•Non-relational databases

(RDF/Linked Open Data

This week:•Caveat Curator (challenges of

working with data)•Programming on the Whiteboard,

i.e., conceptualizing the specific steps that you need to take to accomplish your goals

Goals/Takeaways•A better understanding of the

workflow for dealing with data•How to start small and scale

up effectively•Greater ability to talk about

what you’re trying to do

Why this focus on data?•Understanding your data, and

your intended actions, is a key skill for developing any digital project (big or small).

•You may have one big project – but your data may support several small/intermediary projects.

Image: Josh Lee, @wtrsld, via Twitter, January 2014.

What if your data is crowdsourced?

You can require a particular format for

submissions

You can even put programmatic limits on

the formats available for submission

But in the end, you’re probably still going to need to scrub and/or

format.

This is true even for data from supposedly reputable sources, like government or media

organizations.

Example: Doctor Who Villains dataset

http://tinyurl.com/doctorwhovillains

Data Dictionaries

If you are thinking about your data, and the tasks

that you need to accomplish, then it’s

easier to determine what sort of language or

platform your project needs.

Pseudocode•Used by programmers to break

down a complex task into single steps

•Easily adaptable for use by non-programmers

Pseudocode Example (Visible Prices)• Computer has a file that contains prices from

different texts.

• Computer must know that each price amount is connected with an object, and with a bibliographical record.

• Users can input a price amount, and computer will retrieve all objects that match the price, and display them to the user, along with bibliographical information.

• (More complex): Computer is able to retrieve prices linked with certain categories (clothing, food, etc.)

It is likely that your data will have a longer life span than any specific

project you create.

In many instances, it may be more useful to

focus on the data curation as much as a

single project.

Getting Data•Figshare•Datahub.io•Project websites•APIs

Cleaning Data

•OpenRefine http://openrefine.org/

Key DH Values•Adaptive•Sustainable/resource-aware•Collaborative•Social

Key skills•Thinking flexibly about your data (and potential project)

•Are there portions of your dataset that could be extracted for use in a particular tool?

•How can you adjust your data in order to show it to people (and be more able to talk/write/present about your research interests?)

And now, it’s your turn...

Group Activity•What questions can you ask and

answer with this data as it is?•What data would you need in

order to ask & answer other research questions?

•What are the steps that you would need to take in order to answer those research questions?

Next steps•What’s the smallest version of your dataset possible? (useful for testing out tools)

•Possible tools to examine (as ways of presenting your data)• Omeka (http://www.omeka.net)

• Scalar (http://scalar.usc.edu)

• Simile (http://www.simile-widgets.org)

• Google Fusion Tables (https://support.google.com/fusiontables/answer/2571232)

Thank you!

•Questions? Ideas? Book a consult at http://paigecmorgan.youcanbook.me

top related