data for the humanities

44
Data for the Humanities February 21, 2017 Rafia Mirza Digital Humanities Librarian [email protected] @librarianrafia Peace Ossom Williamson Director of Research Data Services [email protected] @123POW

Upload: librarianrafia

Post on 22-Jan-2018

506 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Data for the Humanities

Data for the HumanitiesFebruary 21, 2017

Rafia MirzaDigital Humanities [email protected] @librarianrafia

Peace Ossom WilliamsonDirector of Research Data Services [email protected] @123POW

Page 2: Data for the Humanities

Learning Outcomes

• Understand the use of data in answering humanities research questions

• Understand descriptive metadata and the rationale for its use

• Recognize areas of potential bias and ambiguous or misleading representation in reporting

Page 3: Data for the Humanities

What are data?

Page 4: Data for the Humanities

“All content in digital formats can be characterized as structured or unstructured data.”

Introduction to Digital Humanities: Concepts, Methods, and Tutorials

Page 5: Data for the Humanities

Examples:

•Audio

•Notes

•Geospatial

•Textual

Data are more than numbers

https://www.lib.umn.edu/datamanagement/whatdata

Page 6: Data for the Humanities

What is data literacy?

Page 7: Data for the Humanities

the ability to read, create, utilize, communicate, and criticize data.

Data Literacy

Page 8: Data for the Humanities

data quality

accessibility, usability, and understandability on the basis of context, providence, and metadata

Data Literacy

Page 9: Data for the Humanities

data structure

of different objects in a way that works to evaluate developing hypotheses

Data Literacy

Page 10: Data for the Humanities

recognizeResearch potential

be aware ofResearch methods

understandContext and provenience

Humanities Data Literacy

Page 11: Data for the Humanities

“Humanists have data, and they need data skills.”

Digital Humanities Data Curation

Data in the Humanities

Page 12: Data for the Humanities

Types of Humanities Data

• Scholarly editions

• Text corpora

• Text with markup

• Thematic research collections

• Data with accompanying analysis or annotation

• Finding aids and other information maps, such as bibliographies

Digital Humanities Data Curation Introduction

Page 13: Data for the Humanities

Big Data Digital Humanities vs.Small Data Digital Humanities

• “Research in Big Data Digital Humanities focuses on large or dense cultural datasets, which call for new processing and interpretation methods”

• “..Small Data Digital Humanities regroup more focused works that do not use massive data processing..”

• A map for big data research in digital humanities, Frédéric Kaplan

Page 14: Data for the Humanities

1. research the context: know the data about the data (so meta!)

How to understand data

Page 15: Data for the Humanities

Data versus Metadata

Big? Smart? Clean? Messy? Data in the Humanities, Christof Schöch

Metadata Metadata Metadata Metadata

data data data data

data data data data

data data data data

data data data data

About this dataset:

Title: Metadata Date Created: MetadataCreator: MetadataMethods Used: Metadata

Page 16: Data for the Humanities

2. research who the data is about

How to understand data

Page 17: Data for the Humanities

What are historical contexts around their language and style?

Page 18: Data for the Humanities

A note on data ethics.

Page 19: Data for the Humanities

Zine Librarians Code of Ethics

• “Zines are not like mass-distributed books. They are often self-published and self-distributed, and sometimes printed in very small runs, intended for a small audience. In addition, perzinesare by definition “personal”, and zinesters may feel different about having their zines distributed in print than they would about having them openly available on the internet or print. This can be especially true in the case of “historical” zines in library collections — for example, a teen girl writing a zine for her close friends in 1994 may not want her zine distributed online or in print 20 years later.”

• Via Zinelibraries

Page 21: Data for the Humanities

3. investigate the source

How to understand data

Page 22: Data for the Humanities

Recognizing uncertainty and bias

Data on killings in the Syrian conflict.

https://responsibledata.io/reflection-stories/uncertainty-statistics/

Let’s investigate the source…

Page 23: Data for the Humanities

Recognizing uncertainty and bias

Sources include

• Syrian government

• Syrian Center for Statistics and Research

• Syrian Network for Human Rights

• Syrian Observatory for Human Rigets

and many more.

https://responsibledata.io/reflection-stories/uncertainty-statistics/

Page 24: Data for the Humanities
Page 25: Data for the Humanities

there are lots of human decisions that go into creating these statistics

without knowing how these deaths have been coded, it’s difficult to trust in the figures

Page 26: Data for the Humanities

4. highlight un/common data entries to gain rough insights

How to understand data

Page 27: Data for the Humanities

Descriptive analysis

i.e., description of the data from a sample

Page 28: Data for the Humanities

Quick descriptive statistics

• frequency

•rank from lowest to highest

•average (mean, median, mode)

•variability

Page 29: Data for the Humanities

Bivariate descriptive statistics

fancy way of saying we are looking at two variables at once

Hamlet Macbeth Othello

Similes 50 9 59

Metaphors 20 38 58

Total 70 47 117

Evaluating Comparison Methods

Page 30: Data for the Humanities

Correlation

most common way to describe a relationship between two measures

Page 32: Data for the Humanities

What if the dataset you needdoes not exist?

Page 33: Data for the Humanities

How to data1. Determine what to say

2. Find/collect/create the data you need

3. Wrangle!

4. Clean!

5. Do it many more times.

Page 34: Data for the Humanities

ID Religion Income Age Q1 Q2 Q3

26371 Jewish <$10K 19 Yes 6 20

26372 Atheist $50-75K 24 - 4 21

26373 Catholic $75-100K 56 Yes 3 21

26374 Withheld $75-100K 33 No 6 21

26375 Pentecostal withheld 49 Yes 8 20

26376 Jewish $40-50K 29 Yes 5 19

26377 Catholic $20-30K 37 No 4 22

http://vita.had.co.nz/papers/tidy-data.pdf

Tidy Data

Page 35: Data for the Humanities

Most common problems

• Column headers are values, not variable names.

• Multiple variables are stored in one column.

• Variables are stored in both rows and columns.

• Multiple types of observational units are stored in the same table.

• A single observational unit is stored in multiple tables

http://vita.had.co.nz/papers/tidy-data.pdf

Page 36: Data for the Humanities

if you torture data long enough,

it will confess to anything

Page 37: Data for the Humanities

How can a visualization be misleading?

Page 38: Data for the Humanities

What’s wrong?

Page 39: Data for the Humanities

A little less dramatic than you thought.

Page 40: Data for the Humanities

http://www.visualisingdata.com/2014/04/the-fine-line-between-confusion-and-deception/

Page 41: Data for the Humanities

https://thesyriacampaign.org/

Page 42: Data for the Humanities

Open Data: Things to Consider

http://www.slideshare.net/libereurope/humanities-data-literacy-student-perspective-on-digital-cultural-heritage-collections?qid=70bd86f2-10c5-43a6-b053-56d264ca28ab&v=&b=&from_search=1

Page 43: Data for the Humanities

Recommended Reading / Viewing

“Numbers are Only Human” – Brian Root

“Ethical Principles of Psychologists and Code of Conduct” –American Psychological Association

“On Not Looking: Ethics and Access in the Digital Humanities” –Kimberly Cristen-Withey

Page 44: Data for the Humanities

Upcoming Workshops and Eventslibrary.uta.edu/scholcomm

Rafia [email protected] @librarianrafia

Peace Ossom [email protected] @123POW