diggingdeeper, reaching further module 5: visualizing

37
Digging Deeper, Reaching Further Module 5: Visualizing Textual Data An Introduction

Upload: others

Post on 29-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DiggingDeeper, Reaching Further Module 5: Visualizing

Digging Deeper, Reaching Further

Module 5: Visualizing Textual Data

An Introduction

Page 2: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

In this module we’ll…

§ Introduce common visualization strategies for text data

à Communicate with researchers about their options

§Use a web-based visualization tool, HathiTrust+Bookworm

à Gain experience creating and reading data visualizations

§See how Sam used HathiTrust+Bookworm for his project

à Learn how HT+BW was utilized in research

2

Page 3: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Where we’ll end up

Create visualization of word usage trends across the HathiTrust corpus.

3

Page 4: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Data visualization

§Data visualization is the process of converting data sources into

a visual representation

§Visualizations present particular ways of interpreting data

§Data visualization is an entire field of study; we’re barely

scratching the surface

4

Page 5: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Why visualize text data?

§Understand broader themes of a dataset

§Explore patterns in the data

§Cluster texts for overview or classification

§Compare data to other data (e.g., correlating with social networks)

Adapted from Jason Chuang’s Text Visualization course at Stanford Universityhttp://hci.stanford.edu/courses/cs448b/f11/lectures/CS448B-20111117-Text.pdf5

Page 6: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Place in research process

§ In the earlier exploration stage of a project: • Explore full range of data• Discover characteristics and themes in data

§ In the later explanation stage of a project:• Communicate findings to others in a clearer and more

efficient way

6

Page 7: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common text data visualizations

Word cloud §Relatively unsophisticated, but effective§Size of word relates to prominence or salience

Topic models from HTRC Algorithms

7

Page 8: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common text data visualizations

Trees or hierarchies§Word trees

Occurrences of “I have a dream” in Martin Luther King’s historical

speech. (Wattenberg and Viégas, 2008)

8

Page 9: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common text data visualizations

Networks§Node-link diagrams

§Good for representing topic models

§Visualize connections between named entities

Topic model of English books, 1850-1899

(Underwood, 2012)9

Page 10: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common text data visualizations

Temporal- or spatial-based

visualizations

§Temporal visualizations

Percent representation of female characters in English literature

(Underwood and Bamman, 2016) https://tedunderwood.com/2016/12/28/the-

gender-balance-of-fiction-1800-2007/10

Page 11: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common text data visualizations

Temporal or spatial visualizations

§Maps

Percent of newspaper pages containing the term “hoosier”

(Palmer, Polley, & Pollock, n.d.)http://centerfordigschol.github.io/ch

roniclinghoosier/map1.html11

Page 12: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common text data visualizations

Other “multi-dimensional” visualizations

§Bubble charts

§Heat maps

Bubble chart: readability of U.S. presidential speeches

(The Guardian, 2013)12

Page 13: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common text data visualizations

Other “multi-dimensional”

visualizations

§Heat maps

Heatmap of MARC cataloging at the Library of Congress by book year and

cataloging year(Schmidt, 2017)

http://sappingattention.blogspot.com/2017/05/a-brief-visual-history-of-marc.html13

Page 14: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

ActivityMatch type of use to the type of visualization:Visualization What would it be good for?Word cloudTrees or hierarchiesNetworksTimelineMapBubble chartHeatmap

UsesChange over time SpatialTopical densityRelationshipsWord distribution

F See Handout p. 1

** Bonus: what kinds of variables (i.e. data points) you would need for each visualization?14

Page 15: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Common visualization tools

§Word clouds• Voyant• Wordle

§ Word use trends • Google Books Ngram

Viewer• HathiTrust+Bookworm

§Tabular data visualization• Tableau

§ Mapping• ArcGIS Online with

StoryMaps• Tableau

§ Network graphs• Gephi• NodeXL• DH Press

15

Page 16: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

§Python• matplotlib, pyplot

• ggplot library

§R• ggplot2

§D3.js• Javascript library for visualizations

Common visualization libraries

16

Page 17: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Review: key terms in text analysis

N-gram

four score, score and, and seven, seven years, years ago, ago our, our fathers, fathers brought, brought forth, forth on, on this, this continent, continent a, a new, new nation, nation conceived, conceived in, in liberty, liberty and…

A contiguous chain of n items from a sequence of text where n is the number of items. Example: Bigram.

17

Page 18: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

N-gram visualization: HathiTrust + Bookworm

Brings together:

§ Text data (unigrams)

§Bibliographic metadata

§Visualization tool

§ Track trends in a repository

HathiTrust

Bookworm

18

Page 19: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Bookworm framework

§Visualizes categories

§The category is plotted along the x-axis• Often plot years along the x-axis• Can plot other things!

§HathiTrust+Bookworm is just one implementation of the framework

Adapted from Ben Schmidt, “Bookworm API Philosophy”19

Page 20: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Example HT+Bookworm view

Track social change: lady vs. woman over time20

Page 21: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Reading an HT+BW graph

§ Let’s look at how verbs change over time• Eg. Burned vs. burnt

Do you see any trends?

21

Page 22: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Bookworm interface

Limit your search with facets

https://bookworm.htrc.illinois.edu/develop22

Page 23: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Bookworm interface

Fine-tune your results

23

Page 24: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Bookworm interface

Links directly to texts in the HTDL24

Page 25: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Sample Reference Question

I’m a student in history who would like to incorporate digital methods into my research. I study American politics, and in particular I’d like to examine how concepts such as liberty change over time.

Approach:

Explore word usage trends of political concepts within the HathiTrust using HT+BW

25

Page 26: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Hands-on activity

§ In this activity, you will use HT+BW to explore lexical

trends

Website: https://bookworm.htrc.illinois.edu/develop

F See Handout p. 1

26

Page 27: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Examples

27

Page 28: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Examples

28

Page 29: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Discussion

§What trends did you discover?

29

Page 30: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Case Study: Inside the Creativity Boom

§Sam used HT+Bookworm to visualize the use of “creative” in the HTDL over time

30

Page 31: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Case Study: Inside the Creativity Boom

§Sam also used an experimental HT+BW interface to

create different kinds of visualizations…

31

Page 32: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Case Study: Inside the Creativity Boom

§ “Creative” by language and year

32

Page 33: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Case Study: Inside the Creativity Boom

§ “Creativity” by library classification and year

33

Page 34: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Discussion

§Where does visual literacy fit into data literacy overall?

§What would it mean to be visually literate, particularly

with regard to text analysis?

34

Page 35: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

Questions?

35

Page 36: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

References§ Chuang, J. (2011). Text Visualization. November 2011. Retrieved January 25, 2017, from

http://hci.stanford.edu/courses/cs448b/f11/lectures/CS448B-20111117-Text.pdf .

§ Palmer K., Polley T., & Pollock C. (n.d.). Chronicling Hoosier. Retrieved August 16, 2017, from

http://centerfordigschol.github.io/chroniclinghoosier/map1.html .

§ Roskey Legal Education Blog. (2011, July 15). Martin Luther King, Jr.’s “I have a dream” speech as a word tree.

Retrieved August 16, 2017, from http://roskylegaled.com/blog/post/martin-luther-king-jr-s-i-have-a-dream-speech-as-a/ .

§ Schmidt, B. (2017, May 16). A brief visual history of MARC cataloging at the Library of Congress. Retrieved August 16,

2017, from http://sappingattention.blogspot.com/2017/05/a-brief-visual-history-of-marc.html .

§ Schmidt, B. (n.d.). API Philosophy | Bookworm. Retrieved August 16, 2017, from https://bookworm-

project.github.io/Docs/api_philosophy.html .

36

Page 37: DiggingDeeper, Reaching Further Module 5: Visualizing

M5 -

References§ Theguardian.com. (2013, February 12). The state of our union is … dumber: How the linguistic standard of the

presidential address has declined. Retrieved August 16, 2017, from

https://www.theguardian.com/world/interactive/2013/feb/12/state-of-the-union-reading-level

§ Underwood, T., & Bamman, D. (2016, November 28). The Gender Balance of Fiction, 1800-2007 | The Stone

and the Shell. Retrieved August 16, 2017, from https://tedunderwood.com/2016/12/28/the-gender-balance-of-

fiction-1800-2007/ .

§ Underwood, T. (2012, November 11). Visualizing topic models. | The Stone and the Shell. Retrieved August 16,

2017, from https://tedunderwood.com/2012/11/11/visualizing-topic-models/ .

§ Wattenberg, M., & Viégas, F. B. (2008). The word tree, an interactive visual concordance. IEEE transactions on

visualization and computer graphics, 14(6). 10.1109/TVCG.2008.172 .

37