diggingdeeper, reaching further module 5: visualizing
TRANSCRIPT
Digging Deeper, Reaching Further
Module 5: Visualizing Textual Data
An Introduction
M5 -
In this module we’ll…
§ Introduce common visualization strategies for text data
à Communicate with researchers about their options
§Use a web-based visualization tool, HathiTrust+Bookworm
à Gain experience creating and reading data visualizations
§See how Sam used HathiTrust+Bookworm for his project
à Learn how HT+BW was utilized in research
2
M5 -
Where we’ll end up
Create visualization of word usage trends across the HathiTrust corpus.
3
M5 -
Data visualization
§Data visualization is the process of converting data sources into
a visual representation
§Visualizations present particular ways of interpreting data
§Data visualization is an entire field of study; we’re barely
scratching the surface
4
M5 -
Why visualize text data?
§Understand broader themes of a dataset
§Explore patterns in the data
§Cluster texts for overview or classification
§Compare data to other data (e.g., correlating with social networks)
Adapted from Jason Chuang’s Text Visualization course at Stanford Universityhttp://hci.stanford.edu/courses/cs448b/f11/lectures/CS448B-20111117-Text.pdf5
M5 -
Place in research process
§ In the earlier exploration stage of a project: • Explore full range of data• Discover characteristics and themes in data
§ In the later explanation stage of a project:• Communicate findings to others in a clearer and more
efficient way
6
M5 -
Common text data visualizations
Word cloud §Relatively unsophisticated, but effective§Size of word relates to prominence or salience
Topic models from HTRC Algorithms
7
M5 -
Common text data visualizations
Trees or hierarchies§Word trees
Occurrences of “I have a dream” in Martin Luther King’s historical
speech. (Wattenberg and Viégas, 2008)
8
M5 -
Common text data visualizations
Networks§Node-link diagrams
§Good for representing topic models
§Visualize connections between named entities
Topic model of English books, 1850-1899
(Underwood, 2012)9
M5 -
Common text data visualizations
Temporal- or spatial-based
visualizations
§Temporal visualizations
Percent representation of female characters in English literature
(Underwood and Bamman, 2016) https://tedunderwood.com/2016/12/28/the-
gender-balance-of-fiction-1800-2007/10
M5 -
Common text data visualizations
Temporal or spatial visualizations
§Maps
Percent of newspaper pages containing the term “hoosier”
(Palmer, Polley, & Pollock, n.d.)http://centerfordigschol.github.io/ch
roniclinghoosier/map1.html11
M5 -
Common text data visualizations
Other “multi-dimensional” visualizations
§Bubble charts
§Heat maps
Bubble chart: readability of U.S. presidential speeches
(The Guardian, 2013)12
M5 -
Common text data visualizations
Other “multi-dimensional”
visualizations
§Heat maps
Heatmap of MARC cataloging at the Library of Congress by book year and
cataloging year(Schmidt, 2017)
http://sappingattention.blogspot.com/2017/05/a-brief-visual-history-of-marc.html13
M5 -
ActivityMatch type of use to the type of visualization:Visualization What would it be good for?Word cloudTrees or hierarchiesNetworksTimelineMapBubble chartHeatmap
UsesChange over time SpatialTopical densityRelationshipsWord distribution
F See Handout p. 1
** Bonus: what kinds of variables (i.e. data points) you would need for each visualization?14
M5 -
Common visualization tools
§Word clouds• Voyant• Wordle
§ Word use trends • Google Books Ngram
Viewer• HathiTrust+Bookworm
§Tabular data visualization• Tableau
§ Mapping• ArcGIS Online with
StoryMaps• Tableau
§ Network graphs• Gephi• NodeXL• DH Press
15
M5 -
§Python• matplotlib, pyplot
• ggplot library
§R• ggplot2
§D3.js• Javascript library for visualizations
Common visualization libraries
16
M5 -
Review: key terms in text analysis
N-gram
four score, score and, and seven, seven years, years ago, ago our, our fathers, fathers brought, brought forth, forth on, on this, this continent, continent a, a new, new nation, nation conceived, conceived in, in liberty, liberty and…
A contiguous chain of n items from a sequence of text where n is the number of items. Example: Bigram.
17
M5 -
N-gram visualization: HathiTrust + Bookworm
Brings together:
§ Text data (unigrams)
§Bibliographic metadata
§Visualization tool
§ Track trends in a repository
HathiTrust
Bookworm
18
M5 -
Bookworm framework
§Visualizes categories
§The category is plotted along the x-axis• Often plot years along the x-axis• Can plot other things!
§HathiTrust+Bookworm is just one implementation of the framework
Adapted from Ben Schmidt, “Bookworm API Philosophy”19
M5 -
Example HT+Bookworm view
Track social change: lady vs. woman over time20
M5 -
Reading an HT+BW graph
§ Let’s look at how verbs change over time• Eg. Burned vs. burnt
Do you see any trends?
21
M5 -
Bookworm interface
Limit your search with facets
https://bookworm.htrc.illinois.edu/develop22
M5 -
Bookworm interface
Fine-tune your results
23
M5 -
Bookworm interface
Links directly to texts in the HTDL24
M5 -
Sample Reference Question
I’m a student in history who would like to incorporate digital methods into my research. I study American politics, and in particular I’d like to examine how concepts such as liberty change over time.
Approach:
Explore word usage trends of political concepts within the HathiTrust using HT+BW
25
M5 -
Hands-on activity
§ In this activity, you will use HT+BW to explore lexical
trends
Website: https://bookworm.htrc.illinois.edu/develop
F See Handout p. 1
26
M5 -
Examples
27
M5 -
Examples
28
M5 -
Discussion
§What trends did you discover?
29
M5 -
Case Study: Inside the Creativity Boom
§Sam used HT+Bookworm to visualize the use of “creative” in the HTDL over time
30
M5 -
Case Study: Inside the Creativity Boom
§Sam also used an experimental HT+BW interface to
create different kinds of visualizations…
31
M5 -
Case Study: Inside the Creativity Boom
§ “Creative” by language and year
32
M5 -
Case Study: Inside the Creativity Boom
§ “Creativity” by library classification and year
33
M5 -
Discussion
§Where does visual literacy fit into data literacy overall?
§What would it mean to be visually literate, particularly
with regard to text analysis?
34
M5 -
Questions?
35
M5 -
References§ Chuang, J. (2011). Text Visualization. November 2011. Retrieved January 25, 2017, from
http://hci.stanford.edu/courses/cs448b/f11/lectures/CS448B-20111117-Text.pdf .
§ Palmer K., Polley T., & Pollock C. (n.d.). Chronicling Hoosier. Retrieved August 16, 2017, from
http://centerfordigschol.github.io/chroniclinghoosier/map1.html .
§ Roskey Legal Education Blog. (2011, July 15). Martin Luther King, Jr.’s “I have a dream” speech as a word tree.
Retrieved August 16, 2017, from http://roskylegaled.com/blog/post/martin-luther-king-jr-s-i-have-a-dream-speech-as-a/ .
§ Schmidt, B. (2017, May 16). A brief visual history of MARC cataloging at the Library of Congress. Retrieved August 16,
2017, from http://sappingattention.blogspot.com/2017/05/a-brief-visual-history-of-marc.html .
§ Schmidt, B. (n.d.). API Philosophy | Bookworm. Retrieved August 16, 2017, from https://bookworm-
project.github.io/Docs/api_philosophy.html .
36
M5 -
References§ Theguardian.com. (2013, February 12). The state of our union is … dumber: How the linguistic standard of the
presidential address has declined. Retrieved August 16, 2017, from
https://www.theguardian.com/world/interactive/2013/feb/12/state-of-the-union-reading-level
§ Underwood, T., & Bamman, D. (2016, November 28). The Gender Balance of Fiction, 1800-2007 | The Stone
and the Shell. Retrieved August 16, 2017, from https://tedunderwood.com/2016/12/28/the-gender-balance-of-
fiction-1800-2007/ .
§ Underwood, T. (2012, November 11). Visualizing topic models. | The Stone and the Shell. Retrieved August 16,
2017, from https://tedunderwood.com/2012/11/11/visualizing-topic-models/ .
§ Wattenberg, M., & Viégas, F. B. (2008). The word tree, an interactive visual concordance. IEEE transactions on
visualization and computer graphics, 14(6). 10.1109/TVCG.2008.172 .
37