interactive text mining suite: data visualization for literary studies

30
Introduction Visualization Methods ITMS Medieval Corpus Conclusion Interactive Text Mining Suite: Data Visualization for Literary Studies Olga Scrivner and Jefferson Davis Indiana University CDH 2017 1 / 30

Upload: olga-scrivner

Post on 15-Apr-2017

74 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion Interactive Text Mining Suite: DataVisualization for Literary Studies

Olga Scrivner and Jefferson Davis

Indiana University

CDH 20171 / 30

Page 2: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Outline

1 Visual Analytics in Digital Humanities

2 Shiny Web Application - Interactive Text Mining Suite

3 Case Study: Visualization of Medieval Romance ofFlamenca

2 / 30

Page 3: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Digital Humanities - Transformation

The “epic transformation of archives” - shifting from print todigital archival form (Folsom, 2007)

3 / 30

Page 4: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Digital Humanities

“As our collective knowledge continues to be digitized andstored (...) it becomes more difficult to find and discover what

we are looking for.” (Blei 2012)

4 / 30

Page 5: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Digital Humanity Manifesto 2.0 (2009) and Berry(2011)

1st Wave: “The first wave of digital humanities work wasquantitative, mobilizing the search and retrievalpowers of the database, automating corpuslinguistics, stacking hypercards into criticalarrays”

2nd Wave: “The second wave is qualitative, interpretive”,concentrating on new tools for creating andcurating digital repositories (Berry, 2011)

3rd Wave: Concentration on the computationality, search,retrieval and analysis originated inhumanity-based work

5 / 30

Page 6: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Visual Analytics in Literature

“The science of analytical reasoning facilitated byvisual interactive interfaces”

(Thomas et al., 2005)

6 / 30

Page 7: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Close Reading

Concept Micro-analysis (Jockers, 2013)

Close textual analysis of individual texts to“unveil words, verbal images, elements of style,sentences, argument patterns” (Jasinski, 2001)

Methods Color coding, marginal comments, underlining

Tools Poem Viewer, PRISM, Juxta, eMargin

7 / 30

Page 8: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Close Reading Visualization: eMargin and JUXTA

8 / 30

Page 9: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Distant Reading

Concept Macro-analysis (Jockers, 2013)

“the construction of abstract models”(Jasinski, 2001)

Methods Tag clouds, heat maps, clusters, topics, networkgraphs

Tools GUI: Voyant, PapermachineTUI: Mallet, Meta, R and Python packages

9 / 30

Page 10: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Visualization Methods in Literature

Graphs, maps and trees for literature analysis (Moretti, 2005)

10 / 30

Page 11: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Visualization Methods in Literature

Word clouds to analyze a novel (Vuillemot et al., 2009)

11 / 30

Page 12: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Visualization Methods in Literature

Social network graphs of characters in Greek tragedies(Rydberg-Cox, 2011)

12 / 30

Page 13: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Visualization Methods in Literature

Literary fingerprint and summaries (Oelke et al., 2012)

13 / 30

Page 14: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Visualization Methods in Literature

Tracking emotion and sentiment in fairy tales(Mohammad, 2012)

14 / 30

Page 15: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Topic Modeling

Discovering underlying theme of collection from Science magazine1990-2000 (Blei 2012)

15 / 30

Page 16: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Technological and Methodological Obstacles

Many tools require some programming skills (Mallet,Meta, R and Python libraries)

GUI tools are limited to certain formats and functions(Voyant, PaperMachine)

Lack of active control by users

16 / 30

Page 17: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Our Goals - Interactive Text Mining Suite

A user-friendly interactive tool for quantitative andvisualization analysis

Designed for linguistic and literary analysis

Incorporation of annotated corpora in macro-analysis

17 / 30

Page 18: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Background

1 R - a free programming language for statistical computingand graphics

2 RStudio - Integrated Development Environment: a sourcecode editor, an executor and a debugger

3 Shiny App - a web application framework for R

18 / 30

Page 19: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

ITMS

Platform-independent, user-friendly and interactive

State-of-the-art statistical and graphical tools (R libraries)

http://www.interactivetextminingsuite.com

19 / 30

Page 20: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Multi-Functional

1 Import txt, pdf, rdf and Google books API

2 Metadata extraction

3 Interactive data pre-processing

4 Dynamic visualization

20 / 30

Page 21: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Case Study - Medieval Occitan

Occitan (Provencal) constitutes an important element of theliterary, linguistic, and cultural heritage in the history ofRomance languages

Interactive online database and linguistically annotated corpus(Scrivner et al., 2014)http://www.oldoccitancorpus.org

21 / 30

Page 22: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Comparative Analysis of POS: Original andTranslation

Occitan corpus English translation

22 / 30

Page 23: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Key-Word-in-Context Analysis of POS

Existential - there Negation

23 / 30

Page 24: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Stylistic Similarities - Sentence Length

Occitan Corpus English Translation

24 / 30

Page 25: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Stylistic Comparison - Punctuation

Occitan Corpus English Translation

Question marks and exclamation marks - red; quotation marks, hyphens and parenthesis - green; semicolons,colons, commas, periods - blue25 / 30

Page 26: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Document Level Cluster Analysis

Cluster analysis - groups documents into subgroups. Thesesubgroups “are coherent internally, but clearly different from

each other”(Manning, 2009)

26 / 30

Page 27: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Document Level Topic Analysis

Text collections - “represented as random mixtures over latenttopics, where each topic is characterized by a distribution over

words”(Blei, 2003)

27 / 30

Page 28: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

Conclusion

1 There is a need for text mining tools designed for linguistsand literary scholars

2 Interactive user-friendly applications bridge the gapbetween data mining and digital humanities

3 Shiny framework can be incorporated in any digitalcorpora to exhibit, search or visualize written collections

28 / 30

Page 29: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

ITMS

Browser and Smart Phone

Questions, comments

https://languagevariationsuite.wordpress.com/

29 / 30

Page 30: Interactive Text Mining Suite: Data Visualization for Literary Studies

Introduction

VisualizationMethods

ITMS

MedievalCorpus

Conclusion

References

Mohammad, Saif. 2013. From Once Upon a Time to Happily Ever After:Tracking Emotions in Novels and Fairy Tales. In Proceedings of the ACLWorkshop on Language Technology for Cultural Heritage, Social Sciences, andHumanities (LaTeCH), 2011, Portland, OR.Moretti, Franco. 2005. Graphs, maps, trees: abstract models for a literary history.R.R. Donnelley & Sons.Oelke, Daniela, Dimitrios Kokkinakis and Mats Malm. 2012. Advanced VisualAnalytics Methods for Literature Analysis. In Proceedings of the 6th EACLWorkshop, 35-44.Rydberg-Cox, Jeff. 2011. Social Networks and the Language of Greek Tragedy.Journal of the Chicago Colloquium on Digital Humanities and Computer Science.1(3): 1-11.Thomas, James and Kristin Cook. 2005. Illuminating the Path: the Research andDevelopment Agenda for Visual Analytics. National Visualization and AnalyticsCenter.Vuillemot, Romain, Tanya Clement, Catherine Plaisant and Amit Kumar. 2009.What’s Being Near “Martha”? Exploring Name Entities in Literary TextCollections. In Proceedings if the IEEE Symposium. Atlantic City, New Jersey.107-114.http://www.clipartbest.com/clipart-9i4A55xiE

30 / 30