data visualization and digital humanities research:  a survey of available data sets and tools

24
101100LIteraryCriticism010111010001Shakespeare0 101Translation10 Linguistics11101DigtialCollect ons 01010TopicMapping 01History Data visualization and digital humanities research: a survey of available data sets and tools LITA National Forum 2011 St. Louis, MO Friday, September 30, 2011 Erik Mitchell, University of Maryland Susan Sharpless Smith, Wake Forest University

Upload: duyen

Post on 24-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Data visualization and digital humanities research:  a survey of available data sets and tools . LITA National Forum 2011 St. Louis, MO Friday, September 30, 2011 Erik Mitchell, University of Maryland Susan Sharpless Smith, Wake Forest University. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Data visualization and digital humanities research:

a survey of available data sets and tools

LITA National Forum 2011St. Louis, MO

Friday, September 30, 2011Erik Mitchell, University of Maryland

Susan Sharpless Smith, Wake Forest University

Page 2: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Motivation

“Digital humanities needs gateway drugs. Kudos to the pushers on the Google Books team.” - Dan Cohen http://www.dancohen.org/2010/12/19/

“Linked open data could have the same leveraging effect that the World Wide Web had on computing, said Micki McGee, an assistant professor of sociology at Fordham University”-Steve Kolowich, The Promise of Digital Humanities, Inside HigherEd

Page 3: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Birth of a word

“Imagine if you could record your life, everything you said, everything you did available in a perfect memory store at your finger tips. “

- Deb Roy – The Birth of a Word http://www.ted.com/

Page 4: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Overview

• Discuss examples of data-focused research tools

• Explore tools• Consider roles for librarians• Wrap-up/Q & A

Page 5: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Taxonomy of uses

Resource type Research methods

Discovery Text searching, citation chaining, concept exploration

Visualization Mapping, graphing, charting

Analysis / publishing Dataset publishing, statistical analysis, annotation

Page 6: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Searching and Discovery

Examples: BYU Corpua http://corpus.byu.edu/

WOK Citation Mapping WOK

Page 7: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Visualization

Free Visualization Tools

Page 8: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Analysis and publishing

NodeXL http://nodexl.codeplex.com/

Page 9: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Tool Comparison - linguistics

Evaluation areas Tool features

Index approach features Concordancing, lemmatization, semantic relationships, collocation/KWIC, sense disambiguation

External links / interoperability Links to lexical databases (e.g. wordnet), data export, metadata structures, common search features

Dataset population Population definition, open or closed, data source, syncronic/diacronic, mono, bi, pluralingual?

Page 10: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Tool exploration

• Discover / Search• What kinds of discovery tools exist and how

common are the discovery features across different datasets / systems?

• Visualization• What visualization features exist, are there products

that are easy to use, are the skills transferable?

• Analysis / Annotation• What analytical tools are included, what analysis

techniques are common?

Page 11: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Perseus

http://www.perseus.tufts.edu

Page 12: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

JSTOR Data For Research

http://dfr.jstor.org

Page 13: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Wordseer

Aditi Muralidharan Marti Hearsthttp://bebop.berkeley.edu/wordseer

Page 14: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Google’s Ngram Viewerbooks.google.com/ngrams

culturomics.org

But here's the rub. Google Books, as others point out, wasn't really built for research. . . That means Google Books didn't come with the interfaces scholars need for vast data manipulation . . . http://chronicle.com/article/The-Humanities-Go-Google/65713/

Page 15: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Ted talk on Google NGRAM viewer

http://www.ted.com/talks/what_we_learned_from_5_million_books.html

Page 16: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Concordancing

Eric Lease Morgan - http://dh.crc.nd.edu/sandbox/cyl/catalog/

Page 17: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Google’s public data explorer

http://www.google.com/publicdata/

Page 19: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Data cleaning – Google Refine

http://code.google.com/p/google-refine

Page 20: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Data visualization – Google Fusion Tables

http://google.com/fusiontables

http://www.google.com/fusiontables/DataSource?dsrcid=332788

Page 21: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Research/teaching need

• Researcher needs vary from advanced linguistic analysis and IT support to need for basic digital content/infrastructure

Corpus-based research

Page 22: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Librarian contributions

• Domain specific, tool-type specific comparisons

• IT and research support – data analysis, data curation, tool/data sources identification

• Shift from “reference” to “research” in sync with move from resource discovery to thematic analysis

Page 23: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Next steps

• Build new skills, develop new systems• Create tutorials guides• Explore connections between data/curation

and publishing and these tools – so is there a connection

• Explore role of library discovery systems and consider new feature implementation.

Page 24: Data visualization and digital humanities research:   a survey of available data sets and tools

1011

00LI

tera

ryCr

itici

sm01

0111

0100

01Sh

akes

pear

e010

1Tra

nsla

tion1

0Lin

guis

tics1

1101

Dig

tialC

olle

ction

s010

10To

pic

Ma

pp

ing0

1Hist

ory

Sites of interestData analysis• Google Refine• Rapidminer• Lingua tools

(http://search.cpan.org/~emorgan/)

• http://alias-i.com/lingpipe/web/competition.html

• Digital Resource Tools

Visualization• NodeXL• Google Public Data Explorer• Google Fusion Tables• http://bit.ly/lita_datatools• Projectbamboo.org

Data publishing• Corpus of Contemporary

American English• British National Corpus• http://corpus.byu.edu/• JSTOR DFR• digitalresearchtools.pbwor

ks.com

Discovery• Wordseer• Perseus (Tufts)• Google Ngram Viewer• Corpus.byu.edu