unwritten history of literary practice
DESCRIPTION
text mining, distant reading, macroanalysis, eighteenth and nineteenth-century literary history, data mining, machine learningTRANSCRIPT
![Page 1: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/1.jpg)
THE UNWRITTEN HISTORY
OF LITERARY PRACTICE.
TED UNDERWOOD
FEB 28, 2013
![Page 2: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/2.jpg)
PRE- AND POST-1150
EXAMPLES
PRE
all
well
good
world
make
king
name
POST
general
country
power
state
present
interest
number
![Page 3: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/3.jpg)
1.0
1.5
2.0
2.5
3.0
1700 1750 1800 1850 1900
Genre
Poetry
Drama
Prose fiction
Nonfiction
DIFFERENTIATION OF FOUR GENRES.
The y axis is a ratio:
Number of pre-1150 words /
number of post-1150 words
![Page 4: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/4.jpg)
CORRELATE WITH THE
RISING PRE-1150 TREND.
![Page 5: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/5.jpg)
CORRELATE NEGATIVELY
WITH THE TREND.
![Page 6: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/6.jpg)
HOW DO YOU FIND THE
FICTION IN A COLLECTION
OF 469,000 VOLUMES?
![Page 7: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/7.jpg)
1. Tag a
“training
corpus” of
example
documents.
2. Identify
features.
naive Bayes
naive Bayes
naive Bayes
naive Bayes
3. Train an
ensemble.
![Page 8: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/8.jpg)
NAIVE BAYES ON TEXT AND TITLES,
COMBINED WITH LOGISTIC REGRESSION.(432 VOLS HELD OUT FROM CORPUS OF 1356 19C VOLS.)
predicted
actual
prose
nonfiction
prose
fiction
verse and
drama
Recall
prose
nonfiction
118 5 0 0.959
prose
fiction
1 143 1 0.986
verse and
drama
0 7 157 0.957
Precision 0.992 0.923 0.994
![Page 9: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/9.jpg)
Weight classifiers by proximity to
the date of the unknown
document.
1700 1800 1900
19c classifier
18c classifier
![Page 10: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/10.jpg)
![Page 11: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/11.jpg)
![Page 12: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/12.jpg)
FEATURES CONSISTENTLY
MORE COMMON
(WILCOXON TEST, N = 220)
IN FIRST PERSON IN THIRD PERSON
![Page 13: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/13.jpg)
MEAN SIMILARITY TO FIRST-
PERSON.
Mean prob.
of “first-
person” for
all fiction
vols.
I‟ve
left out
1700-1720
here, beca
use the
sample
size is so
small.
timespan
mean
firs
t
0.3
0.4
0.5
0.6
1750 1800 1850
![Page 14: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/14.jpg)
WHAT DO THIRD-PERSON
NARRATORS TALK ABOUT?
herself -0.489
himself -0.475
him -0.440
had -0.369
eyes -0.312
was -0.281
face -0.274
hers -0.269
voice -0.249
remembered -0.246
lips -0.243
felt -0.242
turned -0.231
girl -0.227
pale -0.226
loved -0.226
watched -0.223
trembling -0.222
looked -0.222
conscious -0.219
smile -0.216
sudden -0.212
silent -0.209
silence -0.206
husband -0.204
daughter -0.203
![Page 15: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/15.jpg)
WHAT DO THIRD-PERSON
NARRATORS TALK ABOUT?
herself -0.489
himself -0.475
him -0.440
had -0.369
eyes -0.312
was -0.281
face -0.274
hers -0.269
voice -0.249
remembered -0.246
lips -0.243
felt -0.242
turned -0.231
girl -0.227
pale -0.226
loved -0.226
watched -0.223
trembling -0.222
looked -0.222
conscious -0.219
smile -0.216
sudden -0.212
silent -0.209
silence -0.206
husband -0.204
daughter -0.203
![Page 16: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/16.jpg)
0.000
0.005
0.010
0.0 0.5 1.0 1.5
log(pronounratio+1)
agg
regate
fre
quen
cy o
f 'fa
cia
l g
estu
res'
“WE DIDN‟T NEED FIRST PERSON. WE HAD …
FACES!!” N = 47,500, R = -0.247
bodily signs of
emotion:
eyes
face
voice
lips
smile
glance
tears
pale
trembling
sigh
![Page 17: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/17.jpg)
FIRST-PERSON NARRATORS
QUANTIFY MORE:
R = 0.21 ON N=47,500
log(+1) ratio of first to third person pronouns
ag
gre
gate
fre
que
ncy o
f num
bers
0.005
0.010
0.015
0.5 1.0 1.5
![Page 18: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/18.jpg)
DEFOE‟S „QUANTIFYING
NARRATOR‟ IS NOT ALONE.*
“I never saw them afterwards, or any sign of them, except
three of their hats, one cap, and two shoes that were not
fellows.”
The Life and Strange Surprising Adventures of Robinson Crusoe
see also …
Perseverance Island, or the Robinson Crusoe of the 19c.
The Boy Tar: or, A Voyage in the Dark
A Lady’s Experiences in the Wild West in 1883
The Swiss Family Robinson
The Shipwreck and Adventures of M. Pierre Viaud
etc. etc. etc. „quantifying narrator‟ h/t Brett D. Wilson.
![Page 19: Unwritten History of Literary Practice](https://reader033.vdocuments.us/reader033/viewer/2022052601/5595f28c1a28ab5e0e8b47d9/html5/thumbnails/19.jpg)
WHAT GETS QUANTIFIED
(H/T PATRICK JUOLA)
IN FIRST PERSON
inches pistols
canoes barrels
feet englishmen
guns gallons
savages slaves
IN THIRD
centuries
figures
tears
friends
eyes