a statistical study of latin elegiac couplets

1
A Statistical Study of Latin Elegiac Couplets C.W. Forstall 1 and W.J. Scheirer 2 1.Department of Classics, State University of New York at Buffalo 2. Department of Computer Science, University of Colorado at Colorado Springs Motivation References Elegiac Couplets The Functional n-gram Analysis The Significance of the bi-gram er A Comparison of Two Meters 1. C. Forstall, S. Jacobson and W. Scheirer, “Evidence of Intertextuality: Investigating Paul the Deacon’s Angustae Vitae,” presented at Digital Humanities, July 2010. 2. N. Coffee, J. Koenig, S. Poornima, C. Forstall and R. Ossewaarde, The Tesserae Project. http:// tesserae.caset.buffalo.edu 3. C. Forstall and W. Scheirer, “Features from Frequency: Authorship and Stylistic Analysis Using Repetitive Sound,” Chicago Colloquium on Digital Humanities and Computer Science, 2009. 4. M. Platnauer, Latin Elegiac Verse: A Study of the Metrical Usages of Tibullus, Propertius & Ovid. Cambridge University Press, 1951. 5. G. Conte, Latin Literature: A History, Translated by J.B. Solodow, the Johns Hopkins University Press, 1999. Observation: Sound plays a fundamental role in an author’s style, particularly for poets. The functional n-gram is a feature for stylistic analysis, whereby the power of the Zipfian distribution is realized by selecting the n-grams that occur most frequently as features, while preserving their relative probabilities as the actual feature element. n-1 n-1 C(e n-N+1 e n ) C(e n-N+1 ) iff P(e n | e n-N+1 ) = freq(e n-N+1 e n ) > φ n-1 Feature: The Functional n-gram In this work, we consider primitive sound elements as functional character level bi-grams. ōd’ ĕt ămō. quār’ īd făcĭām, fōrtāssĕ rĕquīrīs. nēscĭŏ, sēd fĭĕrī sēntĭĕt ēxcrŭcĭōr. I hate and I love. Perhaps you ask why I do it? I don't know, but I feel it happening, and I am in torment. Catullus 85 nēscĭŏ quīd fūrtīvŭs ămōr părăt. ūtĕrĕ, qua͞esō, dūm lĭcĕt : īn lĭquĭdā nāt tĭbĭ līntĕr ăquā. Sneaky Love is up to something. Enjoy it while you can, I beg: your boat sails in clear waters. Tibullus 1.5 – lines 75 & 76 sōlă vĭrō mŭlĭēr spŏlĭīs ēxūltăt ădēmptīs, sōlă lŏcāt nōctēs, sōlă lĭcēndă vĕnīt, Alone woman delights in what she steals from a man, Alone she hires out her nights, alone she comes up for sale. Ovid Amores 1.10 – lines 29 & 30 n-1 The elegiac meter 4 is used for a variety of themes, most notably Love 5 . The elegiac couplet is a pair of two different one-line “verses”: In the above, — represents a long syllable and ˘˘ a pair of short syllables; the two symbols superimposed represent the poet's choice of either one long or two shorts. The first verse is identical to a verse of dactylic hexameter; the second, often called the “pentameter” verse of the couplet, is shorter by two half-feet. We are interested in understanding the nature of the sound that is constrained by elegiac couplets – does it reflect the voice of the poet, or the general style of the elegiac form? ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 0 500 1000 1500 2000 5.0 5.2 5.4 5.6 5.8 6.0 6.2 word length Index chars / word ! ! ! catull. hex catull. eleg. hor. hex. juv. hex. luc. hex. lucr. hex. ov. eleg. ov. hex. prop. eleg. stat. hex. tib. eleg. verg. hex. Word Length in Elegiac Couplets and Dactylic Hexameters 0.1 0.15 0.2 0.25 0.3 0.35 5 10 15 20 25 30 35 40 45 probability sample bi-gram frequencies for the sequence ’er’ 50 line samples Major stylistic shift between 64 & 65. Poem 65 begins the elegiac corpus Catullus 1 - 60 Catullus 61 - 64 Catullus 65 - 116 Average: 0.179 Average: 0.171 Average: 0.241 Analysis of stylistic difference in the elegiac couplets of Catullus 0.1 0.15 0.2 0.25 0.3 0.35 5 10 15 20 25 30 35 probability sample bi-gram frequencies for the sequence ’er’ Tibullus Book 1 Tibullus Book 2 Tibullus Book 3 ! = 0.03226 ! = 0.04335 ! = 0.0532 Highest deviation in Book 3. Book 3 is generally attributed to other poets. Analysis of stylistic variation between different books of Tibullus Red: Elegiac Couplets Black: Dactylic Hexameters Calculating the associated probabilities for er over a collection of 50 line samples spanning the entire Catullan corpus exposes a clear break between the elegiac poems (65 - 116) and the rest. The values taken on by a distinct functional n-gram have been found to vary by meter and poet. They can reveal much about the style of a single poet. Beyond bi-gram frequencies, useful results were obtained from mean word length, the feature most sensitive to meter. The number of characters per word tended to be higher for dactylic hexameter than for elegiac couplets both within and between authors. For instance, for 50 line samples representing the three individual books of Tibullus the highest standard deviation belongs to Book 3, which is attributed to a collection of poets, including Tibullus, Sulpicia, and other (often inferior) writers. The standard deviation of the bi-gram frequency er, calculated over samples drawn from a particular poet, indicates the additional presence of an author signal. Catullus 64 was dramatically higher, separated completely from the rest of the Catullan corpus, and generally higher than samples from any author in either meter. Problem: a deficiency was the lack of a large data base of poets who wrote in both meters. Solution: split the elegiac corpus into two halves, a hexameter half and a pentameter half, cutting each couplet in two. A preliminary study considered: Catullus, Ovid, Propertius, and Tibullus. Samples of 150 randomly-chosen words. Features considered: the bi-gram frequency nt, the ratio um:am, and word length Results: all features are sensitive to the difference between hexameter and pentameter. While, as expected, word length was greater for the hexameter half of the elegiac couplet than for the pentameter, it was still not as high as for stichic (continuous) hexameters. One model to explain this postulates blending of a genre-dependent signal with the meter signal. This work is part of an ongoing study 1,3 of repetitive sound and its relationship to style in poetry. Within the Digital Humanities, stylistic studies have been produced for a wide variety of literature, including poetry. Existing feature sets and analysis techniques have most often examined texts at the word-level. A word-level examination captures only part of the underlying sound content of a poem, which is fundamental to its composition. Here we introduce a variety of sound-based statistical features found to be useful descriptors of Latin poetics. In this work, we look at the role repetitive sound plays in the Latin elegiac couplet, where just a single character-level bi-gram can be a defining component of the form 1 . We are working to incorporate our feature sets and classification components into the University at Buffalo’s Tesserae 2 project, an online tool which provides scholars studying Latin poetry easy access to sophisticated textual analysis tools. Functional n-grams for elegiac couplets: er top bi-gram that is common to all poets considered nt – bi-gram with the greatest metrical variation um – bi-gram sensitive to meter signal am – bi-gram sensitive to meter signal Latin Elegists considered in this study: Catullus Ovid Propertius Tibullus Other Latin poets considered in this study: Horace Juvenal Lucan Lucretius Statius Vergil

Upload: others

Post on 01-Aug-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Statistical Study of Latin Elegiac Couplets

A Statistical Study of Latin Elegiac Couplets

C.W. Forstall1 and W.J. Scheirer2

1. Department of Classics, State University of New York at Buffalo 2.  Department of Computer Science, University of Colorado at Colorado Springs

Motivation

References

Elegiac Couplets

The Functional n-gram Analysis

The Significance of the bi-gram er

A Comparison of Two Meters

1.  C. Forstall, S. Jacobson and W. Scheirer, “Evidence of Intertextuality: Investigating Paul the Deacon’s Angustae Vitae,” presented at Digital Humanities, July 2010.

2.  N. Coffee, J. Koenig, S. Poornima, C. Forstall and R. Ossewaarde, The Tesserae Project. http://tesserae.caset.buffalo.edu

3.  C. Forstall and W. Scheirer, “Features from Frequency: Authorship and Stylistic Analysis Using Repetitive Sound,” Chicago Colloquium on Digital Humanities and Computer Science, 2009.

4.  M. Platnauer, Latin Elegiac Verse: A Study of the Metrical Usages of Tibullus, Propertius & Ovid. Cambridge University Press, 1951.

5.  G. Conte, Latin Literature: A History, Translated by J.B. Solodow, the Johns Hopkins University Press, 1999.

Observation: Sound plays a fundamental role in an author’s style, particularly for poets.

The functional n-gram is a feature for stylistic analysis, whereby the power of the Zipfian distribution is realized by selecting the n-grams that occur most frequently as features, while preserving their relative probabilities as the actual feature element.

n-1

n-1

C(en-N+1en)

C(en-N+1) iff P(en | en-N+1) = freq(en-N+1en) > φ n-1

Feature: The Functional n-gram

In this work, we consider primitive sound elements as functional character level bi-grams.

ōd’ ĕt ămō. quār’ īd făcĭām, fōrtāssĕ rĕquīrīs. nēscĭŏ, sēd fĭĕrī sēntĭ’ ĕt ēxcrŭcĭōr.

I hate and I love. Perhaps you ask why I do it? I don't know, but I feel it happening,

and I am in torment.

Catullus 85

nēscĭŏ quīd fūrtīvŭs ămōr părăt. ūtĕrĕ, qua ͞esō, dūm lĭcĕt : īn lĭquĭdā nāt tĭbĭ līntĕr ăquā.

Sneaky Love is up to something. Enjoy it while you can, I beg:

your boat sails in clear waters.

Tibullus 1.5 – lines 75 & 76 sōlă vĭrō mŭlĭēr spŏlĭīs ēxūltăt ădēmptīs, sōlă lŏcāt nōctēs, sōlă lĭcēndă vĕnīt,

Alone woman delights in what she steals from a man,

Alone she hires out her nights, alone she comes up for sale.

Ovid Amores 1.10 – lines 29 & 30

n-1

The elegiac meter4 is used for a variety of themes, most notably Love5. The elegiac couplet is a pair of two different one-line “verses”:

In the above, — represents a long syllable and ˘˘ a pair of short syllables; the two symbols superimposed represent the poet's choice of either one long or two shorts. The first verse is identical to a verse of dactylic hexameter; the second, often called the “pentameter” verse of the couplet, is shorter by two half-feet.

We are interested in understanding the nature of the sound that is constrained by elegiac couplets – does it reflect the voice of the poet, or the general style of the elegiac form?

!

!

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!!

!!

!

!

!

!!

!

!!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!!!

!

!!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!!!

!!!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!!!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!!!

!!

!

!

!!!

!

!

!

!

!

!

!

!

!

0 500 1000 1500 2000

5.0

5.2

5.4

5.6

5.8

6.0

6.2

word length

Index

chars

/ w

ord

!

!

!

catull. hex

catull. eleg.

hor. hex.

juv. hex.

luc. hex.

lucr. hex.

ov. eleg.

ov. hex.

prop. eleg.

stat. hex.

tib. eleg.

verg. hex.

Word Length in Elegiac Couplets and Dactylic Hexameters

0.1

0.15

0.2

0.25

0.3

0.35

5 10 15 20 25 30 35 40 45

prob

abilit

y

sample

bi-gram frequencies for the sequence ’er’50 line samples

Major stylistic shift between 64 & 65.Poem 65 begins the elegiac corpus

Catullus 1 - 60

Catullus 61 - 64

Catullus 65 - 116

Average:0.179

Average:0.171

Average:0.241

Analysis of stylistic difference in the elegiac couplets of Catullus

0.1

0.15

0.2

0.25

0.3

0.35

5 10 15 20 25 30 35

prob

abilit

y

sample

bi-gram frequencies for the sequence ’er’

TibullusBook 1

TibullusBook 2

TibullusBook 3

! = 0.03226 ! = 0.04335 ! = 0.0532

Highest deviation in Book 3. Book 3 is generally attributed to other poets.

Analysis of stylistic variation between different books of Tibullus

Red: Elegiac Couplets Black: Dactylic Hexameters

Calculating the associated probabilities for er over a collection of 50 line samples spanning the entire Catullan corpus exposes a clear break between the elegiac poems (65 - 116) and the rest.

The values taken on by a distinct functional n-gram have been found to vary by meter and poet. They can reveal much about the style of a single poet.

Beyond bi-gram frequencies, useful results were obtained from mean word length, the feature most sensitive to meter. The number of characters per word tended to be higher for dactylic hexameter than for elegiac couplets both within and between authors.

For instance, for 50 line samples representing the three individual books of Tibullus the highest standard deviation belongs to Book 3, which is attributed to a collection of poets, including Tibullus, Sulpicia, and other (often inferior) writers.

The standard deviation of the bi-gram frequency er, calculated over samples drawn from a particular poet, indicates the additional presence of an author signal.

Catullus 64 was dramatically higher, separated completely from the rest of the Catullan corpus, and generally higher than samples from any author in either meter.

Problem: a deficiency was the lack of a large data base of poets who wrote in both meters. Solution: split the elegiac corpus into two halves, a hexameter half and a pentameter half, cutting each couplet in two.

A preliminary study considered: Catullus, Ovid, Propertius, and Tibullus. Samples of 150 randomly-chosen words. Features considered: the bi-gram frequency nt, the ratio um:am, and word length Results: all features are sensitive to the difference between hexameter and pentameter. While, as expected, word length was greater for the hexameter half of the elegiac couplet than for the pentameter, it was still not as high as for stichic (continuous) hexameters. One model to explain this postulates blending of a genre-dependent signal with the meter signal.

This work is part of an ongoing study1,3 of repetitive sound and its relationship to style in poetry.

Within the Digital Humanities, stylistic studies have been produced for a wide variety of literature, including poetry. Existing feature sets and analysis techniques have most often examined texts at the word-level. A word-level examination captures only part of the underlying sound content of a poem, which is fundamental to its composition. Here we introduce a variety of sound-based statistical features found to be useful descriptors of Latin poetics.

In this work, we look at the role repetitive sound plays in the Latin elegiac couplet, where just a single character-level bi-gram can be a defining component of the form1. We are working to incorporate our feature sets and classification components into the University at Buffalo’s Tesserae2 project, an online tool which provides scholars studying Latin poetry easy access to sophisticated textual analysis tools.

Functional n-grams for elegiac couplets: er – top bi-gram that is common to all poets considered nt – bi-gram with the greatest metrical variation um – bi-gram sensitive to meter signal am – bi-gram sensitive to meter signal

Latin Elegists considered in this study: Catullus Ovid Propertius Tibullus

Other Latin poets considered in this study: Horace Juvenal Lucan Lucretius Statius Vergil