![Page 1: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/1.jpg)
Part-of-SpeechTaggingforHistoricalEnglish
YiYangandJacobEisensteinGeorgiaTech
![Page 2: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/2.jpg)
[MuralidharanandHearst,2011&2012]
‣ DigitalhumaniEesresearch
‣ HowdoestheportrayalofmenandwomendifferinShakespeare’splays?
‣ What’sthelanguageusepaMernsinNorthAmericanslavenarraEves?
![Page 3: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/3.jpg)
[MuralidharanandHearst,2011&2012]
‣ NLPcanhelp!
‣ DigitalhumaniEesresearch
‣ HowdoestheportrayalofmenandwomendifferinShakespeare’splays?
‣ What’sthelanguageusepaMernsinNorthAmericanslavenarraEves?
![Page 4: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/4.jpg)
[MuralidharanandHearst,2011&2012]
‣ NLPcanhelp!
‣ DigitalhumaniEesresearch
‣ HowdoestheportrayalofmenandwomendifferinShakespeare’splays?
‣ What’sthelanguageusepaMernsinNorthAmericanslavenarraEves?
‣ OnlyifNLPworksforhistoricaltexts…
![Page 5: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/5.jpg)
EarlyModernEnglish
Heesaidnobodyhadsaidanythingagtmee.
[HenryOxinden,1660]
![Page 6: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/6.jpg)
EarlyModernEnglish
Heesaidnobodyhadsaidanythingagtmee.
‣SpellingvariaEon
He againstHe me
[HenryOxinden,1660]
![Page 7: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/7.jpg)
StanfordPOSTagger
Heesaidnobodyhadsaidanythingagtmee.
‣SpellingvariaEon
Stanford:
![Page 8: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/8.jpg)
StanfordPOSTagger
Heesaidnobodyhadsaidanythingagtmee.X X X
‣SpellingvariaEon
Stanford:Gold:
![Page 9: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/9.jpg)
TransferLossforPOSTagging
0
5
10
15
20
25
3.0
Errorrate
ModernEnglish
[Raysonetal.,2007]
![Page 10: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/10.jpg)
TransferLossforPOSTagging
0
5
10
15
20
25
18.0
3.0
Errorrate
ModernEnglish
EarlyModernEnglish
[Raysonetal.,2007]
![Page 11: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/11.jpg)
Approaches
‣ SpellingnormalizaEon }‣ Mapfromhistoricalspellingstocontemporaryforms.
Raysonetal.(2007)Scheibleetal.(2011)Bollmann(2011)
![Page 12: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/12.jpg)
Approaches
‣ DomainadaptaEon(thiswork)
‣ SpellingnormalizaEon }‣ Mapfromhistoricalspellingstocontemporaryforms.
‣ BuildrobustNLPsystemswithrepresentaEonlearning.
Raysonetal.(2007)Scheibleetal.(2011)Bollmann(2011)
}Yang&Eisenstein(2014)Yang&Eisenstein(2015)
![Page 13: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/13.jpg)
SpellingNormalizaEon
[VARD;BaronandRayson,2008]
Original:Heesaidnobodyhadsaidanythingagtmee.
Normalized:Heesaidnobodyhadsaidanythingagedme.
![Page 14: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/14.jpg)
SpellingNormalizaEon
‣CorrectnormalizaEon
[VARD;BaronandRayson,2008]
Original:Heesaidnobodyhadsaidanythingagtmee.
Normalized:Heesaidnobodyhadsaidanythingagedme.
X
![Page 15: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/15.jpg)
SpellingNormalizaEon
‣CorrectnormalizaEon
[VARD;BaronandRayson,2008]
Original:Heesaidnobodyhadsaidanythingagtmee.
Normalized:Heesaidnobodyhadsaidanythingagedme.
‣IncorrectnormalizaEon
X X
against
![Page 16: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/16.jpg)
SpellingNormalizaEon
‣CorrectnormalizaEon
[VARD;BaronandRayson,2008]
Original:Heesaidnobodyhadsaidanythingagtmee.
Normalized:Heesaidnobodyhadsaidanythingagedme.
‣IncorrectnormalizaEon‣FalsenegaEve
X X
against
X
He
![Page 17: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/17.jpg)
SpellingNormalizaEon
[VARD;BaronandRayson,2008]
Normalized:Heesaidnobodyhadsaidanythingagedme.
X XX
Stanford:Gold:
![Page 18: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/18.jpg)
SpellingNormalizaEon
[VARD;BaronandRayson,2008]
Normalized:Heesaidnobodyhadsaidanythingagedme.
X XX
Stanford:Gold:
XX
![Page 19: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/19.jpg)
RepresentaEonLearning
Heesaidnobodyhadsaidanythingagtmee.
![Page 20: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/20.jpg)
RepresentaEonLearning
Heesaidnobodyhadsaidanythingagtmee.
![Page 21: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/21.jpg)
RepresentaEonLearning
Heesaidnobodyhadsaidanythingagtmee.
![Page 22: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/22.jpg)
RepresentaEonLearning
Heesaidnobodyhadsaidanythingagtmee.
Hee
saidwascametold…
} HeI
We…
saidwascametold…
}IVOOV Context Context
![Page 23: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/23.jpg)
Model
![Page 24: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/24.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
![Page 25: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/25.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
![Page 26: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/26.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
![Page 27: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/27.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
![Page 28: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/28.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
![Page 29: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/29.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
![Page 30: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/30.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
u2
Outputembeddings
Inputembeddings
v1
v3
v4
p(ft|f2) / exp
�u2
>vt
�
![Page 31: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/31.jpg)
FeatureEmbeddings
[FEMA;YangandEisenstein,2015]
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
u2
Outputembeddings
Inputembeddings
v1
v3
v4
p(ft|f2) / exp
�u2
>vt
�
` =TX
t 6=2
logp(ft|f2)
![Page 32: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/32.jpg)
WordEmbeddings
[word2vec;Mikolovetal.,2013]
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
heesaidnobodyhad…
}words
1
2
3
4
1
2
3
4
‣ Wordembeddings
‣ Featureembeddings
![Page 33: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/33.jpg)
WordEmbeddings
[word2vec;Mikolovetal.,2013]
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
heesaidnobodyhad…
}words
1
2
3
4
1
2
3
4
‣ Wordembeddings
‣ Featureembeddings
‣ GenericrepresentaEons
![Page 34: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/34.jpg)
WordEmbeddings
[word2vec;Mikolovetal.,2013]
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
heesaidnobodyhad…
}words
1
2
3
4
1
2
3
4
‣ Wordembeddings
‣ Featureembeddings
‣ GenericrepresentaEons
‣ Task-specificrepresentaEons
![Page 35: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/35.jpg)
WordEmbeddings
[word2vec;Mikolovetal.,2013]
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
heesaidnobodyhad…
}words
1
2
3
4
1
2
3
4
‣ Wordembeddings
‣ Featureembeddings
‣ GenericrepresentaEons
‣ Task-specificrepresentaEons
‣ Wordco-occurrences
![Page 36: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/36.jpg)
WordEmbeddings
[word2vec;Mikolovetal.,2013]
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
heesaidnobodyhad…
}words
1
2
3
4
1
2
3
4
‣ Wordembeddings
‣ Featureembeddings
‣ GenericrepresentaEons
‣ Task-specificrepresentaEons
‣ Wordco-occurrences
‣ Featureco-occurrences
![Page 37: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/37.jpg)
LearningfromMulEpleDomains
[FEMA;YangandEisenstein,2015]
‣PreviousworkonunsuperviseddomainadaptaEoninvolvesintwodomains.
![Page 38: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/38.jpg)
LearningfromMulEpleDomains
[FEMA;YangandEisenstein,2015]
‣PreviousworkonunsuperviseddomainadaptaEoninvolvesintwodomains.‣UnsupervisedmulE-domainadaptaEon
![Page 39: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/39.jpg)
LearningfromMulEpleDomains
[FEMA;YangandEisenstein,2015]
‣PreviousworkonunsuperviseddomainadaptaEoninvolvesintwodomains.‣UnsupervisedmulE-domainadaptaEon
![Page 40: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/40.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
![Page 41: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/41.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
DomainAMributes: Genre Epoch
![Page 42: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/42.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
leMers 1600+
DomainAMributes: Genre Epoch
![Page 43: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/43.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
leMers 1600+
DomainAMributes:
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
Genre Epoch
![Page 44: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/44.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
leMers 1600+
DomainAMributes:
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
= +(shared) (leMers)
+(1600+)
Genre Epoch
![Page 45: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/45.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.
leMers 1600+
DomainAMributes:
CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
= +(shared) (leMers)
+(1600+)
Genre Epoch
![Page 46: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/46.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
(shared) (leMers)= + +
(1600+)
![Page 47: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/47.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
(shared) (leMers)
= + +
(1600+)
u2 = h(shared)2 + h(letters)
2 + h(1600+)2
= + +
![Page 48: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/48.jpg)
MulEpleFeatureEmbeddings
[FEMA;YangandEisenstein,2015]
Heesaidnobodyhadsaidanythingagtmee.CurrWord=heeNextWord=said
Prefix1=hSuffix1=e
…
}features
1
2
3
4
(shared) (leMers) (1600+)
u2 = h(shared)2 + h(letters)
2 + h(1600+)2
= + +
p(ft|f2) / exp
�u2
>vt
�
![Page 49: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/49.jpg)
Experiments
![Page 50: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/50.jpg)
PennCorporaofHistoricalEnglishModernBriEshEnglish(MBE)
1840-1914
1770-1839
1700-1769
0 110,000 220,000 330,000 440,000
343,024
427,424
322,255
#oftokens
EarlyModernEnglish(EME)
1640-1710
1570-1639
1500-1569
0 177,500 355,000 532,500 710,000
640,255
706,587
614,315
#oftokens
[KrochandTaylor,2000;Krochetal.,2004]
![Page 51: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/51.jpg)
TagsetMappings
‣PennCorporaofHistoricalEnglish(PCHE)tagset:83tags‣PennTreebank(PTB)tagset:45tags
[MoonandBaldridge,2007]
![Page 52: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/52.jpg)
TagsetMappings
‣PennCorporaofHistoricalEnglish(PCHE)tagset:83tags‣PennTreebank(PTB)tagset:45tags
[MoonandBaldridge,2007]
ADJ
PCHE PTB
JJADVALSO RB
VB VBVBI… …
![Page 53: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/53.jpg)
Systems
‣Supportvectormachine(SVM)tagger‣SixteenbasicfeaturetemplatesbyRatnaparkhi(1996)
![Page 54: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/54.jpg)
Systems
‣Supportvectormachine(SVM)tagger
‣RepresentaEonlearningmethods
‣SixteenbasicfeaturetemplatesbyRatnaparkhi(1996)
‣Structuralcorrespondencelearning(SCL)‣Brownclustering‣word2vecembeddings‣MulEplefeatureembeddings(FEMA)
[Blitzeretal.,2006;Brownetal.,1992;Mikolovetal.,2013]
![Page 55: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/55.jpg)
TemporalAdaptaEonModernBriEshEnglish(MBE)
1840-1914
1770-1839
1700-1769
0 110,000 220,000 330,000 440,000
343,024
427,424
322,255
#oftokens
EarlyModernEnglish(EME)
1640-1710
1570-1639
1500-1569
0 177,500 355,000 532,500 710,000
640,255
706,587
614,315
#oftokens
Train Train
Test1 Test1
Test2 Test2
![Page 56: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/56.jpg)
0
1.2
2.4
3.6
4.8
6
4.6
Averageerrorrate
Baseline SCL Brown word2vecFEMA
Results:ModernBriEshEnglish
![Page 57: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/57.jpg)
0
1.2
2.4
3.6
4.8
6
4.44.24.34.6
Averageerrorrate
Baseline SCL Brown word2vecFEMA
Results:ModernBriEshEnglish
![Page 58: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/58.jpg)
0
1.2
2.4
3.6
4.8
6
3.74.44.24.3
4.6
Averageerrorrate
Baseline SCL Brown word2vecFEMA
Results:ModernBriEshEnglish
(-0.9)
(Ourmethod)
![Page 59: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/59.jpg)
0
2.2
4.4
6.6
8.8
11
9.4
BaselineSCL Brown word2vec
FEMA
Averageerrorrate
Results:EarlyModernEnglish
![Page 60: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/60.jpg)
0
2.2
4.4
6.6
8.8
11
8.38.08.29.4
BaselineSCL Brown word2vec
FEMA
Averageerrorrate
Results:EarlyModernEnglish
![Page 61: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/61.jpg)
0
2.2
4.4
6.6
8.8
11
6.6
8.38.08.29.4
BaselineSCL Brown word2vec
FEMA
Averageerrorrate
Results:EarlyModernEnglish
(-2.8)
(Ourmethod)
![Page 62: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/62.jpg)
AdaptaEonfromPTB
PennTreebank
ModernBriEshEnglish
EarlyModernEnglish
0 500,000 1,000,000 1,500,000 2,000,000
1,961,157
1,092,703
969,905
#oftokens
Train
Test1
Test2
![Page 63: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/63.jpg)
AdaptaEonfromPTB
StandardevaluaEonscenarioforEnglishPOStagging.
![Page 64: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/64.jpg)
AdaptaEonfromPTB
StandardevaluaEonscenarioforEnglishPOStagging.
‣Lowresourcelanguages‣Specificgenres,styles,orepochs
InsufficientdataannotaEonforhistoricaltexts.
![Page 65: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/65.jpg)
0
4.6
9.2
13.8
18.4
23
18.9
Errorrate
Baseline SCL Brown word2vec FEMA
Results:ModernBriEshEnglish
![Page 66: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/66.jpg)
0
4.6
9.2
13.8
18.4
23
18.318.418.418.9
Baseline SCL Brown word2vec FEMAErrorrate
Results:ModernBriEshEnglish
![Page 67: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/67.jpg)
0
4.6
9.2
13.8
18.4
23
17.518.318.418.418.9
Baseline SCL Brown word2vec FEMAErrorrate
Results:ModernBriEshEnglish
(-1.4)
(Ourmethod)
![Page 68: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/68.jpg)
0
6
12
18
24
30
25.9
Baseline SCL Brown word2vec FEMAErrorrate
Results:EarlyModernEnglish
![Page 69: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/69.jpg)
0
6
12
18
24
30
24.224.024.125.9
Baseline SCL Brown word2vec FEMAErrorrate
Results:EarlyModernEnglish
![Page 70: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/70.jpg)
Results:EarlyModernEnglish
0
6
12
18
24
30
22.124.224.024.1
25.9
Baseline SCL Brown word2vec FEMAErrorrate
(-3.8)(Ourmethod)
![Page 71: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/71.jpg)
0
6
12
18
24
30
22.125.9
Baseline FEMA+VARD
FEMAErrorrate
NormalizaEonvs.RepresentaEonLearning
(-3.8) (-2.6)(-4.9)
RepresentaEonlearning
(-3.8)
FEMA
(-3.8)22.1
![Page 72: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/72.jpg)
0
6
12
18
24
30
23.322.125.9
Baseline VARD FEMA+VARD
FEMAErrorrate
NormalizaEonvs.RepresentaEonLearning
(-3.8) (-2.6)(-4.9)
RepresentaEonlearning
SpellingnormalizaEon
![Page 73: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/73.jpg)
0
6
12
18
24
30
21.023.322.1
25.9
Baseline VARD FEMA+VARD
FEMAErrorrate
NormalizaEonvs.RepresentaEonLearning
(-3.8) (-2.6)(-4.9)
RepresentaEonlearning+
normalizaEon
RepresentaEonlearning
SpellingnormalizaEon
![Page 74: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/74.jpg)
token annotaEonsinPCHE annotaEonsinPTB
,(comma) ,(comma;83.4%).(period;16.6%) ,(comma)
.(period) ,(comma;12.3%).(period;87.7%) .(period)
to TO(54.6%)IN(44.3%) TO
all/any/every JJ DT
ErrorAnalysis
‣AnnotaEoninconsistenciesandtagsetmismatches
![Page 75: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/75.jpg)
token annotaEonsinPCHE annotaEonsinPTB
,(comma) ,(comma;83.4%).(period;16.6%) ,(comma)
.(period) ,(comma;12.3%).(period;87.7%) .(period)
to TO(54.6%)IN(44.3%) TO
all/any/every JJ DT
ErrorAnalysis
‣AnnotaEoninconsistenciesandtagsetmismatches
![Page 76: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/76.jpg)
token annotaEonsinPCHE annotaEonsinPTB
,(comma) ,(comma;83.4%).(period;16.6%) ,(comma)
.(period) ,(comma;12.3%).(period;87.7%) .(period)
to TO(54.6%)IN(44.3%) TO
all/any/every JJ DT
ErrorAnalysis
‣AnnotaEoninconsistenciesandtagsetmismatches
![Page 77: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/77.jpg)
ErrorAnalysis
token annotaEonsinPCHE annotaEonsinPTB
,(comma) ,(comma;83.4%).(period;16.6%) ,(comma)
.(period) ,(comma;12.3%).(period;87.7%) .(period)
to TO(54.6%)IN(44.3%) TO
all/any/every JJ(quanEfier) DT
‣AnnotaEoninconsistenciesandtagsetmismatches
![Page 78: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/78.jpg)
Conclusions
![Page 79: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/79.jpg)
Conclusions
‣ FeatureembeddingsoutperformwordembeddingsbyexploiEngtask-specificinformaEoninfeaturetemplates.
![Page 80: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/80.jpg)
Conclusions
‣ RepresentaEonlearningandspellingnormalizaEonarecomplementaryforimprovingtaggingperformance.
‣ FeatureembeddingsoutperformwordembeddingsbyexploiEngtask-specificinformaEoninfeaturetemplates.
![Page 81: Part-of-Speech Tagging for Historical English · Penn Corpora of Historical English Modern BriEsh English (MBE) 1840-1914 1770-1839 1700-1769 0 110,000 220,000 330,000 440,000 343,024](https://reader033.vdocuments.us/reader033/viewer/2022053014/5f130f6783caa2412155751b/html5/thumbnails/81.jpg)
Conclusions
‣ RepresentaEonlearningandspellingnormalizaEonarecomplementaryforimprovingtaggingperformance.
‣ TagsetmismatchesmakeithardtoevaluatemodernPOStaggersforhistoricalEnglish.
‣ FeatureembeddingsoutperformwordembeddingsbyexploiEngtask-specificinformaEoninfeaturetemplates.