metaphor mining in historical german novels: an unsupervised learning approach

31
Metaphor Mining in Historical German Novels: An Unsupervised Learning Approach. Stefan Pernes University of Würzburg

Upload: stefanper

Post on 11-Apr-2017

556 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Metaphor Mining in Historical German Novels: An Unsupervised Learning Approach.

Stefan Pernes University of Würzburg

‣ Intro ‣Dataset ‣Method ‣Examples ‣Conclusion

Overview

‣ Intro ‣Dataset ‣Method ‣Examples ‣Conclusion

Overview

“Manifestations of metaphor are frequent in language, appearing on average in every third sentence of general-domain text, according to corpus studies”

- Shutova 2015

Context

Linguistic Metaphor

A systematic mapping, or rather, a projection of one domain of experience (the source, e.g. war)

onto another (the target, e.g. argument)

Definition

Conceptual Metaphor

A cognitive 'bracket' that encompasses any number of specific realizations (e.g. She shot down

all of my arguments)

Definition

Applications

‣Cognitive anthropology ‣Critical Discourse Analysis ‣Literary Studies ‣Philosophy ‣Psycholinguistics ‣…

‣ Intro ‣Dataset ‣Method ‣Examples ‣Conclusion

Overview

Corpus characteristics

‣about 1700 historical German novels ‣ ranging from the early 16th to the early 20th century ‣based on: http://www.germanistik.uni-

wuerzburg.de/lehrstuehle/computerphilologie/forschung/projekte/digibib

‣consists of 120.000 literary works, covering almost all canonical titles ‣also includes encyclopaedias, dictionaries

Dev Corpus #1

143 novels / 15 mio tokens

Dev Corpus #2

383 novels / 41 mio tokens / from 1780

‣ Intro ‣Dataset ‣Method ‣Examples ‣Conclusion

Overview

“Selectional preferences are the tendency for a word to semantically select or constrain which other words may appear in a direct syntactic

relation with it.” - Roberts and Egg 2014

Selectional preferences

Preprocessing

‣POS-Tagging (treetagger) ‣Dependency Parsing (mate-tools) ‣using a modular NLP pipeline designed for book-

length documents ‣http://github.com/DARIAH-DE/DARIAH-DKPro-

Wrapper

Preprocessing

‣extracting most frequent nouns and corresponding verbs in specific grammatical relations: subject, direct object, indirect object ‣ in German: subject, accusative object, dative object

Preprocessing

‣vector representation of nouns and normalization ‣using Jensen-Shannon divergence to construct noun-noun similarity matrix

Clustering: Connectivity based(baseline)

Dev Corpus #1 Dev Corpus #2

Clustering: Subspace based

Dev Corpus #1 Dev Corpus #2

‣ Intro ‣Dataset ‣Method ‣Examples ‣Conclusion

Overview

Examples

based on:

‣Dev Corpora #1 and #2 ‣Ward linkage @ 400 clusters ‣Complete linkage @ 1000 clusters

IDEAS ARE FOOD1

education

bildung (10): geben-dobj-0.181818181818 beanspruchen-dobj-0.0909090909091 taxieren-dobj-0.0909090909091 voraneilen-subj-0.0909090909091 überstrahlt-dobj-0.0909090909091 ausspräch-subj-0.0909090909091 nahestehen-subj-0.0909090909091 heraustreiben-dobj-0.0909090909091 ermangelnd-dobj-0.0909090909091 abschöpfen-dobj-0.0909090909091

memory

erinnerung (48): geben-dobj-0.135593220339 wachzurufen-dobj-0.0338983050847 stören-dobj-0.0338983050847 mahnen-dobj-0.0338983050847 aufgrischen-dobj-0.0338983050847 verlöschen-subj-0.0169491525424 wiederzuerwecken-dobj-0.0169491525424 neubeleben-dobj-0.0169491525424 frischen-dobj-0.0169491525424 hervorschießen-subj-0.0169491525424

hunger

hunger (10): geben-dobj-0.181818181818 erweren-dobj-0.0909090909091 büssen-dobj-0.0909090909091 schaben-subj-0.0909090909091 überhen-iobj-0.0909090909091 verschmachten-dobj-0.0909090909091 stärkern-dobj-0.0909090909091 bittern-subj-0.0909090909091 hinausgetreiben-subj-0.0909090909091 trainieren-iobj-0.0909090909091

1 cf. Lakoff et al. - Master Metaphor List

WELL-BEING IS WEALTH

money

geld (265): bekommen-dobj-0.216306156406 ausgeben-dobj-0.108153078203 leihen-dobj-0.0549084858569 reichen-subj-0.0249584026622 aufbringen-dobj-0.0166389351082 einbringen-dobj-0.0149750415973 herausgeben-dobj-0.0133111480865 liehen-dobj-0.0116472545757 zurückzahlen-dobj-0.0116472545757 vertun-dobj-0.00998336106489

security

sicherheit (12): bekommen-dobj-0.1875 bedroht-dobj-0.1875 fehlen-dobj-0.0625 glorios-dobj-0.0625 erquickt-subj-0.0625 bedrücken-iobj-0.0625 mistrauen-iobj-0.0625 geringhätzen-dobj-0.0625 geringschätzen-dobj-0.0625 sen-dobj-0.0625 betrage-dobj-0.0625 halber-dobj-0.0625

LUSTFUL PERSON IS AN ANIMAL

creature

kreatur (11): haben-subj-0.388888888889 anpreisen-dobj-0.111111111111 entfremden-subj-0.0555555555556 entquillen-dobj-0.0555555555556 herfürbringen-dobj-0.0555555555556 verübeln-dobj-0.0555555555556 fleuchen-dobj-0.0555555555556 spieen-subj-0.0555555555556 hinwegwerfen-iobj-0.0555555555556 posaunt-subj-0.0555555555556

lust

lust (48): haben-subj-0.37962962963 hättest-dobj-0.0925925925926 anwandeln-dobj-0.0555555555556 verspüren-subj-0.0277777777778 wegzulaufen-dobj-0.0185185185185 vergieng-dobj-0.0185185185185 hest-dobj-0.0185185185185 verspürt-dobj-0.0185185185185 fabulieren-dobj-0.00925925925926 verlör-subj-0.00925925925926

COMMUNICATION IS TRANSFER+ PSYCHOLOGICAL FORCES ARE PHYSICAL FORCES

scent

geruch (5): verbreiten-dobj-0.555555555556 hielten-dobj-0.111111111111 geben-dobj-0.111111111111 naschnüffeln-dobj-0.111111111111 entgegentrügen-iobj-0.111111111111

distrust

mißtrauen (3): verbreiten-dobj-0.333333333333 umwölkt-subj-0.333333333333 aufkeimen-subj-0.333333333333

EMOTIONS ARE PLANTS 2

flower

blume (47): pflücken-dobj-0.290697674419 liegen-subj-0.093023255814 lieben-dobj-0.0581395348837 begießen-dobj-0.0348837209302 welken-dobj-0.0232558139535 duften-dobj-0.0232558139535 duftet-subj-0.0116279069767 durchhauten-subj-0.0116279069767 hingesenken-subj-0.0116279069767 erblüht-dobj-0.0116279069767

emotion

gefühl (90): liegen-subj-0.0825688073394 ersticken-dobj-0.0642201834862 abstumpfen-dobj-0.0275229357798 hervorraufen-dobj-0.0183486238532 halten-iobj-0.0183486238532 entspinnen-dobj-0.0183486238532 hinausdehnen-dobj-0.00917431192661 aufwekken-dobj-0.00917431192661 anhielen-subj-0.00917431192661 arten-subj-0.00917431192661

2 not in Master Metaphor List

COMMUNICATION IS A LIQUID

milk

milch (11): fließen-subj-0.375 abmaß-dobj-0.0625 beschnuppern-dobj-0.0625 auflecken-dobj-0.0625 fleussen-dobj-0.0625 vertrocknen-dobj-0.0625 abrahmen-dobj-0.0625 herniederfleussen-dobj-0.0625 austranken-dobj-0.0625 hinuntergelassen-subj-0.0625

speech

rede (50): verschlagen-dobj-0.0535714285714 fließen-subj-0.0357142857143 verschlug-subj-0.0357142857143 abgewinnen-dobj-0.0357142857143 bestürzen-subj-0.0357142857143 hageln-dobj-0.0178571428571 kämen-dobj-0.0178571428571 heimzahlen-subj-0.0178571428571 verwunderen-dobj-0.0178571428571 coupieren-subj-0.0178571428571

PHYSICAL FORCES ARE ANIMALS

bear

bär (8): nehmen-dobj-0.272727272727 erliegen-dobj-0.181818181818 läg'-subj-0.0909090909091 herauszotteln-dobj-0.0909090909091 beschnüffeln-subj-0.0909090909091 losrennen-subj-0.0909090909091 erlegst-dobj-0.0909090909091 plumpt-iobj-0.0909090909091

flood

flut (14): nehmen-dobj-0.277777777778 spie-dobj-0.0555555555556 tosten-dobj-0.0555555555556 hinauswälzen-dobj-0.0555555555556 zubrüllen-dobj-0.0555555555556 wühlen-iobj-0.0555555555556 vorüberrausch-dobj-0.0555555555556 vertropfen-dobj-0.0555555555556 vorüberwälzen-dobj-0.0555555555556 herauffahren-dobj-0.0555555555556

CAUSES ARE OBJECTS

shirt

hemd (12): anziehen-dobj-0.391304347826 anlegen-dobj-0.0869565217391 durchstechen-dobj-0.0869565217391 herumzerren-dobj-0.0869565217391 entzweigerissen-dobj-0.0434782608696 übergewerfen-dobj-0.0434782608696 herunterschälen-dobj-0.0434782608696 zurückgeschlagen-dobj-0.0434782608696 strammen-dobj-0.0434782608696 anziehst-dobj-0.0434782608696

coercion

zwang (7): anlegen-dobj-0.588235294118 überheben-dobj-0.117647058824 überziehen-subj-0.0588235294118 befreyet-dobj-0.0588235294118 dummen-subj-0.0588235294118 yfüern-iobj-0.0588235294118 rechn-subj-0.0588235294118

‣ Intro ‣Dataset ‣Method ‣Examples ‣Conclusion

Overview

Conclusion(as of yet)

‣ initial concept graph is robust in various settings ‣substantially more data is needed, including normalization and balancing ‣Shutova and Sun’s (2013) original Hierarchical Graph Factorisation Clustering (HGFC) algorithm could greatly improve the outcome ‣ introduce metadata into the model for diachronic analysis

References

‣G. Lakoff, J. Espenson, and A. Schwartz, “The master metaphor list,” University of California at Berkeley, Tech. Rep., 1991. ‣W. Roberts and M. Egg. "A Comparison of Selectional

Preference Models for Automatic Verb Classification." Proceeding of EMNLP. 2014. ‣E. Shutova and L. Sun, “Unsupervised metaphor

identification using hierarchical graph factorization clustering.” in HLT-NAACL, 2013, pp. 978–988. ‣E. Shutova, “Design and evaluation of metaphor

processing systems,” Computational Linguistics, 2015.