hands-on session · self organizing maps for pitch contour analysis analyzing spoken data 3 d....

40
Hands-On Session Miriam Butt & Dominik Sacha LingVis: Visual Analytics for Linguistics DGfS 2016 | 24.-26.2.2016

Upload: others

Post on 12-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Hands-On Session

MiriamButt&DominikSacha

LingVis:VisualAnalytics forLinguisticsDGfS2016|24.-26.2.2016

Page 2: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Hands-On

• Wecanworkwith:• SelfOrganizingMapsforPitchContourAnalysis• DiachronicVisualizationofLanguageProperties• ClusterVisualization• TheWALSExplorer• DoubleTreeJS/KWIC

• Inthefollowing,weexplain:• howtoworkwiththem• typeofdataneeded

• Alsopointerstofurthersites

2

Page 3: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Self Organizing Maps for Pitch Contour AnalysisAnalyzingSpokenData

3

D.Sacha,Y.Asano,C.Rohrdantz,F.Hamborg,D.A.Keim,B.BraunandM.Butt.SelfOrganizingMapsfortheVisualAnalysisofPitchContours.Proceedingsofthe20thNordicConferenceofComputationalLinguistics.Vilnius,Lithuania,ACLAnthology,23():181-190,2015.

Page 4: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SOM – Starting the Tool

• Allmaterial(softwareanddata)iscontainedinthe“prosomvis”folder

• Double-Clicktheprosomvis-Jarfile• prosomvis-1.0.0-SNAPSHOT-jar-with-dependencies.jar• TheAppshouldstart

4

Page 5: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SOM– Loading Data5

1.ClickontheButtonandloadadatafile

2.Youmayinspecttheloadeddata

3.Finallyswitchtotheconfig tab

Page 6: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SOM– Configuration (Simple)6

1. SelecttheVectorthatshallbeanalyzed

(Forthesumimasen datasetapreprocessedvectorisincluded)

2.Ifnopreprocessedvectorisdetected,furtherpreprocessingshavetobeapplied

3.Adatafiltermaybeappliede.g.,utterance=“sumimasen”Note,datamaybefilteredduringtheanalysisprocessaswell

4.SettheSOMdimensions.Feelfreetotrydifferentones

5.TheSOMtrainingmaybe“long”or“short”Suggestion:Tryshortfirstandswitchtoalongertrainingtoconfirmaconcretehypothesis

6.FinallylaunchtheSOM

Page 7: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SOM Training – Just Watch7

Observethedistancebars

Observethechangingcells(contours,background)

Page 8: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SOM Visualization - Configuration8

3.)Onceanattributehasbeenselected(1.)),aseparateheatmapmaybeopenedforeachattributevalue.Tryoutdifferentnormalizationtechniques

1.)Add/removeattributestobevisualized.E.g.,“utterance”or“ns”(Hint:Startwithonlyoneattribute)

2.)Tryoutdifferentgridandcellvisualizations(ifthecheckboxesdonotworkusetheupdatebutton)

4.)Asubsequent SOMmaybetrainedwiththeremainingdataitems

Page 9: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SOM Visualization – Data Filtering & Grid Interaction9

1.)SelectanattributetoexcludeitforthenextSOMtraining 2.)Rightclickonacelltoopen

thecontect menue (cellsmaybedeletedorpinned)

3.)Cellsmaybedrag&dropped (moved)&pinned

4.)Chooseanothercolor(clickontheColorcell)

Page 10: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SOM - History10

1.)ClickontheSOMstobringthemtothefrontofthescreen

Page 11: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Some Hints - I

• Sumimasen_Entschuldigung• Datasetcontainsrecordedutterances“sumimasen”(JP)&“entschuldigung”(GER)[-->Excuseme]

• GermanandJapanesespeakersandlearners(L1,L2)• Datasetiswellpreprocessedandneedsnofurtherconfiguration

• Visualizethewholedataset(selecttheutteranceattribute)• Filteroutthe“Entschuldigung”Utterances(inthe1st SOM)• Visualizethedifferentspeakernationalities(”ns”attribute)

11

Page 12: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Sumimasen - Example12

• Example:• AnalysisofPitchContoursviaSelf-OrganizingMaps• incombinationwithVisualAnalytics

• Data• Japanesevs.German‘sorry’• Japanesepitchcontouralwayshasafall• Germanscanvaryaccordingtopragmaticintent• RecordedGermanandJapanesenatives• vs.learnersofGermanandJapanese(beginners/advanced)• learnersofJapanesewereGermanandviceversa

Page 13: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Some Hints

• Beatthetunes• Datasetofrecordednon-sensewordsofJapaneseandGermanspeakers• Alotofmeta-dataiscontained• Datasetisnotpreprocessed

• Tryoutdifferentpreprocessings• Visualizethewholedataset(selectthe“pitch”attribute)• Filteroutthe“HH”Utterances(inthe1st SOM)• TrainanotherSOM(basedonthe1st SOM)thatfiltersoutthe“HL”• Analyzethedifferentdialects/segments/speakers(usetheheatmaps)

13

Page 14: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Diachronic Visualization of Text Properties/IcelandicAnalyzingLanguageChangeBasedonFeatureOccurrences

14

M.Butt,T.Bögel,K.Kotcheva,C.Schätzle,C.Rohrdantz,D.Sacha,N.Dehe andD.A.Keim.V1inIcelandic:Amultifactorical visualizationofhistoricaldata.ProceedingsoftheLREC2014WorkshopVisLR:Visualizationasaddedvalueinthedevelopment,useandevaluationofLanguageResources,pages33-40,2014.

C.Schätzle,D.SachaandM.Butt.DiachronicVisualizationofObliqueSubjectsinIcelandic.WorkshopPoster:BigDataVisualComputing– QuantitativePerspectivesforVisualComputing,2014.

Page 15: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Diachronic Visualization of Text Properties/Icelandic

• Youareprovidedwithtwotoolsforanalyzingthethe annotated diachronic corpus of Icelandic (IcePaHC).

• Two Questions so far:• When does V1 in Icelandic Occur?• What governs the appearance of dative subjects.

• TwoTools• icelandicV1– Jar• Icelandic_dative-Jar

• Datasetisthesame(data/text/)• DerivedFeatures/Properties(extractedsentences)aredifferent

15

Page 16: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Check out the data16

• Navigate into the data folder• Take a look at the raw texts• And the extracted language properties

Page 17: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Starting the Visualization

• Tostartthetools,justdoubleclickontheJAR-files• icelandic-1.0-SNAPSHOT-jar-with-dependencies.ja• dative-1.0-SNAPSHOT-jar-with-dependencies.jar• (Twoseparatefolders)

• AWindowshouldopen

• Enlargethewindow,andstartyouranalysis• Zoom(mousewheel)• Pan/Navigate• DetailsonDemand(tooltips)• Furtherinteractionaredescribedontopofthegenres(fordativevis)

17

Page 18: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

V1 in Icelandic18

Pronoun

Def. Noun

Indef. Noun

BE DO HV MD RD VB

1150 1210 2008

Timeline

Text length & V1 occurrences

PRO

EXP

Factors identified by linguists as being relevant to V1 in Icelandic

Page 19: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Dative in Icelandic – Expand/Fold19

Expand/FoldVoice Expand/FoldVoice

Expand/FoldFeature

Expand/FoldFeature

SentenceBar

Tick-Mark

Active

Passive

Middle

EXPR

HAPPST

MOD PSY

BOT

EX SEND

CONC

SOC PUT

COMM

PRED

MEASMOT COP

COS PERM

ASP

• Press“e”or“t”toexpand/foldthetextnodes• Clickingonthetextsisalsopossible

Page 20: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Dative in Icelandic – Layout (Press “l”)20

Page 21: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Dative in Icelandic – Interactive Tick-Marks

• Press“d”todisablethetooltips• Press“l”tochangethelayout• Hoveroveraglyphtoactivateitstickmarks

21

Page 22: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Hints – It is a Research Prototype!

• Iftheprogramhangs(onlywhitespaceisshown)• àJustrestarttheapp(openanewwindow)

• Task Suggestion• Work with the software as is. • Think critically about the visualization/interactive possibilities.• See if you can identify patterns from the visualization without

necessarily knowing anything about Icelandic or the phenomenon (we could).

22

Page 23: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Cluster VisualizationVariousPerspectivesonData

23

AndreasLamprecht,AnnetteHautli,ChristianRohrdantz,TinaBögel.2013.AVisualAnalyticsSystemforClusterExploration.InProceedingsofthe51stAnnualMeetingoftheAssociationforComputational Linguistics,SystemDemo,109–114,Sofia,Bulgaria.

Page 24: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Cluster Visualization

• Automatic clustering methods are increasingly being used by a wide range of linguists.

• However, it is often hard to understand what the clustering method is doing.• And it is hard to interact with it. • The following presents an interactive, flexible visual analytic approach to

clustering information.

24

Lamprecht, Andreas, Hautli, Annette, Rohrdantz, Christian and Tina Bögel. 2013. 'A Visual Analytics System for Cluster Exploration'. In Proceedings of ACL 2014 , 109-114, Sofia, Bulgaria

Page 25: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Cluster Visualization

• So far allows for standard k-means or GVM clustering.• Important Note: the visualization adds the visual and interactive

component – it does not improve on the statistical approaches per se.• Each data point is represented by a dot. • The user can specify the amount of clusters desired.

25

Page 26: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

SampleVisualization26

Page 27: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Glyphs in the Cluster Visualization

• Glyphs are combinations of symbols that are defined to have a certain meaning.

• Data objects in the visualization can be presented either as circles, normal glyphs or star glyphs.

• Circles: Every items (e.g. a noun or a verb) represented by a colored circle• Normal glyphs: Relative bigram frequencies mapped onto the length of arcs

(ordered clock-wise around the center beginning in north position)• Star glyphs: Extension of normal glyphs, ends of arcs are connected to form a

“star".

27

Page 28: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

28

Glyphs in the Cluster Visualization

• The data shown here is that from the N-V complex predicates case study. • There are four light verbs (kar ‘do’, ho ‘be’, hu ‘become’ and rakh ‘put’).• The numbers show the frequency with which they appear with a given noun – the

data point represented by the dot.

Page 29: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

29• Overplotting is a problem when the data set becomes large or when the data points are very similar to one another.

• Several strategies to handle this interactively:• change transparency of objects

• reposition data objects

• scale data objects

Page 30: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

• You are provided with a Java program in class. • The software is still under development, so if you want to use it for

purposes outside of this class, please contact Miriam. • It should start by just clicking on it. • There is a Readme file to guide you through what needs to be

done. • There is also an extensive handbook in PDF format.

30

Cluster Visualization

Page 31: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Data:• The data needs to be in a txt file. • The data points need to be separated by a symbol (e.g. “,”)• We have provided sample data from our work on Urdu

• Motion verbs courtesy of Annette Hautli – this seems to be broken on my version.• Urdu N-V complex predicates

• We have also provided some data based on Levin’s verb classes (levin-classes.txt).

• Feel free to add to this data as you wish. • Some information on Levin’s verb classes is provided in levin-verbs-

lawler.txt.

31

Cluster Visualization

Page 32: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Task/Interaction Suggestion: • work with the Urdu motion verbs or the Levin verb classes file to get a feel

for the visualization• experiment with different numbers of clusters• experiment with different visualizations of the data points (glyphs, star

glyphs)• the Levin verb classes file contains three errors (three verbs contain wrong

information)• see if you can spot that via the visualization

• think critically about the visualization and the interactive possibilities

32

Cluster Visualization

Page 33: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Task/Interaction Suggestion: • enter your own data into a file by using the existing ones as a model• you need to think about how to encode your data so that the system can

compute with it• Example: you may be interested in properties like whether a noun takes a

certain case marker• Noun1: accusative, instrumental• Noun2: accusative, no instrumental

• this can be encoded as: NounType, accusative, instrumentalNoun1, 1, 1Noun2, 1, 0

33

Cluster Visualization

Page 34: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

ClusterViswithoutverbs

• Idea:Sometimes we can use visualizationswith data other than what they wereoriginally designed for.

• Example:ClusterViswasdesigned to analyse properties of verbs,butit can be usedto analyze any similarly encodedproperties,nomatterwhat those properties are for.

• To try:bierce-freq.txt and bierce-freq-2.txtcontain information about letters thatAmbroseBiercewrote.Thefeatures include things like the number of pronouns(PP),and the number of words longer than 6characters.Arethere any clusters?If so,canyou interpret them?(Thedata originally are from ChrisCuly and we don't know theanswers....)

Page 35: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

The WALS Sunburst ExplorerVisualizationofTypologicalPatterns

35

ChristianRohrdantz,MichaelHund,ThomasMayer,BernhardWälchli andDanielA.Keim.2012.TheWorld’sLanguagesExplorer:VisualAnalysisofLanguageFeaturesinGeneaologica andArealcontexts.ComputerGraphicsForum31(3),935-944.

Page 36: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

• The WALS Explorer can be accessed on-line:• http://th-mayer.de/wals/

• You cannot use your own data here.• It is meant for an exploration of the World Atlas of Language

Structure (http://wals.info).• Task/Interaction Suggestion:

• pick a phenomenon you are interested in • see what you can find out about it• think critically about the visualization and the interactive possibilities

36

WALS Explorer

Page 37: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

DoubleTreeVisualizationsbyChrisCuly

37

Page 38: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

ChrisCuly has designed anumber of visualization possibilities

• http://linguistics.chrisculy.net/lx/software/• You can either download theseor use them on-line

Suggestions:Explore the examples providedwith the visualizations

• What are advantages/disadvantages of each?• Whatwould you like them to dothat they can't?

• If you have tagged data available,try your own.

Exploring Texts

Page 39: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Some Further Available Visualization Tools/Ideas

• http://magic-table.googlecode.com/svn/trunk/magic-table/google_visualisation/example_1.html

• http://www.eurac.edu/en/research/autonomies/commul/projects/Pages/Linfovis/programs.aspx

• http://www.knime.org/

Page 40: Hands-On Session · Self Organizing Maps for Pitch Contour Analysis Analyzing Spoken Data 3 D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg , D. A. Keim, B. Braun and M. Butt. Self Organizing

Exploring bigrams with MagicTableMedium-advanced: basic programminghttp://magic-table.googlecode.com/svn/trunk/magic-table/google_visualisation/example_1.html

• Use the MagicTable visualization fromGooglecharts to look at bigramco-occurrences

• cell row,column is for the bigram:row column

• Data:• maybe look at POStagbigrams• have to count and normalize