dimensional music emotion recognition
TRANSCRIPT
1
Dimensional Music Emotion Recognition
Yi-Hsuan YangAssistant Research FellowMusic & Audio Computing (MAC) Lab Research Center for IT InnovationAcademia Sinica
Dec. 2011 @ MTG, UPF
Music & Emotion
Music conveys emotion and modulates our moodMusic emotion recognition (MER)
Understand how human perceives/feels emotion when listening to musicDevelop systems for emotion-based music retrieval
2
Why Do We Listen to Music?
Motive Ratio“to express, release, and influence emotions” 47%“to relax and settle down” 33%“for enjoyment, fun, and pleasure” 22%“as company and background sound” 16%“because it makes me feel good” 13%“because it’s a basic need, I can’t live without it” 12%“because I like/love music” 11%“to get energized” 9%“to evoke memories” 4%
3
“Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening,” Patrik N. Juslin and Petri Laukka, Journal of New Music Research, 2004
Categories of Emotion
Expressed (intended) emotionWhat a performer tries to express
Perceived emotionWhat a listener perceives as being expressed in musicUsually the same as the expressed emotion
Felt (induced) emotionWhat a listener actually feelsStrongly influenced by the context of music listening (environment, mood)
4
Emotion Description w/ Mood Labels
5Courtesy of Ching-Wei Chen @ Gracenote
Description w/ Latent Dimensions
6
7
Categorical Approach
Hevner’ model (1936)
Audio spectrum
8
Dimensional Approach
Emotion plane (Russell 1980, Thayer 1989)
Audio spectrum
Categorical vs. Dimensional
Pros Cons
Categorical • Intuitive• Natural language• Atomic description
• Lack a unifying model• Ambiguous• Subjective• Difficult to offer fine-grained
differentiation
Dimensional • Focus on a few dimensions
• Good user interface
• Less intuitive• Semantic loss in projection• Difficult to obtain ground
truth
9
Q: No Consensus on Mood Taxonomy
10
Work # Emotion description Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, seriousFeng et al [sigir03] 4 Happy, angry, fear, sadLi et al [ismir03],Wieczorkowska et al [imtci04]
13Happy, light, graceful, dreamy, longing, dark, sacred, dramatic, agitated, frustrated, mysterious, passionate, bluesy
Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy
Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm
Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, contentYang et al [mm06] 4 Happy, angry, sad, relaxedSkowronek et al [ismir07] 12 Arousing, angry, calming, carefree, cheerful, emo-
tional, loving, peaceful, powerful, sad, restless, tender
Wu et al [mmm08] 8 Happy, light, easy, touching, sad, sublime, grand, exciting
Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressiveTrohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry
Fuzzy Boundary b/w Mood Classes
Subjective usage of affective termsCheerful, happy, joyous, party/celebratoryMelancholy, gloomy, sad, sorrowful
Semantic overlap (#2 and #4) and acoustic overlap (#1 and #5) [mirex07.cyril&perfe]
11
MIREX AMC TaxonomyCluster 1 Passionate, rowdy, rousing, confident, boisterousCluster 2 Amiable/good-natured, sweet, fun, rollicking, cheerfulCluster 3 Literate, wistful, bittersweet, autumnal, brooding, poignantCluster 4 Witty, humorous, whimsical, wry, campy, quirky, sillyCluster 5 Aggressive, volatile, fiery, visceral, tense/anxious, intense
Granularity of Emotion Description
Small set of emotion classesInsufficient comparing to the richness of our perception
Large set of emotion classesDifficult to obtain reliable ground truth data
12
Acerbic, Aggressive, Ambitious, Amiable, Angry, Bittersweet, Bright, Brittle, Calm/, Carefree, Cathartic, Cerebral, Cheerful, Circular, Clinical, Cold, Confident, Delicate, Dramatic, Dreamy, Druggy, Earnest, Eccentric, Elegant, Energetic, Enigmatic, Epic, Exciting, Exuberant, Fierce, Fiery, Fun, Gentle, Gloomy, Greasy, Happy, …
□ Happy□ Sad□ Angry□ Relaxed
Sol: Describing Emotions in Emotion Space
13
○ Activation, activity○ Energy and stimulation levelArousal
Valence○ Pleasantness○ Positive and
negative affective states
[psp80]
The Dimensional Approach
StrengthNo need to consider which and how many emotionsGeneralize MER from categorical domain to real-valued domain Easy to compare differentcomputational models
ArousalValence
14
The Dimensional Approach
WeaknessSemantic loss due to projectionBlurs important psychological distinctions
3rd dimension: potency [psy07]Angry ↔ afraidProud ↔ shamefulInterested ↔ disappointed
4th dimension: unpredictabilitySurprisedTense ↔ afraidContempt ↔ disgust
15
Music Retrieval in VA Space
Provide a simple means for 2D user interface
Pick a pointDraw a trajectory
Useful for mobile devices with small display space
16
Demo
arousalarousal
valencevalence
Q: How to Predict Emotion Values?
Transformation-based approach [mm06]
Consider the four quadrants Perform 4-class mood classificationApply the following transformation
Arousal = u1 + u2 – u3 – u4
Valence = u1 + u4 – u2 – u3
(u denotes likelihood)
Not rigorous
17
18
Sol: Perform Regression
Given features,predict a numerical value
Given N inputs (xi, yi), 1≤ i ≤N, where xi is feature and yi is the numerical value to be predicted, train a regression model R(.) such that the following mean squared error (MSE) is minimized
2
1
1min ( ( ))i
N
f ii
fN
y=
−∑ x x
y
yi : numerical emotion value xi : feature (input)f(xi) : prediction result (output)
e.g. linear regressionf(xi) = wTxi +b
= \sumj {wjxij} +b
Computational Framework [taslp08]
Predict the VA valuesTrains a regressionmodel f (·) that minimizesthe mean squared error (MSE)One for valence;one for arousal
19
Trainingdata
Manual annotation
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic Prediction
Feature
Regressor
Emotion value
2
1
1min ( ( ))i
N
f ii
fN
y=
−∑ x
Obtain Music Emotion Rating
Manual annotationRates the VA values of each song
Ordinal rating scaleScroll bar
20
Trainingdata
Manual annotation
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic Prediction
Feature
Regressor
Emotion value
User study1240 Chinese pop songs; each 30-sec666 subjects; each rates 8 random songs
Subjective evaluationEasiness of annotating emotionWithin-subject reliability: compare to one month laterBetween-subject reliability: compare to other subjects
21
0 100
Method Easiness Within-subject reliability
Between-subject reliability
Emotion rating 2.82 2.92 2.81
From 1 to 5 (strongly disagree to strongly agree)
Evaluation of Emotion Rating
AnnoEmo: GUI for Emotion Rating [hcm07]
Encourages differentiation
22
Click to listen again
Drag & drop to modify
annotation
Demo
Determining VA values is not that easyDifficult to ensure consistently
Does dist(0.5,0.8) = dist(–0.2,0.1) in terms of our emotion perception?Does 0.7 the same for two subjects?
23
Cognitive Load is Still High
-1
-1
1
10.80.5
0.1-0.2
Sol: Ranking Instead of Rating [taslp11a]
Determines the position of a song By the relative ranking with respect to other songs Rather than by the exact emotion values
24
Oh Happy DayI Want to Hold Your Hand by BeatlesI Feel Good by James BrownWhat a Wonderful World by Louis ArmstrongInto the Woods by My Morning JacketThe Christmas SongC'est La VieLabita by Lisa OneJust the Way You Are by Billy JoelPerfect Day by Lou ReedWhen a Man Loves a Woman by Michael BoltonSmells Like Teen Spirit by Nirvana
positivevalence
negative valence
valence= 1
valence= –1
relativeranking
exactrating
Ranking-Based Emotion Annotation
Emotion tournament Requires only n–1 pairwise comparisonsThe global ordering can later be approximated by a greedy algorithm [jair99]
25
a b c d e f g h
a b c d e f g habcdefgh
03100701
f > b > c = h > a = d = e = gWhich songs is more positive?
Online Interface
26
Simplify Emotion Annotation
Subjective evaluationBoth rate and rankThe ordering of rate and rank does not matter
Result
27
Strong
Weak
Q: Which Features are Relevant? [psy07]
28
Sound intensity Tempo Rhythm
Pitch rangeMode Consonance
major
Feature Extraction
Melody/harmony [MIR toolbox]Pitch estimate, key clarity, harmonic change, musical mode
Spectral [Marsyas]Spectral flatness measures, spectral crest factors, MFCCs
Temporal [Sound description toolbox]Zero-crossing rate, temporal centroid, log-attack time
Rhythmic [Rhythm pattern extractor]Beat histogram and average tempo
Psyco-acoustic motivated features [PsySound]Loudness, sharpness, timbral width, volume, spectral dissonance, tonal dissonance, pure tonal, complex tonal, multiplicity, tonality, chord
29
Data Collection
30
0
Q: Subjective Issue
31
Each circle represents the emotion annotation for a music piece by a subject
Sol: Probabilistic MER [taslp11b]
Predicts the probabilistic distribution P(e|d) of the perceived emotions of a music piece
32
Sol: Personalized MER [sigir09]
From P(e|d) to P(e|d,u) General regressor personal regressorUtilize user feedback
33
Trainingdata
Manual annotation
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic Prediction
Feature
Regressor
Emotion value
Emotion-based retrieval
Personalization
User feedback
Evaluation Setup
Training data195 Western/Japanese/Chinese pop songs25-sec segment that is representative of the song
Too long the emotion may not be homogeneousToo short the listener may not hear enough
Manual annotation253 subjects; each rates 12 songsRate the VA values in 11 ordinal levels
○ 0 ○ 1 ○ 2 ○ 3 ○ 4 ○ 5 ○ 6 ○ 7 ○ 8 ○ 9 ○ 10
Each song is annotated by 10+ subjectsGround truth obtained by averaging
34
Quantitative Result
ResultR2: squared correlation between y and f(x)Valence prediction is challenging
Valence: 0.25 ~ 0.35Arousal: 0.60 ~ 0.85
35
Method R2 of valence R2 of arousalMultiple linear regression 0.109 0.568Adaboost.RT [ijcnn04] 0.117 0.553SVR (support vector regression) [sc04] 0.222 0.570SVR + RReliefF (feature selection) [ml03] 0.254 0.609
Qualitative Result
36
No No No Part 2 - Beyonce
All Of Me - 50 Cent
New York Giants -Big Pun
Why Do I Have To Choose - Willie Nelson
The Last Resort - The Eagles
Mammas Don't Let Your Babies Grow
Up To Be Cowboys -Willie Nelson
Live For The One I Love -Celine Dion
If Only In The Heaven's Eyes - NSYNC
I've Got To See You Again - Norah Jones
Bodies - Sex Pistols
You're Crazy - Guns N' Roses
Out Ta Get Me - Guns N' Roses
Missing 1: Temporal Context of Music
“Sweet anticipation” by David Huron
Music’s most expressive qualities probably relate to structural changes across time
Music emotion can also vary within an excerpt [tsmc06]
37
Missing 2: Context of Music Listening
38
Listening mood/contextFamiliarity/associated memoryPreference of the singer/performer/songSocial relationship
Conclusion
A computational framework for predicting numerical emotion values
Generalizes MER from categorical to dimensionalResolves some issues of emotion descriptionRank instead of rate2D user interface for music retrieval
Valence & subjectivityContent & context
AcknowledgementProf. Homer Chen, National Taiwan University
39
Reference
Music Emotion Recognition, CRC Press, 2011
“A regression approach to music emotion recognition,” IEEE TASLP, 2008. (cited by 76)
“Ranking-based emotion recognition for music organization and retrieval,” IEEE TASLP, 2011
“Prediction of the distribution of perceived music emotions using discrete samples,” IEEE TASLP, 2011
“Exploiting online tags for music emotion classification,” ACM TOMCCAP, 2011
“Machine recognition of music emotion: A review,” ACM TIST, 2012
40CRC Press