personalized emotion space for video affective content

6
 2009, V ol.14 No.5, 393-398  Article ID 1007-1202(2009)05-0393-06 DOI 10.1007/s11859-009-0505-1 Pe rsonalized Emotion Space for Video Af fectiv e Content Representation SUN Kai, YU Junqing , HUANG Yue, HU Xiaoqiang, LIU Qing College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China Abstract: A personalized emotion space is proposed to bridge the “affective gap” in video affective content understanding. In order to unify the discrete and dimensional emotion model, fuzzy C-mean (FCM) clustering algorithm is adopted to divide the emo- tion space. Gaussian mixture model (GMM) is used to determine the membership functions of typical affective subspaces. At every step of modeling the space, the inputs rely completely on the af- fective experiences recorded by the audiences. The advantages of the improved V-A (Velance-Arousal) emotion model are the per- sonalization, the ability to define typical affective state areas in the V-A emotion space, and the convenience to explicitly express the intensity of each affective state. The experimental results validate the model and show it can be used as a personalized emotion space for video affective content representation. Key words: video affective computing; personalized emotion space; video affective content representation; fuzzy C-means clus- tering (FCM); Gaussian mixture model (GMM) CLC number: TP 391.4 Received date: 2008-12-07 Foundation item: Supported by the National Natural Science Foundation of China (60703049); the “Chenguang” Foundation for Young Scientists (200850731353) and the National Post-doctoral Foundation of China (20060400847) Biography: SUN Kai (1977-), male, Ph. D. candidate, research direction: video affective computing, content based video retrieval. E-mail: sunkai4u @gmail.com  To whom correspondence shoul d be addressed. E-mail: [email protected] u.cn 0 Introduction With the proliferation of digital audio visualization, the challenge of extracting meaningful content from such data sets has led to the research and development in the area of content based video retrieval (CBVR). Video affective computing is one of the latest research areas in CBVR, which can utilize both affective computing  [1]  and CBVR theories to understand video affective con- tents [2,3] . The affective content is an important natural component for humans to classify and retrieve informa- tion. Recognizing the video affective content and using it to automatically label the significant affective features  potentially allows a new modality for users to interact with video contents. To understand video affective contents automati- cally, the primary task is to transform the abstract con- cepts of emotion into a form that can be handled by the computer easily. Furthermore, the emotional experiences inspired by the video content vary from individual to individual. How to model a personalized space to repre- sent video affective contents is one of the biggest chal- lenges. There are many psychological emotion models. They can be categorized into two classes: a  discrete emotion states [4,5]  and  b  dimensional c ontinuous emo- tion space [6] . The 2D valence-arousal emotion model [7,8]  (V-A model) is a famous dimensional continuous emo- tion space that is more precise and general than the dis- crete emotion states. The V-A model allows a smooth  passage from one state to another in an infinite set of values. However, the actual affective recognizers usually

Upload: ignat-mirela

Post on 19-Feb-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Personalized Emotion Space for Video Affective Content

7/23/2019 Personalized Emotion Space for Video Affective Content

http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 1/6

 

2009, Vol.14 No.5, 393-398

 Article ID 1007-1202(2009)05-0393-06

DOI  10.1007/s11859-009-0505-1

Personalized Emotion Space

for Video Affective Content

Representation

□ SUN Kai, YU Junqing†, HUANG Yue,

HU Xiaoqiang, LIU Qing

College of Computer Science and Technology, Huazhong

University of Science and Technology, Wuhan 430074, Hubei,

China

Abstract: A personalized emotion space is proposed to bridge the

“affective gap” in video affective content understanding. In order

to unify the discrete and dimensional emotion model, fuzzy

C-mean (FCM) clustering algorithm is adopted to divide the emo-

tion space. Gaussian mixture model (GMM) is used to determine

the membership functions of typical affective subspaces. At every

step of modeling the space, the inputs rely completely on the af-

fective experiences recorded by the audiences. The advantages of

the improved V-A (Velance-Arousal) emotion model are the per-

sonalization, the ability to define typical affective state areas in the

V-A emotion space, and the convenience to explicitly express the

intensity of each affective state. The experimental results validate

the model and show it can be used as a personalized emotion space

for video affective content representation.

Key words:  video affective computing; personalized emotionspace; video affective content representation; fuzzy C-means clus-

tering (FCM); Gaussian mixture model (GMM)

CLC number: TP 391.4

Received date: 2008-12-07

Foundation item: Supported by the National Natural Science Foundation of

China (60703049); the “Chenguang” Foundation for Young Scientists

(200850731353) and the National Post-doctoral Foundation of China

(20060400847)

Biography: SUN Kai (1977-), male, Ph. D. candidate, research direction:

video affective computing, content based video retrieval. E-mail: [email protected]

†  To whom correspondence should be addressed. E-mail: [email protected]

0 Introduction

With the proliferation of digital audio visualization,

the challenge of extracting meaningful content from such

data sets has led to the research and development in the

area of content based video retrieval (CBVR). Video

affective computing is one of the latest research areas in

CBVR, which can utilize both affective computing

 [1]

 andCBVR theories to understand video affective con-

tents[2,3]

. The affective content is an important natural

component for humans to classify and retrieve informa-

tion. Recognizing the video affective content and using it

to automatically label the significant affective features

 potentially allows a new modality for users to interact

with video contents.

To understand video affective contents automati-

cally, the primary task is to transform the abstract con-

cepts of emotion into a form that can be handled by the

computer easily. Furthermore, the emotional experiencesinspired by the video content vary from individual to

individual. How to model a personalized space to repre-

sent video affective contents is one of the biggest chal-

lenges. There are many psychological emotion models.

They can be categorized into two classes: ○a   discrete

emotion states[4,5]

 and ○ b   dimensional continuous emo-

tion space[6]

. The 2D valence-arousal emotion model[7,8]

 

(V-A model) is a famous dimensional continuous emo-

tion space that is more precise and general than the dis-

crete emotion states. The V-A model allows a smooth

 passage from one state to another in an infinite set of

values. However, the actual affective recognizers usually

Page 2: Personalized Emotion Space for Video Affective Content

7/23/2019 Personalized Emotion Space for Video Affective Content

http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 2/6

Wuhan University Journal of Natural Sciences 2009, Vol.14 No.5 394

use a discrete set of typical emotion states depicting the

affective experiences. It is desirable to define some typi-

cal emotion state areas in the 2D plane of the V-A model

for unifying the two main emotion model classes (dis-

crete and dimensional). Another issue of the V-A modelis that it does not allow explicitly expressing the inten-

sity of the emotion state. Moreover, it does not cover the

 personalization issue.

To address the problems mentioned above, a per-

sonalized emotion space is presented. The basic idea is to

define a set of typical fuzzy emotion subspaces in the

V-A emotion space. Each affective state is a point in the

V-A plane and characterized by a fuzzy emotion sub-

space. The intensity of each state is expressed by the

membership function of the fuzzy subspace. By intro-

ducing the typical fuzzy emotion subspace, the proposed

emotion space can represent discrete emotion states in

the continuous V-A plane. The fuzzy emotion subspaces

and their membership functions can be modeled based on

the personalized emotion coordinates annotated by the

audiences, which allows personalization to be met. The

centers, borders, shapes and densities of these subspaces

can truthfully reflect the emotional tendencies of audi-

ences.

1 The Establishment of thePersonalized Emotion Space

For the convenience of discussion, the formal de-

scriptions of modeling the emotion space are given as

follows:

L e t { | ( , ) , 1 1, 1 1}S e e v a v a= ∈ × − −R R    ≤ ≤ ≤ ≤

 be the V-A plane (Fig. 1), where ( , )e v a is one of the

affective states, v and a denote the intensities of valence

and arousal. ( 1, , )i E S i k ⊆ =  

are the typical fuzzy emo-

tion subspaces, which can be expressed as( )i

i

e S 

 E e E 

e∈

= ∫  

 

( 1, , ).i k =   1 2{ , , , }( )nT x x x n= ∈ N is the training set

of video clips, the corresponding emotion coordinates

annotated by the audiences (i.e., the points in the V-A

 plane) are 1 2{ , , , }.T nS e e e=   The coordinate value

( , )i iv a   of i T e S ∈ can be collected through our soft-

ware tool developed according to the theory of emotional

 psychology (Fig. 2). The modeling objectives are defin-

ing the typical fuzzy emotion subspaces i E 

in V-A plane

S   and determining their affective membership func-tions ( )i E e

, where 1, , .i k =    

Fig. 1 V-A emotion space and the typical emotion subspaces

1.1 FCM-Based Division of the V-A Emotion

Space

We use Fuzzy C-means clustering (FCM) to define

the typical fuzzy emotion subspaces i E 

in the V-A plane

S . FCM, also known as fuzzy ISODATA, is a data clus-

tering algorithm in which each data point belongs to a

cluster to a degree specified by a membership grade.

Robert et al   proposed this algorithm[9]

  as an improve-

ment over K-means clustering.

Our task is to use the FCM to define k  typical emo-

tion subspaces ( 1, , )i E i k =  

 based on 1 2{ , , , }.T nS e e e=    

Suppose that the cluster centers of the k   subspaces i E 

 

are 1 2, , , k c c c . We can use a k n× membership matrix

U  to express how well every emotion state belong to thetypical subspaces. To accommodate the introduction of

fuzzy partitioning U  is allowed to have the elements with

the values between 0 and 1. However, imposing nor-

malization stipulates that the summation of degrees of

 belongingness for an affective state i T e S ∈ always be

equal to unity:

1

1, 1,2, ,k 

ij

i

u j n=

= ∀ =∑     (1)

We can use U  to define the cost function (or objec-

tive function) J  for FCM:

2

1

1 1 1

( , , , )k k n

m

k i ij ij

i i j

 J c c J u d = = =

= =∑ ∑∑U    (2)

where iju is between 0 and 1; ic is the cluster center of

the fuzzy typical affective subspace ;i

 E 

ij i jd c e= −   is

the Euclidean distance between the ith cluster center and

the  jth affective state; and (1, )m ∈ ∞ is a weighting ex-

 ponent.

The necessary conditions for equation (2) to reach a

minimum can be found by forming a new objective func-

tion new J  as follows:new 1 1 2( , , , , , , , )k n J c c   λ λ λ  U   

Page 3: Personalized Emotion Space for Video Affective Content

7/23/2019 Personalized Emotion Space for Video Affective Content

http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 3/6

SUN Kai et al : Personalized Emotion Space for Video Affective …  395

2

1 1 1 1

1k n n k  

m

ij i j j ij

i j j i

u d uλ = = = =

⎛ ⎞= + −⎜ ⎟

⎝ ⎠∑∑ ∑ ∑   (3)

where  jλ  , 1, , , j n=   are the Lagrange multipliers for the

n constraints in equation (1). By differentiating new ( , J    U   

1 1 2, , , , , , )

k nc c   λ λ λ  with respect to all its input argu-

ments, the necessary conditions for equation (2) to reach

its minimum are

1

1

( )

( )

nm

ij j

 j

i nm

ij

 j

u e

c

u

=

=

=

∑  (4)

and

2

1

1

1ij

mk 

ij

 p  pj

u

=

=

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠∑

  (5)

The FCM is simply an iterated procedure through

the preceding two necessary conditions.

Based on the above discussions, the steps using

FCM to define the typical emotion subspaces i E 

and the

membership matrix U  can be concluded as follows:

Step 1 Initialize the membership matrix U  with a

random value between 0 and 1 such that the constraints

in equation (1) are satisfied. 

Step 2 Calculate k  centers of the typical emotion

subspaces ,ic 1, , ,i k =   using equation (4).

Step 3 Compute the cost function according to

equation (2). Stop if it is either below a certain tolerance

value or its improvement over previous iteration is below

a certain threshold.

Step 4 Compute a new U  using equation (5). Go

to step 2.

1.2 GMM-Based Membership Functions of

Fuzzy Affective Subspaces

Although the typical emotion subspaces i E 

can be

fuzzily divided in V-A plane based on FCM, their con-tinuous membership functions ( )i E e

still cannot be de-

termined. A fuzzy set is specified by its membership

function. Therefore, it is very important to determine the

membership functions for the subspaces rationally.

GMM (Gaussian Mixture Model) is an effective

tool for data modeling and pattern classification[10]

.

GMM assumes that the data under modeling is generated

via a probability density distribution, which is the

weighted sum of a set of Gaussian PDFs (Probability

Density Function). Our study proves that the distribution

of the elements’ membership degrees in every subspace

meets the needs of GMM, i.e., GMM can be used to for-

mulate these member functions.

Suppose the typical emotion subspace { }i ij E e=

,

( 1, , ; 1, , ),ii k j s= = where1

.k 

i

i

 s n=

=∑ If the distribu-

tion of elements in i E 

is similar to ellipsoid, we can use

single multivariate Gaussian PDF ( ; , ) g e  µ Σ  to express

the PDF of  i E 

:

( ; , ) g e  µ Σ    =  

T 11 1  exp ( ) ( )

2(2 )d 

e e µ Σ µ Σ 

−⎡ ⎤− − −⎢ ⎥⎣ ⎦π

  (6)

where  µ  is the center of the PDF,  Σ  is the covariance

matrix of the PDF. These parameters determine the

characters of the PDF, such as the center, width and di-

rection of the function. However, the distribution of  i E   is not a rigorous single multivariate Gaussian distribu-

tion. A flexible solution is the weighted sum of a set of

Gaussian PDFs, which can be denoted as:

1

( ) ( ; , )r 

i i i

i

 p e g eα µ Σ =

= ∑   (7)

where r   is the number of the Gaussian PDFs. The pa-

rameters in (7) are ( 1, , r α α  ; 1, , r  µ µ  ; 1, , r Σ Σ  ) and

1 2, , , r α α α    should satisfy the constraint condition:

1

1r 

i

i

α =

=∑ . We call ( ) p e the Gaussian Mixture Model.

To simplify the discussion, we restrict that the co-

variance matrix of the Gaussian PDF be expressed as:

2 2

1 0

, 1, ,

0 1

i i i

r r 

 I i r Σ σ σ 

×

⎛ ⎞⎜ ⎟

= = =⎜ ⎟⎜ ⎟⎝ ⎠

  (8)

In this case, the single Gaussian PDF can be ex-

 pressed as:2

T

22

( ; , )

( ) ( )  (2 ) exp2

 g e

e e

 µ σ 

 µ µ σ σ 

=

⎡ ⎤− −π −⎢ ⎥

⎣ ⎦

  (9)

Equation (7) can be rewritten as:

2

1

( ) ( ; , )r 

i i i

i

 p e g eα µ σ =

= ∑   (10)

The parameters in equation (10) are:

2 2

1 1 1{ , , ; , , ; , , }r r r θ α α µ µ σ σ  =    

To compute the optimum estimation of  θ  , we can

use maximum likelihood estimate (MLE) to find the

maximum of the equation (11):

11

( ) ln ( ) ln ( )i i s  s

 j j

 j j

 J p e p eθ ==

⎡ ⎤= =⎢ ⎥

⎣ ⎦  ∑∏  

Page 4: Personalized Emotion Space for Video Affective Content

7/23/2019 Personalized Emotion Space for Video Affective Content

http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 4/6

Wuhan University Journal of Natural Sciences 2009, Vol.14 No.5 396

2

1 1

ln ( ; , )i s r 

k k k 

 j k 

 g eα µ σ = =

⎡ ⎤= ⎢ ⎥

⎣ ⎦∑ ∑   (11)

Based on the above discussions, the steps using

GMM to determine the membership functions ( )i E e

of

the fuzzy typical emotion subspaces i E 

can be concluded

as follows:

Step 1 Initialize the parameter vector:2 2

1 1 1{ , , ; , , ; , , }r r r θ α α µ µ σ σ  =     (12)

Step 2 Calculate ( ), j e β  1,2, , j r =   , using θ  .

Step 3 Compute the new  j µ  according to

1

1

( )

( )

i

i

 s

 j k k 

 j  s

 j k 

e e

e

 β 

 µ 

 β 

=

=

=

∑   (13)

Step 4 Compute the new σ  according to

T

2 1

1

( ) ( )1

( )

i

i

 s

 j k j k j

 j  s

 j k 

e e

d e

 β µ µ 

σ 

 β 

=

=

− −

=

  (14)

Step 5 Compute the new α  according to

1

1( )

i s

 j k 

k i

e s

α β =

=   ∑  

Step 6 Let2 2

1 1 1{ , , ; , , ; , , }.r r r θ α α µ µ σ σ  =    

Stop if   θ θ −  

is below a certain tolerance value;else let θ θ =   and go to step 2.

2 Experimental Results and

Discussion

2.1 Video Affective Content Database

Video affective computing is a hot but fairly new

research topic in CBVR, which still lacks a standard

video affective content database (VACDB) to validate

our proposed video affective semantic space. We choose

movies to create our VACDB because they have rich

affective contents. Based on the statistical figures of theIMDB

[11], we select 46 typical movies as the source of

affective video clips in VACDB. The total length of

these movies is 84 hours 43 minutes 4 seconds. These

movies can be classified into 6 genres: 9 animations, 10

actions, 11 dramas, 7 science fictions, 3 horrors and 6

comedies.

The ground truth for the 6 typical affective contents,

i.e., joy, tension, fear, relaxation, sadness and neutral, is

manually determined within the 46 movies. If one of the

video clips is labeled with the same affective content by

at least 6 of 9 researchers, we assign this clip as having

one of 6 affective contents. Finally, we select a total of

1037 video clips from 46 movies to create the VACDB.

The total length of the 1 037 clips is 10 hours 42

minutes 41 seconds. We choose 4 audiences (A1, A2, A3 

and A4) from different fields, gender, age and back-

grounds to record the coordinate values of these 1037

video clips. The coordinate values are recorded with the

affective content annotation tool (Fig. 2), which is de-

signed according to the theory of emotional psychology.

We do not tell the audiences the emotion labels of thesemovie clips. The audiences watch every movie clip and

record its emotion coordinate based on their affective

experiences, i.e., the intensities of their valence and

arousal (V-A). After the steps mentioned above, we get 4

sets of the 1 037 emotional coordinates of these movie

clips, which are used to validate our proposed emotion

space.

Fig. 2 Video affective content annotation tool

Page 5: Personalized Emotion Space for Video Affective Content

7/23/2019 Personalized Emotion Space for Video Affective Content

http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 5/6

SUN Kai et al : Personalized Emotion Space for Video Affective …  397

2.2 Experimental Results

Modeling our emotion space has two steps: ①  de-

fining the fuzzy typical affective subspaces in V-A plane

 based on FCM and ②  determining the affective mem-

 bership functions of these subspaces based on GMM.At the first step of our experiment, the inputs of the

FCM are the 1037 emotional coordinates labeled by the

audiences, i.e., { }iS e= , ( , )i i ie v a   ∈ ×R R  , [ 1,1]iv   ∈ −  

and [ 1,1]ia   ∈ − , 1, 2, ,1 037i =   . The number of the

typical affective sunspaces (i.e., clusters) is 6. The

weighting exponent m  of the cost function  J   is 2. The

tolerance value and the threshold of the iterations in

FCM are 100 and 1×5

10− , respectively. Fig. 3 (a), (b),

(c) and (d) demonstrate the division results of the V-A

 plane. The centers, borders, shapes and densities of the 6

typical affective subspaces in Fig. 3 (a), (b), (c) and (d)

are different, which proves that our modeling method

can characterize the personalized emotion experiences of

the audiences.

Fig. 3 The 4 partitioning results of V-A plane based on FCM

(a), (b), (c) and (d) are plotted according to the coordinate values recorded by

A1, A2, A3 and A4, respectively

The second step is using the GMM to formulate the

membership functions of the subspaces. Because the

 performance of GMM is highly related to its number of

mixtures, we designed another experiment to find the

 best number of mixtures in GMM. We partitioned S  into

a design set (DS, 519 samples) and a test set (TS, 518

samples). DS was used for training and TS was used for

test. We found that the training and test recognition rates

all reached the highest points when the number of mix-

tures in GMM was 2. Therefore, we chose 2 mixtures tomodel GMM. Fig. 4(a) demonstrates the 6 3-D member-

ship functions determined by the GMM in our experi-

ments, and Fig. 4(b) demonstrates the labeled 2-D mem-

 bership functions (the coordinate values for modeling are

recorded by A1 and the membership functions based on

the coordinate values recorded by A2, A3  and A4 are

similar). Bringing the 1 037 coordinate values into these6 membership functions, the average recognition rate is

up to 97.8%. The experimental results show that our

 proposed emotion space can be used to represent the

video affective content very well.

Fig. 4 GMM-based membership functions of typical

emotion subspaces

2.3 Discussions

The proposed personalized emotion space has two

 prominent characteristics. On the one hand, the emotion

space is originated from the most famous 2D V-A emo-

tional model, which is a continuous model and can ex-

 press infinite affective states in the V-A plane. Mean-

while, we define k   typical emotion subspace ( 6k  = in

experiments) based on the fuzzy theory. By introducing

the typical fuzzy emotion subspace, the proposed emo-

tion space can also represent discrete affective states inthe continuous V-A plane. Therefore, our emotion space

successfully unifies the discrete and dimensional emo-

Page 6: Personalized Emotion Space for Video Affective Content

7/23/2019 Personalized Emotion Space for Video Affective Content

http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 6/6

Wuhan University Journal of Natural Sciences 2009, Vol.14 No.5 398

tion models in the theory psychology. On the other hand,

at every step of modeling the emotion space, the inputs

rely completely on the affective experiences recorded by

the audiences. The centers, borders, shapes and densities

of these subspaces can truthfully reflect the emotionaltendencies of the audiences, which mean that our pro-

 posed space can be used to cover the personalization is-

sues. Along with the increase of video clips, the emotion

space will become more and more audience-oriented.

3 Conclusion

Modeling personalized emotion space and utilizing

it to represent and recognize the video affective contents

is one of the most important problems in video affectivecomputing. Filling this theory gap is the main objective

of this paper. The experimental results demonstrate that

the proposed emotion space can be used as an overall

solution to this problem.

All of these are the foundation for our future works.

To understand video affective contents automatically, it

is desirable to design a set of video affective features to

relate the video clips with the emotion coordinate values

 based on our emotion space.

[1]  Picard R.  Affective Computing   [M]. Cambridge: MIT Press,

1997.

[2]  Hanjalic A, Xu L Q. Affective Video Content Representation

and Modeling [J].  IEEE Transactions on Multimedia, 2005,

7(1):143-154.

[3]  Hanjalic A. Extracting Moods from Pictures and Sounds:

Towards Truly Personalized TV [J].  IEEE Magazine on Sig-

nal Processing , 2006, 23(2): 90-100.

[4]  Ekman P. Are There Basic Emotions? [J]. Psychological Re-

view, 1992, 99(3): 550-553.

[5]  Ortony A, Clore G L, Collins A. The Cognitive Structure of

 Emotion  [M]. Cambridge: Cambridge Cambridge University

Press, 1988.

[6]  Russell J A. The Circumplex Model of Affect [J].  Journal of

 Personality and Social Psychology, 1980, 39(6): 1161-1178.

[7]  Lang P J, Bradley M M, Cuthbert B N. International Affec-

tive Picture System (IAPS): Instruction Manual and Affective

Ratings[EB/OL]. [2008-04-15]. http://www.unifesp.br/dpsico-

bio/adap/instructions.pdf.  

[8]  Wang L H, Cheong L F. Affective Understanding in Film [J].

 IEEE Transactions on Circuits and Systems for Video Tech-

nology, 2006, 16(6): 689-704.

[9]  Robert L C, Jitendra V D, Bezdek J C. Efficient Implementa-

tion of the Fuzzy C-means Clustering Algorithms[J].  IEEE

Transactions on Pattern Analysis and Machine Intelligence,

1986, 8(2): 248-255.

[10] Zhang Z X. Data Clustering and Pattern Recognition[EB/OL].

[2008-04-15]. http://neural.cs.nthu.edu.tw/jang/books/dcpr /doc

 / 08 gmm.pdf. [11] The Internet Movie Database (IMDB)[EB/OL].[2008-04- 15]. 

http://www.imdb.com/chart/top.

 

References