[ieee 2013 international conference on cyberworlds (cw) - yokohama, japan (2013.10.21-2013.10.23)]...

8
Improving Structured Content Design of Digital Signage Evolutionarily Through Utilizing Viewers’ Involuntary Behaviors Ken Nagao and Issei Fujishiro Graduate School of Science and Technology Keio University Yagami, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan {nagao, fuji}@fj.ics.keio.ac.jp Abstract—Digital signage has been getting more popularity due to the recent development of underlying hardware technol- ogy and improvement in installing environments. Any signage is required to make its content more attractive to viewers by evaluating the current attractiveness on the fly, in order to deliver the message from the sender more effectively. However, most previous methods for signage require time to reflect the viewers’ evaluations. In this paper, we present a novel, viewer-adaptive digital signage displaying academic conference posters, which automatically learns what structure and content are being attractive through the viewers’ involuntary behaviors, and makes them more adapted to the viewers. This system takes away the current gap between viewers’ evaluations and the content actually displayed on digital signage, and makes efficient mutual transmission of information between the cyberworld and the reality. Keywords-digital signage; image processing; genetic algorith- m; human behavior recognition I. I NTRODUCTION Recent developments of underlying hardware technology and improvements in installing environments have facilitated the dissemination of digital signage. There can be found many advantages to use digital signage compared to tradi- tional paper signage. Especially, digital sigange can interact with viewers and thus has recently attracted much attention as a new interaction device from the field of computer vision. In order to deliver the original message effectively, any signage has to provide “attractive” content to its potential viewers. Therefore, it is important to survey viewers’ eval- uations towards the signage and modify the appearance of the content accordingly. In general, such survey and modifi- cation are carried out by humans because it is necessary to allow for the subjective matters as well as aesthetic issues. This imposes them tedious tasks, which brings on the gap between viewers’ evaluations and the content actually displayed on digital signage. Direct interaction between viewers and digital signage can be a solution for this, while most current systems rely on viewers’ voluntary interactions with digital signage. In order to remove the gap, we propose an automatic method for the survey and modification. Our method utilizes viewers’ involuntary behaviors in front of a digital signage. For example, a viewer may change his head angle for reading each part of the content. If the part was convincing he would nod or if it was unacceptable he would shrug his shoulders. Such involuntary behaviors come directly from his emotion, and hence, if the digital signage system can capture and recognize them, it can automatically estimate his actual feeling towards each part of the content. In addition to this, the gaze trajectory of the viewer suggests an ideal structure of the content to the system. Not only the survey but also the modification of the content should be done automatically. Our method utilizes an idea of genetic algorithm (GA): mutation and crossover. Through the evolutionary algorithm, the system can auto- matically learn “what structure and content are attractive” for the modification. In the system, “content designs” are regarded as “individuals” in the GA. With these novel functionalities, the system takes away the gap, and produces a new comfortable style of transmitting information. In our previous position paper [1] and poster [2], we have proposed the basic idea, while they do not include practical algorithms, nor does not take into account learning attractive structure of a content. Through the course of this study, we have recently extended our method so as to include the utilization of viewers’ attention point trajectories for learning attractive structure of a content. In order to prove the effectiveness of the proposed method, we have developed a system for automatically making academic conference posters more adapted to viewers by utilizing viewers’ involuntary behaviors. Figure 1 shows an example of adaptive conference poster. The proposed system in actual use can be seen in Figure 2, and Figure 3 illustrates how the viewers’ behaviors are recognized. II. RELATED WORK So far, direct questions to viewers, such as interviews or questionnaires, have been commonly used for evaluating a content of digital signage [3]. One major drawback in such direct questions is that much human labor is required for gathering and analyzing the sufficient evaluation data. In addition to this, it is necessary to check up the validity of the answers because they do not always tell their actual feeling in such situations. 2013 International Conference on Cyberworlds 978-1-4799-2245-1/13 $31.00 © 2013 IEEE DOI 10.1109/CW.2013.68 108

Upload: issei

Post on 25-Dec-2016

221 views

Category:

Documents


2 download

TRANSCRIPT

Improving Structured Content Design of Digital SignageEvolutionarily Through Utilizing Viewers’ Involuntary Behaviors

Ken Nagao and Issei Fujishiro

Graduate School of Science and TechnologyKeio University

Yagami, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan{nagao, fuji}@fj.ics.keio.ac.jp

Abstract—Digital signage has been getting more popularitydue to the recent development of underlying hardware technol-ogy and improvement in installing environments. Any signageis required to make its content more attractive to viewers byevaluating the current attractiveness on the fly, in order todeliver the message from the sender more effectively. However,most previous methods for signage require time to reflectthe viewers’ evaluations. In this paper, we present a novel,viewer-adaptive digital signage displaying academic conferenceposters, which automatically learns what structure and contentare being attractive through the viewers’ involuntary behaviors,and makes them more adapted to the viewers. This systemtakes away the current gap between viewers’ evaluationsand the content actually displayed on digital signage, andmakes efficient mutual transmission of information betweenthe cyberworld and the reality.

Keywords-digital signage; image processing; genetic algorith-m; human behavior recognition

I. INTRODUCTION

Recent developments of underlying hardware technology

and improvements in installing environments have facilitated

the dissemination of digital signage. There can be found

many advantages to use digital signage compared to tradi-

tional paper signage. Especially, digital sigange can interact

with viewers and thus has recently attracted much attention

as a new interaction device from the field of computer vision.

In order to deliver the original message effectively, any

signage has to provide “attractive” content to its potential

viewers. Therefore, it is important to survey viewers’ eval-

uations towards the signage and modify the appearance of

the content accordingly. In general, such survey and modifi-

cation are carried out by humans because it is necessary

to allow for the subjective matters as well as aesthetic

issues. This imposes them tedious tasks, which brings on the

gap between viewers’ evaluations and the content actually

displayed on digital signage. Direct interaction between

viewers and digital signage can be a solution for this, while

most current systems rely on viewers’ voluntary interactions

with digital signage. In order to remove the gap, we propose

an automatic method for the survey and modification.

Our method utilizes viewers’ involuntary behaviors in

front of a digital signage. For example, a viewer may change

his head angle for reading each part of the content. If the part

was convincing he would nod or if it was unacceptable he

would shrug his shoulders. Such involuntary behaviors come

directly from his emotion, and hence, if the digital signage

system can capture and recognize them, it can automatically

estimate his actual feeling towards each part of the content.

In addition to this, the gaze trajectory of the viewer suggests

an ideal structure of the content to the system.

Not only the survey but also the modification of the

content should be done automatically. Our method utilizes

an idea of genetic algorithm (GA): mutation and crossover.

Through the evolutionary algorithm, the system can auto-

matically learn “what structure and content are attractive”

for the modification. In the system, “content designs” are

regarded as “individuals” in the GA.

With these novel functionalities, the system takes away the

gap, and produces a new comfortable style of transmitting

information. In our previous position paper [1] and poster

[2], we have proposed the basic idea, while they do not

include practical algorithms, nor does not take into account

learning attractive structure of a content. Through the course

of this study, we have recently extended our method so

as to include the utilization of viewers’ attention point

trajectories for learning attractive structure of a content. In

order to prove the effectiveness of the proposed method, we

have developed a system for automatically making academic

conference posters more adapted to viewers by utilizing

viewers’ involuntary behaviors. Figure 1 shows an example

of adaptive conference poster. The proposed system in actual

use can be seen in Figure 2, and Figure 3 illustrates how

the viewers’ behaviors are recognized.

II. RELATED WORK

So far, direct questions to viewers, such as interviews or

questionnaires, have been commonly used for evaluating a

content of digital signage [3]. One major drawback in such

direct questions is that much human labor is required for

gathering and analyzing the sufficient evaluation data. In

addition to this, it is necessary to check up the validity of the

answers because they do not always tell their actual feeling

in such situations.

2013 International Conference on Cyberworlds

978-1-4799-2245-1/13 $31.00 © 2013 IEEE

DOI 10.1109/CW.2013.68

108

In many academic fields, there have been various attempts

to detect human behaviors. Some works achieved to recog-

nize human behaviors in a non-contact manner. For example,

there are many known attempts to recognize human faces in

video [5] [6], and Microsoft KinectTMenables non-contact

motion captures in an inexpensive way. The reliability of the

motion captured by Microsoft KinectTMis verified through

previous attempts of research and development [7].

These methods can gather viewers’ behaviors without

making them aware of the system, and thus have been

becoming to be used for content evaluation of digital sig-

nage. Some digital signage systems have recently started to

use a fixed video camera for gathering viewers’ evaluations

towards the content. Those systems mainly gather demo-

graphics segmentation, such as gender, age and viewing

time [8]. However, those data are not necessarily related to

the viewers’ feelings, and these systems cannot distinguish

whether the viewer has negative or positive feeling towards

the content. In addition, those systems cannot detect where

the viewers are looking at, and employed evaluations are

not enough to analyze which section of the content is

attractive or not. There are many known works to utilize

gaze trajectory for analyzing the structure of a Web page

[9], but there are few research attempts towards the content

of digital signage as far as we know.

The question on the attractive content design has been

a difficult question because it involves many subjective

issues encompassing from psychology to other disciplines.

There have been some research challenges into tackling this

problem. For example, Singh and Bhattacharya proposed an

algorithm to improve the aesthetics of Web interface utilizing

a GA [10]. For the evaluation in the GA, they used more

than ten geometrical features of page layout. However, the

approach does not take user’s evaluation to the Web interface

into primary account, and they concluded that it is difficult

to define the cases in which the approach works well.

Moreover, in the case of digital signage, it should be

considered that local tastes towards the content may be

different depending on the place where it is installed. There

are some known digital signage systems which enable inter-

action with viewers through utilizing interactive devices such

as touch panels. This makes it possible for the systems to

directly show them contents which they want to see, while

the systems have to wait for the explicit interaction from

them, which would not work efficiently.

Considering pros and cons of these methods, we have

proposed an idea of making digital signage more attractive

utilizing viewers’ involuntary behaviors [1] [2]. To the best

of our knowledge, this was the only one attempt to utilize

viewers’ involuntary behaviors for automatically making

digital signage more attractive. We took advantages of a

GA, and presented a method which evolutionarily produces

attractive content designs. However, the users of the system

are required to preliminarily define a hierarchical content

structure they want to show, and modification towards digital

signage is limited in terms of structure, while the defined

structure may not fit local viewers’ evaluations. On the top of

that, the position paper and poster do not include sufficiently

pragmatic algorithms.

In order to ameliorate these problems, our extended

method utilizes viewers’ involuntary behaviors not only for

evaluating each section of the content but also for learning

an ideal, overall structure of the content through utilizing

viewers’ attention point trajectories.

(a) A randomly-generated initial design in a first generation.

(b) An evolved design in the seventh generation.

Figure 1. An example of adaptive conference poster.

Figure 2. A snapshot of the proposed system in actual use.

III. APPROACH

As shown in Figure 4, the main processing flow in our

system is composed of two steps: (a) evaluation step and

(b) modification step. Hereafter, we will refer to a person

109

Figure 4. The processing flow of the proposed method. First, based on the defined content by a sender, the system randomly creates a first generation ofcontent designs. (a) The system displays each content design of the same generation in turn. When each content design is displayed, the system recognizesviewers’ involuntary behaviors and surveys the evaluations. (b) After all content designs are evaluated, they are sorted in an evaluation order. Then thesystem breeds new content designs through mutation and crossover, considering the evaluations. Also, some highly-ranked content designs are inherited tothe next generation. After generating new content designs of the next generation, the system displays them again for re-evaluations. Repeating these steps,the system evolutionarily produces more attractive content designs. If an acceptable content design is reached, the evolution terminates.

Figure 3. An example of recognition of viewers’ behaviors. The systemuses Microsoft KinectTMand Kinect for Windows SDK for motion captureand face recognition.

who uses this system for delivering his messages to viewers

simply as “sender”.

Firstly, a sender is required to give the system what he

wants to show to the public in a certain format. Using these

contents, the system randomly creates some content designs,

which constitute a first generation of content designs.

Next, the system starts to display them to the public

in turn. While each content design is being displayed, the

system recognizes the viewers’ involuntary behaviors in

front of the digital signage for evaluating each section and

the sectional structure of each content design. Without the

need to wait for the viewers’ voluntary interactions, the

system can gather evaluations from a number of viewers.

In addition to this, those behaviors come directly from their

actual feelings towards the content designs, and thus it is not

necessary to check up the validity of the evaluation. “Invol-

untary behaviors” include gaze-related information such as

head pose and pointing fingers. Therefore, our system can

understand which section of the content is attractive from

them. Also, how the viewer’s attention point moves tells the

system the relationships among the sections.

Using these analysis results as a guide, the system gen-

erates the next generation of content designs in two ways:

mutation and crossover. The next generation also includes

the top-ranked content designs in the current generation.

After that, the system displays the new content designs once

again for viewers to re-evaluate them. Repeating these steps,

the system evolutionarily produces more attractive content

designs, and reaches an ideal content design which fits local

tastes. This comes from an idea of GA, which is useful for

solving problems whose structure is not well understood and

for which no exact solution is found, such as the problem

of seeking “what structure and content are attractive”. Also,

GA has an ability to overwhelm manual designs.

Through the proposed method, the system automatically

learns “what structure and content are attractive” and mod-

ifies the content based on the evaluations, so the senders

do not need to analyze the local tastes, search for attractive

designs, nor modify the content manually.

Figure 1 illustrates one example of evolved content de-

signs. In the evolved content design, each of the figures in

the poster is independently adjusted in terms of size, and

each section is positioned based on its relationship with the

others. Also, some sections are modified for its attractiveness

in the evolved content design.

110

A. Defining a Content

First, the sender is required to make it clear what he

wants to show to the public. Here, the sender needs to

divide the content into smaller sections. For example, an

academic conference poster can actually be divided into

“title”; “authors”; “section titles”; “content of sections”; and

“reference figures”. Hereafter, we will simply refer to the

overall design of the content as content design, and small

content sections defined in this step as sections.

Each section is required to have two kinds of properties:

content properties and graphical properties. Content prop-

erties are what the sender wants to tell to the public in

the corresponding section. For example, in the section of

“authors”, content properties are specified as the names of

the authors. On the other hand, graphical properties indicate

how to decorate the section, such as font; font color; frame

color; and size of the section. Based on these properties, the

system creates the first generation of content designs.

B. Evaluation Step

In this step, the system displays the content designs of the

same generation to the public in turn. Here, the content

design will switch to another when a fixed time duration

runs over and at that point no viewer is looking at it.

While each content design is being displayed, the system

gathers the viewers’ evaluation towards the content design

by recognizing their involuntary behaviors, and analyzing

the content design.

1) Recognizing Viewers’ Involuntary Behaviors: In front

of digital signage, the viewers behave in various ways. It is

important that the system recognizes the behaviors without

making the viewers aware of the system, or their behaviors

would be biased. Viewers’ behaviors should be “involuntary”

and come directly from their actual feelings. Therefore, in

addition to the necessity of capturing the behaviors in a non-

contact way, the device should be hidden from the viewers.

In order to make these behaviors easy to understand, we

define two types of behaviors: behaviors indicating viewers’

attention point (attention pointer) and behaviors indicating

viewers’ feeling (feeling indicator).

Attention pointers are behaviors such as changing head

angle and pointing gesture. Recognizing these behaviors and

considering the position where the viewer is standing and the

size of digital signage, the system can estimate which section

the viewer is looking at. In addition to this, the trajectory of

the viewer’s attention point tells the system the relationships

among the sections.

On the other hand, feeling indicators are gestures such

as folding arms; nodding; and shrugging shoulders, which

are deeply related to the viewers’ emotions. Finding the

relationships between these behaviors and feelings has been

studied extensively [4]. Our system can learn the meaning

of the behaviors referring to such a kind of database of our

own that indicates the relationships.

When recognizing these behaviors, the system needs to

take the diversities of the viewers into account. One point is

a difference in moving speed. For example, when the viewer

is a senior, he behaves slowly, while most students behave

quickly. The system needs to consider an average motion

speed of a viewer, and changes the weight of evaluation

according to the motion speed. Another point is a difference

in culture. Meanings of viewers’ behaviors can be different

depending on cultures. Considering that previous research

on the detailed meaning of gestures has limited capability

because human affection is a subjective matter as mentioned

before, we herein focus only on one culture, Japanese, and

then classify the meanings just according to whether the

gesture has positive or negative meaning.

2) Surveying Evaluations of Each Section: Based on

these behaviors, in each content design, the system surveys

and analyzes the gathered evaluations of each section in two

aspects: how eye-catching and unconvincing the section is.

The former one is defined as the total of time duration

when viewers are looking at the section, no matter how they

behave. This time is given to each section respectively and

normalized by the size of the section. This indicates how

eye-catching the section is, without reference to viewers’

feelings towards its content properties. We will refer to this

evaluation as eye-catching evaluation of the section. The

latter one is defined as the total of time duration when the

viewers are looking at the section with negative feeling. In

the same way as eye-catching evaluation, this time is given

to each section and normalized by the size. We will refer to

this evaluation as unconvincing evaluation of the section.

3) Surveying Evaluations of Overall Structure: Unlike

the previous method [1] [2], the new system does not require

the sender to define the relationships between the sections of

the content, which are important for the layout of the content

design. One advantage of this is that it reduces human labors.

The sender only needs to define what he wants to show to

the public, and does not need to analyze the structure. The

other point is that there may be a more ideal structure rather

than the one the sender thinks the content has. For example,

in the case of academic conference posters, the relationships

between a section “system overview” and reference images

can be different. If there were professional people in the

field of the content in a place where the digital signage

is installed, they would find the relationships between the

section “system overview” and a flowchart image, while

if there were nonprofessional people, they would find the

relationships between the section and execution example

images. An appropriate structure depends on a situation

where digital signage is set, and the system needs to have

an allowance for this.

The system surveys the sectional structure evaluations

utilizing the trajectory of a viewer’s attention point. A viewer

changes his attention point while he reads the content of

digital signage, and the trajectory is based on the roles

111

and meanings of the sections. Therefore, by recognizing the

trajectory, the system can estimate the relationships between

sections. Figure 5 shows an example of relationships be-

tween sections found by the system, connecting each section

with a section which was estimated to have the biggest

relationship.

Figure 5. Based on the trajectories of viewers’ attention point, the systemestimates the structure of the content design.

C. Modification Step

After all content designs are evaluated, they are sorted

in an evaluation order based on the time duration each

content design was looked at without unconvincing feelings.

Then, the system breeds new content designs utilizing the

idea of GA. In this step, the system needs to consider the

relationships which the system has analyzed. In addition to

this, the system should not change the property values of

each section in an equally-weighted manner, because the

weight of information each section has can be different.

For example, in the case of academic conference posters,

changing the appearance of the title extensively affects the

look and feel of the content design more than changing that

of section title. Keeping such information inertia in mind,

our system makes modifications to sections with heavy

information such as the title smaller than other sections with

light information.

The modification is performed in two ways: mutation and

crossover. Figure 6 shows examples of the breeding.

1) Mutation: In this step, the system breeds some new in-

dividuals by changing the property values of each section in

the highest-ranked content design. In the system, mutation is

not executed at random. Here, the system takes into account

the evaluations of each section and sectional structure.

The system uses eye-catching evaluation and unconvinc-ing evaluation for the guide of mutation to each section. A

section which got a high value of eye-catching evaluation is

a section which the viewers found worth looking at. There-

fore, to emphasize the section is important for making the

content design more attractive, and thus the system randomly

changes its graphical properties. The system does not need to

consider what kind of change should be done to the graphical

properties for attractiveness, because unattractive individuals

will be exterminated later.

A section which got a high value of unconvincing eval-

uation is a section whose content property is thought to be

unconvincing by the viewers, and thus content properties

of the section should be changed for making the content

easier to understand, especially in the case the section

got a high value of eye-catching evaluation. In the same

way as before, the system does not need to consider what

kind of change should be done to the content properties

for making the section more convincing, and the system

randomly selects the value from the pre-defined values of

the content properties.

In addition to these modifications to each section, the

system modifies the layout of content design based on the

analysis about the relationships of sections. The system puts

up a layout where mutually-related sections are set closer.

In Figure 6, the section “system overview” is emphasized

because it gathered much attention from the viewers. On

the other hand, the sentences of the section “approach” are

replaced with other pre-defined sentence patterns because

the previous sentences were thought to be unconvincing by

the viewers. In addition to these, the sections which were

thought to have the relationships were set closer.

2) Crossover: In addition to those by mutation, the

system breeds new individuals by crossover. From highly-

ranked content designs, the system selects two content

designs for crossover, and in crossover, the system breeds

new content designs by combining property values of each

section in the two content designs.

In Figure 6, the system breeds a new content design whose

properties come from the two previous content designs.

IV. ALGORITHM

In this section, each of the steps is detailed by describing

its algorithms.

Figure 6. Examples of mutation and crossover of designs.

112

A. Defining a Content

Each section is given its section number. When the total

number of section is n, each section is given one sectionnumber from 1 to n respectively. Here, the sender defines

what values the content properties can take, because there

exist various ways to express the message and the current

system does not have natural language processing capability

for producing new sentences. On the other hand, values

of graphical properties are given to each by the system

automatically.

Giving random values to graphical properties and random-

ly selecting the pre-defined values for content properties of

each section, the system creates a fixed number of initial

content designs. Layout of each content design is also

randomly set, while there is a constraint which requires the

section “title” and “authors” to be set at the top of a poster so

that each content design satisfies minimal requirements for

a layout of an academic conference poster. These generated

content designs constitute the first generation of content

designs. The sender can define the total number N of the

content designs of the same generation. Here, the system

gives each content design one content design number from

1 to N respectively.

B. Evaluation Step

For capturing the viewers’ behaviors, our system uses

Microsoft KinectTMand Kinect for Windows SDK 1.7, which

allow us to capture various behaviors such as head pose;

facial expression; and gestures of multiple viewers involun-

tarily (See Figure 3).

By categorizing these behaviors as attention pointers or

feeling indicators, the system gathers eye-catching evalua-

tions and unconvincing evaluations. These evaluations are

approximated by Gaussian kernels. From attention pointers,

the system detects where the viewers are looking at in the

content design, and regard these as the center of Gaussian

distribution respectively. As you can see in Figure 7, the

system gives values to the content design based on the

distributions and accumulates the values until the content

design is changed. Meanings and weights of the values

are defined by feeling indicators and considered in the two

aspects as mentioned above. Therefore, when evaluations are

finished, for each content design, there are two heightmaps:

one indicates eye-catching evaluation and the other indi-

cates unconvincing evaluation. Then, in each heightmap,

the system distributes and normalizes the values to each

section based on the sizes and positions of the sections (See

Figure 7). Here, let Ej,k be the total amount of eye-catching

evaluation which has been given to the k-th section of the

j-th content design. In a similar way, let U j,k be the total

amount of unconvincing evaluation which has been given to

the k-th section of the j-th content designs.

In addition, the system recognizes the trajectories of

viewers’ attention points and estimates the relationships

Figure 7. An example of gathering evaluation. The system uses Gaussiandistributions and accumulates each evaluation by viewers until displayingthe content design finishes. There is the other heightmap which indicatesthe other evaluation.

between sections. Let F vx (x = 1 · · ·n) be a value which

indicates how long a viewer has been focusing on the x-th

section, where v indicates the identifier of the viewer given

by the system. Initially, when one content design starts to be

displayed, for any v, F v1 , F

v2 , · · · , F v

n are set to zero. When

the viewer v is looking at the f -th section, F vf increases

according to the looking time, and when the section is not

looked at, it decreases slowly until it comes to zero as time

goes. In this way, the system memorizes which sections the

viewer was looking at before. For example, if F vf was the

biggest among F v1 , F

v2 , · · · , F v

n , this means that the viewer

v was looking mainly at the f -th section before.

On the other hand, each section has values which indicate

the relationships with the other sections. Let Rj,kx (x =

1 · · ·n) be the values which the k-th section of the j-

th content design has. When some viewer is looking at

the f -th section when the j-th content design is dis-

played, Rj,f1 , Rj,f

2 , · · · , Rj,fn increase correspondingly based

on F1, F2, · · · , Fn the viewer has, and accumulate the values

by every viewer:

for any v ΔRj,fx = F v

x (x �= f, x = 1 · · ·n)From these values, the system estimates the degrees

of the relationships among sections. For example, when

displaying the number j content design is finished, if

max{Rj,f1 , · · · , Rj,f

n } is Rj,fm , this means that the f -th

section has the biggest relationship with the m-th section

in the j-th content design. Figure 8 illustrates an example

of calculating the relationships among the sections.

C. Modification Step

A probability that the modification of graphical property

is applied to a section (let pj,k be the probability of the

k-th section of the j-th content design) corresponds to the

calculated eye-catching evaluation value of each section:

pj,k =Ej,k

max{Ej,1, Ej,2, · · · , Ej,n} × aggressiveness,

113

Figure 8. An example of calculating the relationships among the sections,where viewer 1 is looking at the 3rd section of the j-th content design.The values which indicate the relationships between the 3rd section andthe other sections will increase according to the values indicating whichsection viewer 1 was looking at before mostly.

where aggressiveness means how aggressive modifications

are applied. This value can be defined by the sender. On

the other hand, considering both eye-catching evaluation and

unconvincing evaluation, a probability that the modification

of content property is applied to the section (let p′j,k be

the probability in same way as pj,k) can be formulated as

follows:

p′j,k =U j,kpj,k

max{U j,1, U j,2, · · · , U j,n} .

Here, the system creates a new layout based on

the relationships the system estimates. Firstly, the

system puts up a fixed number of layout candidates,

which are rearranged randomly, while they satisfy

minimal requirements for a layout of an academic

conference poster as mentioned before. Next,

for every layout candidate, the system calculates

d =∑n

k=1 distance(k−th section,mk−th section).Here, mk-th section is the section which has the

biggest relationship with k-th section (therefore,

Rj,kmk

is equal to max{Rj,k1 , · · · , Rj,k

n }), and

distance(k−th section,mk−th section) means Euclidean

distance between the center of k-th section and that of

mk-th section. After all d are calculated, the system finds

out the layout which has the smallest value of d. In this

layout, mutually-related sections get closer, and thus the

system selects the layout for the modification so that a

viewer can easily find out important sections which are

related to the section he is looking at.

In addition to those by mutation, the system breeds

new individuals by crossover. From highly-ranked content

designs, the system selects two content designs for crossover.

The possibility of values in these combination is based on the

eye-catching evaluation and unconvincing evaluation of each

content design so that sections with eye-catching appearance

and convincing contents are selected for the crossover. The

system compares each value in the same sections. For

example, when the system crossovers two content designs, aand b, in the section k, the probability that graphical property

in each content design is used is in the ratio of Ea,k to Eb,k.

V. RESULTS AND EVALUATION

For an empirical evaluation of the system, we made up

some sets of three content designs: (a) a content design

whose layout and properties were set by a man; (b) one

from the first generation; and (c) one from the seventh

generation where N is six and the time interval of each

content design is ten minutes. In this case, five to ten non-

professional students attended as viewers in each content

design. The reason why N is so small is that content designs

of academic conference posters are confined within a few

variety of designs and there could be similar designs if Nwas large. Figure 9 shows example results of (a), (b) and

(c).

In order to compare attractiveness, convincingness and

layout of (a), (b) and (c), we showed these produced content

designs to thirty-two subjects. As was the case with the

viewers, these subjects were non-professional students. As

we mentioned above, there are different sets of (a), (b) and

(c), and thus these people did not always see the same set

of (a), (b) and (c). After this, we asked them how they feel

about the three content designs. The results are tabulated in

Table 1.

(a) (b) (c)

Figure 9. An example of content designs used for evaluations: (a) onemade by a man; (b) one from the first generation; and (c) one from theseventh generation.

Table 1. Result of user evaluation with the three designs (a)-(c),whose examples are shown in Figure 9. These results show howmany subjects thought (c) is better than (a), and (c) is better than(b) in terms of attractiveness, convincingness and layout.

(c) is better

than (a)

(c) is better

than (b)

Attractiveness 53% 78%

Convincingness 40% 69%

Mutually-related layout 50% 69%

In the results, in every case, about 70% of the subjects

thought evolved content designs are better than the initial

content designs. This suggests most of the content designs

were improved through the method. In addition to this, about

114

half of the subjects thought evolved content designs are

better than the one by the man, which indicates the system

has an ability to overwhelm manual designs. On the other

hand, as you can see, in some cases some subjects thought

(b) is better than (c). In most of the cases, these subjects

answered this was because they preferred “simple” content

designs. For example, in Figure 10, we can easily identify

the relationships among sections in the evolved design (right)

because of the frames, while some subjects thought these

frames as annoying and preferred the initial design (left)

even though the layout of the initial design was randomly

set and did not consider the relationships among the sections.

This gives us a useful suggestion to consider the weight of

“way to change graphics” and individual preferences.

Figure 10. An example of a case where an initial content design (left) isthought to be better than an evolved content design (right).

VI. CONCLUSION

In this paper, we presented a novel pragmatic method for

surveying, evaluating and modifying the content design as

well as the structure of the content design, which can

be differentiated from the position paper [1] and poster

[2]. This method realizes the digital signage system which

automatically learns what structure and content of digital

sigange are attractive and makes the content of the digital

signage more adapted to the viewers, through utilizing the

viewers’ involuntary behaviors and the idea of GA. The

system takes away the current gap which lies between the

viewers’ evaluations and the content actually displayed on

digital signage. We proved the feasibility of the method

empirically through the development of a viewer-adaptive

digital signage for displaying academic conference posters.

One major problem in our method is the timing of chang-

ing the content design. In the method, the content design

will switch to another within the same generation when a

fixed time duration runs over. However, this is inefficient if

there were a small number of people, and we need to seek a

better way to switch the content design. Besides, we limited

the usage of digital signage only for displaying academic

conference posters, but we will be able to generalize the

method towards a variety of applications. Especially, the

proposed method enables the digital signage system to learn

the structure of the content, which paves the way for the

system to handle various contents. As for behaviors, we

just bisected involuntary behaviors, but we will be able to

categorize those behaviors into multiple kinds.

ACKNOWLEDGMENT

The work is supported in part by a Grant-in-Aid for the

Leading Graduate School Program for “Science for De-

velopment of Super Mature Society” from the Ministry

of Education, Culture, Sports, Science and Technology in

Japan.

REFERENCES

[1] K. Nagao and I. Fujishiro, “Making digital signage adaptivethrough a genetic algorithm – Utilizing viewers’ involuntarybehaviors –, Proceedings of the INSTICC International Con-ference on Computer Vision Theory and Applications, Vol. 2,pp. 54–59, 2013.

[2] K. Nagao and I. Fujishiro, “Understanding viewers involun-tary behaviors for adaptive digital signage, presented as posterat ACM SAP2013, and abstract available from IEEE Xplorer.

[3] F. Alt, S. Schneegaß, A. Schmidt, J. Muller, and N. Memarovic,“How to evaluate public displays,” Proceedings of the 2012 In-ternational Symposium on Pervasive Displays, Article No. 17,2012.

[4] H. Gunes and M. Piccardi, “A bimodal face and body gesturedatabase for automatic analysis of human nonverbal affectivebehavior,” Proceedings of the 18th International Conferenceon Pattern Recognition, pp. 1148–1153, 2006.

[5] S. Gutta, J. R. J. Huang, P. Jonathon, and H. Wechsler,“Mixture of experts for classification of gender, ethnic origin,and pose of human faces,” IEEE Transactions on NeuralNetworks, Vol. 11, No. 4, pp. 948–960, 2000.

[6] M. Pantic and L. J. M. Rothkrantz, “Automatic analysis offacial expressions: the state of the art,” IEEE Transactions onPattern Analysis and Machine Intelligence, Vol. 22, No. 12,pp. 1424–1445, 2000.

[7] R. A. Clark, Y.-H. Pua, K. Fortin, C. Ritchie, K. E. Webster, L.Denehy, and A. L. Bryant, “Validity of the Microsoft Kinectfor assessment of postural control,” Gait & Posture, Vol. 36,No. 3, pp. 372–377, 2012.

[8] J. Muller, J. Exeler, M. Buzeck, and A. Kruger, “Reflec-tiveSigns: Digital signs that adapt to audience attention,”Proceedings of the 7th International Conference on PervasiveComputing, pp. 17–24, 2009.

[9] J. H. Goldberg, M. J. Stimson, M. Lewenstein, N. Scott, andA. M. Wichansky, “Eye tracking in web search tasks: designimplications,” Proceedings of the 2002 Symposium on Eyetracking research & applications, pp. 51–58, 2002.

[10] N. Singh and S. Bhattacharya, “A GA-based approach toimprove web page aesthetics,” Proceedings of the First Interna-tional Conference on Intelligent Interactive Technologies andMultimedia, pp. 29–32, 2010.

115