human-computer interaction for smart environment applications …petriu/ijmac-handgestureface... ·...

Int. J. , Vol. x, No. x, xxxx 1

Copyright © 200x Inderscience Enterprises Ltd.

Human-Computer Interaction for Smart Environment Applications Using Hand-Gestures and Facial-Expressions

Qing Chen, Marius D. Cordea, Emil M. Petriu* University of Ottawa, 800 King Edward, Ottawa Ontario, K1N 6N5, Canada E-mail: {qchen, petriu, marius}@discover.uottawa.ca *Corresponding author Annamaria R. Varkonyi-Koczy

Budapest University of Technology and Economics, Magyar Tudosok krt. 2, 1521 Budapest, Hungary E-mail: [email protected] Thomas E. Whalen

Communications Research Centre 3701 Carling Avenue, Ottawa Ontario, K2H 8S2, Canada E-mail: [email protected] Abstract – The paper discusses two body-language human-computer interaction modalities, namely facial expressions and hand gestures, for healthcare and smart environment applications.

Keywords – human-computer interface, hand gesture, face expression, healthcare, smart environment Reference to this paper should be made as follows: (to be filled) Biographical notes: Qing Chen is a Postdoc fellow in the Discover Lab, School of Information Technology and Engineering, University of Ottawa. He received his Ph.D. and

2

M.A.Sc. in Electrical and Computer Engineering in 2008 and 2004 from the University of Ottawa. He received his Bachelor of Engineering from Jianghan Petroleum Institute, Hubei China in 1994. His research interests include computer vision, human-computer interaction and image processing. Marius D. Cordea received the engineering degree from the Polytechnic Institute of Cluj, Romania. He received his Ph.D and M.A.Sc. in Electrical and Computer Engineering from the University of Ottawa, Ottawa, ON, Canada. His research interests include interactive virtual environments, pattern recognition, and animation languages. Emil M. Petriu received the Dipl. Eng. and Dr. Eng. degrees from the Polytechnic Institute of Timisoara, Romania, in 1969 and 1978, respectively. He is a Professor and University Research Chair in the School of Information Technology and Engineering at the University of Ottawa, Ottawa, ON, Canada. His research interests include machine sensing and perception, interactive virtual environments, and soft computing. Annamaria R. Varkonyi-Koczy received her M.Sc. E.E., M.Sc.M.E.-T., and PhD degrees from the Technical University of Budapest, Hungary in 1981, 1983, and 1996, respectively. She is an Associate Professor at the Department of Measurement and Information Systems of the Budapest University of Technology and Economics. Her research interests include digital image and signal processing, soft computing, anytime, and hybrid techniques in complex measurement, diagnostics, and control systems. She is author, co-author of 13 books, 36 journal papers and more then 140 conference papers. Thomas E. Whalen received the Ph.D. degree in experimental psychology from Dalhousie University, Halifax, NS, Canada, in 1979. He has been conducting research in human-computer interactions at the Communications Research Centre (CRC), Ottawa, ON, Canada, a research institute of the Government of Canada, since 1979. His current research interests include access to information through natural language queries, retrieval of images using visual features, Web-based training, and the control of avatars in shared virtual environments. He is also an Adjunct Professor with the Psychology Department, Carleton University, Ottawa, and with the Department of Management Science and Finance, St. Mary’s University, Halifax.

3

1. INTRODUCTION

Pervasive computing and network-intensive technologies are enabling the development of a new human-centric, symbiotic human-computer partnership [1], [2]. The resulting emergent e-society paradigm encompasses a variety of domains such as e-health, intelligent housing, and e-commerce.

New human-computer interaction (HCI) modalities have to be developed to allow humans to interact naturally and intuitively with theses intelligent spaces through non-verbal body-language modalities which include facial expressions, eye movements, hand gestures, body postures and walking styles [3], [4], [5].

Hand gestures represent a powerful non-verbal human communication modality [6], [7]. The expressiveness of hand gestures can be explored to achieve natural human-computer interactions (HCI) in smart environments.

To gather gesture data, device-based techniques such as data gloves have been used to capture hand motions. However, recent progress in computer vision technology and pattern recognition algorithms offer more efficient alternatives for hand gesture recognition. We will discuss vision-based hand tracking and gesture classification, focusing on tracking the bare hand and recognizing hand gestures without the help of any markers and gloves. This paper will address both issues of hand gesture recognition: hand postures and hand gestures.

Interest in facial expression can be dated back to the mid 19th century, when Charles Darwin wrote The Expression of the Emotions in Man and Animals [8]. Later, two sign communication psychologists, Ekman and Friesen, developed the anatomically-oriented Facial Action Coding System (FACS) based on numerous experiments with facial muscles [9], [10]. They defined the Action Unit (AU) as a basic visual facial movement, which cannot be decomposed into smaller units. One or more muscles control the AUs. Their system also describes complex facial expressions in terms of basic facial muscle actions. Ekman and Friesen reduced the distinguishable expression space to a comprehensive system, which could distinguish all possible visually

4

facial expressions by using a limited set of only 46 AUs. Complex facial expressions can be obtained by combining different AUs.

Facial expression analysis, recognition and interpretation represent challenging tasks for image processing, pattern recognition and computer vision. We have developed a vision-based face expression recognition including automatic finding, tracking and modeling of human faces from image sequences of human subjects without makeup, using un-calibrated cameras operating in unknown lighting conditions and background.

Recognizing facial movements can help physically impaired people to control devices. Also, facial models may be used to the rehabilitation of people with impairments of facial muscles, and represent a useful tool in psychology in facial recognition and nonverbal communication areas.

Extrapolating ideas published by authors over the years on the evolution of the instrumentation and measurement field and on the human-computer interaction [11], [12], [13], [14], [15], this paper discusses two body-language human-computer interaction modalities based on the video-recognition of human hand gestures and face expressions.

These non-verbal interaction modalities are particularly suited for the real-time monitoring of the human behavior in a variety of human-centered ubiquitous computer applications such as intelligent networked spaces for healthcare and eldercare as well as for living (smart–houses), working, shopping, and entertainment.

One typical application allows helping elderly and disabled individuals to live securely in their homes by monitoring their behavior. The traditional approach is based on an alarm button worn on the wrist which is used for calling help in emergency situations [16]. However, a considerable portion of users do not wear this device permanently as expected. A more convenient solution is provided by a smart video monitoring system installed in the living premises which triggers an alarm as soon as abnormal events, such as people falling, are detected.

5

2. HAND GESTURE RECOGNITION

Device-based techniques such as data gloves are commonly used to capture hand motions in laboratory environments [17]. However, the gloves and their attached wires are cumbersome and awkward for users to wear. Moreover, the cost of data gloves is often prohibitive for regular users. With the latest technical advances in computer vision, image processing and pattern recognition, real-time vision-based gesture recognition is becoming a preferable alternative.

The human hand is a complex articulated structure consisting of many connected links and joints. Considering the global hand pose and each finger joint, human hand motion has roughly 27 degrees of freedom [18]. The hand gesture recognition is based on two major aspects, [19] [20]:

• hand posture, defined as a static hand pose and its location without any movements involved.

• hand gesture, defined as a sequence of hand postures connected by continuous motions over a short time span.

The dynamic movement of hand gestures includes two aspects: global hand motions and local finger motions [21]. Global hand motions change the position or orientation of the hand. Local finger motions involve moving the fingers in some way but without change in the position and orientation of the hand. Compared with hand postures, hand gestures can be viewed as complex composite hand actions constructed by global hand motions and a series of hand postures that act as transition states.

Based on the hierarchical composite property of hand gestures, it is natural to use a divide-and-conquer strategy to break hand gestures into basic hand postures and movements, which can be treated as primitives for the syntactic analysis based on grammars. To implement this strategy, we use a two-level architecture shown in Figure 1. This architecture decouples hand gesture recognition into two stages: the low-level hand posture detection and tracking, and the high-level hand gesture recognition and motion analysis.

6

Figure 1. The two-level architecture for hand gesture

recognition.

Because there is no obvious structural information for hand postures, it is more appropriate to use a statistical approach for the low-level of the architecture. Two components are included in the statistical approach: the training component and the posture detection and tracking component.

The high-level of the architecture is responsible for gesture recognition and motion analysis. A syntactic approach based on the linguistic pattern recognition technique is selected to fully exploit the composite property of hand gestures. The detected hand postures and motion trajectories are sent from the low-level to the high-level of the architecture as primitives for syntactic analysis so that the whole gesture can be recognized according to the defined grammars. There are two components included in the high-level of the architecture: the local finger motion analysis component and the global hand motion analysis component.

For the low-level hand posture detection, to achieve speed, accuracy and robustness against different backgrounds and lighting conditions, we use a statistical approach based on a set of Haar-like features (see Figure 2), which focus on the information within a certain area of the image rather than each pixel [22]. Compared with

7

other image features such as skin color [7], hand shape [23] and fingertips [24] [25], Haar-like features are more robust against noises and different lighting conditions.

Figure 2. The extended set of Haar-like features.

To improve the detection accuracy and meet the real-

time performance requirements we use the AdaBoost (Adaptive Boost) learning algorithm that can efficiently get rid of the false images and detect the target images with the Haar-like features selected in each step [26]. The AdaBoost learning algorithm takes a set of positive samples (i.e. images contain the object of interest) and a set of negative samples (i.e. images do not contain the object of interest) for the training. During the training process, distinctive Haar-like features are adaptively selected to detect the images containing the object of interest at each training stage.

In our implementation, four hand postures shown in Figure 3 are selected. The selection of this posture set is based on our experiment results which show that no confusion is caused among these postures by the employed algorithm. The camera provides video capture with a resolution of 320×240 up to 15 frames-per-second. We collected 400 positive samples for each posture and 500 random images as negative samples. The required false alarm rate is set at 1×10-6 to terminate the training process. A multi-stage cascade classifier is trained for each of the postures when the training process is terminated. The average detection rate of the final cascade classifier is 97.5%. The time required for the

8

classifiers to process 100 320×240 true color images is around 3 seconds.

Figure 3. The four selected hand postures.

Background subtraction (Figure 4) is used to achieve the robustness against cluttered backgrounds. The system takes the first input image (no postures included in the image scene) as the background and subtracts this image from later input frames. To reduce the noise produced after the background subtraction, median filtering and image dilation/erosion are used.

Figure 4. Background subtraction and noise removal. With the detected hand postures from the low-level of

the structure, three hand gestures are constructed as shown in Figure 5. Each gesture is composed of two postures. The order of the postures can be reversed so that each gesture can be a repetitive action, which improves the comfort for the user.

9

Figure 5. The hand gestures constructed by different hand

postures.

We use linguistic pattern recognition techniques and semantic model to develop a syntactic representation that will allow the understanding of different gestures. With this approach, hand gestures can be specified as building up by a series of hand postures in various ways of composition, just as phrases are built up by words. The rules governing the composition of hand postures into different hand gestures can be specified by a grammar. The syntactic approach provides the capability to describe complex hand gestures by using a small group of simple hand postures and a set of grammars.

To describe the inherently ambiguous dynamic gesture motions, we use linguistic pattern representation based on Stochastic Context-Free Grammars (SCFGs) which assign probabilities to the grammatical production rules [27].

The SCFG that generates hand gestures is defined by: ),,,( SPVVG GTGNGG =

where ),,,{},{ ltfpVSV TGNG ==

and PG: lfSrtfSrpfSr →→→ %25

3%35

2%40

1 :,:,:

The terminals p, f, t, l stand for the four postures: “palm”, “fist”, “two fingers” and “little finger”.

The pipe structure shown in Figure 6 converts the input postures into a sequence of terminal strings.

10

Figure 6. The pipe structure to convert hand postures to

terminal strings. After a string “x” is converted from the postures, we

can decide the most likely product rule that can generate this string by computing the probability:

)(),()( rPxzDxrP r=⇒ where D(zr, x) is the degree of similarity between the input string “x” and “zr”, which is the string derived by the production rule “r”.

D(zr, x) can be computed according to:

)()()(),(

xCountzCountxzCountxzD

r

rr +

∩=

For example, let us consider an input string is detected as “pf”, since:

%10%4042)( 1 =×=⇒ pfrP

%75.8%3541)( 2 =×=⇒ pfrP

%25.6%2541)( 3 =×=⇒ pfrP

According to the greatest probability, the gesture represented by this string should be identified as a “Grasp” gesture.

The SCFG can be extended to describe more complex gestures that include more postures. The assignment of the probability to each production rule can also be used to control the “wanted” gestures and the “unwanted” gestures. Greater probability could be assigned to the

11

“wanted” gestures such as the “Grasp” gesture in previous example. Smaller probability could be assigned to the “unwanted” gestures so that the resulting SCFG would generate the “wanted” gestures with higher probabilities. For example, assume that we noticed the “Quote” gesture is more often performed by users and the “J” gesture is less popular, we can define PG as:

lfSrtfSrpfSr →→→ %25

3%40

2%35

1 :,:,:

where “r2” (corresponding to the “Quote” gesture) is assigned with the highest probability. If an input string is detected as “tl”, according to similar computation, it will be classified as a “Quote” gesture. 3. FACE EXPRESSION RECOGNITION

Our real-time head tracking and facial expression

recovery is based on an Anthropometric Muscle-Based Active Appearance Model (AMB AAM) approach. Each image is analyzed and transformed into a 3D-wireframe head model, as illustrated in Figure 7, controlled by 7 pairs of muscles: zygomatic major, anguli depressor, frontalis inner, frontalis outer, frontalis major, lateral corrugator , nasi labii, and the jaw drop.

Figure 7. Muscle-controlled 3D wireframe head model.

Using a priori knowledge about the images to be

transmitted, a wireframe model of a generic human face is molded on the live image of the particular person at the emission end.

In order to represent faces by 3D models, the vision system has to perform challenging tasks such as face detection, tracking and facial expression understanding.

12

As the head motion and facial expressions are combined in video images, we developed a statistical model-based real-time tracking system able to separate and recover the head’s rigid motions and the non-rigid facial expressions. It uses an Extended Kalman Filter (EKF) to recover the head pose (global motion) and the 3D AMB AAM to recover the facial expressions (local motion) [28].

Figure 8. Visual tracking and recognition of facial

expression (from [29]). Our system, shown in Figure 8, consists of two main

modules: (i) tracking module, delivering the 2D point measurements pi(ui,vi) of the tracked features, where i=1,…,m and m is the number of measurement points, and (ii) estimator module for the estimation of 3D motion and expressions, delivering a state vector:

( , , , , , , , )x y z js t t t f aα β λ=

13

where ),,,,,( λβαzyx ttt are the 3D camera/object relative translation and rotation, f is the camera focal length, and aj are the 3D model’s muscles that generate facial expressions, where j= 1,…,n and n is the number of muscles.

The set of 2D features to be tracked is obtained by projecting their corresponding 3D model points Pi (Xi, Yi, Zi), where i=1,…,m and m is the number of tracked features.

The 3D AMB AAM allows modeling the shape and appearance of human faces using a generic 3D wireframe model of the face. The shape of our model is described by two sets of controls: (i) the muscle actuators to model facial expressions, and (ii) the anthropometrical controls to model different facial types.

The 3D face deformations are defined on the Anthropometric-Expression (AE) space derived from Waters’ muscle-based face model [30] and Facial Action Coding System (FACS) [10]. FACS defines Action Unit (AU) as the basic facial movement controlled by one or more muscles. Expression Action Unit (EAU) represents the facial expression space, and Anthropometry Action Unit (AAU) represents the space of facial types.

Using additive combinations of muscle actuators, we implemented the six basic emotional expressions. The combinations containing the muscles responsible for generating these expressions, and their range of deformation are shown in Figure 9.

Tracking should allow the system to estimate the motion while locating the face and coping with different degrees of rigid motion and non-rigid deformations. Since we separate the global from local motion search spaces, the features to be tracked must originate from rigid regions of the face.

The main purpose of the EKF is to filter and predict the rigid and non-rigid facial motion. It consists of two stages: time updates (or prediction) and measurement updates (or correction). EKF provides iteratively an optimal estimate of the current state using the current input measurement, and produces an estimate of the future state using the underlying state model.

14

Emotional Expressions

Mouth Corners 0.0 – 0.9

Eyebrow Inner 0.0 – 0.2

1. Happiness

Jaw Drop 0.0 – 0.5

Nose Base 0.0 – 0.7

Eyebrow Inner -0.9 – 0.0

Eyebrow Middle -0.9 – 0.0

2. Anger

Eyebrow Outer -0.5 – 0.0

Mouth Corners 0.0 – 0.4


Eyebrow Middle 0.0 – 0.6

3. Surprise

Jaw Drop 0.0 – 0.9

Mouth Corners -0.9 – 0.0


4. Sadness




Eyebrow Outer 0.0 – 0.2

5. Fear

Jaw Drop 0.0 – 0.4

15

Nose Base 0.0 – 0.9

Eyebrow Inner -0.1– 0.0

Eyebrow Outer 0.0 – 0.2

6. Disgust

Jaw Drop 0.0 – 0.1

Figure 9. Emotion FACS models. The EKF state and measurement equations are:

)()()1( kkAsks ξ+=+ )()()( kkHskm η+=

where s is the state vector, m is the measurement vector, A is the state transition matrix, H is the Jacobian that relates state to measurement, and )(kξ and )(kη are error terms modeled as Gaussian white noise.

In our case the state vector is s (translation, rotation, velocity, focal_length, actuators) that contains the relative 3D camera-object translation, rotation and their velocities, camera focal length, and the actuators/muscles.

The tracking system can cope with different degrees of rigid motion and non-rigid deformations as illustrated in Figure 10 and Figure 11.

Figure 10. A typical real-pose-expression EKF tracking

sequence (face from BU database [31]). The system can track the 3D head pose and four EAUs

(Jaw Drop, Mouth Corners, Eyebrow Middle, and Eyebrow Inner). Experiments have shown that the motion tracking system is able to work in a realistic environment

16

without makeup on the face, with an un-calibrated camera, and unknown lighting conditions and background.

Figure 11. A typical real- expression EKF tracking

sequence (face from MMI database [32]). Figure 12 and Figure 13 illustrate the performance of

the facial expression recognition for two cases, person-dependent and person-independent respectively.

Figure 12. Person-dependent recognition of facial

expressions (faces from MMI database [32]).

While, as expected, the person-dependent cases have a better recognition rate, 86.1%, the system has a

17

respectable 76.2% recognition rate in person-independent cases, which could be quite acceptable for some practical applications.

Figure 13. Person-independent recognition of facial

expressions (faces from MMI database [32]). 4. CONCLUSIONS

Experiments have shown that both the hand gesture

and the facial expression tracking and recognition systems are able to work in a realistic environment without makeup, with un-calibrated cameras and unknown lighting conditions and background.

Hand gestures and facial expressions are powerful non-verbal strongly context-dependent human-to-human communication modalities. While understanding them may come naturally to humans, describing them in an unambiguous algorithmic way is not an easy task.

In order to reduce the level of ambiguity in the meaning of hand gestures and facial expressions, they have to be considered in their operational “context”, or status of their environmental situation, which includes the

18

physical environment (location, identity, time, and activity), as well as human factors. ACKNOWLEDGMENT

This work was funded in part by the Natural Sciences

and Engineering Research Council of Canada (NSERC), the Communications and Information Technology Ontario (CITO), and the Communications Research Centre Canada (CRC). REFERENCES

[1] W. Pedrycz, “Computational intelligence and visual computing: an emerging technology for software engineering,” Soft Computing - A Fusion of Foundations, Methodologies and Applications, Vol. 7, No.1, pp. 1432-7643, Nov. 2002.

[2] L. Marchesotti, C. Bonamico, C. Regazzoni, F. Lavagetto, ”Video Processing and Understanding Tools for Augmented Multisensor Perception and Mobile User Interaction in Smart Spaces,” Int. J. Image and Graphics, Vol. 5, No. 3, pp. 679-698, July 2005.

[3] A. Schmidt et al., “Ubiquitous Interaction – Using Surfaces in Everyday Environments as Pointing Devices,” in User Interfaces for All, (N. Carbonell and C. Stephanidis - Eds.), LNCS 2615, pp. 263-279, Springer Verlag, Berlin, Heidelberg, 2003.

[4] W. Hu, T. Tan, L.W.S. Maybank, “A survey on visual surveillance of object motion and behaviours,” IEEE Trans. SMC, Part C, vol. 34, no. 3, pp. 334-352, 2004.

[5] R.D. Green, L. Guan, “Quantifying and recognizing human movement patterns form monocular video images,” IEEE CSVT Trans, 13(2): 154-165, 2004.

[6] V.I. Pavlovic et al, “Visual interpretation of hand gestures for human-computer interaction: a review,” IEEE Tr. Patern Analysis and Machine Intell., Vol. 19, No. 7, pp. 677-695, Jul 1997.

[7] K. Imagawa et al., “Recognition of local features for camera-based sign language recognition system,”

19

Proc. Int. Conf. Pattern Recognition, vol.4, pp. 849–853, 2000.

[8] C. R. Darwin, The expression of emotions in man and animals, New York, Appleton, 1896.

[9] P. Ekman, W. Friesen, Facial Action Coding System: A Technique for the Measurement of the Facial Movement, Consulting Psychologists Press, Palo Alto, 1977.

[10] P. Ekman, W. Friesen, Manual for the Facial Action Coding System, Consulting Psychologists Press, Palo Alto, 1986.

[11] E.M. Petriu, T.E. Whalen, “Computer-Controlled Human Operators,” IEEE Instrum. Meas. Mag., Vol. 5, No. 1, pp. 35 -38, 2002.

[12] M.D. Cordea, D.C. Petriu, E.M. Petriu, N.D. Georganas, T.E. Whalen, “3-D Head Pose Recovery for Interactive Virtual Reality Avatars,” IEEE Trans. Instr. Meas., Vol. 51, No. 4, pp. 640 -644, 2002.

[13] T.E. Whalen, D.C. Petriu, L. Yang, E.M. Petriu, M.D. Cordea, “Capturing Behaviour for the Use of Avatars in Virtual Environments,” CyberPsychology & Behavior, Vol. 6, No. 5, pp. 537-544, 2003.

[14] Q. Chen, N.D. Georganas, E.M. Petriu, “Real-Time Vision-Based Hand Gesture Recognition with Haar-like Features and Grammars,” (6 pages), Proc. IMTC/2007, IEEE Instrum. Meas. Technol. Conf., Warsaw, Poland, May 2007.

[15] M.D. Cordea, E.M. Petriu, D.C. Petriu, “3D Head Tracking and Facial Expression Recovery using an Anthropometric Muscle-Based Active Appearance Model,” (6 pages), Proc. IMTC/2007, IEEE Instrum. Meas. Technol. Conf., Warsaw, Poland, May 2007.

[16] A. Särelä, I. Korhonen, J. Lötjönen, “IST Vivago® - an intelligent social and remote wellness monitoring system for the elderly,” Proc. 4th Annual IEEE Conf. Inf. Tech. App. Biomed. Birmingham, UK, pp. 362-365, 2003.

[17] Q. Chen, A. El-Sawah, C. Joslin, N. D. Georganas, “A dynamic gesture interface for virtual environments based on Hidden Markov Models,” Proc. IEEE Int. Workshop on Haptic, Audio and Visual Environments and their Applications, 2005, pp. 110–115.

20

[18] Y. Wu, T. S. Huang, “Hand modeling analysis and recognition for vision-based human computer interaction,” IEEE Signal Processing Magazine, Special Issue on Immersive Interactive Technology, vol. 18, no. 3, pp. 51-60, 2001.

[19] R. H. Liang, M. Ouhyoung, “A real-time continuous gesture recognition system for sign language,” Proc. Third Int. Conf. on Automatic Face and Gesture Recognition, pp. 558-565, 1998

[20] A. Corradini, “Real-time gesture recognition by means of hybrid recognizers,” Revised Papers from the Int. Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction, LNCS-2298, pp. 34-46, Springer, 2001.

[21] J. Lin, Y. Wu, T. S. Huang, “Modeling the constraints of human hand motion,” Proc. IEEE Workshop on Human Motion, 2000, pp. 121-126.

[22] R. Lienhart, J. Maydt, “An extended set of Haar-like features for rapid object detection,” Proc. IEEE Int. Conf. Image Processing, vol. 1, pp.900-903, 2002.

[23] F. Chen, C. Fu, C. Huang, “Hand gesture recognition using a real-time tracking method and Hidden Markov Models,” Image and Vision Computing, vol.21, no.8, pp.745–758, 2003

[24] Z. Zhang, Y. Wu, Y. Shan, S. Shafer, “Visual panel: Virtual mouse keyboard and 3D controller with an ordinary piece of paper,” Proc. Workshop on Perceptive User Interfaces, 2001.

[25] S. Malik, C. McDonald, G. Roth, “Hand tracking for interactive pattern-based augmented reality,” Proc. Int. Symposium on Mixed and Augmented Reality, 2002.

[26] P. Viola, M. Jones, Robust real-time object detection, Cambridge Research Laboratory Technical Report Series, CRL2001/01, pp.1-24, 2001.

[27] R.C. Gonzalez, M.G. Thomason, Syntactic Pattern Recognition: An Introduction, Addison-Wesley Publishing Co., Reading, MA, 1978

[28] M.D. Cordea, E.M. Petriu, “A 3-D Anthropometric Muscle-Based Active Appearance Model,” IEEE Trans. Instrum. Meas., vol. 55, No. 1, pp. 91 - 98, 2006.

21

[29] M. D. Cordea, A 3D Anthropometric Muscle-Based Active Appearance Model for Model-Based Video Coding, Ph.D. Thesis, University of Ottawa, Ottawa, ON, Canada, 2007.

[30] K. Waters, “A Muscle Model for Animating 3D Facial Expressions,” Computer Graphics, 21(4): 17, July 1987.

[31] S. Sclaroff, J. Isidoro, Active Blobs, Intl. Conf. on Computer Vision, pp. 1146–1153, Mumbai, India, 1998.

[32] M. Pantic, M. F. Valstar, R. Rademaker, L. Maat, “Web-based database for facial expression analysis,” Proc. IEEE Conf. Multimedia and Expo, pp. 317–321, 2005.

human-computer interaction for smart environment applications …petriu/ijmac-handgestureface... ·...

Documents