a phonological expression for physical movement monitoring in

11
A Phonological Expression for Physical Movement Monitoring in Body Sensor Networks Hassan Ghasemzadeh, Jaime Barnes, Eric Guenterberg, Roozbeh Jafari Embedded Systems and Signal Processing Laboratory Department of Electrical Engineering, University of Texas at Dallas {h.ghasemzadeh, jaime. barnes, mavpion}@student.utdallas.edu, rjafari@utdallas. edu Abstract Monitoring human activities using wearable wireless sensor nodes has the potential to enable many useful applications for everyday situations. The deployment of a compact and computationally efficient grammatical representation of actions reduces the complexities involved in the detection and recognition of human behaviors in a distributed system. In this paper, lve introduce a road map to a linguistic framework for the symbolic representation of inertial information for physical movement monitoring. Our method for creating phonetic descriptions consists of constructing primitives across the network and assigning certain primitives to each movement. Our technique exploits the notion of a decision tree to identify atomic actions corresponding to every given movement. We pose an optimization problem for the fast identification of primitives. We then prove that this problem is NP- Complete and provide a fast greedy algorithm to approximate the solution. Finally, we demonstrate the effectiveness of our phonetic model on data collectedfrom three subjects. 1. Introduction Wireless sensor networks are emerging as a promising platform for a large number of application domains. Applications range from monitoring systems such as environmental and medical monitoring to detection and supervision applications such as acoustic beamforming and military surveillance. In most applications, a sensor node is expected to acquire physical measurements, perform local processing and storage, and communicate within a short distance. Employing sensor networks to understand human actions is expected to be of use in numerous aspects of everyday life. In particular, 1-4244-2575-4/08/$20.00 © 2008 IEEE 58 sensor networks can be effective for rehabilitation sports medicine, geriatric care, and gait analysis. ' Human motion recognition using wireless sensor networks can be done with either off-body or on- body sensor devices. The fields of computer vision and surveillance have traditionally used cameras, off- body devices, to monitor human movement [1]. Processing image sequences to recognize motions is the foundation of this approach. Cameras are used in tracking-based systems to detect actions performed by subjects inside an area. The use of video streams in wireless networks to interpret human motion has been considered for assisted living applications [2-4]. Body sensor networks (BSNs), in contrast, are built of lightweight sensor platforms, which are used to recognize the actions performed by the person wearing them [5]. The sensors can be mounted on the human body or clothing or even woven into the fabric of the clothing itself. Unlike vision-based platforms, BSNs require no environmental infrastructures. They are also less expensive. Moreover, the signal readings from on-body sensors are clearer and unbiased by environmental effects like light and the background. This makes BSNs potentially more accurate than vision frameworks for motion recognition. We address the problem of movement representation in BSNs. Current methodologies in BSNs map all sensor readings to an identical feature space and then use traditional pattern recognition techniques to detect movement [6, 7]. A more efficient form of representation is a linguistic framework. Motivation for such an approach comes from the fact that movement and spoken language use a similar cognitive substrate in terms of grammatical hierarchy [8]. To construct a language for movement, the first step is to find the basic primitives, called phonemes, and assign appropriate symbols to them. This step, called phonology, creates the foundation for following steps. The second step,

Upload: ngotruc

Post on 27-Dec-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

A Phonological Expression for Physical Movement Monitoring inBody Sensor Networks

Hassan Ghasemzadeh, Jaime Barnes, Eric Guenterberg, Roozbeh JafariEmbedded Systems and Signal Processing Laboratory

Department ofElectrical Engineering, University ofTexas at Dallas{h.ghasemzadeh, jaime. barnes, mavpion}@student. utdallas. edu, rjafari@utdallas. edu

Abstract

Monitoring human activities using wearablewireless sensor nodes has the potential to enablemany useful applications for everyday situations. Thedeployment of a compact and computationallyefficient grammatical representation of actionsreduces the complexities involved in the detectionand recognition of human behaviors in a distributedsystem. In this paper, lve introduce a road map to alinguistic framework for the symbolic representationof inertial information for physical movementmonitoring. Our method for creating phoneticdescriptions consists of constructing primitivesacross the network and assigning certain primitivesto each movement. Our technique exploits the notionof a decision tree to identify atomic actionscorresponding to every given movement. We pose anoptimization problem for the fast identification ofprimitives. We then prove that this problem is NP­Complete and provide a fast greedy algorithm toapproximate the solution. Finally, we demonstratethe effectiveness of our phonetic model on datacollectedfrom three subjects.

1. Introduction

Wireless sensor networks are emerging as apromising platform for a large number of applicationdomains. Applications range from monitoringsystems such as environmental and medicalmonitoring to detection and supervision applicationssuch as acoustic beamforming and militarysurveillance. In most applications, a sensor node isexpected to acquire physical measurements, performlocal processing and storage, and communicatewithin a short distance. Employing sensor networksto understand human actions is expected to be of usein numerous aspects of everyday life. In particular,

1-4244-2575-4/08/$20.00 © 2008 IEEE

58

sensor networks can be effective for rehabilitationsports medicine, geriatric care, and gait analysis. '

Human motion recognition using wireless sensornetworks can be done with either off-body or on­body sensor devices. The fields of computer visionand surveillance have traditionally used cameras, off­body devices, to monitor human movement [1].Processing image sequences to recognize motions isthe foundation of this approach. Cameras are used intracking-based systems to detect actions performedby subjects inside an area. The use of video streamsin wireless networks to interpret human motion hasbeen considered for assisted living applications [2-4].Body sensor networks (BSNs), in contrast, are builtof lightweight sensor platforms, which are used torecognize the actions performed by the personwearing them [5]. The sensors can be mounted on thehuman body or clothing or even woven into thefabric of the clothing itself. Unlike vision-basedplatforms, BSNs require no environmentalinfrastructures. They are also less expensive.Moreover, the signal readings from on-body sensorsare clearer and unbiased by environmental effectslike light and the background. This makes BSNspotentially more accurate than vision frameworks formotion recognition.

We address the problem of movementrepresentation in BSNs. Current methodologies inBSNs map all sensor readings to an identical featurespace and then use traditional pattern recognitiontechniques to detect movement [6, 7]. A moreefficient form of representation is a linguisticframework. Motivation for such an approach comesfrom the fact that movement and spoken languageuse a similar cognitive substrate in terms ofgrammatical hierarchy [8]. To construct a languagefor movement, the first step is to find the basicprimitives, called phonemes, and assign appropriatesymbols to them. This step, called phonology, createsthe foundation for following steps. The second step,

morphology, consists of combining primitives toform higher-level movements. These are called themorphemes of motion language. The last step issyntax. It addresses the construction of sentencescomposed of morphemes using predefined rules andenables the most meaningful level of movementrecognition [9].

In general, modeling human movements as alanguage gives a compact representation of activities.Therefore, a linguistic model would be especiallyuseful in BSNs, which have limited communicationand computation capabilities but must collect andprocess data from several sensor nodes to recognizemovements. The linguistic model also enables theextraction of temporal characteristics of motion.However, before one can apply a linguistic approachto movement recognition, signals must be segmentedinto different movements and labeled. If the signal issegmented in a less meaningful way, such as byusing a fixed-size window, movements cannot bedecomposed into primitives, and low-level actionscannot be combined to form higher-level actions.

In this paper, we investigate a compact andflexible representation for atomic movements using aBSN platform. We provide phonological rules forconstructing primitives associated with eachmovement. We also provide insights into thedetection of movements using primitives. In ourframework, each action is described by a set ofprimitives in which each primitive is associated witha different node.

2. Related Work

Understanding human activity has attracted manyresearchers in different fields of study. In thecomputer vision community, both static and dynamicvision-based approaches have been developed. Instatic methods, individual time frames of a videosequence are used as the basic components foranalysis. Recognition then involves the combinationof discrete information resulting from individualframes. In dynamic methods, a fixed length of avideo stream is the major unit of analysis. TheHidden Markov Model (HMM) [10], which takesinto account the correlation between adjacent timeinstances by formulating a Markov process [11], isoften used for the dynamic representation of motiondue to its ability to handle uncertainty in itsstochastic framework. Examples include the workpresented in [12], which introduces a statisticaltechnique for synthesizing walking patterns. The

59

motion is expressed by a sequence of primitivesextracted using a model based on HMM.

The limitation of HMM in efficiently handlingseveral independent processes has led to the creationof grammar-based representations of human actions.In [8], Guerra-Filho et al. propose a platform for avisuo-motor language. They provide the basic kernelfor the symbolic manipulation of visual and motorinformation in a sensory-motor system. Thephonology, morphology, and syntax of obtainedinformation are detailed. Their phonological rules aremodeled by a finite automaton. For syntax, theypresent a model for sentence construction in whichactive joints, posture, human activity, and timingcoordination among different joints form the basiccomponents of language. These componentscorrespond respectively to noun, adjective, verb andadverb in a natural language. Reng et al. [13] presentan algorithm to find primitives for human gesturesusing electromagnetic sensors. The motion capturesystem measures the 3D position of body parts. Thetrajectory of motion is considered a gesture, andprimitives are constructed based on the density of thetraining data set. In [14], Jin et al. propose atechnique to reduce the semantic gap between raw3D motion and actual human behavior. The processbegins with spatial dimension reduction and thenmaps the new temporally distributed feature spaceinto the space of motion primitives. They constructmotion documentation for 3D human motion clipsthat can then be used for motion recognition usingstring matching algorithms. In [15], Abhijit et al.present a probabilistic context-free grammar (PCFG)[16] for action recognition. They create the grammarusing a multi-view video sequence, and the resultinggrammar is independent of the video perspective.Each action is described by a short sentence of basickey poses, and each pose is represented by a familyof images observed from different viewpoints. Thesystem has the ability to extract key poses andactions from multi-camera, multi-person training dataand construct a probabilistic grammar. The grammaris used to parse a single-viewpoint video to explorethe sequence of actions within the video. Morecontent-based 3D motion detection techniques can befound in [17], [18], [19], and [20].

Recently, human motion recognition using sensornetworks has been the center of much work.Lymberopoulous et al. [21] introduce a generallinguistic framework for the construction ofhierarchical probabilistic context-free grammars insensor networks. The system uses a series oflocations in time from sensor nodes equipped withcameras. At the lowest level, sensor readings are

converted to a set of symbols that become the inputsof higher-level grammars. Each level of the systemreduces the data that needs to be propagated to thehigher layers. Therefore, the transition from lowerlevels to higher levels represents more macroscopicactivities. An application of this platform is presentedin [22], where the detection of cooking activity isobtained by parsing actions through a hierarchy ofgrammar. This hierarchical approach to humanmotion representation and recognition is also thefoundation of work presented in [2], in which thepotentials for extending the platform to an assisted­living environment are discussed. Barnes et al. reportthat BSNs are a promising platform for locomotionmonitoring [23]. In [24], Logan et al. report theresults of a study on activity recognition usingdifferent types of sensory devices, including built-inwired sensors, RFID tags, and wireless motionsensors. The analysis performed on 104 hours of datacollected from more that 900 sensor inputs showsthat motion sensors outperform the other sensors onmany of the movements studied. A system calledCareNet for physical activity monitoring using awireless sensor network is presented by Jiang et al. in[25]. Commercial systems for activity recognition athome, such as [26] and [27], are available, but theywork only with certain movements. In [28], Marin­Perianu et al. introduce a correlation algorithm fordynamically grouping sensor nodes equipped withmotion sensors. Nodes are considered to be in thesame group if their movements correlate for a certainamount of time.

Although the aforementioned linguistic methodshave been successful in providing a grammaticalplatform for action recognition, they could not beefficiently employed in systems based on BSNs. Themain reason for this incompatibility is that they usevisual movements captured by cameras, whereasBSNs use inertial sensors to capture motion. Theprimitives constructed based on visual data are notappropriate for inertial data.

3. System Architecture and SignalProcessing Flow

For our pilot application, we use a BSN consistingof several sensor units in a wireless network. Eachnode, which is also called a mote, has a triaxialaccelerometer, a biaxial gyroscope, a microcontroller,and a radio, as shown in Figure 1. The processingunit of each node samples sensor readings at 22 Hzand transmits the data wirelessly to a base stationusing a TDMA protocol. The sampling rate is

60

experimentally chosen to provide sufficientresolution of human motion data while compensatingfor bandwidth constraints on our sensor platform.Our motes, Tmote Sky motes, are commerciallyavailable from moteiv® and are powered by two AAbatteries each. The sensor board is custom designed.The base station is another mote connected to alaptop by USB. For our experiments, we arranged 18sensor nodes on the subjects, as shown in Figure 2.We have used 18 nodes to ensure that the systemcaptures inertial information associated with allmajor parts of the body and to study which locationsare most useful. Fewer nodes are sufficient toaccurately recognize motion, as the results of thispaper will show.

Figure 1. A node with inertial sensors

Figure 2. Experimental subject wearing sensor nodes.

The signal processing, phoneme construction, andidentification process has six steps, as shown inFigure 3.

1. Sensor Data Collection: Data is collected fromeach of the five sensors on each of the 18 sensornodes at 22 Hz.

2. Preprocessing: The data is filtered to enableeasier processing. We use an eight-point movingaverage.

3. Segmentation: We determine the part of thesignal that represents an action. Currently weperform this process manually to avoid injectingerrors from automatic segmentation into our process.

4. Feature Extraction: Single-value features areextracted from the filtered data. Features includemean, start-to-end amplitude, standard deviation,peak-to-peak amplitude, and RMS power.

SensorPreprocessing Feature Analysis Primitive Identification

Figure 3. Signal processing flow.

5. Primitive Construction and Symbolization: Thisis the main focus of this paper. In this stage, we use aphonological approach to construct primitives foreach sensor node and present human action in termsof meaningful phonemes.

6. Per-Node Identification: Each node uses aphonetic expression of movements to map a givenaction to the corresponding primitive. In this way,each node offers local knowledge of the current eventin the system.

7. Global Identification: The global state of thesystem is recognized by combining local knowledgefrom different nodes.

Currently, we process collected data offline inMATLAB. This is convenient for rapid prototypingand algorithm development. However, the algorithmswe use for signal processing and movementrecognition are developed from relatively simpletechniques that can be implemented and executed onthe motes.

4. Preliminaries

In this section, we first investigate properties ofmotion recognition using our system. We thendiscuss the need for clustering techniques in oursystem. We look into the most popular clusteringalgorithms and the mechanisms for evaluation ofclusters. Finally, we apply clustering to the problemof primitive construction.

4.1. Recognition Objectives

While detection of movements requires a globalview of the whole system, each individual node in aBSN has only local knowledge of the event takingplace. This makes the problem of training andrecognition very challenging [21]. The amount ofknowledge provided by each node determines theusefulness of that node in recognizing movements.

61

We believe the right place to begin a discussion onphoneme construction for our system is to considerthe properties of movements and the relationshipbetween each movement and the systemconfiguration. To achieve a generalized view of theseproperties, we use a per-node study; Le., we do notmake any assumptions about network topology orapplications to avoid ad hoc methods. Wesummarize the properties of our system below.

4.1.1. Location independence: Our study is focusedon using inertial sensors to measure attributes ofhuman motions. Therefore, we do not embed anycontext data within the system. In particular, thetraining and recognition of movements isindependent of the location where the action takesplace. However, contextual information such aslocation may help to improve the accuracy ofdetection at a later stage. Context awareness can beimplemented on top of the proposed system.

4.1.2. Distributed recognition: Each node in thesystem has a different perspective on the movement.The ability of a node to recognize movements variesbased on the type of movement. For example,consider the two movements "stand to sit" and"bending". A node mounted on the arm mightcontribute to distinguishing these two movements,but a sensor on the ankle might not provide usefulinformation. The main advantage of unsupervisedclassification in our system is that by clusteringtraining sets at each node, nodes not contributing to acertain movement will be recognized. That is, we willbe able to determine which nodes are useful forrecognizing which movements.

4.1.3. Recognition invariance: The detection of amovement should be invariant with respect to thetarget population. This includes invariance to age,gender, strength, and physical capabilities. At this

stage of our study, we have not investigated thisproperty as investigating it will require data from alarge and diverse set of subjects.

4.2. Clustering Implications

We employ clustering to find primitives for eachmovement. In this section, we explain how webenefit from clustering algorithms and strategies offinding the most effective clustering configuration.

4.2.1. Clustering Techniques. Clustering isgrouping together those data points in a training setthat are most similar to each other. Two majorclustering techniques are hierarchical clustering [29]and K-means clustering [30]. In the hierarchicalmethod, each data item is initially considered a singlecluster. At each stage of the algorithm, similarclusters are grouped together to form new clusters. Inthe K-means algorithm, the idea is to define Kcentroids, one for each cluster. In this way, trainingdata are grouped into a predefined number of clusters

Unlike hierarchical clustering, in which clustersare not improved after being created, the K-meansalgorithm visits constructed clusters at each iterationand improves them gradually. The continuouslyimproving nature of this algorithm leads to high­quality clusters when provided appropriate data. Weuse this algorithm for our analysis because it issimple, straightforward, and is based on the firmfoundation of analysis ofvariances [31].

4.2.2. Cluster Validation. Although K-means is apopular clustering technique, the partition attained bythis algorithm is dependent on both the initial valueof centroids and the number of clusters. To increasethe likelihood of arriving at a good partitioning of thedata, many improvements to K-means have beenproposed in the literature. The sum of square error(SSE) is a reasonable metric used to find the globaloptimal solution [32]. To cope with the effects ofinitialization, we use uniformly distributed initialcenters [33] and repeatedly search for theconfiguration that gives the minimum error. Wecalculate the SSE error function as in (1), where Xj

denotes the ;th data item, Jt k denotes the centroid

vector associated with cluster Ck' and K is the total

number of clusters.K

SSE = L L(xi -,uk)2 (1)k=1 ieCk

Another problem with K-means is that it requiresprediction of the correct number of clusters. Usually,a cluster-validity framework provides insight into

62

this problem. We employ the silhouette qualitymeasure [34], which is robust and takes into accountboth intra-cluster and inter-cluster similarities to

determine the quality of a cluster. Let Ck be a cluster

constructed based on the K-means algorithm. Thesilhouette measure assigns a quality metric Sj to the

;th data item ofCk. This value indicates the

confidence of the membership of the th item to

cluster Ck. Sj is defined by (2), where aj is the

average distance between the th data item and all of

the items inside cluster Ck ' and hj is the minimum of

the average distances between the th item and all of

the items in each cluster besides Ck. That is, the

silhouette measure compares the distance between anitem and the other items in its assigned cluster to thedistance between that item and the items in thenearest neighboring cluster. The larger the Sj, thehigher the level of confidence about the membership

of the th sample in the training set to cluster C k .

b· -a·S. = I I (2)

I max{ai ,bi }

While Sj, also called silhouette width, describesthe quality of the membership of a single data item,the quality measure of a partition, called thesilhouette index, for a given number of clusters K iscalculated using (3), where N is the number of dataitems in the training set.

1 NSil(K)=-LSj (3)

N i=1

To obtain the most effective configuration interms of the number of clusters, one can choose the Kthat has the largest silhouette index, as shown in (4).

K = argmax{Sil(K)} (4)K

4.3. Phonology

Physical movement monitoring involves mappinga series of behaviors onto a vocabulary of actions.The vocabulary represents movements performedpreviously and stored as training sets across nodes.This vocabulary consists of words, each composed ofsegments named phonemes or primitives. The initialstep in a linguistic framework for action recognitionis phonology, which is the process of finding basicprimitives for human movement. In our networkedframework, primitives are distributed among sensornodes with varying ability to recognize movements.Our phonetic description is able to characterize each

action in terms of primitives using a two-stageprocess that we will describe next.

4.3.1. Primitive Construction. The first step inconstructing our phonetic description is to find localprimitives at each node. To do this, we use K-meansclustering to perform local clustering at each node,transforming the feature space into groups of densedata items. Each cluster corresponds to an originalprimitive in our model. This technique is effectivesince it provides insights into the usefulness of nodesin detecting each movement. Actions with similarpatterns at a certain node tend to be assigned to thesame cluster at that node, while they might berepresented by different clusters at another node.Suppose sensor readings are mapped to m differentactions {Ail, Ai2, ... , Aim} on some node i bysegmentation. The clustering algorithm willtransform these actions into a series of clusters {Pil ,Pi2, ••• , Pic}' where c ~ m represents the number ofclusters. We employ the validation techniquesexplained in Section 4.2.2 to find the most effectiveclustering configuration.

We use the notations in Table 1 throughout thissection.

5.1. Decision Tree Representation

The problem of recognizing movements usingprimitives can be viewed as a decision tree problemin which internal decision nodes represent differentsensor nodes, and terminal leaves correspond tomovements. The input to the tree is a vectorA = [~, ... ,An]' where Ai is the primitive from node i

that describes the movement to be identified, and n isthe number of nodes in the system. The aim ofmotion recognition is to assign A to one of mmutually exclusive actions. The ordering of nodes inthe tree changes its height and thus the time neededto converge to a solution. We seek a linear orderingof the nodes that minimizes convergence time. Asolution to this problem could potentially provideinsights into several crucial problems, includingdecision tree construction, distributed classification,and scheduling.

To better illustrate the identification problem, weprovide a simple example, shown in Figure 4, whichdepicts the mapping of actions to primitives. Thesystem consists of three sensor nodes denoted by S j,

S2, and S3, and four movements denoted by A, B, C,and D. The circles depict the mapping of movementsto primitives rather than the actual distribution ofdata. The system has seven final primitives denotedby {PI, P2, ••• , P7}. In node s}, movement A ismapped to primitive P}, movements Band Caremapped to primitive P2, and movement D is mappedto primitive P3. In other nodes, the movements aremapped to primitives as shown. This phoneticexpression can effectively describe the ability ofindividual nodes to identify the movements. Forinstance, node Sj can distinguish movement A fromthe rest of the movements, as it finds no ambiguitywhen mapping an action to P}, but it cannotdistinguish between movements Band C, as they aremapped to the same primitive. While each node haslimited knowledge of the system, we require a global

4.3.2. Symbolization. The second step inconstructing our phonetic description is to select afinal group of primitives and assign symbols to them.Some of the clusters defining our initial primitivesare of low quality, meaning the primitives they definewill not be good representations of our movements.We refine our clusters by calculating the silhouettequality measure for each cluster and eliminatingclusters that do not meet a certain threshold. In thisway, the set of primitives at node i might be reducedto {Pil , Pi2, ••• , Pik}, where k ~ c is the number offinal primitives after applying the quality measureand c is the number of original primitives. Aftercluster refinement, each movement is represented bya set of final primitives at each node.

5. Movement Identification Problem

Physical movement monitoring by sensornetworks requires the combination of localknowledge to achieve a global view of humanbehaviors. In this section we study the problem ofaction recognition using the semantic subspacegenerated by primitives. We pose this problem froma general detection perspective. That is, we do notmake any assumptions about the communicationmodel and signal processing techniques used forclassification. Compared to a traditional classificationapproach, our movement recognition algorithm isfaster and requires less communication and power.

63

SymbolAp

SM

Pnkji

Table 1. Notations

Descriptionset of all actions to be detectedset of primitives extracted across the networkset of sensor nodesnumber of actionsnumber of final primitivesnumber of sensor nodesindex for an actionindex for a primitiveindex for a node

view in which every movement is distinguished fromthe rest. Furthermore, we require an ordering ofsensor nodes that minimizes the total time ofconvergence.

(5)

Definition (Complete Ordering): Let A=={AJ, A2, ... ,

Am} be a finite set of actions and P=={P], P2, ••• , Pp } acollection of primitives. An ordering O=={sj, S2, ... ,

sn} is complete if the following condition holds.n'U LDS· =GDS (7)i=l 1

This indicates that the ordering is capable ofdistinguishing between all required pairs ofmovements. In the previous example, the ordering

O=={sJ, S2} is complete since LDS1 ULDS2 = GDS, but

the ordering 0=={S2' S3} is not complete because thisordering cannot discriminate between movements AandB.

LDS; = {(Ak, Ak,JI Ak E Pi ' Ak' E Pi' '

Pi E S i , Pi' E S i , j"* j'}

The LDSj expresses the pairs of actions that can bedistinguished by the lh node. In the example shownin Figure 4, the set of movements is {A, B, C, D},and the set of primitives is {p}, P2, ... , P 7}.

Therefore, the local discrimination set for node S} isLDS}=={(A,B), (A,C), (A, D), (B,D), (C,D)}. For theother nodes, we have LDS2=={(A, C), (A,D), (B,C),(B,D)} and LDS3=={(A,D), (B,D), (C,D)}.

Definition (Global Discrimination Set): Let A=={Aj,A2, ... , Am} be a finite set of actions and P=={P], Pb

... , Pp } a collection of primitives. The globaldiscrimination set GDS is defined by:

GDS={(Ak,Ak,)IAk EA;Ak, EA; ki:k'j (6)

The global discrimination set contains all the pairsof movements that are required to be distinguishedfrom one another. For example, the globaldiscrimination set for the system shown in Figure 4 isGDS=={(A,B), (A,C), (A,D), (B,C), (B,D), (C,D)}. Inthis example, the objective is to distinguish betweenevery pair of movements.

535251

P1~

P21 ® ® Ip41 @ ® I! P61@®®1

PSI®® I!P7~P3~ i

Figure 4. An example of three nodes (s}, S2, Sj) and fourmovements (A, B, C, D). The movements are mapped to seven

primitives (P], •.., P7). Each movement is symbolized bycorresponding primitives; A={PJ,P4,P6}; B={P2,P4,P6};

C={P2,PS,P6}; D={Pj ,PS'P7}.

Given an instance of a decision problem, one canconstruct different decision trees. Figure 5 illustratesa sample decision tree for the example represented inFigure 4. The problem of finding a minimal decisiontree is shown to be hard to approximate [35].Therefore, in this paper, we investigate a staticdesign decision for constructing a decision tree formotion recognition. That is, we try to make an offlinedecision about the optimal ordering of the sensornodes for recognition. Our current method of linearlyordering the nodes restricts the shape of the decisiontree so that all nodes are placed on a single path fromthe root and the tree has a height equal to the totalnumber of nodes required for recognition. We willinvestigate the construction of a full decision tree infuture.

B C

Figure 5. A sample decision tree for the example illustrated inFigure 4.

Definition (Ordering Cost): Let 0=={S J, S2, ... , sn} bea complete ordering of sensor nodes and j{Ak) afunction that gives the index of the first node inwhich the following condition holds:

5.2. Problem Formulation

In this section, we present a formal definition ofour movement identification problem.

Definition (Local Discrimination Set): Let A=={A},A2, ••. , Am} be a finite set of movements mapped to aset of primitives P=={P], Pb ... , Pp }. The localdiscrimination set LDSj for a node Sj is defined by:

I(Ak){(Ak , Ak,) Ik 7: k' , Ak , E A} C ,U LDSi (8)

1=1That is,.f{Ak) is the number of nodes necessary to

distinguish movement Ak from all other movements.Then the total cost of the ordering is given by thefollowing equation:

(9)

64

This formulation weights the cost of an orderingso that an ordering in which more movements requirefewer nodes has a lower cost. For example, let O={Sj,S2, Sj} be a complete ordering for the example shownin Figure 4. Then f{D)=l because movement D canbe completely identified at the first visited node (Sj).At the next node (S2), movement C can bedistinguished from the remaining movements (A andB). Thus,.f{C)=2 because movement C is identified atthe second node. Finally, movements A and B will bedetected at the third visited node (Sj) meaning that.f{A)=f{B)=3. Therefore, the total cost for thisordering is 9.

Definition (Min Cost Identification Problem):Given a finite set GDS and LDS, where LDS={LDSj ,

LDS2, ... , LDSn} is a collection of subsets of GDSsuch that the union of all LDSi forms GDS, Min CostIdentification (MCI) is the problem of finding acomplete linear ordering such that the cost of theordering is minimized.

In the above example, it would be easy to find theoptimal solution by a brute-force technique. We cansee that the cost for the optimal ordering ({S j, S2}) is6.

5.3. Problem Complexity

In this section we address the complexity of MinCost Identification. We show that this problem is NP­Complete by reduction from Min Sum Set Cover.Therefore no polynomial time algorithm exists thatsolves it unless P=NP.

Definition (Min Sum Set Cover): Let V be a finiteset of elements and S={SJ, S2, ... , Sm} a collection ofsubsets of V such that their union forms V. A linearordering of S is a bijection/from S to {l, 2, ... , m}.For each element e E U and linear ordering f, wedefine f{e) as the minimum of .f{S) over all{Sj :e E Sj}. The goal is to find a linear ordering that

minimizes LICe).e

Theorem 1. The Min Cost Identification problem isNP-Complete.Proof We will prove that the Min Cost Identificationproblem is NP-Complete by reduction from Min SumSet Cover (MSSC). Consider an MSSC instance(V,S) consisting of a finite set of elements U and acollection S of subsets of U. The objective is to find aminimum-cost linear ordering of subsets such that theunion of the chosen subsets of U contains allelements in U. We now define a set f] by replacingelements of U with all elements (A

k, A

k,) from the

65

GDS. We also define S by replacing its subsets Siwith LDSi. (U, S) is an instance of the MCI problem.

Therefore, MCI is NP-hard. Since solutions for thedecision problem of MCI are verifiable in polynomialtime, it is in NP, and consequently, the MCI decisionproblem is also NP-Complete.

Theorem 2. There exists no polynomial-timeapproximation algorithm for MCI with anapproximation ratio less than 4.Proof The reduction from MSSC to MCI in theproof of Theorem 1 is approximation preserving; thatis, it implies that any lower bound for MSSC alsoholds for MCI. In [36], it is shown that for every£ > 0, it is NP-hard to approximate MSSC within aratio of 4 - & • Therefore, 4 is also a lower bound forthe approximation ratio ofMCI.

5.4. Greedy Solution

The greedy algorithm for MCI is adapted from thegreedy algorithm for MSSC and is shown inAlgorithm 1. At each step, it looks for the node thatcan distinguish between the maximum number ofremaining movements. It then adds such a node tothe solution space and removes the movements itdistinguishes from further consideration. Thealgorithm terminates when all pairs ofmovements aredistinguished from each other. The approximationratio is 4 as previously discussed.

Algorithm 1. Greedy solution for Mel

Inputs: A, P, SOutput: 0

1. Calculate set LDSj for every node Sj

2. Calculate set GDS3. o=¢

4. while (0"* GDS)

5. take node Sj S.t. LDSj is maximum cardinality6. O=Ousj

7. for each (e E LDSi

)

8. remove e from all LD0" for j=1, ... ,n9. end for10. end while

Algorithm 1 can be used to find the minimumnumber and preferred locations of sensor nodesrequired to recognize certain movements. This can beused for power optimization because at any time,only a subset of sensor nodes will be required to beactive, based on the movements of interest at thattime. Furthermore, reducing the number of requirednodes helps to achieve a more wearable system,which is a critical issue in the design ofBSNs.

6. Experimental Results

To validate our proposed linguistic expression andidentification framework, we prepared an experimentusing the system described in Section 3. We hadthree subjects, one male with age 32 and two femaleswith ages 22 and 55. We placed 18 sensor nodes oneach subject as shown in Figure 2 and stated in Table3. The subjects performed the movements listed inTable 2 for ten trials each. The experiment wasdesigned to involve a relatively wide range ofmovements that required motion from different partsof the human body. Although our experiments arecarried out in a controlled environment in whichsubjects are asked to repeatedly perform specificmovements, we have the suspicion that our methodwill be effective in less controlled environmentswhen given a larger training set that includes a widerrange of movements.

Table 2. Movements for experimental analysis.

No. Description1 Stand to sit (annchair)2 Sit to stand (annchair)3 Stand to sit (dining chair)4 Sit to stand (dining chair)5 Sit to lie6 Lie to sit7 Bend and grasp from ground with right hand8 Bend and grasp from ground with left hand9 Bend and grasp from coffee table with right hand10 Bend and grasp from coffee table with left hand11 Tum clockwise 90 degrees and return12 Tum counterclockwise 90 degrees and return13 Look back clockwise and return to the initial position14 Look back counterclockwise and return to the initial position15 Kneeling, right leg first16 Kneeling, left leg first17 Move forward one step, right leg18 Move forward one step, left leg19 Reach up to a cabinet with right hand20 Reach up to a cabinet with left hand21 Reach up to a cabinet with both hands22 Grasp an object with right hand, tum clockwise and release23 Grasp an object with two hands, tum clockwise and release24 Tum clockwise 360 degrees25 Tum counterclockwise 360 degrees26 Jumping27 Going up stairs, right leg first (one stair)28 Going down stairs, right leg first (one stair)29 Rising from kneeling, right leg30 Rising from kneeling, left leg

For each of the five data streams we receive fromeach sensor node (x, y, z acceleration and x, yangular velocity), we extracted the five features listedin Section 3. We used 50% of the data as a trainingset and 50% as a test set. The training set was usedfor constructing primitives and finding the minimum­cost ordering of the nodes, while the test set was usedfor verifying the accuracy of our classificationtechnique.

66

Table 3. Experiment Statistics

Node# Description #P Ordering #ActionsL!(Ak )

A,EA

1 Waist 32 Neck 2 403 Right-ann 44 Right-elbow 3 1 0 05 Right-foreann 3 8 3 1316 Right-wrist 2 9 3 1587 Left-ann 38 Left-elbow 39 Left-foreann 3 4 3010 Left-wrist 211 Right-thigh 212 Right-knee 213 Right-shin 4 6 3 5814 Right-ankle 3 10 3 18815 Left-thigh 2 2 0 016 Left-knee 217 Left-shin 4 7 10718 Left-ankle 2 6 18

6.1. Static Construction of Decision Tree

As previously stated, our method constructs adecision tree using an ordering of sensor nodes thatleads to a near-optimal solution for minimum-costaction recognition. First, the feature space is mappedto the initial phonemes using our primitive­construction approach. Next, the silhouette qualitymeasure is applied to select high-quality primitivesfrom among the phonemes. The number of primitivesfor each sensor node is shown in Table 3. Thenumber of primitives per node depends on the abilityof the node to distinguish between differentmovements. A node with only one primitive couldnot distinguish between any movements, whereas anode with as many primitives as there are movementscould distinguish between all movements. Our nodeshave two to four phonemes each.

To verify the effectiveness of the greedy solutionto the Min Cost Identification problem, we used theprimitive representation of data collected from 18nodes. Since there are 30 movements to berecognized from our experiment, the size of the GDSis 435, which is the total number of pairs (Ak,Ak,) to

be distinguished.The ordering obtained by the greedy algorithm is

given in Table 3. The ordering implies that in theworst case, 10 sensor nodes are sufficient to achievea global knowledge of the current event in thesystem. The most informative node is node 4, whichhas the ordering 1, meaning that it should be the firstnode visited for action detection. The value of thecost function associated with this node is zerobecause it alone cannot completely distinguish anymovement from all the others. In other words,collaboration between distributed sensor nodes isrequired to gain a global view of the system. The

6.2. Recognition Accuracy

To show the effectiveness of our motion detectionalgorithm using only the active nodes reported above,we used the linear ordering of nodes we obtained to

remaining nodes that should be visited to recognizeall movements in our system are 15, 18, 9, 2, 13, 17,5, 6, and 14, in that order.

The fifth column in Table 3 shows how manymovements are distinguished as nodes are visited. Atthe third visited node (node 18), six movements areidentified, and at the fourth visited node (node 9),three movements are identified. The next visited node(node 2), supplies the information necessary todistinguish two additional movements from the rest.The last column in Table 3 shows the accumulatedvalue of the cost function.f( A

k). The total cost of the

ordering is 188. Figure 6 shows the nodes required toidentify each movement using the ordering weobtained. Visited nodes are listed along the x-axis,and movements are listed along the y-axis.

7. Conclusions and Future Work

In this paper, we proposed a phonemerepresentation of human behavior for body sensornetworks. This phonological approach produces acompact expression of atomic actions. We used acluster-based technique to generate basic primitivesfor each sensor node. The global state of the systemis determined using the local knowledge given byprimitives at each sensor node and a fast recognitionpolicy. We showed that the problem of determiningthe optimal ordering of nodes for movementmonitoring at the primitive level in our distributedplatform is NP-Complete and presented a fast greedysolution for this problem. Our experimentsdemonstrated the functionality of the proposedframework. We verified that our technique achieves91% accuracy in classifying human actions.Although our linguistic framework is simplified byconstructing and using only primitives, itdemonstrates the potential of a linguistic approach. Inthe future, we plan to investigate the construction ofa full decision tree for dynamic optimization. Wealso plan to investigate the recognition of humanbehavior by combining primitives to form words thatdescribe more complex activities and combiningwords to form sentences that describe generalbehavior.

classify our test set (50% of the collected data). Afterfiltering, segmentation, and feature extraction, eachtest vector was mapped to its correspondingprimitives on the active nodes. Using these primitivesand the decision tree defined by the obtainedordering of active nodes, we achieved an accuracy of91%.

I

I

I I

I

Figure 6. Identification order of movements

302928272625242322212019

018~17

~ 14

i ~;1110987654321

8. References

[1] H. Weiming, T. Tieniu, W. Liang, and S. Maybank, "Asurvey on visual surveillance of object motion andbehaviors," Systems, Man and Cybernetics, Part C:Applications and Reviews, IEEE Transactions on, vol.34,pp.334-352,2004.

[2] T. Teixeira, D. Lymberopoulos, E. Culurciello, Y.Aloimonos, and A. Savvides, "A Lightweight CameraSensor Network Operating on Symbolic Information,"in First Workshop on Distributed Smart Cameras 2006,2006.

[3] D. 1. Patterson, D. Fox, H. Kautz, and M. Philipose,"Fine-grained activity recognition by aggregatingabstract object usage," in Wearable Computers, 2005.Proceedings. Ninth IEEE International Symposium on,2005, pp. 44-51.

[4] M. Philipose, K. P. Fishkin, M. Perkowitz, D. J.Patterson, D. Fox, H. Kautz, and D. Hahnel, "Inferringactivities from interactions with objects," PervasiveComputing, IEEE, vol. 3, pp. 50-57, 2004.

[5] A. Vehkaoja, M. Zakrzewski, J. Lekkala, S. Iyengar, R.Bajcsy, S. Glaser, S. Sastry, and R. Jafari, "A resourceoptimized physical movement monitoring scheme forenvironmental and on-body sensor networks," inProceedings ofthe 1st ACM SIGMOBILE internationalworkshop on Systems and networking support forhealthcare and assisted living environments San Juan,Puerto Rico 2007 pp. 64-66

[6] R. Jafari, W. Li5, R. Bajcsy, S. Glaser, and S. Sastry5,"Physical Activity Monitoring for Assisted Living atHome " in 4th International Workshop on Wearable

67

and Implantable Body Sensor Networks (BSN 2007),Aachen University, Germany 2007, pp. 213-219.

[7] H. Ghasemzadeh, E. Guenterberg, K. Gilani, and R.Jafari, "Action coverage formulation for poweroptimization in body sensor networks," in DesignAutomation Conference, 2008. ASPDAC 2008. Asiaand South Pacific, 2008, pp. 446-451.

[8] G. Guerra-Filho, C. Fermuller, and Y. Aloimonos,"Discovering a language for human activity," in AAAI2005 Fall Symposium on Anticipatory CognitiveEmbodied Systems, Washington,D.C., 2005.

[9] G. Guerra-Filbo and Y. Aloimonous, "A Language forHuman Action," Computer, pp. 61-69, May 2007.

[10] L. R. Rabiner, "A tutorial on hidden Markov modelsand selected applications in speech recognition,"Proceedings ofthe IEEE, vol. 77, pp. 257-286, 1989.

[11] 1. K. Aggarwal and P. Sangho, "Human motion:modeling and recognition of actions and interactions,"in 3D Data Processing, Visualization andTransmission, 2004. 3DPVT 2004. Proceedings. 2ndInternational Symposium on, 2004, pp. 640-647.

[12] N. Naotake, Y. Junichi, and K. Takao, "HumanWalking Motion Synthesis with Desired Pace andStride Length Based on HSMM," IEICE - Trans. Inf.Syst., vol. E88-D, pp. 2492-2499, 2005.

[13] L. Reng, T. Moeslund, and E. Granum, "FindingMotion Primitives in Human Body Gestures," inGesture in Human-Computer Interaction andSimulation, 2006, pp. 133-144.

[14] Y. Jin and B. Prabhakaran, "Latent Semantic Mappingfor 3D Human Motion Document Retrieval," in The31st Annual International ACM SIGIR Conference,Singapore, 2008.

[15] A. Ogale, A. Karapurkar, and Y. Aloimonos, "View­invariant modeling and recognition of human actionsusing grammars," in Workshop on Dynamical Vision atICCV'05, 2005.

[16] C. S. Wetherell, "Probabilistic Languages: A Reviewand Some Open Questions" ACM Comput. Surv. , vol.12 pp. 361-379, 1980

[17] M. Muller, T. Rder, and M. Clausen, "Efficientcontent-based retrieval of motion capture data," ACMTrans. Graph., vol. 24, pp. 677-685, 2005.

[18] L. Feng, Z. Yueting, W. Fei, and P. Yunhe, "3D motionretrieval with motion index tree," Comput. Vis. ImageUnderst., vol. 92, pp. 265-284,2003.

[19] J. Xiang and H. Zhu, "Motion Retrieval Based onTemporal-Spatial Features by Decision Tree," inDigital Human Modeling, 2007, pp. 224-233.

[20] O. C. Jenkins and M. 1. Mataric, "Deriving action andbehavior primitives from human motion data," inIntelligent Robots and System, 2002. IEEE/RSJInternational Conference on, 2002, pp. 2551-2556vol.3.

[21] D. Lymberopoulos, A. S. Ogale, A. Savvides, and Y.Aloimonos, "A sensory grammar for inferringbehaviors in sensor networks," in InformationProcessing in Sensor Networks, 2006. IPSN 2006. TheFifth International Conference on, 2006, pp. 251-259.

68

[22] D. Lymberopoulos, A. Barton Sweeney, T. Teixeira,and A. Savvides, "An easy-to-program system forparsing human activities," ENALAB 2006.

[23] 1. Barnes and R. Jafari, "Locomotion Monitoring UsingBody Sensor Networks," in The 1st InternationalConference on PErvasive Technologies Related toAssistive Environments (PETRA), Athens, Greece,2008.

[24] B. Logan, J. Healey, M. Philipose, E. Tapia, and S.Intille, "A Long-Term Evaluation of SensingModalities for Activity Recognition," in UbiComp2007: Ubiquitous Computing, 2007, pp. 483-500.

[25] S. Jiang, Y. Xue, Y. Cao, S. Iyengar, R. Bajcsy, P.Kuryloski, S. Wicker, and R. Jafari, "CareNet: AnIntegrated Wireless Sensor Networking Environmentfor Remote Healthcare," in Third InternationalConference on Body Area Networks, Tempe, Arizona,USA 2008.

[26] "Quiet care systems, Available:\V\\l\v.guietcaresystems.com."

[27] "E-Neighbor system from Healthsense. Available:\v\V\v.healthsense.com."

[28] R. Marin-Perianu, M. Marin-Perianu, P. Havinga, andH. Scholten, "Movement-Based Group Awareness withWireless Sensor Networks," in Pervasive Computing,2007, pp. 298-315.

[29] S. C. Johnson, "Hierarchical Clustering Schemes,"Psychometrika, vol. 2, pp. 241-254 1967.

[30] 1. B. MacQueen, "Some Methods for classification andAnalysis of Multivariate Observations," in 5th BerkeleySymposium on Mathematical Statistics and Probability,Berkeley, 1967, pp. 281-297

[31] P. Berkhin, "Survey Of Clustering Data MiningTechniques," in Accrue Software, San Jose, CA, 2002.

[32] D. Steinley and M. 1. Brusco, "Initializing K-meansBatch Clustering: A Critical Evaluation of SeveralTechniques" J. Classif., vol. 24, pp. 99-121 2007.

[33] S. Lamrous and M. Taileb, "Divisive Hierarchical K­Means," in Computational Intelligence for Modelling,Control and Automation, 2006 and InternationalConference on Intelligent Agents, Web Technologiesand Internet Commerce, International Conference on,2006, pp. 18-18.

[34] P. Rousseeuw, "Silhouettes: a graphical aid to theinterpretation and validation of cluster analysis " J.Comput. Appl. Math. , vol. 20, pp. 53-65 1987

[35] S. Detlef, "Minimization of decision trees is hard toapproximate," J. Comput. Syst. Sci., vol. 74, pp. 394­403,2008.

[36] U. Feige and P. Tetali, "Approximating Min Sum SetCover" Algorithmica vol. 40 pp. 219-234 2004.