online-learning and attention-based approach to obstacle ...weng/research/jirs.pdf ·...

21
Online-learning and Attention-based Approach to Obstacle Avoidance 1 Online-learning and Attention-based Approach to Obstacle Avoidance Using a Range Finder Shuqing Zeng and Juyang Weng ? Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824-1226, USA zengshuq, [email protected] abstract The problem of developing local reactive obstacle-avoidance behaviors by a mobile robot through online real-time learning is considered. The robot operated in an unknown bounded 2-D environment populated by static or moving obstacles (with slow speeds) of arbitrary shape. The sensory perception was based on a laser range finder. A learning-based approach to the problem is presented. To greatly reduce the number of training samples needed, an attentional mechanism was used. An efficient, real-time implementation of the approach was tested, demonstrating smooth obstacle-avoidance behaviors in a corridor with a crowd of moving students as well as static obstacles. Key words: Humanoid, Reactive Collision Avoidance, Range Finder, Perception ? Please send correspondence to Dr. Juyang Weng at the Department of Computer Science and Engineering, 3115 Engineering Building, East Lansing, MI 48824-1126, USA

Upload: truongtram

Post on 10-Mar-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Online-learning and Attention-based Approach to Obstacle Avoidance 1

Online-learning and Attention-based Approach to Obstacle AvoidanceUsing a Range Finder

Shuqing Zeng and Juyang Weng?

Department of Computer Science and EngineeringMichigan State UniversityEast Lansing, MI 48824-1226, USAzengshuq, [email protected]

abstract

The problem of developing local reactive obstacle-avoidance behaviors by a mobile robot through onlinereal-time learning is considered. The robot operated in an unknown bounded 2-D environment populatedby static or moving obstacles (with slow speeds) of arbitrary shape. The sensory perception was based ona laser range finder. A learning-based approach to the problem is presented. To greatly reduce the numberof training samples needed, an attentional mechanism was used. An efficient, real-time implementation ofthe approach was tested, demonstrating smooth obstacle-avoidance behaviors in a corridor with a crowd ofmoving students as well as static obstacles.

Key words: Humanoid, Reactive Collision Avoidance, Range Finder, Perception

? Please send correspondence to Dr. Juyang Weng at the Department of Computer Science and Engineering, 3115Engineering Building, East Lansing, MI 48824-1126, USA

2 Shuqing Zeng and Juyang Weng

1 Introduction

The problem of range-based obstacle avoidance has been studied by many researchers. Various reported

methods fall into two categories: path planning approaches and local reactive approaches. Path planning ap-

proaches are conducted off-line in a known environment. In [15, 9], an artificial potential field is used to find a

collision-free path in a 3-D space. Such methods can handle long-term path planning, but are computationally

expensive for real-time obstacle avoidance in a dynamic environment (moving obstacles).

The local reactive approaches are efficient in unknown or partially unknown dynamic environments since

they reduce the problem’s complexity by computing short-term actions based on current local sensing. In the

vector field histogram method [5], a 1-D polar histogram representation is constructed to model the environ-

ment. The method calculates steering and speed by two separate steps. The curvature-velocity method [21]

and dynamic window (DW) approach [10] formulate obstacle avoidance as a constrained optimization prob-

lem in a 2-D velocity space. These methods assume that the robot moves in circular paths. Obstacles and the

robot’s dynamics are considered by restricting the search space to a set of admissible velocities. Although

they are suited for high velocities (e.g., 95cm/s), the local minima problem exists [2]. The DW approach is

extended to integrate with a map of the environment [11, 23]. More extensive goals include localization and

map building [25, 24, 4].

A major challenge of scene-based behavior generation is the complexity of the scene. Human expert

knowledge has been used to design rules that produce behaviors using pre-specified features. In [16], a fuzzy

logic control approaches is used to incorporate human expert knowledge for realizing obstacle avoidance

behaviors. One difficulty of a pure fuzzy approach is to obtain the fuzzy rules and membership functions. A

neuro-fuzzy approach [28, 17, 1] is introduced with the purpose of generating fuzzy rules and membership

functions automatically. Their training processes are usually conducted using a human supplied training data

set (e.g., trajectories) in an off-line fashion, and the dimensionality of the input variable (features) must be

low in order to be manageable.

In contrast with the above efforts that concentrate on behavior generation without requiring sophisticated

perception, another body of research deals with perception-guided behaviors. Studies for perception-guided

behaviors have had a long history. Usually human programmers define features (e.g., edges, colors, tones, etc.)

or environmental models [13, 6]. An important direction of research, the appearance-based method [19, 8],

aims at reducing or avoiding those human-defined features for better adaptation of unknown scenes. The need

to process high dimensional sensory vector inputs in appearance-based methods brings out a sharp difference

between behavior modeling and perceptual modeling: the effectors of a robot are known with the former, but

the sensory space is extremely complex and unknown with the latter and, therefore, very challenging. In order

to deal with the highly demanding requirement for computational resources, several attentional mechanisms

are proposed (e.g., the state-based attentional model [8]).

In this paper, an approach is presented for the development of a local obstacle avoidance behavior by

a mobile humanoid through online real-time incremental learning. A major distinction of the approach is

Online-learning and Attention-based Approach to Obstacle Avoidance 3

that the appearance-based approach for range-map learning was used, rather than an environment-dependent

algorithm (e.g., obstacle segmentation and classification) for obstacle avoidance. The new appearance-based

learning method was able to distinguish small range-map differences that are critical in altering the navigation

behavior (e.g., passable and not passable sections). In principle, the appearance-based method is complete in

the sense that it is able to learn any complex function that maps from the range-map space to the behavior

space. This also implies that the number of training samples that are required to approximate the complex

function is very large. To reduce the number of training samples required, the attentional selective mechanism

was introduced, which dynamically selected regions in near approximity for analysis and treated other regions

as negligible for the purpose of local object avoidance. Further, online training was used so that the trainer

could dynamically choose the training scenarios according to the system’s current strengths and weakness,

further reducing the time and samples of training.

The remainder of this paper is organized as follows. Section 2 presents the proposed algorithm. The

robotic platform and the real-time online learning procedure are described in Section 3. The results of simu-

lation and real robot experiments are reported in Section 4. Discussions and concluding remarks are given in

Section 5.

2 Approach

2.1 Problem statement

The obstacle avoidance behavior is considered as a reactive decision process, which converts the current range

map to action. The robot does not sense and store scene configuration (e.g., global map of the environment)

nor the global robot position. In other words, it is assumed that the current range map contains all of the

information that is sufficient for robots to derive the next motor control signal. In fact, it uses the real world

as its major representation. In the work presented here, the robot’s only goal is to move safely according to

the scene: It has no target location. Such a navigation system is useful for applications where a human guides

global motion but local motion is autonomous2.

The range scanner observesr(t) ∈ R ⊂ Rn at timet, whereR denotes the space of all possible range

images in a specific environment.r(t) is a vector of distance, whoseith componentri denotes the distance

to the nearest obstacle at a specific angle. The vectorv(t) gives the current velocities of the vehicle at time

t, which are measured by the vehicle’s encoders. The vectorsr(t) andv(t) are given by two sensors whose

dimension and scale are different, thus, a normalization procedure is needed when merging the two vectors

together as the integrated observation vector.

Definition 1 The vectorx(t) denotes the system’s observation of the environment at timet. It is defined as:

2 Such applications include, but not limited to, web-based telematics, semi-autonomous robots in battle field, and roving

on remote planets where local intelligence of a robot is needed for reducing remote operators’ intervention level.

4 Shuqing Zeng and Juyang Weng

IHDR

Mobile

Robot

Attention

r(t)

y(t+1)

v(t+1)

z(t)

x(t)

Dv(t)

Fig. 1. The control architecture of the range-based navigation. “Attention” denotes an attentional module.rp(t) denotes

the retrieved range prototype.

x(t) = (r(t)− r

wr,

v(t)− vwv

) ∈ X , (1)

wherewr andwv are two positive numbers that denote the scatter measurements3 of the variatesr andv,

respectively; andr and v are sample means.

The action vectory(t) ∈ Y consists of control signals sent to all of the effectors at timet, whereYdenotes the sample space action vectors.

The obstacle avoidance behavior can be formulated as a mappingf : X 7→ Y, i.e., the primed action

y(t + 1) (the signal sent to the motors) is a function ofx(t):

y(t + 1) = f(x(t)). (2)

Fig. 1 shows the coarse control architecture of the presented approach. An Incremental Hierarchical Dis-

criminating Regression (IHDR) [14] tree is generated to estimate the control signaly from x. The current

input range imager(t) and the vehicle’s velocitiesv(t) are used for deriving the next control signaly(t + 1).

An attentional module is added to extract partial views from a whole view of a scan.

2.2 Attentional mechanism

Direct use of an image as a long vector for statistical feature derivation and learning is called the appearance-

based approach in the computer vision community. Usually the appearance-based approach utilizes mono-

lithic views where the entire range data (or visual image) frame is treated as a single entity. However, the

importance of signal components is not uniform. There are cases where the appearances of two scenes are

quite similar globally, but different actions are required. Further, similar actions are required in situations

where the global appearances of two scenes look quite different. Both cases indicate that there are critical

areas where differences crucially determine the action needed. This necessitates an attentional mechanism to

select such critical areas.3 For simplicity, we assume the covariance matrix (Σu) of a variateu is equal toσ2I, whereI denotes an identical

matrix. Thus, its corresponding scatter iswu =√

trΣu =√

nσ, wheren denotes the dimension of the variateu (see

[18] page 13).σ can be estimated from samples.

Online-learning and Attention-based Approach to Obstacle Avoidance 5

(a1) (b1) (c1)

T T T

(a2) (b2) (c2)

T T

r

T

r

(a3) (b3) (c3)

Fig. 2. (a1), (b1) and (c1) show three scenarios along the robot’s path. The solid small circles denote the robot with a short

line segment at the front indicating orientation. Thick lines mark the walls along the corridor.T andr denote the threshold

and the mean, respectively. (a2), (b2) and (c2) are range images taken by the robot at the three scenarios, respectively.

(a3), (b3) and (c3) are the corresponding images after passing the attentional module. The diagrams in the lower two rows

use logarithmic scales for the Y-axis. The distance between (a3) and (b3) becomes larger while the distance between (b3)

and (c3) becomes smaller.

For example, in Fig. 2, (a1), (b1) and (c1) show three scenarios along the robot’s path. Range images (a2)

and (b2) are quite similar globally judged from the entire image (except the critical area on the left side). In

the context of an appearance-based method, this means that the distance (e.g., Euclidean one) between the

two is small. They require different actions: turning right for (a1), and going straight or turning left for (b1).

In another case, range images (b2) and (c2) are very different globally, but their desired actions are similar:

going straight or turning left. Thus, it is difficult to discriminate the three cases correctly by using a distance

metric defined on the entire image. But, if we look at the left subregions in (a2), (b2) and (c2) of Fig. 2, we

can see that the similarities and differences are clear. Without the capability to attend to this critical region,

the learning system requires significantly more training samples when complex scenarios are considered.

In the above three cases, the critical area is the input component where range readings are very small.

This is true, in general, because near obstacles determine heading more than, and often take precedence over,

far-away objects. As shown later, this can be accomplished by an attentional mechanism.

First the scalar attentional effector is defined.

6 Shuqing Zeng and Juyang Weng

g

f

r(t) z(t)

Fig. 3. The attentional signal generatorf and attentional executorg. r(t) andz(t) denote the input and output, respec-

tively.

Definition 2 The operation of the attentional effectora(t) for input r(t) and outputz(t) is defined by:

z(t) = g(r(t), a(t)) =

r(t) a(t) = 1,

r a(t) = 0,(3)

wherer denotes the sample mean of the raw signalr(t).

For the intended application, it is desirable to havea(t) behave in the following way. First, when all input

components have large values, the attentional selection is in its default mode, turning on all components.

Second, when there are nearby objects, the attentional selection activates only nearby objects which are

critical for object avoidance while far-away objects are replaced by their mean readings. This attentional

action can be realized by two programmed functionsg andf :

zi(t) = g(ri(t), ai(t)) (4)

and

ai(t) = f(r(t)) =

1 if ri < T or ∀j rj(t) ≥ T,

0 otherwise,(5)

whereT is a threshold,i = 1, 2, ..., n, andr(t) = (r1(t), r2(t), ..., rn(t)) denotes the input vector.z(t) =

(z1(t), z2(t), ..., zn(t)) is written as the output of the attentional action anda(t) = (a1(t), a2(t), ..., an(t)) as

the attention vector. The above functionf will suppress some far-away components (ai(t) = 0) if there are

objects closer thanT . If all readings are far-away, it is not desirable to turn the attention off completely and,

therefore, all attentional effectors are left on(∀j, aj(t) = 1). This operation is illustrated in Fig. 3.

In practice, this raw attentional vectora(t) is smoothed by convoluting with a flat window, as

a′i(t) = d 111

i+5∑

j=i−5

aj(t)e,

whered·e denotes rounding to the nearest integer. This smoothing serves to eliminate point-wise noise and to

provide a neighborhood influence to the output attentional vector.

In Fig. 2, the readings of the left part of diagrams (b2) and (c2) are smaller thanT . Thus, only the left part

passes through the attentional mechanism without change whereas other parts are suppressed by being set to

the mean, as shown in (b3) and (c3) of Fig. 2. This is needed for the robot to pass through tight areas, where a

Online-learning and Attention-based Approach to Obstacle Avoidance 7

small change in the width of a gap determines whether the robot can pass. The attentional mechanism enables

the robot to focus on critical areas (i.e., parts with close range) and, thus, the learned behaviors sensitively

depend on the attended part of the range map. All range readings are attended when there is no nearby object

as shown by (a2) and (a3) of Fig. 2.

In Fig. 1, the learner IHDR is a hierarchically organized high-dimensional regression algorithm. In or-

der to develop stable collision-avoidance behaviors, the robot needs sufficient training samples. Here, it is

shown that the attentional mechanism greatly reduces the number of necessary training samples when there

are objects close to the robot. The following definition allows for the quantitative analysis of the proposed

attentional mechanism.

Definition 3 Consider a scanner operating in a environment, which can be approximated by piecewise 3-D

planes (for simplicity analysis only). Each range mapr can be approximated by a polygon withh segments

(as in Fig. 4 (a)).P = (p1, p2, ..., ph) as the map, where theith end pointpi is denoted by its polar co-

ordinate (ri, αi). Without lost of generality, the angle coordinates are sorted:α1 < α2 < · · · < αh.

P ′ = (p′1, p′2, ..., p

′h) is the post-attentional map, wherep′i = (zi, αi) whose rangezi has been defined

earlier.

Remark. The largerh is, the closer the approximation ofr in general. In a particular case, the polygon

representation becomes a regular grid whenh = n.

The post-attentional approximationP ′ is written as the function ofP , i.e.,

P ′ = g∗(P ). (6)

The attentional mechanism, defined in Eq. (6), is not a one-to-one mapping, as shown in Fig. 4. The post-

attentional mapP ′ is the representative for a set of pre-attentional maps besidesP if condition C: ∃pj pj ∈P ∧ lj < T is satisfied. This set is denoted byR(P ′), i.e.R(P ′) ≡ {P |g∗(P ) = P ′}. The following theorem

gives a lower bound of the average size ofR(P ′) when there are objects within the distance T of the robot.

Theorem 1 Let ∆ andrm denote, respectively, the range resolution and maximum distance of each radial

line. If theith end pointpi’s radial lengthri is a random variable with a uniform distribution in the sample

space{0,∆, 2∆, ..., rm}. Then the average size of the setR(P ′) conditioned onC is:

EP ′{R(P ′)|C} >(1− p)hqh−1

1− ph, (7)

whereq = (rm − T )/∆, p = (rm − T )/rm, andEP ′{·|C} denotes the expectation on conditionC.

The proof of Theorem 1 is relegated to Appendix A. Table 1 shows the lower bound of the size due to

attention at two typical parametric settings. It can be observed that the reduction is large. Of course the size

of remaining space to learn is also large, the ratio of space reduced over original space is roughlyp.

8 Shuqing Zeng and Juyang Weng

Parameters Reduced size

rm = 50m, ∆ = 0.2m, T = 2m, andh = 10 9.51× 1020

rm = 10m, ∆ = 0.2m, T = 2m, andh = 15 7.52× 1022

Table 1.The lower bound of the reduction ratio at several parametric settings.

Laser

(r',α )i i

P1h

P2

P

Pi

T

Laser

(r ,α )i i

3 P2

P

Pi

r-T

P1hP

(a) (b)

Fig. 4. (a) shows that the pre-attentional mapr is approximated by a polygonP . P is specified byh end points:

p1, p2, ..., ph. Thekth end point is denoted by(lk, αk). (b) shows the post-attentional approximationP ′, after the at-

tentional functiong∗. Those end-points whose distances are larger thanT are set tor. The half-circle with symbolT

shows the area the robot pays attention to. We can see numerous pre-attentional approximations map to a single post-

attentional approximationP ′. It is clear that the points outside the half-circle of (a) have the freedom to change positions

without affecting the shape of (b).

2.3 IHDR: Memory associative mapping engine

In the context of the appearance-based approach, the mapping (e.g.,f in Eq. (2)) from high dimensional

sensory inputs into action remains a nontrivial problem in machine learning, particularly in incremental and

real-time formulations. By surveying the literature of the function approximation with high dimensional input

data, one can identify two classes of approaches: (1) approaches that fit global models, typically by approx-

imating a predefined parametrical model using a pre-collected training data set, and (2) approaches that fit

local models, usually by using temporal-spatially localized simple (therefore computationally efficient) mod-

els and growing the complexity automatically (e.g., the number of local models and the hierarchical structure

of local models) to account for the nonlinearity and the complexity of the problem.

The literature in the function approximation area concentrates primarily on the methods of fitting global

models. For example, Multiple Layer Perceptron (MLP) [3], Radial Basis Function (RBF) [7], and Support

Vector Machine based Kernel Regression methods (SVMKR) [22] employ global optimization criteria. In

spite of their theoretic background, they are not suitable for online real-time learning in high-dimensional

spaces. First, they require a priori task-specified knowledge to select the right topological structure and pa-

rameters. Their convergent properties are sensitive to the initialization biases. Second, some of these methods

are designed primarily for batch data learning and are not easy to adapt for incrementally arrived data. Third,

their fixed network topological structures eliminate the possibility to learn increasingly complicated scenes.

Online-learning and Attention-based Approach to Obstacle Avoidance 9

X space Y space

Virtual label

Virtual label

Virtual label

The plane for thediscriminatingfeature subspace

1

12

2

3

34

4

5

5 6

6

7

7

88

9

9

10

1011

11

12

12

13

13

14

14

15

15

16

16

17

17

18

18

19

19

20

20

21

2122

22

23

23

Fig. 5. Y-clusters in spaceY and the corresponding X-clusters in spaceX . Each sample is indicated by a number which

denotes the order of arrival. The first and the second order statistic are updated for each cluster. The first statistic gives

the position of the cluster, while the second statistics gives the size, shape and orientation of each cluster.

In contrast to the global learning methods described above, local model learning approaches are more

suited for incremental and real-time learning, especially in the problem of a robot in an uncertain scene.

Since there is limited knowledge about the scene, the robot itself needs to develop the representation of the

scene in a generative and data driven fashion. Typically the local analysis (e.g., IHDR [14] and LWPR [26])

generates local models to sample the high-dimensional spaceX × Y sparsely based on the presence of data

points in a Vector Quantization manner.

Two major differences exist between IHDR and LWPR. First, IHDR organizes its local models in a

hierarchical way, as shown in Fig. 6, while LWPR is a flat model. IHDR’s tree structure recursively excludes

many far-away local models from consideration (e.g., an input face does not search among nonfaces), thus,

the time to retrieve and update the tree for each newly arrived data pointx is O(log(n)), wheren is the size of

the tree or the number of local models. This extremely low time complexity is essential for real-time online

learning with a very large memory. Second, IHDR derives automatically discriminating feature subspaces

in a coarse-to-fine manner from input spaceX in order to generate a decision-tree architecture for realizing

self-organization.

The formulation of IHDR is not within the scope of this paper (see [14] for a complete presentation). A

version used by this paper is outlined in Procedures 1, 2 and 3.

Two kinds of nodes exist in IHDR: internal nodes (e.g., the root node) and leaf nodes. Each internal node

hasq children, and each child is associated with a discriminating function:

li(x) =12(x− ci)T W−1

i (x− ci) +12

ln(|Wi|), (8)

whereWi andci denote the distance metric matrix and the x-center of the child nodei, respectively, fori =

1, 2, ..., q. Meanwhile, a leaf node, sayc, only keep a set of prototypesPc = {(x1, y1), (x2, y2), ..., (xnc, ync

)}.The decision boundaries for internal nodes are fixed, and a leaf node may develop into an internal node by

spawning (see Procedure 1, step 8) once enough prototypes are received.

10 Shuqing Zeng and Juyang Weng

Input

+

+

+

+

+

+

++

+

+

+

+ +

++

+

++

+

+

+

+

+

++ +

++

+

++

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

++ +

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

++

+

+

+

+

+

+

+++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

++

+

+

+

++

+

+

+

+

+

++ +

++

+ ++

+

+

+

++

+

+

++

+

+

+

+

+

+

++

+

+

+

+

++ +

++

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

++

+

+ +

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+ +

+

++

+

+ +

+

+

+ +

+

+

+

+

++

++

+

+

++

+

+

+

+ +

+

+

+

+

+ +

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

++

+

+

+

+

++

+

+

+

+

+ +

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++ +

+

+

++

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

++

+

+

+

+

++

+

+

+

++

+

+

+

++

root

Fig. 6. The autonomously developed IHDR tree. Each node corresponds to a local model and covers a certain region of

the input space. The higher level node covers a larger region, and may partition into several smaller regions. Each node

has its own Most Discriminating Feature (MDF) subspaceD which decides its child models’ activation level.

The need for learning the matricesWi, i = 1, 2, ..., q in Eq. (8) and inverting them makes it impossible to

defineWi (for i = 1, 2, ..., q) directly on the high dimensional spaceX . Given the empirical observation that

the true intrinsic dimensionality of high dimensional data is often very low [20], it is possible to develop the

most discriminating feature (MDF) subspaceD to avoid degeneracy and other numerical problems caused by

redundancy and irrelevant dimensions of the input data (see Figs. 5 and 6).

Procedure 1 Add pattern . Given a labeled sample(x, y), update the IHDR treeT .

1: Find the best matched leaf nodec by calling Procedure 3.

2: Add the training sample (x, y) to c’s prototype setPc.

3: Find the closest cluster to thex andy vectors by computing:

m = arg min1≤i≤q

(wx‖x− ci‖

σx+ wy

‖y− yi‖σy

), (9)

wherewx andwy are two positive weights that sum to 1:wx + wy = 1; σx andσy denote incrementally

estimated average lengths ofx andy vectors, respectively; andci and yi denote, respectively, x-center

and y-center of theith cluster of the nodec.

4: Let µ(n) denote the amnesic function that controls the updating rate, depending onn, in such a way as:

µ(n) =

0 if n ≤ n1 ,

b(n− n1)/(n2 − n1) if n1 < n ≤ n2 ,

b + (n− n2)/d if n2 < n,

(10)

wheren denotes the number of visits to the clusterm (see Eq. 9) in the nodec, andb, n1, n2 andd are

parameters. We update the x-centercm and y-centerym of the clusterm in the nodec, but leave other

clusters unchanged:

Online-learning and Attention-based Approach to Obstacle Avoidance 11

cm(n) = n−1−µ(n)n cm(n− 1) + 1+µ(n)

n x,

ym(n) = n−1−µ(n)n ym(n− 1) + 1+µ(n)

n y,(11)

wherecm(n− 1) andym(n− 1) are the old estimations (before update) of x-center and y-center of the

mth cluster in the nodec, respectively, whilecm(n) and ym(n) are the new estimations (after update).

Readers should note that the weightswx andwy control the influence ofx andy vectors on the clustering

algorithm. For example, whenwx = 0, Eq. (11) is equivalent to the y label clustering algorithm used in

[14].

5: Compute the sample mean, sayc, of all theq clusters in the nodec. Let the current x-centers of theq

clusters by{ci|ci ∈ X , i = 1, 2, ..., q} and the number of samples in the clusteri beni. Then,

c =q∑

i=1

nici/

q∑

i=1

ni.

6: Call the Gram-Schmidt Orthogonalization (GSO) procedure (see Appendix A [14]) using{ci − c|i =

1, 2, ..., q} as input. Then calculate the projection matrixM of subspaceD as:

M = [b1, b2, ..., bq−1], (12)

whereb1, b2, ..., bq−1 denote the orthnormal basis vectors derived by the GSO procedure.

7: UpdateWm, the distance matrix of themth cluster in the nodec, as:

be = min{n− 1, ns}bm = min{max{2(n− q)/q, 0}, ns}bg = 2(n− q)/q2

we = be/b, wm = bm/b, wg = bg/b

b = be + bm + bg

A = M ′T M

B = MT (x− cm)(x− cm)T M

Γm(n) = n−1−µ(n)n AT Γm(n− 1)A + 1+µ(n)

n B

S =∑q

i=1 niΓi(n)/∑q

i=1 ni

Wm = weρ2I + wmS + wgΓm(n)

whereµ(n) is the amnesic function defined in Eq. (10),ρ andns are empirical defined parameters,M ′

is the old estimation of the projection matrix (before update using Eq. (12)), andΓm(n− 1) andΓm(n)

denote, respectively, the old and new estimations of the covariance matrix of the clusterm.

8: if the sizePc is larger thannf , a predefined parameter,then

9: Mark c as an internal node. Createq nodes asc’s children, and reassign each prototypexi in Pc to the

child k, based on discriminating functions defined in Eq. (8). This is to compute:

k = arg min1≤i≤q(li(xi)).

12 Shuqing Zeng and Juyang Weng

10: end if

Procedure 2 Retrieval. Given an IHDR treeT and an input vectorx, return the corresponding estimated

outputy.

1: By calling Procedure 3, the the best matched leaf nodec is obtained.

2: Computey by using the nearest-neighbor decision rule in the setPc = {(x1, y1), (x2, y2), ..., (xnc, ync

)}:

y = ym, (13)

where

m = arg min1≤i≤nc

‖ xi − x ‖ .

3: Returny.

Procedure 3 Select leaf node. Given an IHDR treeT and a sample(x, y), wherey is either given or not

given. Output: the best matched leaf nodec.

1: c ← the root node ofT .

2: for c is an internal nodedo

3: c ← themth child of the nodec, wherem = arg min1≤i≤q(li(x)) andli(x), i = 1, 2, ..., q are defined

in Eq. (8).

4: end for

5: Return nodec.

3 The robotic system and online training procedure

The tests were performed on a humanoid robot, called Dav (see. Fig. 7), built in the Embodied Intelligence

Laboratory at Michigan State University. For details about the Dav robot, please refer to [12] and [27].

Fig. 8 describes the block diagram of the mobile drive-base. Dav is equipped with a desktop with four Pen-

tium III Xeon processors and large memory. The vehicle’s low-level servo control consists of two Actuator

Control Units (ACUs) inter-networked by a CAN bus. A CAN-to-Ethernet bridge is designed and imple-

mented by using a PC/104 embedded computer system. This architecture provides a client/server model for

realizing a robot interface. Acting as a client, the high-level control program sends UPD packets to the server,

which executes low-level device control.

Mounted on the front of Dav, the laser scanner (SICK PLS) tilts down3.8◦ for possible low objects.

The local vehicle coordinate system and control variables are depicted in Fig. 9. During training, the control

variables(θ, d) are given interactively by the position of the mouse pointerP through a GUI interface. Once

the trainer clicks the mouse button, the following equations are used to compute the imposed (taught) action

y = (v, ω):

Online-learning and Attention-based Approach to Obstacle Avoidance 13

Fig. 7.Dav: a mobile humanoid robot

TouCAN

MPC555

Microcontroller

TouCAN

MPC555

Microcontroller

ACU ACU

Wheel 1 Wheel 2 Wheel 3 Wheel 4

CAN Bus

Ethernet

PC104 Computer

CAN controller

LCD

Keyboard/Mouse

Att

Main Computer

Ethernet

IHDR

CAN-to-Ethernet

Bridge

Laser scanner

Fig. 8. A block diagram of the control system of the drive-base. It is a two-level architecture: the low-level servo control

is handled by ACUs while the high-level learning algorithms are hosted in the main computer. A client/server model is

used to implement the robot interface.

14 Shuqing Zeng and Juyang Weng

L

W

Robot

Area I

Area II

d

θ

Mouse

Pointer

X

YP

laser map

Fig. 9. The local coordinate system and control variables. Once the mouse buttons are clicked, the position of the mouse

pointer(θ, d) gives the imposed action, whereθ denotes the desired steering angles, andd controls the speed of the base.

ω = −Kp(π/2− θ)

v =

Kvd P ∈ Area I,

0 P ∈ Area II,

(14)

whereKp andKv are two predetermined positive constants. Area II corresponds to rotation about the center

of the robot withv = 0.

Dav’s drive-base has four wheels, and each one is driven by two DC motors. Letq denote the velocity

readings of the encoders of four wheels. Supposevx andvy denote the base’s translation velocities, andω

denotes the angular velocity of the base. By assuming that the wheels do not slip, the kinematics of the base

is:

q = B(vx, vy, ω)T , (15)

whereB, defined in [27], is an8 × 3 matrix. The base velocities(vx, vy, ω)T are not directly available to

learning. It can be estimated from the wheels’ speed vectorq in a least-square-error sense:

v = (vx, vy, ω)T = (BT B)−1BT q. (16)

In this paper, two velocities,(vy, ω), are used as the control vectory. Thus, the IHDR tree learns the following

mapping incrementally:

y(t + 1) = f(z(t), v(t)).

During interactive learning,y is given. Whenevery is not given, IHDR approximatesf while it performs

(testing). At the low level, the controller servoesq based ony.

3.1 Online incremental training

The learning procedure is outlined as follows:

Online-learning and Attention-based Approach to Obstacle Avoidance 15

1. At time framet, grab a new laser mapr(t) and the wheels’ velocityq(t). Use Eq. (16) to calculate the

base’s velocityv(t).

2. Computera(t) based onr(t) using Eq. (5). Apply attentiona(t) to givenz(t) using Eq. (4). Mergez(t)

and the current vehicle’s velocities,v(t), into a single vectorx(t) using Eq. (1).

3. If the mouse button is clicked, Eq. (14) is used to calculate the imposed actiony(t), then go to step 4.

Otherwise go to step 6.

4. Use input-output pair(x(t), y(t)) to train the IHDR tree by calling Procedure 1 as one incremental step.

5. Send the actiony(t) to the controller which givesq(t + 1). Incrementt by 1 and go to step 1.

6. Query the IHDR tree by calling Procedure 2 and get the primed actiony(t + 1). Sendy(t + 1) to the

controller which givesq(t + 1). Incrementt by 1 and go to step 1.

Online incremental training processing does not explicitly have separate training and testing phases. The

learning process is repeated continuously when the actions are imposed.

4 Experimental results

4.1 Simulation experiments

To show the importance of the attentional mechanism, two IHDR trees were trained simultaneously: one used

attention and the other used the raw range image directly. The simulated robot was interactively trained in 16

scenarios as shown in Fig. 10. The robot acquired 1157 samples.

In order to test the generalization capability of the learning system, the leave-one-out test was performed

for both IHDR trees. The 1157 training samples were divided into 10 bins. 9 bins were chosen for training and

one bin was left for testing. This procedure was repeated ten times, one for each choice of test bin. In thejth

(j = 1, 2, ..., 10) test, letyij andyij denote, respectively, the true and estimated outputs, andeij = |yij− yij |denotes the error for theith testing sample. The mean errore and varianceσe of error are defined as:

e =

∑10j=1

∑mi=1 ei

10m

σe =

√√√√ 110m

10∑

j=1

m∑

i=1

(ei − e)2,

wherem denotes the number of testing samples. The results of a leave-one-out test for two IHDR trees are

shown in Fig. 11. Comparing the results, both mean error and variance were decreased about 50 percent by

introducing attention, which indicates that generalization capability was improved.

Secondly, a test was performed on the two IHDR trees in an environment different from the training

scenarios. For each of the 100 trials, a start position in the free space was chosen for the simulated robot.

Table 4.1 shows the rate of success for the two trained IHDR trees. A run was considered to be successful

16 Shuqing Zeng and Juyang Weng

(1) (2) (3) (4)

(5) (6) (7) (8)

(9) (10) (11) (12)

(13) (14) (15) (16)

Fig. 10.The 16 scenarios are used to train the simulated robot. The small solid circles denote the robot, and solid dark

line represents the walls. The small trace circles recorded the online training trajectories. During this training process,

1157 samples were acquired to build the two IHDR trees.

Table 2.The results of tests with random starting positions.

With attention Without attention

Rate of success 0.91 0.63

when the robot could run continually for three minutes without hitting an obstacle. As seen in Table 4.1, the

rate of success increased greatly by introducing attention.

Online-learning and Attention-based Approach to Obstacle Avoidance 17

0

5

10

15

20

25

With Attenion Without attention 0

0.005

0.01

0.015

0.02

0.025

With Attenion Without attention

(a) (b)

Fig. 11.The results of the leave-one-out test. (a) shows the mean and variance of error for the estimation of the robot’s

orientation. The unit of y-axis is in degree. (b) shows the mean and variance of error for the robot’s velocity. The unit of

y-axis is the ratio to the robot’s maximum velocity.

Fig. 12. A 5-minute run by the simulated robot with the attentional module. The solid dark lines denote walls and the

dot lines show the trajectory. Obstacles of irregular shapes are scattered about the corridor. The circle around the robot

denotes the thresholdT used by the attentional mechanism.

In Fig. 12, with attention, the simulated robot successfully performed a continuous 5-minute run. The

robot’s trajectory is shown by small trailing circles. Remember that no environmental map was stored across

the laser maps and the robot had no global position sensors. Fig. 13 shows that, without attention selection,

the robot failed several times in a half minute test run.

4.2 Experiment on the Dav robot

The purpose of the experiment is to show the capability of generalization. The robot was trained incrementally

online using different obstacle configurations (e.g., trash cans, walls, and human legs). Totally 4,655 samples

18 Shuqing Zeng and Juyang Weng

Fig. 13.The result of the test without attention selection. Two collisions occurred, indicated by arrows.

Fig. 14.Dav moved autonomously in a corridor crowded with people.

were used for training. More training samples can be added whenever the operator is not satisfied with the

robot’s performance at a particular setting.

A continuous testing 15-minute run was performed by Dav in the corridor of the Engineering Building

at Michigan State University. The corridor was crowded with high school students, as shown in Fig. 14. Dav

successfully navigated at roughly 20-35 cm/s in this dynamic changing environment without collisions with

moving students. It is worth noting the testing scenarios were not the same as the training scenarios.

5 Discussion and Conclusion

The system may fail when obstacles are outside the field-of-view of the laser scanner. Since the laser scanner

has to be installed at the front, nearby objects on the side are not “visible.” This means that the trainer needs to

“look ahead” when providing desired control signals so that the objects are not too close to the “blind spots.”

In addition, the attentional mechanism assumes that far-away objects are not related to the desired control

signal. This does not work well for long term planning, e.g., the robot may be trapped in a U-shape setting.

This problem can be solved by integrating this local collision avoidance with a path planner, but the latter is

beyond the scope of this paper.

Online-learning and Attention-based Approach to Obstacle Avoidance 19

This paper described a range-based obstacle-avoidance learning system implemented on a mobile hu-

manoid robot. The attention selection mechanism reduces the importance of far-away objects when nearby

objects are present. The power of the learning-based method is to enable the robot to learn a very complex

function between the input range map and the desired behavior; such a function is typically so complex that

it is not possible to write a program to simulate it accurately. Indeed, the complex range-perception based

human action learned byy = f(z, v) is too complex to write a program that does not include learning.

The success of the learning for high dimensional input(z, v) is mainly due to the power of IHDR, and the

real-time speed is due to the logarithmic time complexity of IHDR. The optimal subspace-based Bayesian

generalization enables quasi-optimal interpolation of behaviors from matched learned samples. The online

incremental learning is useful for the trainer to dynamically select scenarios according to the robots weak-

ness (i.e., problem areas) in performance. It is true that training requires extra effort, but it enables navigation

behaviors to adjust according to a wide variety of changes in the range map.

Appendix A

Proof of Theorem 1: Consider the cases where there arek (1 ≤ k ≤ h) end points located within the

half-circleT (see Fig. 4). The number of possible configurations for theh − k end points outside the circle,

denoted bysk, is:

sk = qh−k. (17)

Because the radial distance of theh − k end points have the freedom to choose values from the interval

[T, rm], which hasq discrete values. By definition:

EP ′{R(P ′)|C} =h∑

k=1

skP (k|C),

whereP (k|C) denotes the conditional probability whenk end points are located within the half-circleT . We

can see

P (k|C) = Ckh(1− p)k/(1− ph).

Therefore,

EP ′{R(P ′)} =h∑

k=1

qh−k Ckh(1− p)k

1− ph

=∑h

k=0 Ckhqh−k(1− p)k − qh

1− ph

=(q + (1− p))h − qh

1− ph>

(1− p)hqh−1

1− ph.

In the last step, the inequality,(x + δ)n − xn > nxn−1δ if 0 < δ << x, is used.

20 Shuqing Zeng and Juyang Weng

Acknowledgements

The work is supported in part by a MSU Strategic Partnership Grant, The National Science Foundation under

grant No. IIS 9815191, The DARPA ETO under contract No. DAAN02-98-C-4025, and The DARPA ITO

under grant No. DABT63-99-1-0014.

References

1. L. Acosta, G. N. Marichal, L. Moreno, J. A. Mendez, and J. J. Rodrigo. Obstacle avoidance using the human operator

experience for a mobile robot.Journal of Intelligent and Robotic Systems, 27:305–319, 2000.

2. K. O. Arras, J. Persson, N. Tomatis, and R. Siegwart. Real-time obstacle avoidance for polygonal robots with a

reduced dynamic window. InProceedings of the IEEE International Conference on Robotics and Automation, pages

3050–3055, Washington, DC, May 2002.

3. C. M. Bishop.Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.

4. B.L. Boada, D. Blanco, C. Castejon, and L.E. Moreno. A genetic solution for the slam problem. InProc. of the 11th

International Conference on Advanced Robotics (ICAR’03), 2003.

5. J. Borenstein and Y. Koren. The vector field histogram - fast obstacle avoidance for mobile robots.IEEE Transactions

Robotics and Automation, 7(3):341–346, 1991.

6. C. Breazeal, A. Edsinger, P. Fitzpatrick, B. Scassellati, and P. Varchavskaia. Social constraints on animate vision.

IEEE Intelligent Systems and Their Applications, 15(4):32–37, 2000.

7. S. Chen, C. F. N. Cowan, and P. M. Grant. Orthogonal least squares learning for radial basis function networks.IEEE

Transactions on Neural Networks, 2(2):321–355, 1991.

8. S. Chen and J. Weng. State-based SHOSLIF for indoor visual navigation.IEEE Trans. Neural Networks, 11(6):1300–

1314, November 2000.

9. J. H. Chuang. Potential-based modeling of tree dimensional workspace for obstacle avoidance.IEEE Transactions

on Robotics and Automation, 14(5):778–785, 1998.

10. D. Fox, W. Burgard, and S. Thrun. The dynamic window approach to collision avoidance.IEEE Robotics and

Automation Magazine, 4(1):23–33, 1997.

11. D. Fox, W. Burgard, S. Thrun, and A. B. Cremers. A hybrid collision avoidance method for mobile robots. In

Proceedings of the IEEE International Conference on Robotics and Automation, pages 1238–1243, 1998.

12. J.D. Han, S.Q. Zeng, K.Y. Tham, M. Badgero, and J.Y. Weng. Dav: A humanoid robot platform for autonomous

mental development. InProc. IEEE 2nd International Conference on Development and Learning (ICDL 2002),

pages 73–81, MIT, Cambridge, MA, June 12-15, 2002.

13. W. Hwang and J. Weng. Vision-guided robot manipulator control as learning and recall using SHOSLIF. InProc.

IEEE Int’l Conf. on Robotics and Automation, pages 2862–2867, Albuquerque, NM, April 20-25, 1997.

14. W. S. Hwang and J. Weng. Hierarchical discriminant regression.IEEE Trans. Pattern Analysis and Machine Intelli-

gence, 22(11):1277–1293, 11 2000.

15. Y. K. Hwang and N. Ahuja. A potential field approach to path planning.IEEE Transactions on Robotics and

Automation, 8(1):23–32, 1992.

Online-learning and Attention-based Approach to Obstacle Avoidance 21

16. T. Lee and C. Wu. Fuzzy motion planning of mobile robots in unknown environments.Journal of Intelligent and

Robotic Systems, 37:177–191, 2003.

17. Chin-Teng Lin and C.S. George Lee.Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems.

Prentice-Hall PTR, 1996.

18. K. V. Mardia, J. T. Kent, and J. M. Bibby.Multivariate Analysis. Academic Press, London, New York, 1979.

19. D. A. Pomerleau. Efficient training of artificial neural networks for autonomous navigation.Neural Computation,

3(1):88–97, 1991.

20. S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding.Science, 290:2323–

2326, December 22, 2000.

21. R. Simmons. The curvature-velocity method for local obstacle avoidance. InIEEE International Conference on

Robotics and Automation ICRA’96, pages 2275–2282, Minneapolis, April 1996.

22. A. J. Smola and B. Scholkopf. A tutorial on support vector regression. Technical Report NeuroCOLT Technical

Report NC-TR-98-030, Royal Holloway College, University of London, UK, 1998.

23. S. Thrun, M. Bennewitz, W. Burgard, A. B. Cremers, F. Dellaert, D. Fox, D. Haehnel, C. Rosenberg, N. Roy,

J. Schulte, and D. Schulz. Minerva: A second generation mobile tour-guide robot. InProc. of the IEEE International

Conference on Robotics and Automation (ICRA’99), 1999.

24. N. Tomatis, I. Nourbakhsh, and R. Siegwart. Hybrid simultaneous localization and map building: a natural integration

of topological and metric.Robotics and Autonomous Systems, 44:3–14, 2003.

25. A. C. Victorino, P. Rives, and J. Borrelly. Safe navigation for indoor mobile robots. part i: A sensor-based navigation

framework.The International Journal of Robotics Research, 22(12):1005–1118, 2003.

26. S. Vijayakumar, A. D’Souza, T. Shibata, J. Conradt, and S. Schaal. Statistical learning for humanoid robots.Au-

tonomous Robot, 12(1):55–69, 2002.

27. S. Zeng, D. M. Cherba, and J. Weng. Dav developmental humanoid: The control architecture and body. In

IEEE/ASME International Conference on Advanced Intelligent Mechatronics, pages 974–980, Kobe, Japan, July

2003.

28. S.Q. Zeng and Y. B. He. Learning and tuning fuzzy logic controllers through genetic algorithm. InIEEE International

Conference on Neural Networks, volume 3, pages 1632–1637, Orlando, Florida, June 1994.