whole-body balancing walk controller for position ... · humanoid robots equipped with force/torque...

Whole-Body Balancing Walk Controller

for Position Controlled Humanoid Robots

Seung-Joon Yi

GRASP Laboratory, University of Pennsylvania,

Philadelphia PA 19104, USA

[email protected]

Byoung-Tak Zhang

BI Laboratory, Seoul National University, Seoul, Korea

[email protected]

Dennis Hong

RoMeLa Laboratory, University of California,Los Angeles CA 90095, USA

[email protected]

Daniel D. Lee

GRASP Laboratory,University of Pennsylvania,

Philadelphia PA 19104, USA

[email protected]

Received 25 May 2015

Accepted 13 January 2016Published 17 March 2016

Bipedal humanoid robots are intrinsically unstable against unforeseen perturbations. Conven-

tional zero moment point (ZMP)-based locomotion algorithms can reject perturbations by

incorporating sensory feedback, but they are less e®ective than the dynamic full body behaviors

humans exhibit when pushed. Recently, a number of biomechanically motivated push recoverybehaviors have been proposed that can handle larger perturbations. However, these methods are

based upon simpli¯ed and transparent dynamics of the robot, which makes it suboptimal to

implement on common humanoid robots with local position-based controllers. To address thisissue, we propose a hierarchical control architecture. Three low-level push recovery controllers

are implemented for position controlled humanoid robots that replicate human recovery

behaviors. These low-level controllers are integrated with a ZMP-based walk controller that is

capable of generating reactive step motions. The high-level controller constructs empirical de-cision boundaries to choose the appropriate behavior based upon trajectory information gath-

ered during experimental trials. Our approach is evaluated in physically realistic simulations

and on a commercially available small humanoid robot.

Keywords: Position controlled humanoid robot; biomechanically motivated push recovery;

low-dimensional policy; online learning.

International Journal of Humanoid Robotics

Vol. 13, No. 1 (2016) 1650011 (28 pages)

°c World Scienti¯c Publishing Company

DOI: 10.1142/S0219843616500110

1650011-1

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

http://dx.doi.org/10.1142/S0219843616500110

1. Introduction

Due to their small footprint and high center of mass (COM), bipedal humanoid

robots are prone to lose balance with uneven °oors, robot modeling errors, or im-

precise actuators. Thus, active stabilization of humanoid robots has been an im-

portant topic in robotics research. Biomechanical studies of human walking and

balancing behavior showed that humans use three basic balance control strategies,

denoted ankle, hip and step strategies which are illustrated in Figs. 1(a)–1(c).1 The

ankle strategy controls torque at the ankle joint, the hip strategy uses the angular

acceleration of the torso and free limbs to apply counteractive ground reaction force

(GRF), and the step strategy changes the base of support to a new position. All three

strategies seek to control the horizontal position of the system's COM by changing

the horizontal component of the GRF.

The conventional approach for bipedal locomotion control is zero moment point

(ZMP)-based control algorithms based upon the linear inverted pendulum model

(LIPM).2 The reference ZMP trajectory is typically designed in advance according to

footstep locations, then the torso and foot trajectories are calculated based on the

reference ZMP using the LIPM.3 Stabilization is accomplished by measuring state

error and feedback control to track the reference ZMP, which updates the COM

trajectory and generates an inertial force resulting in an e®ective control torque at

the ankle joints, as shown in Fig. 1(d). The closed-loop ZMP tracking approaches are

usually con¯ned to the ankle strategy, as reactive stepping requires online modi¯-

cation of the ZMP trajectory. However, there has recently been some work on si-

multaneously generating COM and ZMP trajectories in real time to enable the step

strategy.4–6

The main advantage of ZMP tracking-based approaches is that they can easily be

integrated in existing walk controllers, and they have been successfully incorporated

Fig. 1. A comparison of three biomechanically motivated push recovery approaches and the ZMP

tracking approach. (a) Ankle strategy. (b) Hip strategy. (c) Step strategy. (d) ZMP tracking approach.

S.-J. Yi et al.

1650011-2

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

on many humanoid robot platforms. However, they usually require fast online

computation, a precise dynamic model of the robot, and accurate estimation of the

current dynamic state, which makes it harder to use on resource constrained robots

with restricted actuation, sensing and processing capabilities. The ankle strategy

alone has limited e®ectiveness against strong perturbation. The step strategy can be

used after large perturbations, but it is not always physically feasible due to step

timing or foot con¯guration. Figure 2 shows an example where the ZMP tracking-

based ankle/step controller fails to stabilize the robot.

On the other hand, an active line of research has focused on the theoretical

analysis of biomechanically motivated push recovery controllers using an abstract

model of the robot. These models include ankle control torque for the ankle

strategy, °ywheel body and hip control torque for the hip strategy, and secondary

support point for the step strategy.7–9 Such approaches result in very simple

analytical controllers that can reject stronger perturbations as they utilize angular

momentum degrees of freedom. However, the biggest drawback of these approa-

ches is that most of them assume simpli¯ed and transparent dynamics of the

robot, which is often hard to realize as most of the humanoid robots currently

available has highly distributed mass and local position-based controllers with

high feedback gain.

Our aim is to get the best of both worlds, devising an integrated walk controller

that can exhibit the full range of biomechanically inspired behaviors to respond to

external perturbations. We take a hybrid approach where walking is governed by a

ZMP-based walk controller, and large perturbations trigger biomechanically moti-

vated simple push recovery controllers. First, we design a simple ZMP-based walking

controller that simultaneously plans the ZMP and COG trajectories in real time for

reactive stepping. To incorporate the biomechanically motivated push recovery

controllers, we utilize a hierarchical architecture which consists of low-level con-

trollers that governs each biomechanically motivated push recovery behavior with a

high-level controller that switches each low-level controller based on the current

state of the robot. Instead of relying upon the accuracy of the theoretical model

(a) (b) (c)

Fig. 2. Comparison of the ZMP tracking approach and the biomechanical push recovery approach under

lateral pushes during walking. An impulsive lateral force for 0.01 s is applied to the COM of the robot atthe middle of the single support phase. Note that the step strategy is not possible for this case due to

kinematic constraints. (a) The ZMP tracking approach, 0.9Ns of lateral push. (b) The ZMP tracking

approach, 1.2Ns of lateral push. (c) The ankle and hip strategies, 1.2Ns of lateral push.


1650011-3

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

parameters,10,11 we use the empirical decision boundaries between the controller that

are learned from experience.

The main contribution of this work is twofold: from the theoretical point of view,

we show that the physical humanoid robot has similarly-shaped but quantitatively

di®erent stability regions from those derived by theoretical models of varying sim-

plicity. In terms of implementation, we propose an integrated system that e®ectively

combines three push recovery behaviors and a walk controller to enable a humanoid

robot to perform push recovery behaviors while walking. We demonstrate how this

controller is learned from experience and evaluate its performance on a small hu-

manoid robot.

The remainder of the paper is organized as follows. Section 2 reviews three bio-

mechanically motivated push recovery controllers and their implementations on

position controlled humanoid robots. Section 3 explains the step-based omnidirec-

tional walk controller which can perform reactive stepping for push recovery control

during walking. Section 4 shows how to learn the high-level controller from repeated

trials in a simulated environment, and Sec. 5 shows the experimental results using

the DARwIn-OP humanoid robot. Finally, we conclude with a discussion of out-

standing issues and potential future directions arising from this work.

2. Biomechanically Motivated Push Recovery Controllers

for Position Controlled Robots

Biomechanical studies show that humans display three distinctive motion patterns in

response to sudden external perturbations, which we denote as ankle, hip and step

push recovery strategies.1 The ankle strategy applies control torque at the ankle

joint, the hip strategy uses the angular acceleration of torso and free limbs to apply

counteractive GRF, and ¯nally the step strategy changes the base of support to a

new position. For each push recovery strategy, we ¯rst review the basic push re-

covery controllers for the simpli¯ed model, and then explain how we implement the

behaviors of such controllers on resource constrained humanoid robots which lack

force/torque control and only provide position-based control with high proportional

gain. Finally, we explain how we handle possible issues with those controllers when

the robot is moving.

2.1. Ankle push recovery

The ankle strategy applies control torque on the ankle joints to keep the COM

within the base of support. It is widely adopted in the form of the closed-loop ZMP

tracking, and this approach is successfully implemented on a number of full-sized

humanoid robots equipped with force/torque sensors at the ankles.3,12–16 Further-

more, it was recently shown that the approach is robust enough to make a full-sized

humanoid robot walk on a public street with unknown surface inclinations and

unevenness.17 It also has been widely implemented on small humanoid robots and

S.-J. Yi et al.

1650011-4

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

two current commercially available small humanoid robots, Naoa and DARwIn-

OP,b are provided with walk controllers using the ankle strategy for stabiliza-

tion.18,19 And there have also been other closed-loop walk control implementations

utilizing the ankle strategy on small humanoid robots.20,21

We ¯rst examine the abstract model in Fig. 3(a), where ankle torque �ankle is

applied to a LIPM with massm, COM height z0 and COM horizontal position x from

current support point. The resulting linearized dynamic model is

€x ¼ !2ðx � �ankle=mgÞ; ð1Þ

where ! ¼ ffiffiffiffiffiffiffiffiffig=z0

pand g is the gravitational constant. If we assume a reference

trajectory xref which satis¯es the LIPM without additional ankle torque

€xref ¼ !2xref ; ð2Þ

then the state error xerr ¼ x � xref follows the same dynamic model as (1):

€xerr ¼ !2ðxerr � �ankle=mgÞ; ð3Þwhich can be controlled by PD control on xerr:

�ankle ¼ Kpxerr þKd _xerr; ð4Þwhere Kp and Kd are control gains. This requires torque control of ankle actuators,

but in practice it can be approximated for position controlled actuators with pro-

portional control by directly setting the target angle of the ankle actuator

��ankle ¼ K 0pxerr þK 0

d _xerr; ð5Þ

ahttp://www.aldebaran-robotics.com/.bhttp://www.robotis.com/xe/darwin en.

(a) (b)

Fig. 3. The ankle strategy that applies control torque on ankle joints. (a) The abstract model for theankle strategy. (b) The ankle strategy implemented on DARwIn-OP humanoid robot.


1650011-5

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

where��ankle is the target ankle angle bias.10,19 In addition to the ankle joints, we use

the same control law to modulate arm position to apply additional e®ective torque at

the ankles in a similar way, unless overridden by the hip controller.

When the robot is walking, we only apply ankle bias to the current support foot

during the middle phase of single support to prevent the ankle strategy from setting

nonzero ankle bias for the foot currently in air, which can result in premature

landing. We use a trapezoid function f ð�Þ to make a smooth transition at landing

and takeo®

��ankle ¼ f ð�singleÞðK 0pxerr þK 0

d _xerrÞ; ð6Þwhere 0 � �single < 1 is the single support phase and f ð�Þ is following function

f ð�Þ ¼�=�lift 0 � � < �lift;

1 �lift � � < �land

ð1� �Þ=ð1� �landÞ �land � � < 1;

8<: ; ð7Þ

where �lift and �land are timing parameters. Figure 3(b) shows the ankle strategy

controller implemented on the DARwIn-OP small humanoid robot.

2.2. Hip push recovery

The hip strategy uses angular acceleration of the torso and limbs to generate a

backward GRF to pull the COM back towards the base of support. A two-phase in

the hip strategy for a humanoid has been suggested which uses angular acceleration

to absorb the disturbance in the re°ex phase and return to initial pose in the recovery

phase.22 An extended LIPM with angular momentum was used to derive analytic

control laws for the hip and the step strategy, and the concept of capture point was

suggested as the calculated stepping position for the step strategy.7 This approach is

further extended by using a simpli¯ed model that results in analytic decision surfaces

for push recovery strategies as functions of the state of the robot.8,9 These approa-

ches are extended to control GRF and ZMP at each foot using angular momentum

and showed it can balance a 3D full-body model of a humanoid robot in a simulated

environment for nonlevel and nonstationary ground.23 The hip strategy for a sta-

tionary robot has been also implemented on a full-sized, torque controlled humanoid

robot.24

The abstract model in Fig. 4(a) includes a °ywheel with mass m, COM height z0and rotational inertia I , and control torque �hip applied at the center of the °ywheel.

The resulting linearized dynamic model is then:

€x ¼ !2ðx � �hip=mgÞ; ð8Þ�::hip ¼ �hip=I : ð9Þ

However, the °ywheel should not exceed joint limits. In this case, the following

bang–bang pro¯le can be used for applying hip torque to maximize the e®ect while

S.-J. Yi et al.

1650011-6

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

satisfying the joint angle constraint,7

�hipðtÞ ¼� MAXhip 0 � t < TH1;

�� MAXhip TH1 � t < 2TH1;

(ð10Þ

where � MAXhip is the maximum torque that the can be applied on torso and TH1 is the

time the torso stops accelerating. This torque pro¯le angularly accelerates the torso

with maximum torque and then decelerates with maximum negative torque, making

it stop at angle �MAXhip . This behavior can be approximately implemented with high

gain position controlled actuators by directly setting the hip target angle bias

��TARGEThip to �MAX

hip , which makes the torso accelerate with the maximum torque and

stops at that position with nearly maximum deceleration. After t ¼ 2TH1, the hip

angle bias should return to zero.22 This two-phase behavior can be simply imple-

mented as

��TARGEThip ¼

�MAXhip 0 � t < 2TH1;

�MAXhip

2TH1 þ TH2 � t

TH2

2TH1 � t < 2TH1 þ TH2;

8>><>>: ð11Þ

where TH2 is the duration of the returning phase. The same controller is used for arm

angles to apply additional GRF from the angular momentum of the limbs as well.

When the robot is pushed hard during walking, the robot may lift its currently

tipped foot, which can instantly destabilize the robot. To prevent this, when the hip

strategy is initiated, we shorten the single support phase and extend the double

support phase until the hip strategy is completed and the robot stands stably on two

feet. Figure 4(b) shows the hip strategy controller implemented on the DARwIn-OP

small humanoid robot.

(a) (b)

Fig. 4. The hip strategy uses angular acceleration of torso and limbs to apply counteractive GRF. (a) The

abstract model for the hip strategy. (b) The hip strategy implemented on DARwIn-OP humanoid robot.


1650011-7

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

2.3. Step push recovery

When the magnitude of the disturbance exceeds the capability of the other two

push recovery controllers, the step controller can be used to move the base of

support towards the direction of the push by taking a step. If we assume that the

push is done while robot is in single support phase, this strategy can be imple-

mented in a straightforward manner by changing the landing position of currently

lifted foot towards the direction of perturbation. This step strategy has been

implemented on various full-sized humanoid robots, including HRP-225,11 and

Sarcos robot26 while walking, and Hubo10 and Toyota partner robot6 while hop-

ping in place. There have been some analytical studies about where the robot

should step assuming simpli¯ed models, including the capture point,7 foot place-

ment estimator27 and generalized foot placement estimator28 approaches. They all

share the inverted pendulum model shown in Fig. 5(a), which models the step

strategy as three stages including initial single support stage from initial condition,

support point transition stage, and ¯nal single support stage to stable state. Their

main di®erence is how they model each stage. A LIPM is used for all three stages,

and the support point transition is assumed to occur instantaneously preserving

linear momentum, which results in the following landing position from initial

support point7:

xcapture ¼ _x=!þ x: ð12ÞIn reality, we cannot instantly change the support point, and landing impacts

reduce the linear momentum. In Ref. 26, an inverted pendulum model with ¯xed

leg length z0 and pendulum tilt angle � is used for the ¯rst and second stages, and a

LIPM with body height z0 is used for the third stage. Landing is modeled as an

impulse force along the landing leg, which makes the vertical velocity descend to

zero. In Refs. 28 and 10, an inverted pendulum model with leg length l and an

angular momentum conserving impact model for transition are used. Those models

do not admit a closed form solution in general, but an approximate solution is

(a) (b)

Fig. 5. The step strategy changes the support point by stepping. (a) The abstract model for the step

strategy. (b) The step strategy implemented on the DARwIn-OP humanoid robot.

S.-J. Yi et al.

1650011-8

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

provided in Ref. 10 as:

xcapture ¼ 2 cosða=2Þ; ð13Þ

a ¼ 2 cos�1 1�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðl�:2=2þ cos �� 1Þ=8

q� �: ð14Þ

One practical issue for a physical implementation of the step strategy is the

landing shock. As the step strategy is meant to be used with large perturbation, it

can lead to a hard landing that can make the robot bounce back and fall down.

There have been approaches to handle this by incorporating mechanical or elec-

trical compliance, and we use a simpler approach of lowering the proportional gain

for the swing leg at the later part of stepping. Figure 5(b) shows the step strategy

implemented on the DARwIn-OP robot.

We should also consider that the step strategy may not be always possible for

walking humanoid robot due to kinematic and timing constraints. Most humanoid

robots cannot cross their legs due to kinematics constraints, and the amount by

which the robot can change the landing position of the currently lifted foot decreases

over time due to velocity constraints. Also, if the robot is pushed when the robot is in

double support or is about to land its foot, it needs to take a new step for push

recovery. In this case, we have to determine which foot the robot should use for

stepping, as lifting the foot with the current support edge will result in the robot

instantly falling. Figure 6 shows three possible stepping cases according to the di-

rection of perturbation from the same foot stance. The support foot for capture step

can be determined based on the angle between the two feet and the perturbation

vector as shown in Figs. 6(a) and 6(b). For cases like Fig. 6(c), the step strategy is

not available due to a kinematic constraint.

(a) (b) (c)

Fig. 6. Determining step foot based upon the direction of perturbation.


1650011-9

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

2.4. The high-level push recovery controller

We have explained three biomechanically motivated push recovery controllers

and their implementations for walking in a position controlled humanoid robot.

When pushed, humans perform a combination of push recovery behaviors

according to the particular situation. To select the appropriate set of push re-

covery behaviors as humans do, we use a hierarchical controller shown in Fig. 7,

where ankle, hip and step push recovery controllers work as low-level sub-

controllers and the high-level push recovery controller triggers each according to

the direction and the amount of the external disturbance estimated using the

onboard sensors.

For abstract models we have seen in Figs. 3–5, there have been analytic studies for

decision boundaries of each controller.8,29 If we assume maximum ankle torque as

� ankleMAX, then the stability region for ankle push recovery controller, a region of state

space the system can be stabilized, can be derived as

_x=!þ xj j < � MAXankle=mg ð15Þ

and following stability region for the hip strategy plus the ankle strategy

_x=!þ xj j < ð� MAXankle þ � MAX

hip ðe!TH1 � 1Þ2Þ=mg: ð16Þ

Finally, if we assume instantaneous support point transition without loss of linear

momentum, we have the following stability region for using all three strategies

at once:

_x=!þ xj j < ð� MAXankle þ � MAX

hip ðe!TH1 � 1Þ2Þ=mg þ xMAXcapture; ð17Þ

where xMAXcapture is the maximum step size available. In this case we can use two

boundary conditions in (15) and (16) to select between controllers based on current

state. For the more realistic case with a multi-segmented body with motor dynamics

as on a physical robot, these theoretical boundaries do not ¯t well and the high-level

controller needs to be trained from experience. This is covered in more detail later in

this paper.

Fig. 7. The hierarchical control structure for push recovery.

S.-J. Yi et al.

1650011-10

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

3. Integration with Walk Controller

As we have seen before, the step strategy requires reactive modi¯cation of the

stepping sequence and the foot trajectory, as shown in Fig. 8. However reactive

stepping is generally not possible with typical ZMP tracking approaches where the

reference ZMP trajectory is calculated in advance and the COM trajectory is gen-

erated to minimize the ZMP error.

Recently, there have been approaches to generate walking patterns online to

overcome this limitation, including a ZMP preview-based algorithm that updates

the COM trajectory at a high frequency,30 a real-time gait planning method based on

the analytic solution of the LIPM with a parametrized ZMP trajectory.4,5 Step

push recovery based on these approaches have been successfully implemented on the

HRP-2 robot25,11 and the Toyota partner robot.6

Another method for real-time walk pattern generation is the biologically inspired,

central-pattern-generator-based approach. This approach has been implemented on

the Hubo robot and demonstrated step push recovery behavior while hopping in

place.10 Due to its simplicity, this approach has been widely used for small, resource

constrained humanoid robots,19,31,32 but it is generally harder to design a stable

trajectory as it is not based on a explicit stability criterion.

Our walk controller is based on the analytic solution of the LIPM, but further

simpli¯ed to be implemented on resource constrained robots. The walk pattern is

divided into discrete steps, and the overall walk control is separated into a footstep

generation controller and trajectory controller. The footstep generation controller

generates the parameters for the next step, including the initial and ¯nal position of

each foot and support foot information, and generates the reference ZMP trajectory

based on them. The trajectory controller generates foot and torso trajectories for the

current step based on those parameters. We describe more details of our walk con-

troller in following subsections.

3.1. Footstep generation controller

Our ¯rst assumption is that walking is divided into discrete steps, which start and

end with a double support phase. Then we can de¯ne the ith step as a set of

(a) (b)

Fig. 8. Two di®erent cases of reactive stepping. (a) The inter-step override which uses the same support

foot and updates the foot trajectory for the next step. (b) The intra-step override which updates the

current foot trajectory during stepping.


1650011-11

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

parameters

STEPi ¼ SFi;Li;Ci;Ri;L1þ1;Ciþ1;Riþ1f g; ð18Þwhere SFi denotes the support foot, Li;Ci;Ri and Liþ1;Ciþ1;Riþ1 are the initial and

¯nal 2D poses of left foot, torso and right foot in ðx; y; �Þ coordinate. The landing footpose is calculated from the current foot con¯guration, commanded walk velocity and

kinematic and self-collision constraints. To make the step transition occur at the

most stable posture, we set the boundary torso pose Ci to be the midpoint of Li and

Ri for all i. When an inter-step override is required as in Fig. 8(a), the current

commanded walk velocity is overridden and the next landing foot position is de-

termined according to the push direction. A single step is further divided into three

stages, which includes the ¯rst double support stage when ZMP moves to the current

support foot, and the single support stage when ZMP lies on the support foot, and

the second double support stage when ZMP moves back to the ¯nal torso position. If

we de¯ne the walk phase � as t=t STEP, where t is the time passed since step started

and t STEP is the duration of the step, we can design the ZMP trajectory pið�Þ as apiecewise linear function of � as

pið�Þ ¼Ci 1� �

�1

� �þ Li

�

�1

0 � � < �1;

Li �1 � � < �2;

Ciþ1 1� 1� �

1� �2

� �þ Li

1� �

1� �2

�2 � � < 1;

8>>>>><>>>>>:

ð19Þ

for the left support foot case and

pið�Þ ¼Ci 1� �

�1

� �þ Ri

�

�1

0 � � < �1;

Ri �1 � � < �2;

Ciþ1 1� 1� �

1� �2

� �þ Ri

1� �

1� �2

�2 � � < 1;

8>>>>><>>>>>:

ð20Þ

for the right support foot case, where �1; �2 are the timing parameters determining

the transition between single support and double support phase. The step controller

and resulting ZMP trajectory are shown in Fig. 9.

3.2. Trajectory controller

The trajectory controller generates the foot and torso trajectories for the current step

de¯ned in (18). First, we de¯ne the single support walk phase �single as

�single ¼

0 0 � � < �1;

�� 1

�2 � �1

�1 � � < �2;

1 �2 � � < 1;

8>>>><>>>>:

ð21Þ

S.-J. Yi et al.

1650011-12

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

then we use following heuristic trajectory function with parameters �; �:

fT ð�Þ ¼ �� þ ��ð1� �Þ; ð22Þto generate the foot trajectories for both feet lið�Þ, rið�Þ:

lið�Þ ¼ Lið1� fT ð�singleÞÞ þ Liþ1fT ð�singleÞ; ð23Þrið�Þ ¼ Rið1� fT ð�singleÞÞ þ Riþ1fT ð�singleÞ: ð24Þ

Then the torso trajectory xi is calculated to satisfy following ZMP criterion for the

LIPM

xi:: ¼ ðxi � pið�ÞÞ=tZMP

2; ð25Þwhere tZMP ¼ ffiffiffiffiffiffiffiffiffi

z0=gp

. The piecewise linear ZMP trajectory we use in (19) and (20)

yields the following closed-form solution of xið�Þ during the step period 0 � � < 1:

xið�Þ ¼

pið�Þ þ a pi e

�=�ZMP þ ani e

��=�ZMP

þmitZMP

�� 1

�ZMP

� sinh�� 1

�ZMP

� �0 � � < �1;

pið�Þ þ a pi e

�=�ZMP þ ani e

��=�ZMP �1 � � < �2;

pið�Þ þ a pi e

�=�ZMP þ ani e

��=�ZMP

þ nitZMP

�� 2

�ZMP

� sinh�� 2

�ZMP

� ��2 � � < 1;

8>>>>>>>>>><>>>>>>>>>>:

ð26Þ

where �ZMP ¼ tZMP=t STEP and mi, ni are ZMP slopes which are de¯ned as follows for

the left support case:

mi ¼ ðLi � CiÞ=�1; ð27Þni ¼ �ðLi � Ciþ1Þ=ð1� �2Þ ð28Þ

(a) (b)

Fig. 9. The step-based walk controller. (a) An example of walking behavior which is composed of two

steps, STEPi and STEPiþ1. (b) Corresponding lateral ZMP and torso trajectories pð�Þ and xð�Þ. Timingparameters of �1 ¼ 0:2 and �2 ¼ 0:8 are used.


1650011-13

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

and for the right support case:

mi ¼ ðRi � CiÞ=�1; ð29Þni ¼ �ðRi � Ciþ1Þ=ð1� �2Þ: ð30Þ

The parameters a pi and an

i can then be uniquely determined from the boundary

conditions xið0Þ ¼ Ci and xið1Þ ¼ Ciþ1. This analytic solution of the torso trajectory

is continuous and has zero ZMP error during each step period, but may have dis-

continuous velocity at the transition when commanded velocity is changing. How-

ever, we found this does not hamper stability as the transition occurs in the middle of

the most stable double support stance. In addition to calculating foot trajectories

based upon predetermined target foot poses from step controller, the intra-step

override shown in Fig. 8(b) is handled by the trajectory controller by updating the

landing position of the current swing foot towards the capture point. As the new

landing point has to satisfy kinematic and velocity constraints, it is most e®ective at

the initial phase of the step.

4. Learning the High-Level Push Recovery Controller

In the previous sections, we have described our hierarchical push recovery controller

structure and its implementation for a position controlled robot. As we have dis-

cussed, although there are analytic decision boundaries for simpli¯ed models to select

the appropriate set of push recovery controllers based on current state, such decision

rules may not work well with more realistic dynamic models. Instead of relying on the

abstract model, our previous works have been using a machine learning approach,

where we directly train the parametrized controller from experience. We have

implemented three parametrized push recovery strategy for a resource constrained

robot with high gain position control, and used reinforcement learning to learn the

high-level controller that governs three push recovery controllers from raw sensory

inputs using a full-body model of robot in simulated environment, and used the

learned controller on small humanoid robot walking in place.33

An insight gained from physical experiments is that modest pushes can be ef-

fectively stabilized using the ankle strategy alone, and the magnitudes of hip and step

strategies are limited with the physical robot due to kinematic and motor con-

straints. In other words, it is su±cient to ¯x jj�MAXhip jj and jjxcapturejj, which greatly

reduces the action space compared to previous parametrized controllers. Still, such a

direct approach is not data e±cient as it does not utilize knowledge of the decision

surface, and applying this approach on real robot with scarce training data requires

much simpli¯cation of the controller.34

In this work, we take a hybrid approach. We use a low-dimensional decision

boundary in state space, but instead of relying on a theoretical boundary from a

simpli¯ed model, we use the training data from a simulated environment to get the

empirical decision boundary for push recovery controllers. Then the model can be

easily trained with limited number of data from the physical robot afterwards.

S.-J. Yi et al.

1650011-14

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

4.1. Resource constrained humanoid platform

Most of the physical implementation of push recovery controller introduced so far use

human-sized robots, usually equipped with harmonic gear drive train, triaxial force–

torque sensors and torque-controlled actuators. On the other hand, lightweight, low-

cost humanoid robots with o®-the-shelf servomotors are now gaining popularity in

part due to the commercial availability of a®ordable small humanoids. Although

those a®ordable humanoids are limited in terms of their sensory, motor and pro-

cessing power, they have been used for a viable research platform in many areas,

including balancing control during walking. A number of push recovery approaches

has been implemented on such platforms, including a crouching re°ex similar to hip

strategy,35 frontal hip strategy,36,37 lateral step strategy31 and frontal ankle and step

strategy.38

For this work, we use the commercially available DARwIn-OP humanoid robot

and its simulation model as the test platform. It is 45 cm tall, weighs 2.8 kg, and has

20� of freedom. It has a 3-axis accelerometer and gyroscope for inertial sensing, and

joint encoders at each joint for proprioceptive sensing. Position-controlled dyna-

mixel servos are used for actuators, which are controlled by a custom microcon-

troller connected to an embedded PC at a control frequency of 100Hz.

4.2. The extended inverted pendulum model

The abstract model we used in previous sections does not ¯t the physical hu-

manoid platform well. The most notable di®erence is that the physical robot has

feet with nonzero size, and the robot can be tipped on the boundary of the foot.

Furthermore, the ankle torque is only indirectly controlled by proportional con-

trol. Finally, the estimate of the linear position and velocity of the COM using

noisy sensors can be very hard. Proprioceptory sensors can be used to determine

COM position if we assume the support foot is on the ground, but such assumption

will not hold if the robot is perturbed hard. Instead, we have found that the

angular velocity and tilt angle information from inertial sensors are more reliable.

Thus, we propose a new abstract model for a resource constrained humanoid

robot, which is shown in Fig. 10(a). It is an inverted pendulum with the tilt angle �

as state, and has a foot with toe position �þ and heel position �� from the ankle

joint. The ankle torque �ankle is controlled by a PD control of � with saturation

values mg�þ and mg��:

�:: ¼ !2ðsinð�Þ � ð�ankle þ �hipÞ=mgz0Þ; ð31Þ

�ankle ¼ fsatðK 00p �þK 00

d �:Þ; ð32Þ

fsatðxÞ ¼mg�þ x � mg�þ;x mg�þ < x < mg�þ;mg�� x � mg��:

8<: ð33Þ


1650011-15

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

We can linearize the stability regions in (15), (16) and consider the saturated

case to get the following stability regions for the ankle, hip and step strategies:

��=z0 < �:=!þ � < �þ=z0; ð34Þ

�:=!þ � > ��=z0 � � MAX

hip ðe!TH1 � 1Þ2=mgz0;

�:=!þ � < �þ=z0 þ � MAX

hip ðe!TH1 � 1Þ2=mgz0;ð35Þ

�:=!þ � > ��=z0 � � MAX

hip ðe!TH1 � 1Þ2=mgz0 � xMAXcapture=z0;

�:=!þ � < �þ=z0 þ � MAX

hip ðe!TH1 � 1Þ2=mgz0 þ xMAXcapture=z0:

ð36Þ

Figures 10(c) and 10(d) show three trajectory plots acquired from various initial

pushes using the extended inverted pendulum model and three di®erent sets of push

recovery strategies. Parameters used are m ¼ 2, z0 ¼ 0:295, �þ ¼ 0:05, �� ¼ �0:05,

K 00p ¼ 500, K 00

d ¼ 57:83, � MAXhip ¼ 1, TH1 ¼ 0:3, xMAX

capture ¼ 0:08, which are based on the

multi-body model of the DARwIn-OP robot. Reduced mass of m ¼ 2 is used to

compensate for the large leg mass of the robot. We see that the hip and step stra-

tegies help to enlarge the stability region, and even with the nonlinear dynamic

model we use, the empirical stability region of the ankle strategy closely follows the

theoretical one derived using simpler LIPM.

4.3. The ankle strategy with multi-body model

To model more realistic, multi-body dynamics of the robot we use the Webots

commercial robotic simulator39 based on the Open Dynamics Engine physics library

and supplied simulated model of DARwIn-OP robot. We use our modular open

source humanoid framework40 for controlling the robot. The controller update fre-

quency and physics simulation frequency are set to 100Hz. We use the COM height

z0 ¼ 0:295, step duration t STEP = 0.50 and robot center to ankle width d stance ¼0:375 for walk parameters. For the ankle strategy gain parameters, we use values of

(a) (b) (c) (d)

Fig. 10. The extended inverted pendulum model and the phase space trajectory plots generated usingdi®erent push recovery strategies. (a) An inverted pendulum model of robot with position controlled ankle

joint and foot. (b) Ankle strategy. (c) Ankle plus hip strategy. (d) Ankle plus step strategy. White and

gray regions in (b)–(d) are theoretical stable and unstable regions from (34). Darker gray regions in (c) and

(d) are the increased stable region compared to using the ankle strategy alone.

S.-J. Yi et al.

1650011-16

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

K 00p ¼ 0, K 00

d ¼ 0:15 which are found to be e®ective in practice, as the position con-

trolled joints and nonpoint feet already apply a positional negative feedback to the

system. The robot is pushed with impulse forces for one timestep (0:01 s) with dif-

ferent magnitudes and directions, and various combinations of push recovery

strategies are evaluated and the state trajectories are logged.

Figures 11(a) and 11(b) show the empirical decision boundaries found for the ankle

strategy from frontal and lateral pushes. Although the empirical trajectory plots have

shapes similar to those in Fig. 10, the empirical stability regions di®er signi¯cantly

from those obtained via the abstract models. From the trajectory curves, we ¯t a

linear classi¯er that best separates two regions for the duration 0:03 < t < 0:3, as our

impulse impact setup makes an unrealistic big spike at sensor readings for one or two

simulation steps. Then we get the estimated values for � and z0 shown in Table 1,

which implies following empirical stability boundaries for the ankle strategy:ffiffiffiffiffiffiffiffiffiffig=z�0

q��þ ��=z �

0

� �< �

:<

ffiffiffiffiffiffiffiffiffiffig=zþ0

q��þ �þ=z þ

0

� �: ð37Þ

4.4. Deciding between hip and step strategies

Given the empirical stability region of the ankle strategy controller, if the pertur-

bations fall outside that region, we need to employ other push recovery controllers in

addition to ankle controller to handle them. The LIPM-based abstract models

(Figs. 4 and 5) imply the theoretical stability regions described in (16) and (17),

(a) (b) (c) (d)

Fig. 11. Phase space trajectory plots generated with the multi-body model and di®erent push recoverystrategies in physically realistic simulations. (a) Ankle strategy, frontal push. (b) Ankle strategy, lateral

push. (c) Ankle plus hip strategy, lateral push. (d) Ankle plus step strategy, frontal push. White and gray

regions are theoretical stable and unstable regions from (34)–(36). Thick dashed lines are estimated linear

boundary between stable and unstable regions.

Table 1. Parameter values estimated

from the multi-body model.

Parameter �þ zþ0 �� z�0

Frontal 0.45 1.09 −0.42 1.02

Lateral 0.84 1.45 −0.84 1.45


1650011-17

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

which can grow quite large with large �MAXhip and xMAX

capture. However, due to kinematic

and velocity constraints we have practical limits for those values. Taking a step also

takes time, which further restricts the e®ectiveness of the step strategy. We set

jjxhipjj ¼ 40�, TH1 ¼ 0:15, TH2 ¼ 0:3 and jjxcapturejj ¼ 0:08 for hip and step strategy

parameters and compare the results of the two strategies. Figures 11(c) and 11(d)

show trajectory plots acquired from two sets of push recovery controllers: the ankle

plus hip strategy and the ankle plus step strategy. In this case, a clear boundary for

ankle plus hip strategy is not as evident as in Fig. 10(c), as the inertial sensor of our

robot lies in the torso and rotates when the hip strategy is triggered. Instead of

decoupling the hip rotation and sensory readings, which turned out very hard with

noisy sensor model we use, we compared the outcome of two push recovery strategies

against various magnitudes of perturbation to better compare the e®ectiveness of the

two controllers. We have found that against frontal push, the step strategy can

withstand slightly larger maximum perturbations than the hip strategy, 1.04Ns

versus 1.05Ns for the step strategy, and step strategy has a wider region of stability

than the hip strategy with ¯xed parameter values �MAXhip and xMAX

capture. On the other

hand, the step strategy is not available for purely lateral perturbation due to kine-

matic constraints and we have to rely on the hip strategy for such cases.

In summary, the decision rule for push recovery strategies is as follows. We set the

ankle strategy active all the time, and if the state estimate moves beyond the em-

pirical stability boundaries in (37), the step strategy is triggered. In case the step

strategy is not available due to constraints, the hip strategy is triggered instead.

4.5. Comparison with ZMP tracking controller

To demonstrate the e®ectiveness of the hierarchical push recovery controller, we

compare it to the commonly used closed-loop ZMP tracking controller. We imple-

ment the ZMP tracking controller based on Ref. 18, with a single di®erence that the

current state is estimated using an inertial sensor rather than joint encoders and

forward kinematics. All other parameters remain unchanged. Various amounts of

frontal and lateral impulses were applied to the robot, and the outcome of push

recovery e®ort is logged for each controller. Figure 12 shows the comparison of two

controllers for forward, backward and sideways pushes. Figure 13 shows the stability

regions of four di®erent combinations of push recovery controllers settings. We can

see that the step strategy can handle the frontal perturbations fairly well, and the hip

strategy is e®ective for lateral perturbations where the step strategy cannot be uti-

lized due to the kinematic constraint. Overall, we see that the stability region of the

suggested approach is approximately 21% larger and completely encompasses that of

ZMP tracking method.

4.6. Extension to the full-sized humanoid robots

In this paper, we have used only the DARwIn-OP miniature humanoid robot for

testing in both the simulated and the real environments, which has relative large feet

S.-J. Yi et al.

1650011-18

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

size and larger power to weight ratio compared to common full-sized robots. To see

how our method can extend to a larger robot, we have made a comparison to a full-

sized position controlled humanoid robot, THOR-RD, which we have used for the

DARPA Robotics Challenge.41,42 The overall dimensions of two robots are shown in

Fig. 14, and a more detailed comparison between two robots is provided in Table 2.

We have found that due to the relatively lower COM height, the larger THOR-

RD robot has slightly larger foot length to COM height ratio that a®ects the max-

imum tilt angle the robot can recover from. On the other hand, due to the lower

power to weight ratio and larger dimension of the robot, the maximum horizontal

torso acceleration possible with full ankle torque is approximately three times smaller

than DARwIn-OP robot. So overall we expect the ankle strategy to work similarly

with larger robot, albeit being less responsive. And from (16) and (35), we can

assume that the e®ectiveness of the hip strategy is roughly proportional to the

� MAXhip =mg. The comparison of the quantity over two robots shows that under this

assumption, the hip strategy will be approximately 15% less e®ective with the larger

THOR-RD robot. Finally, the THOR-RD robot has longer natural pendulum period

(a) (b)

(c) (d)

(e) (f)

Fig. 12. A comparison of the ZMP tracking controller and suggested hierarchical push recovery controller

with di®erent impulse forces. (a) ZMP tracking controller, 1.05Ns of frontal push. (b) Hierarchical push

recovery controller, 1.05Ns of frontal push. (c) ZMP tracking controller, 1.04Ns of backward push.

(d) Hierarchical push recovery controller, 1.04Ns of backward push. (e) ZMP tracking controller, 1.61Nsof lateral push. (f) Hierarchical push recovery controller, 1.61Ns of lateral push.


1650011-19

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

due to its higher COM height, and can take a larger step relative to the foot length.

We expect both of these factors can help the e®ect of the step strategy.

In summary, we expect that the suggested controller to work with larger position

controlled humanoid robots as well, although the torque limit of the actuators can

moderately degrade the performance of some strategy. Unfortunately, at the point of

Fig. 13. Comparison of stability regions for four di®erent push recovery settings.

Fig. 14. Comparison of the dimensions of the DARwIn-OP miniature humanoid robot and the THOR-

RD full-sized humanoid robot.

S.-J. Yi et al.

1650011-20

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

writing this paper, we could not test the controller with THOR-RD robot as we could

not risk possible hardware damage. This remains as a future work.

5. Experimental Results

In addition to the simulated environment, we have implemented the integrated

walk controller with push recovery on a commercially available DARwIn-OP small

humanoid robot. All code and parameter values used for simulation are used to

control the physical robot as well, with help of our modular open source humanoid

framework.

5.1. Hardware setup

To generate repeatable external perturbations, a motorized moving platform was

constructed using Dynamixel servomotors (Fig. 15). To generate maximum peak

acceleration, the platform is slowly accelerated in one direction and then suddenly

accelerated in the opposite direction. We have found the platform can generate

Table 2. Detailed comparison of the DARwIn-OP miniature humanoid

robot and the THOR-RD full-sized humanoid robot.

DARwIn-OP THOR-RD Ratio

Total height (m) 0.454 1.54 3.4

COM height (m) 0.295 0.70 2.37Foot length (m) 0.104 0.260 2.5

Foot width (m) 0.66 0.160 2.42

Leg link length (m) 0.186 0.600 3.22

Weight (kg) 2.8 58 20.7Max torque (Nm) 2.5 44.2 17.68

Foot length/COM height ratio 0.35 0.37 1.06

Leg/foot length ratio 1.78 2.30 1.29

Natural pendulum period (s) 0.17 0.27 1.59

Max COM acceleration (m/s2) 3.03 1.09 0.36Max torque/mass ratio 0.89 0.76 0.85

(a) (b)

Fig. 15. The servo platform to generate controlled perturbation. (a) The ankle strategy alone cannot

withstand the perturbation generated by the moving platform. (b) The robot can withstand the same

magnitude of perturbation with the hip strategy.


1650011-21

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

accelerations greater than 0.5 g while carrying the robot, providing large enough

perturbations to make the robot fall without stabilization.

5.2. Empirical decision boundary with physical robot

We applied various magnitudes of perturbations to the robot from the front, back,

and one side while running the ankle strategy controller and measured the inertial

sensor readings for one second to generate state trajectories of robot. Figure 16 shows

the state trajectories in the frontal and lateral axis, which are ¯ltered with a moving

average ¯lter with n ¼ 3. For the frontal pushes, we can see the trajectory plot shown

in top part of Fig. 16(a) closely follows the graph acquired using simulated multi-

body model in Fig. 11(b), showing an almost linear boundary between stable and

unstable trajectories, while the slope is quite di®erent from theoretical one from

LIPM shown in gray shade. However, for backward pushes, we see the shape of

boundary is nonlinear at the initial part of the trajectory. This is due to mechanical

backlash of the joint, and it is only noticeable for backward pushes as the robot leans

slightly to the front with the default standing pose, eliminating the e®ect of backlash

for frontal pushes. From the sets of state trajectories, we obtain the linear boundaries

with estimated parameters � and z0 shown in Table 3.

5.3. Testing the push recovery controller

After estimating the boundary values shown in Table 3, we test the hierarchical push

recovery controller against perturbations in realistic setting. Figure 17(a) shows the

experimental setup. At each test, the pendulum starts swinging from stationary

(a) (b)

Fig. 16. Phase space trajectory plot acquired from frontal and lateral push experiment with the DARwIn-

OP robot. (a) Frontal push. (b) Lateral push.

S.-J. Yi et al.

1650011-22

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

state, where the initial position is determined experimentally so that the perturba-

tion is large enough to knock down the standing robot without any stabilization. We

use the pendulum mass 500 g and length 75 cm, and the swing angle of 30� and 45�

for frontal and lateral trials, which translate into 1.35Ns and 1.61Ns of perturba-

tions respectively. For each of three di®erent push recovery settings, we have per-

formed ¯ve total trials to get the standard deviation, where each trial consists of 20

tests. Figure 17(b) shows the comparison of three stabilization methods. We can see

that due to a number of causes such as the battery depletion, slight impact position

di®erence and temperature buildup at the actuator, there are some deviation of the

results, but still our controller signi¯cantly helps the robot to withstand large dis-

turbances. Interestingly, we have found that the physical robot can withstand larger

perturbations than simulated one in Fig. 13, probably due to longer impact duration

with physical setup.

Then we let the robot walk with nonzero speed, and applied disturbances using a

soft tipped stick to the robot to see how the walk controller handles the reactive

stepping while locomotion. Figure 18 shows some examples of robot response against

external disturbances. We see that the suggested controller can successfully trigger

appropriate push recovery behaviors during locomotion to keep the robot from

falling down.

(a) (b)

Fig. 17. The comparison of push recovery controller performances using the DARwIn-OP robot. (a) The

experimental setup. (b) Test results.

Table 3. Parameter values estimated

from the DARwIn-OP robot.

Parameter �þ z þ0 �� z �

0

Frontal 1.3 2.7 �1.1 2.7

Lateral 1.4 2.7 �1.4 2.7


1650011-23

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

(a)

(b)

(c)

(d)

Fig. 18. Responses of the push recovery controller against perturbation while walking. (a) The ankle and

step strategies while walking forward at 18 cm/s. (b) The ankle and step strategies while walking backward

at 12 cm/s. (c) The ankle and hip strategies while walking forward at 18 cm/s. (d) The ankle and hipstrategies while turning at 0.6 rad/s.

S.-J. Yi et al.

1650011-24

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

6. Conclusions

We have demonstrated an integrated controller that enables full-body push recovery

for humanoid robots without specialized sensors and actuators. Three low-level

biomechanically motivated push recovery strategies are implemented on a position

controlled humanoid robot, and integrated with a ZMP-based walk controller that

allows reactive stepping. Instead of relying on inaccurate theoretical decision sur-

faces, we propose to use a low-dimensional empirical decision surface for a hierar-

chical controller that is learned from repeated trials both in a simulated environment

using a multi-body model with proportional control joints, and in a real environment

using a servo-controlled moving platform and DARwIn-OP small humanoid robot.

Experimental results show that the trained controller can successfully initiate a full

body push recovery behavior under external perturbations. Potential future work

includes incorporating more sophisticated learning algorithms to better utilize the

limited training data, and implementing these algorithms on full-sized humanoid

robots.

Acknowledgments

We acknowledge the support of the NSF PIRE program under contract OISE-

0730206, and ONR SAFFIR program under contract N00014-11-1-0074.

References

1. A. G. Hofmann, Robust Execution of Bipedal Walking Tasks from BiomechanicalPrinciples, Ph.D. Thesis, Computer Science Department (Massachusetts Institute ofTechnology, Cambridge, MA, USA, 2006), 407 pp.

2. S. Kajita and K. Tani, Study of dynamic biped locomotion on rugged terrain, in IEEE Int.Conf. Robotics and Automation (Sacramento, CA, 1991), pp. 1405–1411.

3. S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada and K. Yokoi, Biped walkingpattern generation by using preview control of zero-moment point, in IEEE Int. Conf.Robotics and Automation (2003), pp. 1620–1626.

4. K. Harada, S. Kajita, K. Kaneko and H. Hirukawa, An analytical method on real-timegait planning for a humanoid robot, in IEEE–RAS Int. Conf. Humanoid Robots, Vol. 2(2004), pp. 640–655.

5. M. Morisawa, K. Harada, S. Kajita, K. Kaneko, F. Kanehiro, K. Fujiwara, S. Nakaokaand H. Hirukawa, A biped pattern generation allowing immediate modi¯cation of footplacement in real-time, in IEEE–RAS Int. Conf. Humanoid Robots (2006), pp. 581–586.

6. R. Tajima, D. Honda and K. Suga, Fast running experiments involving a humanoidrobot, in IEEE Int. Conf. Robotics and Automation (Piscataway, NJ, USA, 2009),pp. 1418–1423.

7. J. Pratt, J. Car® and S. Drakunov, Capture point: A step toward humanoidpush recovery, in 6th IEEE–RAS Int. Conf. Humanoid Robots (2006), pp. 200–207.

8. B. Stephens, Humanoid push recovery, in IEEE–RAS Int. Conf. Humanoid Robots(2007).

9. B. Jalgha, D. C. Asmar and I. Elhajj, A hybrid ankle/hip pre-emptive falling scheme forhumanoid robots, in IEEE Int. Conf. Robotics and Automation (2011), pp. 1256–1262.


1650011-25

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

10. B.-K. Cho, S.-S. Park and J.-H. Oh, Stabilization of a hopping humanoid robot for apush, in IEEE–RAS Int. Conf. Humanoid Robots (2010), pp. 60–65.

11. M. Morisawa, F. Kanehiro, K. Kaneko, N. Mansard, J. Sol, E. Yoshida, K. Yokoi andJ.-P. Laumond, Combining suppression of the disturbance and reactive stepping forrecovering balance, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IEEE, 2010),pp. 3150–3156.

12. K. Hirai, M. Hirose, Y. Haikawa and T. Takenaka, The development of Honda humanoidrobot, in IEEE Int. Conf. Robotics and Automation, Vol. 2 (IEEE, 1998), pp. 1321–1326.

13. I.-W. Park, J.-Y. Kim, J. Lee and J.-H. Oh, Mechanical design of humanoid robotplatform khr-3 (kaist humanoid robot 3: Hubo), in IEEE–RAS Int. Conf. HumanoidRobots (2005), pp. 321–326.

14. S. Kajita, T. Nagasaki, K. Kaneko, K. Yokoi and K. Tanie, A running controllerof humanoid biped hrp-2lr, in IEEE Int. Conf. Robotics and Automation (2005),pp. 616–622.

15. T. Buschmann, S. Lohmeier and H. Ulbrich, Humanoid robot Lola: Design and walkingcontrol, J. Physiology-Paris, 103(3–5) (2009) 141–148.

16. B.-K. Cho, J.-H. Kim and J.-H. Oh, Online balance controllers for a hopping and runninghumanoid robot, Adv. Robot. 25 (9–10) (2011) 1209–1225.

17. S. Kajita, M. Morisawa, K. Miura, S. Nakaoka, K. Harada, K. Kaneko, F. Kanehiro andK. Yokoi, Biped walking stabilization based on linear inverted pendulum tracking, inIEEE/RSJ Int. Conf. Intelligent Robots and Systems (IEEE, 2010), pp. 4489–4496.

18. D. Gouaillier, C. Collette and C. Kilner, Omni-directional closed-loop walk for NAO, inIEEE–RAS Int. Conf. Humanoid Robots (2010), pp. 448–454.

19. I. Ha, Y. Tamura and H. Asama, Gait pattern generation and stabilization for humanoidrobot based on coupled oscillators, in IEEE/RSJ Int. Conf. Intelligent Robots and Sys-tems (2011), pp. 3207–3212.

20. V. Prahlad, D. Goswami and M.-H. Chia, Disturbance rejection by online ZMP com-pensation, Robotica, 26 (2008) 9–17.

21. C. Graf and T. R€ofer, A closed-loop 3D-LIPM gait for the Robocup standard platformleague humanoid, in Fourth Workshop on Humanoid Soccer Robots (2010), pp. 18–22.

22. M. Abdallah and A. Goswami, A biomechanically motivated two-phase strategy forbiped upright balance control, in IEEE Int. Conf. Robotics and Automation (2005),pp. 2008–2013.

23. S.-H. Lee and A. Goswami, Ground reaction force control at each foot: A momentum-based humanoid balance controller for non-level and non-stationary ground, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (2010), pp. 3157–3162.

24. B. Stephens, Push Recovery Control for Force-Controlled Humanoid Robots, Ph.D.Thesis, (Pittsburgh, PA, USA, 2011), 180 pp.

25. H. Diedam, D. Dimitrov, P.-B. Wieber, K. Mombaur and M. Diehl, Online walking gaitgeneration with adaptive foot positioning through linear model predictive control, inIEEE/RSJ Int. Conf. Intelligent Robots and Systems (2008), pp. 1121–1126.

26. B. Stephens and C. Atkeson, Modeling and control of periodic humanoid balance usingthe linear biped model, in IEEE–RAS Int. Conf. Humanoid Robots (2009), pp. 379–384.

27. D. L. Wight, E. G. Kubica and D. W. L. Wang, Introduction of the foot placementestimator: A dynamic measure of balance for bipedal robotics, J. Comput. NonlinearDynam. 3(1) (2008) 011009.

28. S.-K. Yun and A. Goswami, Momentum-based reactive stepping controller on level andnon-level ground for humanoid robot push recovery, in IEEE/RSJ Int. Conf. IntelligentRobots and Systems (IEEE, 2011), pp. 3943–3950.

S.-J. Yi et al.

1650011-26

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

29. T. Sugihara, Standing stabilizability and stepping maneuver in planar bipedalism basedon the best COM-ZMP regulator, in Proc. 2009 IEEE Int. Conf. Robotics and Auto-mation (ICRA'09) (2009), pp. 669–674.

30. K. Nishiwaki and S. Kagami, High frequency walking pattern generation based on pre-view control of ZMP, in IEEE Int. Conf. Robotics and Automation (2006), pp. 2667–2672.

31. M. Missura and S. Behnke, Lateral capture steps for bipedal walking, in IEEE–RAS Int.Conf. Humanoid Robots (2011), pp. 401–408.

32. M. Missura and S. Benke, Omnidirectional capture steps for bipedal walking, in IEEE Int.Conf. Humanoid Robots (2013), pp. 14–20.

33. S.-J. Yi, B.-T. Zhang, D. Hong and D. D. Lee, Learning full body push recovery controlfor small humanoid robots, in IEEE Int. Conf. Robotics and Automation (2011),pp. 2047–2052.

34. S.-J. Yi, B.-T. Zhang, D. Hong and D. D. Lee, Online learning of a full body push recoverycontroller for omnidirectional walking, in IEEE–RAS Int. Conf. Humanoid Robots (2011),pp. 1–6.

35. R. Renner and S. Behnke, Instability detection and fall avoidance for a humanoid usingattitude sensors and re°exes, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems(2006), pp. 2967–2973.

36. D. N. Nenchev and A. Nishio, Ankle and hip strategies for balance recovery of a bipedsubjected to an impact, Robotica 26(5) (2008) 643–653.

37. B. Jalgha and D. Asmar, A simple momentum controller for humanoid push recovery, inAdvances in Robotics, Vol. 5744 (Springer, Berlin, 2009). Lecture Notes in ComputerScience, pp. 95–102.

38. B. Hengst, M. Lange and B. White, Learning ankle-tilt and foot-placement control for°at-footed bipedal balancing and walking, in IEEE–RAS Int. Conf. Humanoid Robots(2011), pp. 288–293.

39. O. Michel, Webots: Professional mobile robot simulation, J. Adv. Robot. Syst. 1(1) (2004)39–42.

40. S. G. McGill, J. Brindza, S.-J. Yi and D. D. Lee, Uni¯ed humanoid robotics softwareplatform, in 5th Workshop on Humanoid Soccer Robots (2010), pp. 7–11.

41. S.-J. Yi, S. G. McGill, L. Vadakedathu, Q. He, I. Ha, J. Han, H. Song, M. Rouleau, B.-T.Zhang, D. Hong, M. Yim and D. D. Lee, Team THOR's entry in the DARPA roboticschallenge trials 2013, J. Field Robot. 32(3) (2014) 315–335.

42. S.-G. McGill, S. Yi and D. D. Lee, Team THOR's adaptive autonomy for disaster re-sponse humanoids, in IEEE Int. Conf. Humanoid Robots (2015), pp. 453–460.

Seung-Joon Yi received the B.Sc. degree from the School of

Electrical Engineering and the Ph.D. degree from the School of

Computer Science and Engineering, Seoul National University,

Seoul, Korea, in 2000 and 2013, respectively. He is currently a

Postdoctoral Fellow at the GRASP Laboratory, University of

Pennsylvania, where he has also worked as a Visiting Scholar

from 2009–2013. He is the author of over 20 technical publica-

tions, proceedings, editorials and books. He has been the main

developer of the University of Pennsylvania RoboCup robotic soccer team and the

DARPA Robotics Challenge team. His research interests include reinforcement

learning and humanoid robotics.


1650011-27

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

Byoung-Tak Zhang received the B.Sc. and M.Sc. degrees in

Computer Science and Engineering from Seoul National Univer-

sity, Seoul, Korea, in 1986 and 1988, respectively, and the Ph.D.

degree in Computer Science from the University of Bonn, Bonn,

Germany, in 1992. He is currently a Professor with the School of

Computer Science and Engineering and the Graduate Programs

in Bioinformatics, Brain Science and Cognitive Science, SNU,

and directs the Biointelligence Laboratory and the Center for

Bioinformation Technology. Prior to joining SNU, he was a Research Associate with

the German National Research Center for Information Technology (GMD) from

1992–1995. From August 2003 to August 2004, he was a Visiting Professor with the

Computer Science and Arti¯cial Intelligence Laboratory (CSAIL), MIT, Cambridge.

His research interests include probabilistic models of learning and evolution, bio-

molecular/DNA computing, and molecular learning/evolvable machines.

Dennis Hong is an Associate Professor and the Founding Di-

rector of Robotics and Mechanisms Laboratory RoMeLa of the

Mechanical Engineering Department at Virginia Tech. His re-

search focuses on robot locomotion and manipulation, autono-

mous vehicles and humanoid robots. His past awards include the

NSF CAREER, the SAE Ralph R. Teetor Award, the ASME

Freudenstein/GM Young Investigator Award, and has been

named to Popular Science's \Brilliant 10" to name a few. As the

inventor of a number of novel robots and mechanisms, Washington Post magazine

called Dr. Hong \the Leonardo da Vinci of robots." He received his degrees in Me-

chanical Engineering; B.Sc. from the University of Wisconsin Madison (1994), M.Sc.

and Ph.D. degrees from Purdue University (1999, 2002).

Daniel D. Lee is currently a Professor in the School of Engi-

neering and Applied Science at the University of Pennsylvania.

He studied Physics, receiving his A.B. from Harvard in 1990, and

his Ph.D. in Condensed Matter Physics from MIT in 1995. After

completing his studies, he joined Bell Labs, the research and

development arm of Lucent Technologies, where he was a

Researcher in the Theoretical Physics and Biological Computa-

tion departments. After six years in industrial research, he joined

the faculty at Penn in 2001 where he is currently in the Electrical and Systems

Engineering Department and at the GRASP Robotics Laboratory. His research

interests include machine learning, robotics and computational neuroscience.

S.-J. Yi et al.

1650011-28

Int.

J. H

uman

. Rob

ot. 2

016.

13. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by S

EO

UL

NA

TIO

NA

L U

NIV

ER

SIT

Y o

n 06

/18/

16. F

or p

erso

nal u

se o

nly.

whole-body balancing walk controller for position ... · humanoid robots equipped with force/torque...

Documents