whole-body balancing walk controller for position ... · humanoid robots equipped with force/torque...
TRANSCRIPT
Whole-Body Balancing Walk Controller
for Position Controlled Humanoid Robots
Seung-Joon Yi
GRASP Laboratory, University of Pennsylvania,
Philadelphia PA 19104, USA
Byoung-Tak Zhang
BI Laboratory, Seoul National University, Seoul, Korea
Dennis Hong
RoMeLa Laboratory, University of California,Los Angeles CA 90095, USA
Daniel D. Lee
GRASP Laboratory,University of Pennsylvania,
Philadelphia PA 19104, USA
Received 25 May 2015
Accepted 13 January 2016Published 17 March 2016
Bipedal humanoid robots are intrinsically unstable against unforeseen perturbations. Conven-
tional zero moment point (ZMP)-based locomotion algorithms can reject perturbations by
incorporating sensory feedback, but they are less e®ective than the dynamic full body behaviors
humans exhibit when pushed. Recently, a number of biomechanically motivated push recoverybehaviors have been proposed that can handle larger perturbations. However, these methods are
based upon simpli¯ed and transparent dynamics of the robot, which makes it suboptimal to
implement on common humanoid robots with local position-based controllers. To address thisissue, we propose a hierarchical control architecture. Three low-level push recovery controllers
are implemented for position controlled humanoid robots that replicate human recovery
behaviors. These low-level controllers are integrated with a ZMP-based walk controller that is
capable of generating reactive step motions. The high-level controller constructs empirical de-cision boundaries to choose the appropriate behavior based upon trajectory information gath-
ered during experimental trials. Our approach is evaluated in physically realistic simulations
and on a commercially available small humanoid robot.
Keywords: Position controlled humanoid robot; biomechanically motivated push recovery;
low-dimensional policy; online learning.
International Journal of Humanoid Robotics
Vol. 13, No. 1 (2016) 1650011 (28 pages)
°c World Scienti¯c Publishing Company
DOI: 10.1142/S0219843616500110
1650011-1
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
1. Introduction
Due to their small footprint and high center of mass (COM), bipedal humanoid
robots are prone to lose balance with uneven °oors, robot modeling errors, or im-
precise actuators. Thus, active stabilization of humanoid robots has been an im-
portant topic in robotics research. Biomechanical studies of human walking and
balancing behavior showed that humans use three basic balance control strategies,
denoted ankle, hip and step strategies which are illustrated in Figs. 1(a)–1(c).1 The
ankle strategy controls torque at the ankle joint, the hip strategy uses the angular
acceleration of the torso and free limbs to apply counteractive ground reaction force
(GRF), and the step strategy changes the base of support to a new position. All three
strategies seek to control the horizontal position of the system's COM by changing
the horizontal component of the GRF.
The conventional approach for bipedal locomotion control is zero moment point
(ZMP)-based control algorithms based upon the linear inverted pendulum model
(LIPM).2 The reference ZMP trajectory is typically designed in advance according to
footstep locations, then the torso and foot trajectories are calculated based on the
reference ZMP using the LIPM.3 Stabilization is accomplished by measuring state
error and feedback control to track the reference ZMP, which updates the COM
trajectory and generates an inertial force resulting in an e®ective control torque at
the ankle joints, as shown in Fig. 1(d). The closed-loop ZMP tracking approaches are
usually con¯ned to the ankle strategy, as reactive stepping requires online modi¯-
cation of the ZMP trajectory. However, there has recently been some work on si-
multaneously generating COM and ZMP trajectories in real time to enable the step
strategy.4–6
The main advantage of ZMP tracking-based approaches is that they can easily be
integrated in existing walk controllers, and they have been successfully incorporated
Fig. 1. A comparison of three biomechanically motivated push recovery approaches and the ZMP
tracking approach. (a) Ankle strategy. (b) Hip strategy. (c) Step strategy. (d) ZMP tracking approach.
S.-J. Yi et al.
1650011-2
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
on many humanoid robot platforms. However, they usually require fast online
computation, a precise dynamic model of the robot, and accurate estimation of the
current dynamic state, which makes it harder to use on resource constrained robots
with restricted actuation, sensing and processing capabilities. The ankle strategy
alone has limited e®ectiveness against strong perturbation. The step strategy can be
used after large perturbations, but it is not always physically feasible due to step
timing or foot con¯guration. Figure 2 shows an example where the ZMP tracking-
based ankle/step controller fails to stabilize the robot.
On the other hand, an active line of research has focused on the theoretical
analysis of biomechanically motivated push recovery controllers using an abstract
model of the robot. These models include ankle control torque for the ankle
strategy, °ywheel body and hip control torque for the hip strategy, and secondary
support point for the step strategy.7–9 Such approaches result in very simple
analytical controllers that can reject stronger perturbations as they utilize angular
momentum degrees of freedom. However, the biggest drawback of these approa-
ches is that most of them assume simpli¯ed and transparent dynamics of the
robot, which is often hard to realize as most of the humanoid robots currently
available has highly distributed mass and local position-based controllers with
high feedback gain.
Our aim is to get the best of both worlds, devising an integrated walk controller
that can exhibit the full range of biomechanically inspired behaviors to respond to
external perturbations. We take a hybrid approach where walking is governed by a
ZMP-based walk controller, and large perturbations trigger biomechanically moti-
vated simple push recovery controllers. First, we design a simple ZMP-based walking
controller that simultaneously plans the ZMP and COG trajectories in real time for
reactive stepping. To incorporate the biomechanically motivated push recovery
controllers, we utilize a hierarchical architecture which consists of low-level con-
trollers that governs each biomechanically motivated push recovery behavior with a
high-level controller that switches each low-level controller based on the current
state of the robot. Instead of relying upon the accuracy of the theoretical model
(a) (b) (c)
Fig. 2. Comparison of the ZMP tracking approach and the biomechanical push recovery approach under
lateral pushes during walking. An impulsive lateral force for 0.01 s is applied to the COM of the robot atthe middle of the single support phase. Note that the step strategy is not possible for this case due to
kinematic constraints. (a) The ZMP tracking approach, 0.9Ns of lateral push. (b) The ZMP tracking
approach, 1.2Ns of lateral push. (c) The ankle and hip strategies, 1.2Ns of lateral push.
Whole-Body Balancing Walk Controller
1650011-3
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
parameters,10,11 we use the empirical decision boundaries between the controller that
are learned from experience.
The main contribution of this work is twofold: from the theoretical point of view,
we show that the physical humanoid robot has similarly-shaped but quantitatively
di®erent stability regions from those derived by theoretical models of varying sim-
plicity. In terms of implementation, we propose an integrated system that e®ectively
combines three push recovery behaviors and a walk controller to enable a humanoid
robot to perform push recovery behaviors while walking. We demonstrate how this
controller is learned from experience and evaluate its performance on a small hu-
manoid robot.
The remainder of the paper is organized as follows. Section 2 reviews three bio-
mechanically motivated push recovery controllers and their implementations on
position controlled humanoid robots. Section 3 explains the step-based omnidirec-
tional walk controller which can perform reactive stepping for push recovery control
during walking. Section 4 shows how to learn the high-level controller from repeated
trials in a simulated environment, and Sec. 5 shows the experimental results using
the DARwIn-OP humanoid robot. Finally, we conclude with a discussion of out-
standing issues and potential future directions arising from this work.
2. Biomechanically Motivated Push Recovery Controllers
for Position Controlled Robots
Biomechanical studies show that humans display three distinctive motion patterns in
response to sudden external perturbations, which we denote as ankle, hip and step
push recovery strategies.1 The ankle strategy applies control torque at the ankle
joint, the hip strategy uses the angular acceleration of torso and free limbs to apply
counteractive GRF, and ¯nally the step strategy changes the base of support to a
new position. For each push recovery strategy, we ¯rst review the basic push re-
covery controllers for the simpli¯ed model, and then explain how we implement the
behaviors of such controllers on resource constrained humanoid robots which lack
force/torque control and only provide position-based control with high proportional
gain. Finally, we explain how we handle possible issues with those controllers when
the robot is moving.
2.1. Ankle push recovery
The ankle strategy applies control torque on the ankle joints to keep the COM
within the base of support. It is widely adopted in the form of the closed-loop ZMP
tracking, and this approach is successfully implemented on a number of full-sized
humanoid robots equipped with force/torque sensors at the ankles.3,12–16 Further-
more, it was recently shown that the approach is robust enough to make a full-sized
humanoid robot walk on a public street with unknown surface inclinations and
unevenness.17 It also has been widely implemented on small humanoid robots and
S.-J. Yi et al.
1650011-4
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
two current commercially available small humanoid robots, Naoa and DARwIn-
OP,b are provided with walk controllers using the ankle strategy for stabiliza-
tion.18,19 And there have also been other closed-loop walk control implementations
utilizing the ankle strategy on small humanoid robots.20,21
We ¯rst examine the abstract model in Fig. 3(a), where ankle torque �ankle is
applied to a LIPM with massm, COM height z0 and COM horizontal position x from
current support point. The resulting linearized dynamic model is
€x ¼ !2ðx � �ankle=mgÞ; ð1Þ
where ! ¼ ffiffiffiffiffiffiffiffiffig=z0
pand g is the gravitational constant. If we assume a reference
trajectory xref which satis¯es the LIPM without additional ankle torque
€xref ¼ !2xref ; ð2Þ
then the state error xerr ¼ x � xref follows the same dynamic model as (1):
€xerr ¼ !2ðxerr � �ankle=mgÞ; ð3Þwhich can be controlled by PD control on xerr:
�ankle ¼ Kpxerr þKd _xerr; ð4Þwhere Kp and Kd are control gains. This requires torque control of ankle actuators,
but in practice it can be approximated for position controlled actuators with pro-
portional control by directly setting the target angle of the ankle actuator
��ankle ¼ K 0pxerr þK 0
d _xerr; ð5Þ
ahttp://www.aldebaran-robotics.com/.bhttp://www.robotis.com/xe/darwin en.
(a) (b)
Fig. 3. The ankle strategy that applies control torque on ankle joints. (a) The abstract model for theankle strategy. (b) The ankle strategy implemented on DARwIn-OP humanoid robot.
Whole-Body Balancing Walk Controller
1650011-5
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
where��ankle is the target ankle angle bias.10,19 In addition to the ankle joints, we use
the same control law to modulate arm position to apply additional e®ective torque at
the ankles in a similar way, unless overridden by the hip controller.
When the robot is walking, we only apply ankle bias to the current support foot
during the middle phase of single support to prevent the ankle strategy from setting
nonzero ankle bias for the foot currently in air, which can result in premature
landing. We use a trapezoid function f ð�Þ to make a smooth transition at landing
and takeo®
��ankle ¼ f ð�singleÞðK 0pxerr þK 0
d _xerrÞ; ð6Þwhere 0 � �single < 1 is the single support phase and f ð�Þ is following function
f ð�Þ ¼�=�lift 0 � � < �lift;
1 �lift � � < �land
ð1� �Þ=ð1� �landÞ �land � � < 1;
8<: ; ð7Þ
where �lift and �land are timing parameters. Figure 3(b) shows the ankle strategy
controller implemented on the DARwIn-OP small humanoid robot.
2.2. Hip push recovery
The hip strategy uses angular acceleration of the torso and limbs to generate a
backward GRF to pull the COM back towards the base of support. A two-phase in
the hip strategy for a humanoid has been suggested which uses angular acceleration
to absorb the disturbance in the re°ex phase and return to initial pose in the recovery
phase.22 An extended LIPM with angular momentum was used to derive analytic
control laws for the hip and the step strategy, and the concept of capture point was
suggested as the calculated stepping position for the step strategy.7 This approach is
further extended by using a simpli¯ed model that results in analytic decision surfaces
for push recovery strategies as functions of the state of the robot.8,9 These approa-
ches are extended to control GRF and ZMP at each foot using angular momentum
and showed it can balance a 3D full-body model of a humanoid robot in a simulated
environment for nonlevel and nonstationary ground.23 The hip strategy for a sta-
tionary robot has been also implemented on a full-sized, torque controlled humanoid
robot.24
The abstract model in Fig. 4(a) includes a °ywheel with mass m, COM height z0and rotational inertia I , and control torque �hip applied at the center of the °ywheel.
The resulting linearized dynamic model is then:
€x ¼ !2ðx � �hip=mgÞ; ð8Þ�::hip ¼ �hip=I : ð9Þ
However, the °ywheel should not exceed joint limits. In this case, the following
bang–bang pro¯le can be used for applying hip torque to maximize the e®ect while
S.-J. Yi et al.
1650011-6
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
satisfying the joint angle constraint,7
�hipðtÞ ¼� MAXhip 0 � t < TH1;
�� MAXhip TH1 � t < 2TH1;
(ð10Þ
where � MAXhip is the maximum torque that the can be applied on torso and TH1 is the
time the torso stops accelerating. This torque pro¯le angularly accelerates the torso
with maximum torque and then decelerates with maximum negative torque, making
it stop at angle �MAXhip . This behavior can be approximately implemented with high
gain position controlled actuators by directly setting the hip target angle bias
��TARGEThip to �MAX
hip , which makes the torso accelerate with the maximum torque and
stops at that position with nearly maximum deceleration. After t ¼ 2TH1, the hip
angle bias should return to zero.22 This two-phase behavior can be simply imple-
mented as
��TARGEThip ¼
�MAXhip 0 � t < 2TH1;
�MAXhip
2TH1 þ TH2 � t
TH2
2TH1 � t < 2TH1 þ TH2;
8>><>>: ð11Þ
where TH2 is the duration of the returning phase. The same controller is used for arm
angles to apply additional GRF from the angular momentum of the limbs as well.
When the robot is pushed hard during walking, the robot may lift its currently
tipped foot, which can instantly destabilize the robot. To prevent this, when the hip
strategy is initiated, we shorten the single support phase and extend the double
support phase until the hip strategy is completed and the robot stands stably on two
feet. Figure 4(b) shows the hip strategy controller implemented on the DARwIn-OP
small humanoid robot.
(a) (b)
Fig. 4. The hip strategy uses angular acceleration of torso and limbs to apply counteractive GRF. (a) The
abstract model for the hip strategy. (b) The hip strategy implemented on DARwIn-OP humanoid robot.
Whole-Body Balancing Walk Controller
1650011-7
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
2.3. Step push recovery
When the magnitude of the disturbance exceeds the capability of the other two
push recovery controllers, the step controller can be used to move the base of
support towards the direction of the push by taking a step. If we assume that the
push is done while robot is in single support phase, this strategy can be imple-
mented in a straightforward manner by changing the landing position of currently
lifted foot towards the direction of perturbation. This step strategy has been
implemented on various full-sized humanoid robots, including HRP-225,11 and
Sarcos robot26 while walking, and Hubo10 and Toyota partner robot6 while hop-
ping in place. There have been some analytical studies about where the robot
should step assuming simpli¯ed models, including the capture point,7 foot place-
ment estimator27 and generalized foot placement estimator28 approaches. They all
share the inverted pendulum model shown in Fig. 5(a), which models the step
strategy as three stages including initial single support stage from initial condition,
support point transition stage, and ¯nal single support stage to stable state. Their
main di®erence is how they model each stage. A LIPM is used for all three stages,
and the support point transition is assumed to occur instantaneously preserving
linear momentum, which results in the following landing position from initial
support point7:
xcapture ¼ _x=!þ x: ð12ÞIn reality, we cannot instantly change the support point, and landing impacts
reduce the linear momentum. In Ref. 26, an inverted pendulum model with ¯xed
leg length z0 and pendulum tilt angle � is used for the ¯rst and second stages, and a
LIPM with body height z0 is used for the third stage. Landing is modeled as an
impulse force along the landing leg, which makes the vertical velocity descend to
zero. In Refs. 28 and 10, an inverted pendulum model with leg length l and an
angular momentum conserving impact model for transition are used. Those models
do not admit a closed form solution in general, but an approximate solution is
(a) (b)
Fig. 5. The step strategy changes the support point by stepping. (a) The abstract model for the step
strategy. (b) The step strategy implemented on the DARwIn-OP humanoid robot.
S.-J. Yi et al.
1650011-8
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
provided in Ref. 10 as:
xcapture ¼ 2 cosða=2Þ; ð13Þ
a ¼ 2 cos�1 1�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðl�:2=2þ cos �� 1Þ=8
q� �: ð14Þ
One practical issue for a physical implementation of the step strategy is the
landing shock. As the step strategy is meant to be used with large perturbation, it
can lead to a hard landing that can make the robot bounce back and fall down.
There have been approaches to handle this by incorporating mechanical or elec-
trical compliance, and we use a simpler approach of lowering the proportional gain
for the swing leg at the later part of stepping. Figure 5(b) shows the step strategy
implemented on the DARwIn-OP robot.
We should also consider that the step strategy may not be always possible for
walking humanoid robot due to kinematic and timing constraints. Most humanoid
robots cannot cross their legs due to kinematics constraints, and the amount by
which the robot can change the landing position of the currently lifted foot decreases
over time due to velocity constraints. Also, if the robot is pushed when the robot is in
double support or is about to land its foot, it needs to take a new step for push
recovery. In this case, we have to determine which foot the robot should use for
stepping, as lifting the foot with the current support edge will result in the robot
instantly falling. Figure 6 shows three possible stepping cases according to the di-
rection of perturbation from the same foot stance. The support foot for capture step
can be determined based on the angle between the two feet and the perturbation
vector as shown in Figs. 6(a) and 6(b). For cases like Fig. 6(c), the step strategy is
not available due to a kinematic constraint.
(a) (b) (c)
Fig. 6. Determining step foot based upon the direction of perturbation.
Whole-Body Balancing Walk Controller
1650011-9
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
2.4. The high-level push recovery controller
We have explained three biomechanically motivated push recovery controllers
and their implementations for walking in a position controlled humanoid robot.
When pushed, humans perform a combination of push recovery behaviors
according to the particular situation. To select the appropriate set of push re-
covery behaviors as humans do, we use a hierarchical controller shown in Fig. 7,
where ankle, hip and step push recovery controllers work as low-level sub-
controllers and the high-level push recovery controller triggers each according to
the direction and the amount of the external disturbance estimated using the
onboard sensors.
For abstract models we have seen in Figs. 3–5, there have been analytic studies for
decision boundaries of each controller.8,29 If we assume maximum ankle torque as
� ankleMAX, then the stability region for ankle push recovery controller, a region of state
space the system can be stabilized, can be derived as
_x=!þ xj j < � MAXankle=mg ð15Þ
and following stability region for the hip strategy plus the ankle strategy
_x=!þ xj j < ð� MAXankle þ � MAX
hip ðe!TH1 � 1Þ2Þ=mg: ð16Þ
Finally, if we assume instantaneous support point transition without loss of linear
momentum, we have the following stability region for using all three strategies
at once:
_x=!þ xj j < ð� MAXankle þ � MAX
hip ðe!TH1 � 1Þ2Þ=mg þ xMAXcapture; ð17Þ
where xMAXcapture is the maximum step size available. In this case we can use two
boundary conditions in (15) and (16) to select between controllers based on current
state. For the more realistic case with a multi-segmented body with motor dynamics
as on a physical robot, these theoretical boundaries do not ¯t well and the high-level
controller needs to be trained from experience. This is covered in more detail later in
this paper.
Fig. 7. The hierarchical control structure for push recovery.
S.-J. Yi et al.
1650011-10
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
3. Integration with Walk Controller
As we have seen before, the step strategy requires reactive modi¯cation of the
stepping sequence and the foot trajectory, as shown in Fig. 8. However reactive
stepping is generally not possible with typical ZMP tracking approaches where the
reference ZMP trajectory is calculated in advance and the COM trajectory is gen-
erated to minimize the ZMP error.
Recently, there have been approaches to generate walking patterns online to
overcome this limitation, including a ZMP preview-based algorithm that updates
the COM trajectory at a high frequency,30 a real-time gait planning method based on
the analytic solution of the LIPM with a parametrized ZMP trajectory.4,5 Step
push recovery based on these approaches have been successfully implemented on the
HRP-2 robot25,11 and the Toyota partner robot.6
Another method for real-time walk pattern generation is the biologically inspired,
central-pattern-generator-based approach. This approach has been implemented on
the Hubo robot and demonstrated step push recovery behavior while hopping in
place.10 Due to its simplicity, this approach has been widely used for small, resource
constrained humanoid robots,19,31,32 but it is generally harder to design a stable
trajectory as it is not based on a explicit stability criterion.
Our walk controller is based on the analytic solution of the LIPM, but further
simpli¯ed to be implemented on resource constrained robots. The walk pattern is
divided into discrete steps, and the overall walk control is separated into a footstep
generation controller and trajectory controller. The footstep generation controller
generates the parameters for the next step, including the initial and ¯nal position of
each foot and support foot information, and generates the reference ZMP trajectory
based on them. The trajectory controller generates foot and torso trajectories for the
current step based on those parameters. We describe more details of our walk con-
troller in following subsections.
3.1. Footstep generation controller
Our ¯rst assumption is that walking is divided into discrete steps, which start and
end with a double support phase. Then we can de¯ne the ith step as a set of
(a) (b)
Fig. 8. Two di®erent cases of reactive stepping. (a) The inter-step override which uses the same support
foot and updates the foot trajectory for the next step. (b) The intra-step override which updates the
current foot trajectory during stepping.
Whole-Body Balancing Walk Controller
1650011-11
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
parameters
STEPi ¼ SFi;Li;Ci;Ri;L1þ1;Ciþ1;Riþ1f g; ð18Þwhere SFi denotes the support foot, Li;Ci;Ri and Liþ1;Ciþ1;Riþ1 are the initial and
¯nal 2D poses of left foot, torso and right foot in ðx; y; �Þ coordinate. The landing footpose is calculated from the current foot con¯guration, commanded walk velocity and
kinematic and self-collision constraints. To make the step transition occur at the
most stable posture, we set the boundary torso pose Ci to be the midpoint of Li and
Ri for all i. When an inter-step override is required as in Fig. 8(a), the current
commanded walk velocity is overridden and the next landing foot position is de-
termined according to the push direction. A single step is further divided into three
stages, which includes the ¯rst double support stage when ZMP moves to the current
support foot, and the single support stage when ZMP lies on the support foot, and
the second double support stage when ZMP moves back to the ¯nal torso position. If
we de¯ne the walk phase � as t=t STEP, where t is the time passed since step started
and t STEP is the duration of the step, we can design the ZMP trajectory pið�Þ as apiecewise linear function of � as
pið�Þ ¼Ci 1� �
�1
� �þ Li
�
�1
0 � � < �1;
Li �1 � � < �2;
Ciþ1 1� 1� �
1� �2
� �þ Li
1� �
1� �2
�2 � � < 1;
8>>>>><>>>>>:
ð19Þ
for the left support foot case and
pið�Þ ¼Ci 1� �
�1
� �þ Ri
�
�1
0 � � < �1;
Ri �1 � � < �2;
Ciþ1 1� 1� �
1� �2
� �þ Ri
1� �
1� �2
�2 � � < 1;
8>>>>><>>>>>:
ð20Þ
for the right support foot case, where �1; �2 are the timing parameters determining
the transition between single support and double support phase. The step controller
and resulting ZMP trajectory are shown in Fig. 9.
3.2. Trajectory controller
The trajectory controller generates the foot and torso trajectories for the current step
de¯ned in (18). First, we de¯ne the single support walk phase �single as
�single ¼
0 0 � � < �1;
�� �1
�2 � �1
�1 � � < �2;
1 �2 � � < 1;
8>>>><>>>>:
ð21Þ
S.-J. Yi et al.
1650011-12
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
then we use following heuristic trajectory function with parameters �; �:
fT ð�Þ ¼ �� þ ��ð1� �Þ; ð22Þto generate the foot trajectories for both feet lið�Þ, rið�Þ:
lið�Þ ¼ Lið1� fT ð�singleÞÞ þ Liþ1fT ð�singleÞ; ð23Þrið�Þ ¼ Rið1� fT ð�singleÞÞ þ Riþ1fT ð�singleÞ: ð24Þ
Then the torso trajectory xi is calculated to satisfy following ZMP criterion for the
LIPM
xi:: ¼ ðxi � pið�ÞÞ=tZMP
2; ð25Þwhere tZMP ¼ ffiffiffiffiffiffiffiffiffi
z0=gp
. The piecewise linear ZMP trajectory we use in (19) and (20)
yields the following closed-form solution of xið�Þ during the step period 0 � � < 1:
xið�Þ ¼
pið�Þ þ a pi e
�=�ZMP þ ani e
��=�ZMP
þmitZMP
�� �1
�ZMP
� sinh�� �1
�ZMP
� �0 � � < �1;
pið�Þ þ a pi e
�=�ZMP þ ani e
��=�ZMP �1 � � < �2;
pið�Þ þ a pi e
�=�ZMP þ ani e
��=�ZMP
þ nitZMP
�� �2
�ZMP
� sinh�� �2
�ZMP
� ��2 � � < 1;
8>>>>>>>>>><>>>>>>>>>>:
ð26Þ
where �ZMP ¼ tZMP=t STEP and mi, ni are ZMP slopes which are de¯ned as follows for
the left support case:
mi ¼ ðLi � CiÞ=�1; ð27Þni ¼ �ðLi � Ciþ1Þ=ð1� �2Þ ð28Þ
(a) (b)
Fig. 9. The step-based walk controller. (a) An example of walking behavior which is composed of two
steps, STEPi and STEPiþ1. (b) Corresponding lateral ZMP and torso trajectories pð�Þ and xð�Þ. Timingparameters of �1 ¼ 0:2 and �2 ¼ 0:8 are used.
Whole-Body Balancing Walk Controller
1650011-13
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
and for the right support case:
mi ¼ ðRi � CiÞ=�1; ð29Þni ¼ �ðRi � Ciþ1Þ=ð1� �2Þ: ð30Þ
The parameters a pi and an
i can then be uniquely determined from the boundary
conditions xið0Þ ¼ Ci and xið1Þ ¼ Ciþ1. This analytic solution of the torso trajectory
is continuous and has zero ZMP error during each step period, but may have dis-
continuous velocity at the transition when commanded velocity is changing. How-
ever, we found this does not hamper stability as the transition occurs in the middle of
the most stable double support stance. In addition to calculating foot trajectories
based upon predetermined target foot poses from step controller, the intra-step
override shown in Fig. 8(b) is handled by the trajectory controller by updating the
landing position of the current swing foot towards the capture point. As the new
landing point has to satisfy kinematic and velocity constraints, it is most e®ective at
the initial phase of the step.
4. Learning the High-Level Push Recovery Controller
In the previous sections, we have described our hierarchical push recovery controller
structure and its implementation for a position controlled robot. As we have dis-
cussed, although there are analytic decision boundaries for simpli¯ed models to select
the appropriate set of push recovery controllers based on current state, such decision
rules may not work well with more realistic dynamic models. Instead of relying on the
abstract model, our previous works have been using a machine learning approach,
where we directly train the parametrized controller from experience. We have
implemented three parametrized push recovery strategy for a resource constrained
robot with high gain position control, and used reinforcement learning to learn the
high-level controller that governs three push recovery controllers from raw sensory
inputs using a full-body model of robot in simulated environment, and used the
learned controller on small humanoid robot walking in place.33
An insight gained from physical experiments is that modest pushes can be ef-
fectively stabilized using the ankle strategy alone, and the magnitudes of hip and step
strategies are limited with the physical robot due to kinematic and motor con-
straints. In other words, it is su±cient to ¯x jj�MAXhip jj and jjxcapturejj, which greatly
reduces the action space compared to previous parametrized controllers. Still, such a
direct approach is not data e±cient as it does not utilize knowledge of the decision
surface, and applying this approach on real robot with scarce training data requires
much simpli¯cation of the controller.34
In this work, we take a hybrid approach. We use a low-dimensional decision
boundary in state space, but instead of relying on a theoretical boundary from a
simpli¯ed model, we use the training data from a simulated environment to get the
empirical decision boundary for push recovery controllers. Then the model can be
easily trained with limited number of data from the physical robot afterwards.
S.-J. Yi et al.
1650011-14
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
4.1. Resource constrained humanoid platform
Most of the physical implementation of push recovery controller introduced so far use
human-sized robots, usually equipped with harmonic gear drive train, triaxial force–
torque sensors and torque-controlled actuators. On the other hand, lightweight, low-
cost humanoid robots with o®-the-shelf servomotors are now gaining popularity in
part due to the commercial availability of a®ordable small humanoids. Although
those a®ordable humanoids are limited in terms of their sensory, motor and pro-
cessing power, they have been used for a viable research platform in many areas,
including balancing control during walking. A number of push recovery approaches
has been implemented on such platforms, including a crouching re°ex similar to hip
strategy,35 frontal hip strategy,36,37 lateral step strategy31 and frontal ankle and step
strategy.38
For this work, we use the commercially available DARwIn-OP humanoid robot
and its simulation model as the test platform. It is 45 cm tall, weighs 2.8 kg, and has
20� of freedom. It has a 3-axis accelerometer and gyroscope for inertial sensing, and
joint encoders at each joint for proprioceptive sensing. Position-controlled dyna-
mixel servos are used for actuators, which are controlled by a custom microcon-
troller connected to an embedded PC at a control frequency of 100Hz.
4.2. The extended inverted pendulum model
The abstract model we used in previous sections does not ¯t the physical hu-
manoid platform well. The most notable di®erence is that the physical robot has
feet with nonzero size, and the robot can be tipped on the boundary of the foot.
Furthermore, the ankle torque is only indirectly controlled by proportional con-
trol. Finally, the estimate of the linear position and velocity of the COM using
noisy sensors can be very hard. Proprioceptory sensors can be used to determine
COM position if we assume the support foot is on the ground, but such assumption
will not hold if the robot is perturbed hard. Instead, we have found that the
angular velocity and tilt angle information from inertial sensors are more reliable.
Thus, we propose a new abstract model for a resource constrained humanoid
robot, which is shown in Fig. 10(a). It is an inverted pendulum with the tilt angle �
as state, and has a foot with toe position �þ and heel position �� from the ankle
joint. The ankle torque �ankle is controlled by a PD control of � with saturation
values mg�þ and mg��:
�:: ¼ !2ðsinð�Þ � ð�ankle þ �hipÞ=mgz0Þ; ð31Þ
�ankle ¼ fsatðK 00p �þK 00
d �:Þ; ð32Þ
fsatðxÞ ¼mg�þ x � mg�þ;x mg�þ < x < mg�þ;mg�� x � mg��:
8<: ð33Þ
Whole-Body Balancing Walk Controller
1650011-15
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
We can linearize the stability regions in (15), (16) and consider the saturated
case to get the following stability regions for the ankle, hip and step strategies:
��=z0 < �:=!þ � < �þ=z0; ð34Þ
�:=!þ � > ��=z0 � � MAX
hip ðe!TH1 � 1Þ2=mgz0;
�:=!þ � < �þ=z0 þ � MAX
hip ðe!TH1 � 1Þ2=mgz0;ð35Þ
�:=!þ � > ��=z0 � � MAX
hip ðe!TH1 � 1Þ2=mgz0 � xMAXcapture=z0;
�:=!þ � < �þ=z0 þ � MAX
hip ðe!TH1 � 1Þ2=mgz0 þ xMAXcapture=z0:
ð36Þ
Figures 10(c) and 10(d) show three trajectory plots acquired from various initial
pushes using the extended inverted pendulum model and three di®erent sets of push
recovery strategies. Parameters used are m ¼ 2, z0 ¼ 0:295, �þ ¼ 0:05, �� ¼ �0:05,
K 00p ¼ 500, K 00
d ¼ 57:83, � MAXhip ¼ 1, TH1 ¼ 0:3, xMAX
capture ¼ 0:08, which are based on the
multi-body model of the DARwIn-OP robot. Reduced mass of m ¼ 2 is used to
compensate for the large leg mass of the robot. We see that the hip and step stra-
tegies help to enlarge the stability region, and even with the nonlinear dynamic
model we use, the empirical stability region of the ankle strategy closely follows the
theoretical one derived using simpler LIPM.
4.3. The ankle strategy with multi-body model
To model more realistic, multi-body dynamics of the robot we use the Webots
commercial robotic simulator39 based on the Open Dynamics Engine physics library
and supplied simulated model of DARwIn-OP robot. We use our modular open
source humanoid framework40 for controlling the robot. The controller update fre-
quency and physics simulation frequency are set to 100Hz. We use the COM height
z0 ¼ 0:295, step duration t STEP = 0.50 and robot center to ankle width d stance ¼0:375 for walk parameters. For the ankle strategy gain parameters, we use values of
(a) (b) (c) (d)
Fig. 10. The extended inverted pendulum model and the phase space trajectory plots generated usingdi®erent push recovery strategies. (a) An inverted pendulum model of robot with position controlled ankle
joint and foot. (b) Ankle strategy. (c) Ankle plus hip strategy. (d) Ankle plus step strategy. White and
gray regions in (b)–(d) are theoretical stable and unstable regions from (34). Darker gray regions in (c) and
(d) are the increased stable region compared to using the ankle strategy alone.
S.-J. Yi et al.
1650011-16
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
K 00p ¼ 0, K 00
d ¼ 0:15 which are found to be e®ective in practice, as the position con-
trolled joints and nonpoint feet already apply a positional negative feedback to the
system. The robot is pushed with impulse forces for one timestep (0:01 s) with dif-
ferent magnitudes and directions, and various combinations of push recovery
strategies are evaluated and the state trajectories are logged.
Figures 11(a) and 11(b) show the empirical decision boundaries found for the ankle
strategy from frontal and lateral pushes. Although the empirical trajectory plots have
shapes similar to those in Fig. 10, the empirical stability regions di®er signi¯cantly
from those obtained via the abstract models. From the trajectory curves, we ¯t a
linear classi¯er that best separates two regions for the duration 0:03 < t < 0:3, as our
impulse impact setup makes an unrealistic big spike at sensor readings for one or two
simulation steps. Then we get the estimated values for � and z0 shown in Table 1,
which implies following empirical stability boundaries for the ankle strategy:ffiffiffiffiffiffiffiffiffiffig=z�0
q��þ ��=z �
0
� �< �
:<
ffiffiffiffiffiffiffiffiffiffig=zþ0
q��þ �þ=z þ
0
� �: ð37Þ
4.4. Deciding between hip and step strategies
Given the empirical stability region of the ankle strategy controller, if the pertur-
bations fall outside that region, we need to employ other push recovery controllers in
addition to ankle controller to handle them. The LIPM-based abstract models
(Figs. 4 and 5) imply the theoretical stability regions described in (16) and (17),
(a) (b) (c) (d)
Fig. 11. Phase space trajectory plots generated with the multi-body model and di®erent push recoverystrategies in physically realistic simulations. (a) Ankle strategy, frontal push. (b) Ankle strategy, lateral
push. (c) Ankle plus hip strategy, lateral push. (d) Ankle plus step strategy, frontal push. White and gray
regions are theoretical stable and unstable regions from (34)–(36). Thick dashed lines are estimated linear
boundary between stable and unstable regions.
Table 1. Parameter values estimated
from the multi-body model.
Parameter �þ zþ0 �� z�0
Frontal 0.45 1.09 −0.42 1.02
Lateral 0.84 1.45 −0.84 1.45
Whole-Body Balancing Walk Controller
1650011-17
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
which can grow quite large with large �MAXhip and xMAX
capture. However, due to kinematic
and velocity constraints we have practical limits for those values. Taking a step also
takes time, which further restricts the e®ectiveness of the step strategy. We set
jjxhipjj ¼ 40�, TH1 ¼ 0:15, TH2 ¼ 0:3 and jjxcapturejj ¼ 0:08 for hip and step strategy
parameters and compare the results of the two strategies. Figures 11(c) and 11(d)
show trajectory plots acquired from two sets of push recovery controllers: the ankle
plus hip strategy and the ankle plus step strategy. In this case, a clear boundary for
ankle plus hip strategy is not as evident as in Fig. 10(c), as the inertial sensor of our
robot lies in the torso and rotates when the hip strategy is triggered. Instead of
decoupling the hip rotation and sensory readings, which turned out very hard with
noisy sensor model we use, we compared the outcome of two push recovery strategies
against various magnitudes of perturbation to better compare the e®ectiveness of the
two controllers. We have found that against frontal push, the step strategy can
withstand slightly larger maximum perturbations than the hip strategy, 1.04Ns
versus 1.05Ns for the step strategy, and step strategy has a wider region of stability
than the hip strategy with ¯xed parameter values �MAXhip and xMAX
capture. On the other
hand, the step strategy is not available for purely lateral perturbation due to kine-
matic constraints and we have to rely on the hip strategy for such cases.
In summary, the decision rule for push recovery strategies is as follows. We set the
ankle strategy active all the time, and if the state estimate moves beyond the em-
pirical stability boundaries in (37), the step strategy is triggered. In case the step
strategy is not available due to constraints, the hip strategy is triggered instead.
4.5. Comparison with ZMP tracking controller
To demonstrate the e®ectiveness of the hierarchical push recovery controller, we
compare it to the commonly used closed-loop ZMP tracking controller. We imple-
ment the ZMP tracking controller based on Ref. 18, with a single di®erence that the
current state is estimated using an inertial sensor rather than joint encoders and
forward kinematics. All other parameters remain unchanged. Various amounts of
frontal and lateral impulses were applied to the robot, and the outcome of push
recovery e®ort is logged for each controller. Figure 12 shows the comparison of two
controllers for forward, backward and sideways pushes. Figure 13 shows the stability
regions of four di®erent combinations of push recovery controllers settings. We can
see that the step strategy can handle the frontal perturbations fairly well, and the hip
strategy is e®ective for lateral perturbations where the step strategy cannot be uti-
lized due to the kinematic constraint. Overall, we see that the stability region of the
suggested approach is approximately 21% larger and completely encompasses that of
ZMP tracking method.
4.6. Extension to the full-sized humanoid robots
In this paper, we have used only the DARwIn-OP miniature humanoid robot for
testing in both the simulated and the real environments, which has relative large feet
S.-J. Yi et al.
1650011-18
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
size and larger power to weight ratio compared to common full-sized robots. To see
how our method can extend to a larger robot, we have made a comparison to a full-
sized position controlled humanoid robot, THOR-RD, which we have used for the
DARPA Robotics Challenge.41,42 The overall dimensions of two robots are shown in
Fig. 14, and a more detailed comparison between two robots is provided in Table 2.
We have found that due to the relatively lower COM height, the larger THOR-
RD robot has slightly larger foot length to COM height ratio that a®ects the max-
imum tilt angle the robot can recover from. On the other hand, due to the lower
power to weight ratio and larger dimension of the robot, the maximum horizontal
torso acceleration possible with full ankle torque is approximately three times smaller
than DARwIn-OP robot. So overall we expect the ankle strategy to work similarly
with larger robot, albeit being less responsive. And from (16) and (35), we can
assume that the e®ectiveness of the hip strategy is roughly proportional to the
� MAXhip =mg. The comparison of the quantity over two robots shows that under this
assumption, the hip strategy will be approximately 15% less e®ective with the larger
THOR-RD robot. Finally, the THOR-RD robot has longer natural pendulum period
(a) (b)
(c) (d)
(e) (f)
Fig. 12. A comparison of the ZMP tracking controller and suggested hierarchical push recovery controller
with di®erent impulse forces. (a) ZMP tracking controller, 1.05Ns of frontal push. (b) Hierarchical push
recovery controller, 1.05Ns of frontal push. (c) ZMP tracking controller, 1.04Ns of backward push.
(d) Hierarchical push recovery controller, 1.04Ns of backward push. (e) ZMP tracking controller, 1.61Nsof lateral push. (f) Hierarchical push recovery controller, 1.61Ns of lateral push.
Whole-Body Balancing Walk Controller
1650011-19
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
due to its higher COM height, and can take a larger step relative to the foot length.
We expect both of these factors can help the e®ect of the step strategy.
In summary, we expect that the suggested controller to work with larger position
controlled humanoid robots as well, although the torque limit of the actuators can
moderately degrade the performance of some strategy. Unfortunately, at the point of
Fig. 13. Comparison of stability regions for four di®erent push recovery settings.
Fig. 14. Comparison of the dimensions of the DARwIn-OP miniature humanoid robot and the THOR-
RD full-sized humanoid robot.
S.-J. Yi et al.
1650011-20
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
writing this paper, we could not test the controller with THOR-RD robot as we could
not risk possible hardware damage. This remains as a future work.
5. Experimental Results
In addition to the simulated environment, we have implemented the integrated
walk controller with push recovery on a commercially available DARwIn-OP small
humanoid robot. All code and parameter values used for simulation are used to
control the physical robot as well, with help of our modular open source humanoid
framework.
5.1. Hardware setup
To generate repeatable external perturbations, a motorized moving platform was
constructed using Dynamixel servomotors (Fig. 15). To generate maximum peak
acceleration, the platform is slowly accelerated in one direction and then suddenly
accelerated in the opposite direction. We have found the platform can generate
Table 2. Detailed comparison of the DARwIn-OP miniature humanoid
robot and the THOR-RD full-sized humanoid robot.
DARwIn-OP THOR-RD Ratio
Total height (m) 0.454 1.54 3.4
COM height (m) 0.295 0.70 2.37Foot length (m) 0.104 0.260 2.5
Foot width (m) 0.66 0.160 2.42
Leg link length (m) 0.186 0.600 3.22
Weight (kg) 2.8 58 20.7Max torque (Nm) 2.5 44.2 17.68
Foot length/COM height ratio 0.35 0.37 1.06
Leg/foot length ratio 1.78 2.30 1.29
Natural pendulum period (s) 0.17 0.27 1.59
Max COM acceleration (m/s2) 3.03 1.09 0.36Max torque/mass ratio 0.89 0.76 0.85
(a) (b)
Fig. 15. The servo platform to generate controlled perturbation. (a) The ankle strategy alone cannot
withstand the perturbation generated by the moving platform. (b) The robot can withstand the same
magnitude of perturbation with the hip strategy.
Whole-Body Balancing Walk Controller
1650011-21
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
accelerations greater than 0.5 g while carrying the robot, providing large enough
perturbations to make the robot fall without stabilization.
5.2. Empirical decision boundary with physical robot
We applied various magnitudes of perturbations to the robot from the front, back,
and one side while running the ankle strategy controller and measured the inertial
sensor readings for one second to generate state trajectories of robot. Figure 16 shows
the state trajectories in the frontal and lateral axis, which are ¯ltered with a moving
average ¯lter with n ¼ 3. For the frontal pushes, we can see the trajectory plot shown
in top part of Fig. 16(a) closely follows the graph acquired using simulated multi-
body model in Fig. 11(b), showing an almost linear boundary between stable and
unstable trajectories, while the slope is quite di®erent from theoretical one from
LIPM shown in gray shade. However, for backward pushes, we see the shape of
boundary is nonlinear at the initial part of the trajectory. This is due to mechanical
backlash of the joint, and it is only noticeable for backward pushes as the robot leans
slightly to the front with the default standing pose, eliminating the e®ect of backlash
for frontal pushes. From the sets of state trajectories, we obtain the linear boundaries
with estimated parameters � and z0 shown in Table 3.
5.3. Testing the push recovery controller
After estimating the boundary values shown in Table 3, we test the hierarchical push
recovery controller against perturbations in realistic setting. Figure 17(a) shows the
experimental setup. At each test, the pendulum starts swinging from stationary
(a) (b)
Fig. 16. Phase space trajectory plot acquired from frontal and lateral push experiment with the DARwIn-
OP robot. (a) Frontal push. (b) Lateral push.
S.-J. Yi et al.
1650011-22
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
state, where the initial position is determined experimentally so that the perturba-
tion is large enough to knock down the standing robot without any stabilization. We
use the pendulum mass 500 g and length 75 cm, and the swing angle of 30� and 45�
for frontal and lateral trials, which translate into 1.35Ns and 1.61Ns of perturba-
tions respectively. For each of three di®erent push recovery settings, we have per-
formed ¯ve total trials to get the standard deviation, where each trial consists of 20
tests. Figure 17(b) shows the comparison of three stabilization methods. We can see
that due to a number of causes such as the battery depletion, slight impact position
di®erence and temperature buildup at the actuator, there are some deviation of the
results, but still our controller signi¯cantly helps the robot to withstand large dis-
turbances. Interestingly, we have found that the physical robot can withstand larger
perturbations than simulated one in Fig. 13, probably due to longer impact duration
with physical setup.
Then we let the robot walk with nonzero speed, and applied disturbances using a
soft tipped stick to the robot to see how the walk controller handles the reactive
stepping while locomotion. Figure 18 shows some examples of robot response against
external disturbances. We see that the suggested controller can successfully trigger
appropriate push recovery behaviors during locomotion to keep the robot from
falling down.
(a) (b)
Fig. 17. The comparison of push recovery controller performances using the DARwIn-OP robot. (a) The
experimental setup. (b) Test results.
Table 3. Parameter values estimated
from the DARwIn-OP robot.
Parameter �þ z þ0 �� z �
0
Frontal 1.3 2.7 �1.1 2.7
Lateral 1.4 2.7 �1.4 2.7
Whole-Body Balancing Walk Controller
1650011-23
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
(a)
(b)
(c)
(d)
Fig. 18. Responses of the push recovery controller against perturbation while walking. (a) The ankle and
step strategies while walking forward at 18 cm/s. (b) The ankle and step strategies while walking backward
at 12 cm/s. (c) The ankle and hip strategies while walking forward at 18 cm/s. (d) The ankle and hipstrategies while turning at 0.6 rad/s.
S.-J. Yi et al.
1650011-24
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
6. Conclusions
We have demonstrated an integrated controller that enables full-body push recovery
for humanoid robots without specialized sensors and actuators. Three low-level
biomechanically motivated push recovery strategies are implemented on a position
controlled humanoid robot, and integrated with a ZMP-based walk controller that
allows reactive stepping. Instead of relying on inaccurate theoretical decision sur-
faces, we propose to use a low-dimensional empirical decision surface for a hierar-
chical controller that is learned from repeated trials both in a simulated environment
using a multi-body model with proportional control joints, and in a real environment
using a servo-controlled moving platform and DARwIn-OP small humanoid robot.
Experimental results show that the trained controller can successfully initiate a full
body push recovery behavior under external perturbations. Potential future work
includes incorporating more sophisticated learning algorithms to better utilize the
limited training data, and implementing these algorithms on full-sized humanoid
robots.
Acknowledgments
We acknowledge the support of the NSF PIRE program under contract OISE-
0730206, and ONR SAFFIR program under contract N00014-11-1-0074.
References
1. A. G. Hofmann, Robust Execution of Bipedal Walking Tasks from BiomechanicalPrinciples, Ph.D. Thesis, Computer Science Department (Massachusetts Institute ofTechnology, Cambridge, MA, USA, 2006), 407 pp.
2. S. Kajita and K. Tani, Study of dynamic biped locomotion on rugged terrain, in IEEE Int.Conf. Robotics and Automation (Sacramento, CA, 1991), pp. 1405–1411.
3. S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada and K. Yokoi, Biped walkingpattern generation by using preview control of zero-moment point, in IEEE Int. Conf.Robotics and Automation (2003), pp. 1620–1626.
4. K. Harada, S. Kajita, K. Kaneko and H. Hirukawa, An analytical method on real-timegait planning for a humanoid robot, in IEEE–RAS Int. Conf. Humanoid Robots, Vol. 2(2004), pp. 640–655.
5. M. Morisawa, K. Harada, S. Kajita, K. Kaneko, F. Kanehiro, K. Fujiwara, S. Nakaokaand H. Hirukawa, A biped pattern generation allowing immediate modi¯cation of footplacement in real-time, in IEEE–RAS Int. Conf. Humanoid Robots (2006), pp. 581–586.
6. R. Tajima, D. Honda and K. Suga, Fast running experiments involving a humanoidrobot, in IEEE Int. Conf. Robotics and Automation (Piscataway, NJ, USA, 2009),pp. 1418–1423.
7. J. Pratt, J. Car® and S. Drakunov, Capture point: A step toward humanoidpush recovery, in 6th IEEE–RAS Int. Conf. Humanoid Robots (2006), pp. 200–207.
8. B. Stephens, Humanoid push recovery, in IEEE–RAS Int. Conf. Humanoid Robots(2007).
9. B. Jalgha, D. C. Asmar and I. Elhajj, A hybrid ankle/hip pre-emptive falling scheme forhumanoid robots, in IEEE Int. Conf. Robotics and Automation (2011), pp. 1256–1262.
Whole-Body Balancing Walk Controller
1650011-25
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
10. B.-K. Cho, S.-S. Park and J.-H. Oh, Stabilization of a hopping humanoid robot for apush, in IEEE–RAS Int. Conf. Humanoid Robots (2010), pp. 60–65.
11. M. Morisawa, F. Kanehiro, K. Kaneko, N. Mansard, J. Sol, E. Yoshida, K. Yokoi andJ.-P. Laumond, Combining suppression of the disturbance and reactive stepping forrecovering balance, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IEEE, 2010),pp. 3150–3156.
12. K. Hirai, M. Hirose, Y. Haikawa and T. Takenaka, The development of Honda humanoidrobot, in IEEE Int. Conf. Robotics and Automation, Vol. 2 (IEEE, 1998), pp. 1321–1326.
13. I.-W. Park, J.-Y. Kim, J. Lee and J.-H. Oh, Mechanical design of humanoid robotplatform khr-3 (kaist humanoid robot 3: Hubo), in IEEE–RAS Int. Conf. HumanoidRobots (2005), pp. 321–326.
14. S. Kajita, T. Nagasaki, K. Kaneko, K. Yokoi and K. Tanie, A running controllerof humanoid biped hrp-2lr, in IEEE Int. Conf. Robotics and Automation (2005),pp. 616–622.
15. T. Buschmann, S. Lohmeier and H. Ulbrich, Humanoid robot Lola: Design and walkingcontrol, J. Physiology-Paris, 103(3–5) (2009) 141–148.
16. B.-K. Cho, J.-H. Kim and J.-H. Oh, Online balance controllers for a hopping and runninghumanoid robot, Adv. Robot. 25 (9–10) (2011) 1209–1225.
17. S. Kajita, M. Morisawa, K. Miura, S. Nakaoka, K. Harada, K. Kaneko, F. Kanehiro andK. Yokoi, Biped walking stabilization based on linear inverted pendulum tracking, inIEEE/RSJ Int. Conf. Intelligent Robots and Systems (IEEE, 2010), pp. 4489–4496.
18. D. Gouaillier, C. Collette and C. Kilner, Omni-directional closed-loop walk for NAO, inIEEE–RAS Int. Conf. Humanoid Robots (2010), pp. 448–454.
19. I. Ha, Y. Tamura and H. Asama, Gait pattern generation and stabilization for humanoidrobot based on coupled oscillators, in IEEE/RSJ Int. Conf. Intelligent Robots and Sys-tems (2011), pp. 3207–3212.
20. V. Prahlad, D. Goswami and M.-H. Chia, Disturbance rejection by online ZMP com-pensation, Robotica, 26 (2008) 9–17.
21. C. Graf and T. R€ofer, A closed-loop 3D-LIPM gait for the Robocup standard platformleague humanoid, in Fourth Workshop on Humanoid Soccer Robots (2010), pp. 18–22.
22. M. Abdallah and A. Goswami, A biomechanically motivated two-phase strategy forbiped upright balance control, in IEEE Int. Conf. Robotics and Automation (2005),pp. 2008–2013.
23. S.-H. Lee and A. Goswami, Ground reaction force control at each foot: A momentum-based humanoid balance controller for non-level and non-stationary ground, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (2010), pp. 3157–3162.
24. B. Stephens, Push Recovery Control for Force-Controlled Humanoid Robots, Ph.D.Thesis, (Pittsburgh, PA, USA, 2011), 180 pp.
25. H. Diedam, D. Dimitrov, P.-B. Wieber, K. Mombaur and M. Diehl, Online walking gaitgeneration with adaptive foot positioning through linear model predictive control, inIEEE/RSJ Int. Conf. Intelligent Robots and Systems (2008), pp. 1121–1126.
26. B. Stephens and C. Atkeson, Modeling and control of periodic humanoid balance usingthe linear biped model, in IEEE–RAS Int. Conf. Humanoid Robots (2009), pp. 379–384.
27. D. L. Wight, E. G. Kubica and D. W. L. Wang, Introduction of the foot placementestimator: A dynamic measure of balance for bipedal robotics, J. Comput. NonlinearDynam. 3(1) (2008) 011009.
28. S.-K. Yun and A. Goswami, Momentum-based reactive stepping controller on level andnon-level ground for humanoid robot push recovery, in IEEE/RSJ Int. Conf. IntelligentRobots and Systems (IEEE, 2011), pp. 3943–3950.
S.-J. Yi et al.
1650011-26
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
29. T. Sugihara, Standing stabilizability and stepping maneuver in planar bipedalism basedon the best COM-ZMP regulator, in Proc. 2009 IEEE Int. Conf. Robotics and Auto-mation (ICRA'09) (2009), pp. 669–674.
30. K. Nishiwaki and S. Kagami, High frequency walking pattern generation based on pre-view control of ZMP, in IEEE Int. Conf. Robotics and Automation (2006), pp. 2667–2672.
31. M. Missura and S. Behnke, Lateral capture steps for bipedal walking, in IEEE–RAS Int.Conf. Humanoid Robots (2011), pp. 401–408.
32. M. Missura and S. Benke, Omnidirectional capture steps for bipedal walking, in IEEE Int.Conf. Humanoid Robots (2013), pp. 14–20.
33. S.-J. Yi, B.-T. Zhang, D. Hong and D. D. Lee, Learning full body push recovery controlfor small humanoid robots, in IEEE Int. Conf. Robotics and Automation (2011),pp. 2047–2052.
34. S.-J. Yi, B.-T. Zhang, D. Hong and D. D. Lee, Online learning of a full body push recoverycontroller for omnidirectional walking, in IEEE–RAS Int. Conf. Humanoid Robots (2011),pp. 1–6.
35. R. Renner and S. Behnke, Instability detection and fall avoidance for a humanoid usingattitude sensors and re°exes, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems(2006), pp. 2967–2973.
36. D. N. Nenchev and A. Nishio, Ankle and hip strategies for balance recovery of a bipedsubjected to an impact, Robotica 26(5) (2008) 643–653.
37. B. Jalgha and D. Asmar, A simple momentum controller for humanoid push recovery, inAdvances in Robotics, Vol. 5744 (Springer, Berlin, 2009). Lecture Notes in ComputerScience, pp. 95–102.
38. B. Hengst, M. Lange and B. White, Learning ankle-tilt and foot-placement control for°at-footed bipedal balancing and walking, in IEEE–RAS Int. Conf. Humanoid Robots(2011), pp. 288–293.
39. O. Michel, Webots: Professional mobile robot simulation, J. Adv. Robot. Syst. 1(1) (2004)39–42.
40. S. G. McGill, J. Brindza, S.-J. Yi and D. D. Lee, Uni¯ed humanoid robotics softwareplatform, in 5th Workshop on Humanoid Soccer Robots (2010), pp. 7–11.
41. S.-J. Yi, S. G. McGill, L. Vadakedathu, Q. He, I. Ha, J. Han, H. Song, M. Rouleau, B.-T.Zhang, D. Hong, M. Yim and D. D. Lee, Team THOR's entry in the DARPA roboticschallenge trials 2013, J. Field Robot. 32(3) (2014) 315–335.
42. S.-G. McGill, S. Yi and D. D. Lee, Team THOR's adaptive autonomy for disaster re-sponse humanoids, in IEEE Int. Conf. Humanoid Robots (2015), pp. 453–460.
Seung-Joon Yi received the B.Sc. degree from the School of
Electrical Engineering and the Ph.D. degree from the School of
Computer Science and Engineering, Seoul National University,
Seoul, Korea, in 2000 and 2013, respectively. He is currently a
Postdoctoral Fellow at the GRASP Laboratory, University of
Pennsylvania, where he has also worked as a Visiting Scholar
from 2009–2013. He is the author of over 20 technical publica-
tions, proceedings, editorials and books. He has been the main
developer of the University of Pennsylvania RoboCup robotic soccer team and the
DARPA Robotics Challenge team. His research interests include reinforcement
learning and humanoid robotics.
Whole-Body Balancing Walk Controller
1650011-27
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.
Byoung-Tak Zhang received the B.Sc. and M.Sc. degrees in
Computer Science and Engineering from Seoul National Univer-
sity, Seoul, Korea, in 1986 and 1988, respectively, and the Ph.D.
degree in Computer Science from the University of Bonn, Bonn,
Germany, in 1992. He is currently a Professor with the School of
Computer Science and Engineering and the Graduate Programs
in Bioinformatics, Brain Science and Cognitive Science, SNU,
and directs the Biointelligence Laboratory and the Center for
Bioinformation Technology. Prior to joining SNU, he was a Research Associate with
the German National Research Center for Information Technology (GMD) from
1992–1995. From August 2003 to August 2004, he was a Visiting Professor with the
Computer Science and Arti¯cial Intelligence Laboratory (CSAIL), MIT, Cambridge.
His research interests include probabilistic models of learning and evolution, bio-
molecular/DNA computing, and molecular learning/evolvable machines.
Dennis Hong is an Associate Professor and the Founding Di-
rector of Robotics and Mechanisms Laboratory RoMeLa of the
Mechanical Engineering Department at Virginia Tech. His re-
search focuses on robot locomotion and manipulation, autono-
mous vehicles and humanoid robots. His past awards include the
NSF CAREER, the SAE Ralph R. Teetor Award, the ASME
Freudenstein/GM Young Investigator Award, and has been
named to Popular Science's \Brilliant 10" to name a few. As the
inventor of a number of novel robots and mechanisms, Washington Post magazine
called Dr. Hong \the Leonardo da Vinci of robots." He received his degrees in Me-
chanical Engineering; B.Sc. from the University of Wisconsin Madison (1994), M.Sc.
and Ph.D. degrees from Purdue University (1999, 2002).
Daniel D. Lee is currently a Professor in the School of Engi-
neering and Applied Science at the University of Pennsylvania.
He studied Physics, receiving his A.B. from Harvard in 1990, and
his Ph.D. in Condensed Matter Physics from MIT in 1995. After
completing his studies, he joined Bell Labs, the research and
development arm of Lucent Technologies, where he was a
Researcher in the Theoretical Physics and Biological Computa-
tion departments. After six years in industrial research, he joined
the faculty at Penn in 2001 where he is currently in the Electrical and Systems
Engineering Department and at the GRASP Robotics Laboratory. His research
interests include machine learning, robotics and computational neuroscience.
S.-J. Yi et al.
1650011-28
Int.
J. H
uman
. Rob
ot. 2
016.
13. D
ownl
oade
d fr
om w
ww
.wor
ldsc
ient
ific
.com
by S
EO
UL
NA
TIO
NA
L U
NIV
ER
SIT
Y o
n 06
/18/
16. F
or p
erso
nal u
se o
nly.