combining optimal and neuromuscular controllers for agile ...cga/tmp-public/humanoid_main.pdfwork...

26
Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut Geyer and Chris Atkeson, Carnegie Mellon University Overview. A central conundrum in humanoid robotics is that manually designed, policy-based controllers are often more robust than model-based optimizing controllers, but model-based controllers more easily generalize to automated behavior generation with direct control of foot or hand placements. The overall goal of this proposal is to test the hypothesis that incorporating policy-based control as either a cost, a constraint, or a replacement within the model-based online optimal control framework produces controllers that combine the advantages of both approaches. Several research activities are performed to achieve this goal. A state-of-the-art model-based online opti- mal control framework and a neuromuscular control policy established by the researchers in previous work on human and humanoid locomotion serve as starting points. The three proposed variants of a hybrid ap- proach are formally developed involving theoretical research on nonlinear control, development of novel control algorithms, and advancement of neuromuscular control models. The resulting controllers are im- plemented and tested against state-of-art model-based control in simulation and hardware experiments with dierent humanoid robots available at the performing institution. The controllers will be evaluated with respect to locomotion speed, cadence, duty factor, as well as kinematics, kinetics and ground reaction force patterns; external push resistance and survival rate on rocky, sandy, muddy, slippery, and potentially mov- ing terrain; model parameter variations and artificial sensory noise; and cycle time of controller execution. These metrics address the human-likeness, robustness, and computational eciency of humanoid control. Intellectual Merit. The intellectual contributions of this research are: (1) A mathematical framework blending model-based and parametric policy-based control for whole body locomotion in robotics. The framework will be independent of the particular policy, although we focus on a neuromuscular policy as an example in this project. The framework may generalize to other scientific domains and problems which ben- efit from incorporating domain knowledge into optimization. (2) The experimental evaluation in simulation and humanoid robot testbeds of three specific approaches within this framework that focus on the policy as a cost, constraint, or replacement. (3) An understanding of how well these types of blended control generalize across robot platforms of dierent morphology and sensory and actuation modalities. (4) New hypotheses about human sensory-motor control of locomotion behaviors through the advancement of the neuromuscular model. An outcome confirming the overall hypothesis could trigger a paradigm shift in control approaches to robust behavior in robotics, and other agile vehicles and machines. Broader Impacts. This research will lead to more useful robots. A successful outcome will enable practical controllers for robust locomotion of legged robots in uncertain environments with applications in disaster response and elderly care. The project may generate new insights into human sensorimotor control of behavior, which can lead to better prostheses and exoskeletons and more eective approaches to fall risk management. The proposed project integrates research into education and mentoring by directly training graduate and undergraduate students and by incorporating research outcomes into course work. The project strengthens the infrastructure for research and education by helping to maintain the robot testbeds available for research and education. The project supports the broad dissemination of research results by publishing and freely providing simulation and implementation software. We also expect unplanned broader impacts such as appearing in the news, on TV shows, and in movies. Unplanned broader impacts from previous work include our technologies being demonstrated on entries in the DARPA Robotics Challenge, a graduate student being a participant on a Discovery Channel TV series, “The Big Brain Theory: Pure Genius”, a graduate student assisting a group of all female high school students in the robot FIRST competition, and a robot character in a Disney movie being inspired by our work on soft robots (Baymax in Big Hero 6). Key words: humanoid robotics, robust locomotion, model-based control, neuromuscular control 1

Upload: others

Post on 13-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior

Hartmut Geyer and Chris Atkeson, Carnegie Mellon University

Overview. A central conundrum in humanoid robotics is that manually designed, policy-based controllersare often more robust than model-based optimizing controllers, but model-based controllers more easilygeneralize to automated behavior generation with direct control of foot or hand placements. The overallgoal of this proposal is to test the hypothesis that incorporating policy-based control as either a cost, aconstraint, or a replacement within the model-based online optimal control framework produces controllersthat combine the advantages of both approaches.

Several research activities are performed to achieve this goal. A state-of-the-art model-based online opti-mal control framework and a neuromuscular control policy established by the researchers in previous workon human and humanoid locomotion serve as starting points. The three proposed variants of a hybrid ap-proach are formally developed involving theoretical research on nonlinear control, development of novelcontrol algorithms, and advancement of neuromuscular control models. The resulting controllers are im-plemented and tested against state-of-art model-based control in simulation and hardware experiments withdifferent humanoid robots available at the performing institution. The controllers will be evaluated withrespect to locomotion speed, cadence, duty factor, as well as kinematics, kinetics and ground reaction forcepatterns; external push resistance and survival rate on rocky, sandy, muddy, slippery, and potentially mov-ing terrain; model parameter variations and artificial sensory noise; and cycle time of controller execution.These metrics address the human-likeness, robustness, and computational efficiency of humanoid control.

Intellectual Merit. The intellectual contributions of this research are: (1) A mathematical frameworkblending model-based and parametric policy-based control for whole body locomotion in robotics. Theframework will be independent of the particular policy, although we focus on a neuromuscular policy as anexample in this project. The framework may generalize to other scientific domains and problems which ben-efit from incorporating domain knowledge into optimization. (2) The experimental evaluation in simulationand humanoid robot testbeds of three specific approaches within this framework that focus on the policy as acost, constraint, or replacement. (3) An understanding of how well these types of blended control generalizeacross robot platforms of different morphology and sensory and actuation modalities. (4) New hypothesesabout human sensory-motor control of locomotion behaviors through the advancement of the neuromuscularmodel. An outcome confirming the overall hypothesis could trigger a paradigm shift in control approachesto robust behavior in robotics, and other agile vehicles and machines.

Broader Impacts. This research will lead to more useful robots. A successful outcome will enablepractical controllers for robust locomotion of legged robots in uncertain environments with applications indisaster response and elderly care. The project may generate new insights into human sensorimotor controlof behavior, which can lead to better prostheses and exoskeletons and more effective approaches to fall riskmanagement. The proposed project integrates research into education and mentoring by directly traininggraduate and undergraduate students and by incorporating research outcomes into course work. The projectstrengthens the infrastructure for research and education by helping to maintain the robot testbeds availablefor research and education. The project supports the broad dissemination of research results by publishingand freely providing simulation and implementation software. We also expect unplanned broader impactssuch as appearing in the news, on TV shows, and in movies. Unplanned broader impacts from previouswork include our technologies being demonstrated on entries in the DARPA Robotics Challenge, a graduatestudent being a participant on a Discovery Channel TV series, “The Big Brain Theory: Pure Genius”, agraduate student assisting a group of all female high school students in the robot FIRST competition, and arobot character in a Disney movie being inspired by our work on soft robots (Baymax in Big Hero 6).

Key words: humanoid robotics, robust locomotion, model-based control, neuromuscular control

1

Page 2: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

PROJECT DESCRIPTION

1 Objectives

Motivation. A central conundrum in robotics is that manually designed parametric policy-based controllers(e.g. [102, 52, 160, 101, 41, 140, 92, 122]) are often more robust than model-based optimizing controllers,but model-based controllers (e.g. [66, 87, 104, 105, 111, 55, 56, 49, 106, 26, 34, 65, 141, 95, 103, 76, 93, 89,71, 73, 162, 26, 17, 18, 59, 72]) better generalize to automated behavior generation, which requires directcontrol of foot or hand placement for visually guided locomotion or the use of handholds, canes, and otherwalking aids. We believe that finding a way to combine the advantages of both approaches will lead tohuman levels of performance and robust locomotion in humanoid robots, give us a better understanding ofthe science of locomotion, and potentially lead to new insights into human motor control.

Research Question and Proposed Solution. Our motivation translates into a specific research question:Can these two approaches be combined within one formal framework such that the resulting hybrid approachshares the human-likeness, human levels of performance, and robustness of policy-based control with theversatility of model-based optimizing control? We propose as a solution to this question the use of policy-based control as either a cost, a constraint, or a replacement within the model-based online optimal controlframework. We hypothesize that each of these variants offers different advantages and shortcoming withrespect to human levels of performance, robustness, and computational efficiency of humanoid control.

As an example of the model-based controller, we will focus on a hierarchical optimization-based con-troller developed by Atkeson’s group [125, 143, 38, 35, 36, 37, 154, 156] (Sec. 3). This controller providesdirect control of foot and hand placement and steering. It can be combined with a vision system, and canautomatically generate a variety of behaviors. For instance, we have demonstrated visually guided roughterrain locomotion, getting out of a vehicle, opening and moving through a door. manipulating a valve anda tool (drill), cutting a hole in a wall, operating a large circuit breaker switch, inserting a peg into a hole,picking up debris, walking through debris, climbing stairs, and we have added the use of hands for climbinga ladder with the ATLAS robot [36, 11, 10, 79, 37]. In the DARPA Robotics Challenge Finals our robotwas the only biped that did not fall down or need to be rescued by humans during the two days of theevaluations [9]. Our robot performance was also the most consistent across the two days.

As an example of the policy-based control, we will focus on a neuromuscular model of human controlcreated by Geyer’s group [41, 120, 118, 122] (Sec. 4). The model presents an attractive candidate policythat is inherently human-like, comparably robust to disturbances, and that is less demanding in terms of stateestimation (for most joints only position is required and not velocity, for example).

Specific Objectives. Our specific objectives are to formally develop three variants of a hybrid approachand to evaluate their performance in simulation and hardware experiments with different humanoid robots(Fig. 1) (Sec. 5). The metrics according to which we will evaluate the performance include locomotionspeed, cadence, duty factor, as well as kinematics, kinetics and ground reaction force patterns (human levelsof performance), external push resistance and survival rate on rocky, sandy, muddy, slippery, and potentiallymoving terrain (robustness to external perturbations), model parameter variations and artificial sensory noise(robustness to internal uncertainty), and cycle time of controller execution (computational costs). For thehardware experiments, our groups are equipped with humanoid robots of different complexity (ATRIAS andSARCOS Primus), and we have access to another humanoid with different actuation and sensing capabilities(COMAN) through our collaboration with Disney Research Pittsburgh. This access to a range of hardwareplatforms will allow us to evaluate how well the observed outcomes generalize across humanoids.

Intellectual Merit. The expected contributions of the proposed research are: (1) A mathematical frame-work blending model-based and parametric policy-based control for whole body locomotion in robotics.The framework will be independent of the particular policy, although we focus on a neuromuscular policyas an example in this project. (2) The experimental evaluation in simulation and humanoid robot testbeds ofthree specific approaches within this framework that focus on the policy as a cost, constraint, or replacement.(3) An understanding of how well these types of blended control generalize across robot platforms of dif-

1

Page 3: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 1. Robot platforms that we work with. (a) CMU’s ATRIAS compliant electrical humanoid with boom constraint forsagittal plane locomotion. ATRIAS is 3D capable and can be taken off the boom. (b) CMU’s SARCOS Primus hydraulichumanoid. (c) ATLAS hydraulic humanoid used by Atkeson’s group crossing rubble field in the DARPA Robotics Challenge.(d) COMAN compliant electrical humanoid. To evaluate our control hypotheses of this proposal, we will use ATRIAS, SARCOS(equipment in PIs labs), and COMAN (through collaboration with Disney Research).

ferent morphology and sensory and actuation modalities. (4) New hypotheses about human sensory-motorcontrol of locomotion behaviors through evaluating and refining the neuromuscular policy.

Significance. These contributions can have significant intellectual and societal impact. Although hu-manoid robots are considered key technological platforms with applications in disaster response and elderlycare, robust humanoid locomotion remains a research topic in a nascent stage. Successfully developinga mathematical framework that blends model-based online optimal control and policy-based control, andcombines the advantages of both individual control approaches, could trigger a paradigm shift in generatingrobust locomotion behaviors in robotics with an impact on future robotics technology including humanoids,prosthetics, and exoskeletons. In addition, the mathematical framework may generalize to other scientificdomains and problems which benefit from incorporating domain knowledge into optimization.

2 Related Work

Model-based Control in Humanoid Robotics. Force-controlled humanoid robots are often controlled us-ing models of the robot and environment dynamics. These computed torque approaches are generalized tofloating base inverse dynamics [66, 87, 104, 105, 111, 55, 56, 49, 106, 26, 34, 65, 141, 95, 103, 76, 93, 89,71, 73, 162, 26, 17, 18, 59, 72], which enable the robot to track the behavior of an underlying, simplifieddynamics model. In legged locomotion, common underlying models are, for instance, the linear invertedpendulum model used in humanoid robots like ASIMO, HRP-4 and HUBO [63, 50, 64, 131, 21], or the zerodynamics manifold for bipedal robots like RABBIT, MABEL, and AMBER, controlled within the hybridzero dynamics framework [20, 142, 1, 123]. The model-based control paradigm is mathematically elegant,and the corresponding locomotion problem has been solved in theory with optimal control techniques thatautomatically generate behavior [60, 13, 90, 132, 135]. In practice, however, the paradigm requires estimat-ing the states and dynamic parameters of a robot with high accuracy, and humanoids struggle to demonstraterobust dynamic locomotion [37, 154, 156].

Parametric Policy-Based Controllers. Parametric policy-based controllers for legged locomotion havebeen developed in the robotics, computer graphics, and neuromechanics communities [102, 52, 160, 101,41, 92, 140]. These policy-based controllers capture explicit domain knowledge, acquired either by intuitionor by a more formal and time-consuming analysis of individual locomotion behaviors. The Raibert robots inparticular demonstrate robust locomotion while navigating rough terrain [101, 92, 31]. In addition, simula-tion experiments demonstrate that policy controlled models of humans and humanoids can withstand com-parably large disturbances including unexpected ground changes and pushes to the body [140, 121, 39, 122].

One motivation for investigating parametric policies is the observation that, in animals and humans, de-centralized neural control circuits (central pattern generators and muscle reflex modules) at the level ofthe spinal cord seem to contribute substantially to the generation and robustness of legged locomotion

2

Page 4: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 2. Controller architecture based on hierarchical optimization.

Figure 3. ATLAS robot practicing for the terrain task of the DRC. Photo snapshots taken every 5 seconds. Measuredtrajectories in the transverse (a) and sagittal planes (b) of the left (red) and right foot (green) and the center of mass (CoM,blue) are shown. The robot walks in a straight line along y = 0 in reality. Our state estimator at that time drifted significantly.This has been fixed by adding vision-based SLAM.

[115, 19, 46, 29, 97, 54, 40, 23]. It is believed that imitating these types of parametric policy-based controlwill help machines to match the agility and robustness of animals [114, 161, 91].

Hybrid Approaches. Very few attempts have been made to combine the benefits of both types of controlapproaches in humanoid robotics. The control overview reported for the Petman humanoid [92] seemsclosest in spirit, although the paper does not reveal implementation details. Specifically, during stance thecontrol regulates global goals like the height and orientation of the body similar to other implementations[63, 100, 38]. Desired behaviors also include target trajectories for the pelvis, back and arm motions, whichcan be interpreted as costs that improve the human-likeness of behavior. A force distribution algorithm isthen used to best achieve the necessary leg forces that track the different behaviors simultaneously.

We are not aware of a formal framework or analysis of a hybrid approach to humanoid control whichseamlessly blends model-based online optimal control and policy based control.

3 Control Strategy using Model-Based Hierarchical Optimization

Figure 2 shows the architecture of our robot control based on hierarchical optimization. It was successfullytested on our SARCOS and ATLAS platforms (Fig. 1b,c). Our ATLAS robot demonstrated rough terrainwalking, ladder climbing, and full body manipulation in the DARPA Robotics Challenge (DRC) Trials of2013. This work also received a “Best Oral Paper Award” at Humanoids 2013 [38]. Figure 3 shows asequence of snapshots of ATLAS walking on rough terrain and plots of measured trajectories of the feetand the center of mass (CoM) for the DRC Trials. Figure 4 shows similar data for ladder climbing. In theDARPA Robotics Challenge Finals in 2015 our robot was the only biped that did not fall down or need tobe rescued by humans during the two days of the evaluations [9]. Our robot performance was also the mostconsistent across the two days. Because of the way the evaluations were run, we have only limited data fromthe actual evaluations. Videos of our robot’s performance during the DRC Finals are available [8].

In the controller architecture, the Behavior Library (Fig. 2) contains learned behaviors and behaviorscreated from demonstration or motion capture of humans or other robots, and also acts as a cache forplanning. The output of the library is used to prime, speed up, and guide the real time optimization process,and can be motion or force trajectories in joint or task coordinates (xd(t)), as well as possibly includingvalue, heuristic cost, or utility functions (Vx(t)) that can be used to guide search and optimization. Libraries

3

Page 5: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 4. ATLAS climbing the top half of the same ladder as used in the DRC. Photo snapshots taken every 13 seconds.The top row shows repositioning of the hook hands, and the bottom row shows stepping up one tread. The two plots showATLAS climbing the first five treads during the actual run at the DRC Trials. Axis definitions and color codes for foot and CoMtrajectories are the same as in Fig. 3. Left and right hand positions are plotted with cyan and magenta dashed lines.

can be built in advance, or when task specification are known. The rest of the controller can also operatewithout information from the library if suitable information is not available.

The Multistep Low-Order Dynamics Planner uses receding horizon control to continually optimizetrajectories of motion and force of a simplified model of the current task. In walking, this is usually centerof mass (COMd(t)), angular momentum (Ld(t)), contact force (contactsd(t)) trajectories, and value, heuristiccost, or utility functions (Vc(t)) specifying tradeoffs between these quantities. We optimize with a timehorizon of several seconds using Differential Dynamic Programming [58].

For walking, our planner is similar in spirit to Kajita’s Preview Control for a Linear Inverted PendulumModel (LIPM) [62]. We are also using a CoM model, reason about Zero Moment Point (ZMP), and usefuture information to guide the current trajectory. However, our planner can be generalized to nonlinearmodels and adjust foot steps while optimizing the CoM trajectory. Like capture point methods [99], we takethe next few steps into consideration but do not plan to come to rest at the end. Ogura et al. [94] investigatedgenerating human-like walking with heel-strike and toe-off by parameterizing the swing foot trajectory. Incontrast, we use very simple rules to guide the low level controller to achieve the same behaviors.

The Full Body QP Optimization uses quadratic programming (QP) with a very short time horizon (1mson SARCOS and 2ms on ATLAS) to optimize the desired full state of the robot (θθd, θθd), joint torques (ττd),and contact forces (fd). Currently, we pre-specify gain matrices (Kθ, Kθ, Kτ, Kf) for the robot controller,but plan to explore how to optimize gains as well as trajectories. This module enforces constraints suchas robot kinematics and dynamics and joint, velocity, torque, actuator, friction cone, and center of pressurelimits, and resolves redundancies. The module performs inverse kinematics and dynamics to provide us withcompliant motions and dynamic behaviors, while compensating for the effects of modeling errors.

For the QP, we use a formulation developed in our group [125, 144, 143]. Unlike [87, 104] who useorthogonal decomposition to project the allowable motions into the null space of the constraint Jacobian,and minimize costs in the contact constraints and the commands, we directly optimize a quadratic cost interms of state accelerations, torques and contact forces on the full robot model. This design choice allowsus to trade off physical quantities of interest. We are also able to directly reason about inequality constraintssuch as center of pressure within the support polygon, friction, and torque limits. Although it becomes abigger QP problem, we are still able to solve it in real time. Hutter el al. [55] resolved redundancy in inversedynamics using a kinematic task prioritization approach that ensures lower priority tasks always exist in thenull space of higher priority ones. In contrast to their strictly hierarchical approach, we minimize a sum ofweighted terms. We can directly specify the relative importance of the terms by adjusting the weights.

The Low-Level Robot Controller is a blend of our work and controllers provided by the robot manu-facturers. For SARCOS and ATLAS, hydraulic valve commands for each joint i are generated by high rate

4

Page 6: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 5. Neuromuscular control policy. (a) Example walking on ground with large unexpected height changes. Refer to textfor details on model structure (i-iv). (b) Robustness to external disturbances. Survival rate on terrain with unexpected stepsand resistance to unexpected pushes. (c) Robustness to internal disturbances. Black trace: trunk pitch during level groundwalking. Red trace: disturbed pitch signal seen by control policy (15ms time delay, 5deg/s uniform random noise).

servo controllers provided by the manufacturers,

vi = kθ,i(θi − θd,i) + kθ,i(θi − θd,i) + kτ,i(τi − τd,i) + v f f ,i , (1)

where vi is the valve command, θi is the joint angle, θi is the joint angular velocity, τi is the joint torque, andv f f ,i is a feedforward valve command. Subscripts d indicate desired values, and quantities denoted with kare scalar gains. This joint level servo runs at 5kHz on SARCOS and at 1kHz on ATLAS.

We augment this independent joint control by coupling joints and providing force control based onforce/torque sensors in the wrists, ankles, and skin of the robot,

v = Kθ(θθ − θθd) + Kθ(θθ − θθd) + Kτ(ττ − ττd) + Kf(f − fd) ,+v f f (2)

where v, θθ, θθ, ττ, and v f f are the corresponding vector quantities, and the K are matrix gains coupling alljoints. f are sensed contact forces. The coupled controller runs at 1kHz on SARCOS and 500Hz on ATLAS.

Figure 2 does not include our state estimator, which provides appropriate state information to all othermodules. The details of this work on state estimation are described in [156, 154].

4 Control Strategy using Neuromuscular Policy

Figure 5 shows the architecture of our human locomotion model controlled by a modular neuromuscularcontrol policy. We have shown in previous work that this model successfully captures human-like kinemat-ics, kinetics, and ground reaction forces [41, 120, 122], is robust to comparably large external and internaldisturbances [41, 119, 120, 118, 121, 122] (Fig. 5b,c), and shows a range of human locomotion behaviors[122] (Fig. 6). Other researchers have confirmed and furthered these results [140, 39, 32, 30]. We also haveapplied individual modules of the neuromuscular policy to the control of prosthetic limbs [33, 134, 133],leading to the first powered ankle prosthesis commercially available to transtibial amputees [48, 15], and tonew controllers for balance recovery of transfemoral amputees [134, 133].

The Musculoskeletal Dynamics of the model are represented with 8 segments connected by revolutejoints (Fig. 5a, i) that are spanned by 18 Hill-type muscles (Fig. 5a, ii). The muscles generate net torquesabout the joints they span. Each muscle’s force is computed based on a Hill-type model [137] with seriesand parallel elasticities connected to a contractile element (Fig. 5a, iii). The contractile element’s forceFce = AFmax fl fv depends on mechanical muscle properties described by the maximum isometric force Fmaxand the nonlinear force-length and force-velocity relationships ( fl and fv), as well as on the muscle activationA. Muscle activations are low-pass filtered stimulations S generated by a model of human neural control.

5

Page 7: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 6. Examples of steady and transient locomotion behaviors that the neuromuscular policy can generate [122].

The Neural Control (Figs. 5, iv) of the model almost exclusively represents spinal reflex circuits thatare physiologically plausible and embed major functions of legged dynamics and control as local feedbackcontrol modules [41, 122]. For example, the compliant leg behavior important to legged dynamics in stance[16, 86, 44] is realized with simple positive force reflexes of the anti-gravity muscles, and their stimulation ismodeled as S (t) = S 0+GFce(t−∆t), where S 0 is a reference membrane potential at the α-motoneuron, G is asynaptic gain, and ∆t describes a neural time delay (up to 20ms) [42]. Other such functional control modules,for instance, ensure that the segmented legs do not buckle in stance and robustly reach foot placement targetsin swing. In total, ten functional modules have been identified in our neuromuscular control work [122].Together they form the neuromuscular control policy.

Our approach to modeling neural control stands in contrast to more common approaches which emphasizecentral pattern generation [130, 128, 129, 47, 70, 114, 91]. An advantage that our approach provides is thatit links physiology to physics-based function, making the contribution of individual control modules clearand transferrable to rehabilitation and humanoid robotics.

The Neuromuscuclar Control Policy (NMP) represents only one example of several control policieswhich we could consider for a hybrid approach that blends model-based online optimal control and policybased control (compare Sec. 2). In fact, the main ideas behind the approaches we propose to investigate donot hinge on the particular policy. However, the NMP provides several advantages that make it an attractivecandidate policy. It is inherently human-like, does not depend on a high-fidelity model of the full systemdynamics, and is comparably robust to disturbances (Fig. 5). In addition, it can express a range of steadyand transient locomotion behaviors including slope and stair negotiation, walking and running, acceleratingand decelerating, turning and obstacle avoidance (Fig. 6).

5 Results from Prior NSF Support

5.1 Related to Model-Based Hierarchical Optimization

(a) NSF award: IIS-0964581 (PI: Hodgins, Atkeson co-PI); amount: $699,879; period: 7/1/10 - 6/30/14.(b) Title: RI: Medium: Collaborative Research: Trajectory Libraries for Locomotion on Rough Terrain.(c) Summary of Results: This grant has supported work on a variety of approaches to controlling humanoidrobots based on trajectory libraries. Section 3 focused on the hierarchical optimization approach, whichforms one starting point for the proposed work and received a “Best Oral Paper Award” at Humanoids 2013.

Intellectual Merit. (1) Identification of a hierarchical approach to online optimal control of behavior,as described above. (2) Algorithms for training robust policies/controllers using multiple models, form-ing the basis for a new approach to robust learning and for algorithms greatly speeding up reinforcementlearning. (3) Simple learning approaches to concisely represent and accurately predict optimal trajectories.(4) Globally optimal control of instantaneously coupled systems (ICS). (5) A new paradigm, “informed pri-

6

Page 8: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

ority control”, which coordinates multiple sub-policies to faster design complex control systems. (6) Stateestimation for mobile and humanoid robots with “floating body” dynamics.

Broader Impacts. We are working toward more useful robots and knowledge that may help with a sig-nificant social problem, why people fall and injure themselves. We have coordinated our outreach activitieswith the larger outreach efforts of CMU’s NSF Engineering Research Center on Quality of Life Technologyto scale up reach and effectiveness. Our technologies are being shared by being published, and papers areavailable electronically. The technologies were demonstrated on entries in the DARPA Robotics Challenge.We have created and teach relevant courses, including an undergraduate course on humanoid robotics anda graduate course on optimization of behavior. Our work is being used in Disney Research and throughthis technology transfer will eventually be used in entertainment and education applications, and will beavailable to and inspire the public. A graduate student participated on a Discovery Channel TV series, ”TheBig Brain Theory: Pure Genius”. One purpose of the TV series is getting people excited about engineering.A graduate student supported a group of all female high school students in the robot FIRST competition. Arobot character in a Disney movie was inspired by our work on soft robots (Baymax in Big Hero 6) [7].

Development of Human Resources. The project involved five PhD graduate students (four studentshave graduated) and four postdocs. We had weekly meetings, weekly lab meetings, and we individuallymentored all participants. All participants actively did research, made presentations to our group, gaveconference presentations, gave lectures in courses. The students served as teaching assistants.(d) Publications resulting from this NSF award: [158, 88, 3, 74, 157, 108, 127, 124, 126, 153, 163, 117, 107,4, 164, 6, 69, 145, 155, 53, 80, 2, 143, 125, 144, 153, 146, 67, 38]. This award led to further work supportedby DARPA: [35, 36, 37, 154, 156, 11, 10, 79, 9].(e) Other research products: We have made our motion data available in the CMU motion capture database [51].(f) Renewed support. This proposal is not for renewed support.

5.2 Related to Neuromuscular Policy

(a) NSF award number: EEC-0540865; amount: $29,560,917; period of support: 6/1/06 - 5/31/15.(b) Title: NSF Engineering Research Center on Quality of Life Technology (PI: Siewiorek).(c) Summary of Results: This NSF Engineering Research Center supported during the period 6/2010-5/2013a research project (lead H. Geyer) on swing leg control in the NMP. We only report on this part of the award.

Intellectual Merit. (1) Identification of a policy-based swing leg controller that robustly places the footinto targets on the ground [27]. (2) Identification of plausible spinal reflex circuits that embed this swingcontrol in an NMP. The NMP generated resulting muscle activations that match human muscle activationsin walking and running, providing evidence for a similar control in humans [28, 122].

Broader Impacts. The project has supported our work toward control algorithms in rehabilitation robotics,which may help people who depend on legged assistance to achieve greater mobility and quality of life. Thecommercially available, powered ankle prosthesis [33, 15] is a testament to the social impact that this workcan have. We currently expand on this path focusing on the control of active knee prostheses for balancerecovery in amputee gait after large disturbances [134, 109, 133]. Our results are being shared by beingpublished, and papers are available electronically. The results have been integrated into the curriculum of agraduate course on biomechanics and motor control.

Development of Human Resources. The project trained two graduate students, who presented their re-sults at conferences (ROBIO 2012, ICRA 2013 and EMBC 2013). The graduate students have completedtheir Master’s degree and won prestigious scholarships (R. Desai: Siebel and Google Anita Borg scholar-ships) during tenure with the project. They are continuing their training and education within the RoboticsInstitute’s PhD program.(d) Publications resulting from this NSF award: [27, 28, 120, 118, 121, 122].(e) Other research products: None.(f) Renewed support. This proposal is not for renewed support.

7

Page 9: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

MultistepLow-Order Dynamics

Planner

Approach 1:Neuromuscular Policy

integrated via Cost Terms

θd, Kθ

BehaviorLibrary

Full BodyQP

Optimization

Low-LevelRobot

Controller

θd, Kθ

τd, Kτfd, Kf

COMd(t)Ld(t)

contactsd(t)Vc(t)

xd(t)Vx(t)

Approach 2:Neuromuscular Policy

integrated via Constraint Terms

Approach 3:Neuromuscular Policy

as Replacement

(i) (ii)(iii)

Figure 7. Key concepts for hybrid approaches blending optimal and policy-based control in humanoid locomotion. Theneuromuscular policy control acts via cost terms (1), via constraint terms (2), or by replacement (3). The three approaches tointegration can be applied at different stages (i-iii) of the model-based optimization control hierarchy.

6 Research Plan

Both the model-based optimization approach and the policy-based approach have distinct strengths andweaknesses. The policy-based approach generates quite robust locomotion while being insensitive to kine-matic singularities, joint and actuator limits, and model parameters in general, which are all concerns thatmake it difficult to apply controllers from the model-based optimization approach on humanoid robots. Onthe other hand, the policy-based approach does not generalize well to automated behavior generation, whichrequires direct control of foot or hand placement for visually guided locomotion or the use of handholds,canes, walkers, and other walking aids. In this domain, the model-based online optimal control approachshines. To understand how these two approaches can be combined to get the best of each, we propose toinvestigate three distinct hybrid approaches.

6.1 Hybrid approaches that we will pursue

Our key concepts for hybrid approaches are to use the neuromuscular policy as either a cost, a constraint, ora replacement (Fig. 7). Each of these ideas can be applied to different stages of the model-based optimizationwith distinct advantages and shortcomings.

Hybrid Approach 1: Policy as Cost Terms. The concept of the first hybrid approach is to use the controlmodules (CM) of the neuromuscular control policy (NMP) as terms in the cost function at different stagesof the hierarchical optimization control,

C =∑

i

Ci +∑

k

CCM,k, (3)

where C is the total cost, Ci are the i original cost terms, and CCM,k are k new cost terms that bias thecontroller to imitate different modules that make up the overall NMP. We can alter the costs in each stage(i, ii, or iii) or apply the same or different cost functions in some combination of the stages. The generaladvantage of this approach is that it is simple to implement in the state-of-art control frameworks of hu-manoid controllers [159]. The NMP simply acts through bias terms in existing optimization stages. Themain shortcomings of this approach are that it maintains a high demand on the quality of sensory signalsand of the robot-environment dynamics model, M ¨q+ H = S τ+ JT λ, and that the computational speed of theoptimization remains slow as no dimensionality reduction occurs. We hypothesize that the NMP bias willimprove a humanoid robot’s performance including, for example, more human-like step lengths and timingin locomotion. However, we do not anticipate substantial improvements on locomotion robustness.

Hybrid Approach 2: Policy as Optimization Constraints. Instead of a cost term, the second approachintroduces the NMP as constraint terms in the hierarchical model-based optimization,

min C s.t. fi(x) = 0, g j(x) > 0, (4)

8

Page 10: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

where x = [q ˙q]T . Several individual functions of the limbs are common in locomotion. One exampleis compliant leg behavior, which sometimes leads to buckling [113]. One control module of the NMPcounteracts this tendency to buckle [41, 122] (Sec. 4). Including such common limb functions (domainknowledge) as constraints instead of as costs ensures that the optimization respects known danger zones thatthreaten locomotion stability. For instance, the knee is particularly prone to buckling, and a torque constraint

fi(x) = τk − τCM,b (5)

in the full-body QP stage of the hierarchical model-based control (Fig. 7, ii) could ensure that the kneetorque τk follows the counteraction torque τCM,b produced in the neuromuscular policy to prevent buckling.Besides this main advantage, the second approach can also lead to partial dimensionality reduction and arelated computational speed improvement if the basis for the optimization is chosen to automatically fulfillNMP constraints. The reliance on a quality dynamics model of the robot remains a shortcoming of thisapproach. In addition, the implementation of the NMP as constraints will be technically more challenging.We hypothesize, however, that it improves over the first approach with more stable and robust behaviors.

Hybrid Approach 3: Policy as Replacement for Optimization. The third approach directly replacesoptimization by the complete NMP. The motivation behind this concept is that the NMP requires much lesssensory information and model precision than the dynamics model in the model-based optimization (Fig. 7).For instance, replacing the QP optimization (ii in Fig. 7), the NMP acts as a black box, virtual model thatreceives task level inputs xgoal from the dynamics planner (i) and generates desired joint torques

τcmd = f (q, ˆFcnt,ˆθ f loat, xgoals) (6)

for the robot from the current positions q of the robot’s joints, a gross estimate ˆFcnt of the left and rightlegs’ contact forces, and an estimate of the trunk orientation ˆθ f loat in the world. The NMP does not requireprecise measurements of environment interaction (λ and JT ) or joint velocity estimates ( ˙q). In addition,replacement with the NMP enables very fast computation as the dimensionality of the optimization problemreduces to that of the low dimensional dynamics planner. However, this approach can only be appliedwhen the topologies match between the NMP and the target robot. Another major shortcoming is that therobots’ behavioral versatility is limited to that of the NMP, which furthermore must be accessible with afew high level inputs xgoal. We hypothesize that the third approach will be the least sensitive to unmodeledrobot-environment dynamics for the behaviors that the NMP can address.

Robustness and Stability of The Hybrid Approaches. We will use numerical methods to assess therobustness and stability of our complex nonlinear high-dimensional hybrid controllers in simulation andimplemented on robots [116, 82, 75, 77, 152, 139]. We are interested in identifying point attractors, limitcycles, their region of attraction, situations with multiple attractors and domains of attraction, situationswhere a system oscillates due to unmodeled dynamics or parameter change, and situations where a systementers an unacceptable region of the state space such as falling down. For regulation tasks (constant desiredposition or velocity), we will empirically measure the local linear dynamics and estimate eigenvalues ofthe full system as well as gain and phase margins both in simulation and on the robots in the region of thefixed point. In simulation, we will also measure the sensitivity of eigenvalues and gain and phase margins tosystem and controller parameter variations. Given random parameters, we will estimate probability densityfunctions for eigenvalues and gain and phase margins. In the presence of discontinuities in dynamics, suchas impacts, we will use a describing function approach to apply linear stability analysis techniques [147].For limit cycles, we will empirically measure local fast linear dynamics in the vicinity of the limit cycleat various phases of the cycle, and estimate eigenvalues of the local fast dynamics. We will use Poincaresections to estimate longer term and more global dynamics around the entire limit cycle. We will alsodo sensitivity analyses of the perturbation dynamics to system and controller parameter variations. Wewill calculate Lyapunov functions for the controlled systems in simulation using dynamic programmingtechniques. Given a controller, we can use policy iteration to calculate a value function, which can serveas a numerical Lyapunov function. This process is orders of magnitude faster than computing an optimal

9

Page 11: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 8. Preliminary results on neuromuscular policy as replacement [12]. (a) Adaptation of neuromuscular model to ATRIAStopology. (b) Robot negotiating rough terrain (upto ± 7cm) in testbed simulator using swing leg targets (xgoal). Arrow indicatesdirection of motion. (c) Center of mass and foot point trajectories at about 6m into the motion of the robot around the boom.

control using value iteration (often thought of as traditional dynamic programming). The calculated valuefunction can also be used to estimate a viability kernel or the stable region of the controlled system.

6.2 Research activities that we will perform

The starting points for the proposed research are our model-based optimization control framework (de-scribed in section 3) and neuromuscular control policy (described in section 4). The proposed researchactivities detail the intellectual steps that will take us from these starting points to the evaluation of ourhypotheses about hybrid approaches to humanoid control.

Activity 1: Adapt Realtime Optimal Control Framework to ATRIAS and COMAN. Our existingmodel-based control framework is implemented in realtime in both the SARCOS Primus and ATLAS con-trollers [125, 143, 36, 37, 156, 154]. We will adapt this framework to the ATRIAS and COMAN platforms.The adaptation provides new intellectual challenges. ATRIAS and COMAN have physically compliant, se-ries elastic actuators (SEAs) [98], and it remains open how these additional states should be treated in therealtime optimization framework. We will investigate two directions, either keeping the low level controlof the SEAs separate from optimization or integrating their states into the optimization. From a theoreticalstandpoint, the second version should provide superior control performance.

Activity 2: Develop Hybrid Approaches with NMP as Costs or Constraints. Using the NMP as costsor constraints raises two research questions: Which cost or constraint terms of the NMP maximize controlperformance? At what stage in the control framework (i-iii, Fig. 7) should the NMP be integrated? We willaddress the first question by investigating the effect of different cost or constraint terms. In particular, weplan to focus on terms using torques, accelerations, and contact forces. For instance, JCM,i = τT

CM,iWττCM,i +

¨qTCM,iW ¨q ¨qCM,i would generate a cost biasing torques and accelerations of the optimization toward control

module CMi of the NMP. Ideas that we will pursue include the optimization of the cost term weights to findthe most robust controller and the ordering of the constraints in a priority list that gets satisfied until degreesof freedom are exhausted. For the second question, we will investigate how the bias with NMP terms atdifferent stages affects the control performance. A bias in the planner (i, Fig. 7) may provide better footplacement targets than the linear inverted pendulum currently used in the planner. NMP bias in the full bodyoptimization (ii) can address individual limb functions. Finally, a behavior library populated with the NMP(iii) can provide fast, pre-computed solutions, and may allow the humanoid to learn behavior over time.

In general, the NMP constraints are nonlinear constraints. To implement them in the optimization frame-work, we will use sequential quadratic programming techniques [45].

Activity 3: Develop Hybrid Approach with NMP as a Replacement. Replacing individual stagespresents a more radical change than influencing an established optimization framework with additionalcosts or constraints. To address the feasibility of this approach, we have implemented a preliminary versionin a simulation of the 2D ATRIAS testbed (see section 6.3 for details on our testbeds). We removed the footof the neuromuscular model and adapted it to ATRIAS (Fig. 8a). We then mapped the torques generatedby the NMP into desired joint torques (Eq. 6) for a previously validated simulation of the testbed [84, 85].

10

Page 12: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 9. Proposed evaluation of human-likeness and robustness to external disturbances. (a) Human joint kinematics andkinetics as a benchmark for likeness. (b) Standard methods of push recovery and ground changes for assessing robustnessto external perturbation. Refer to section 6.3 for more details and additionally proposed evaluation metrics.

The resulting control generated robust walking of ATRIAS over terrain with unknown ground level changesof ±7cm (Fig. 8b,c) [12]. The preliminary results suggest the feasibility of the approach. To evaluate it onreal hardware, we will embed the NMP replacement for the individual robot platforms within the existingrealtime optimal control framework (activity 1).

In addition, we will address the research question of whether the NMP can be formulated in a hierarchythat controls versatile behavior with a few high-level task goals xgoal. Specifically, we will investigate ifa hierarchy of reflex control can be identified for the key inputs of target speed, target gait, and targetturning rate. Our original NMP produced only walking [41]. An understanding of how the swing leg controlof foot placement can be formulated with a hierarchical structure [27, 28] led us to an NMP which canexpress a rich set of behaviors from walking and running to slope negotiation and turning [122] (compareFig. 6). Although we can formulate offline optimizations that generate transitions between these behaviors,we do not yet understand how these adaptations can be reinterpreted with hierarchical control structures.A solution to this question not only would enrich the online NMP replacement approach, but also canprovide new insights into human sensorimotor control, providing a direct hypothesis about how humansadapt locomotion behaviors.

Activity 4: Implement, Evaluate and Refine Hybrid Approaches in Simulation. We will implementthe hybrid approaches in simulation to evaluate their performance and refine them before conducting ac-tual hardware tests. The simulations will model the experimental tests that we plan to perform within therobot testbeds. For this activity, we will build on our previous work, in which we have developed similarsimulations for the ATRIAS, SARCOS and ATLAS platforms [85, 125, 143, 38].

Activity 5: Implement, Evaluate and Refine Hybrid Approaches in Robot Experiments. We willimplement the three hybrid approaches on several different robot testbeds. The details of our experimentalevaluation plan, including performance metrics and benchmarks, robot testbeds, and data collection, areelaborated in the next section.

6.3 How we will evaluate controller performance in experiments

To test our hypotheses about the performance of the three hybrid approaches, we will evaluate the human-likeness, the robustness to external perturbations, the robustness to internal model uncertainty, and the com-putational cost of the individual humanoid controllers in robot testbeds of increasing complexity.

Performance Metrics and Benchmarks. Well-defined standards and benchmarks for assessing locomo-tion in humanoid robots are rare [136]. Benchmarks that have been used often focus on accomplishing aspecific task with discrete pass or fail metrics (climb ladder, open door, negotiate rubble field, [25]). Estab-lished performance metrics for a more continuous comparison of competing locomotion controllers includeenergy-efficiency, the likeness to human locomotion, and the assessment of push recovery [136].

We will quantify human-likeness using established methods and metrics from biomechanical gait analysisfocusing on speed, cadence, and duty factor [57, 96], as well as on similarity of kinematic, kinetic, and

11

Page 13: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Figure 10. Measuring maximum survivable push for one of our walking simulations [143] using polar plots. A point representsthe maximum survivable perturbation in a given direction. Concentric circles are in increments of 10N (a) or 10Ns (b,c). (a) 3second push recovery as a function of push angle. (b) 0.1 second push recovery as a function of push angle and time afterleft foot lift off. (c) Data is shown for four different conditions. Pushes occur midway through right single support.

ground reaction force patterns [96, 148, 61] (Fig. 9a). We will evaluate locomotion robustness to externalperturbations by quantifying the recovery from pushes in the sagittal and frontal planes (Fig. 9b), and thesurvival rate on terrain with unknown ground disturbances of increasing severity (Fig. 9c). The pusheswill be administered with a force-sensor equipped device to a fixed location on the torso near the COMof the robots to quantify disturbance impulse. (We have successfully used this method in previous workassessing standing balance [126].) Figure 10 shows an example of how we can systematically evaluate thesame performance metric in simulation [143]. The ground disturbances will include tracks with pits of rocks(three different granularities ranging from 1cm to 10cm) [8] and of tiny roundball pellets (1-6mm, emulatebehavior of sand) and other regular and irregular granular materials, and slippery surfaces of decreasingfriction coefficient (teflon, steel, oil layer). We will evaluate the robustness to internal model uncertainty bysystematically changing the model parameters (mass and inertial properties) and adding artificial sensorynoise in the software of the controller during locomotion on level ground. Finally, we will quantify thecomputational cost by tracking the execution time for each controller.

Benchmarks for some of these metrics have been reported in the literature; however, these benchmarks arehighly dependent on the physical capability of the individual robot hardware platform. To objectively assessthe benefits and shortcomings of the proposed hybrid approaches to humanoid control, we will establishbenchmark values for each of our robot platforms using the state of the art model-based optimization control(detailed in section 3) as a null hypothesis.

Robot Testbeds. The groups of the PIs are equipped with humanoid robots of different morphology(ATRIAS and SARCOS Primus) and have access to another humanoid with different actuation and sensinghardware (COMAN) (Fig. 1). We expect to continue to have access to an ATLAS robot as well. This accessto a range of platforms will provide us with the unique opportunity to quantify our results across platforms.Such a systematic evaluation of performance, either in simulation or on actual robots, has rarely been donein the field of humanoid robotics, including our previous work.

We will evaluate the control approaches on the simplest robot testbed first (7 degrees of freedom, DOF)and then systematically advance to the testbeds with increasing complexity (up to 34 DOF). First, we willuse CMU’s ATRIAS biped constrained to a boom (Fig. 1a). With only 3 external and 4 internal DOF, thetestbed substantially simplifies the realtime implementation of the different controllers. Second, we willtake ATRIAS off the boom. This will increase the level of complexity to 6 external and 6 internal DOF, andwill allow us to assess the performance of the controllers including lateral balance. Third, we will apply andtest the controllers on CMU’s SARCOS Primus humanoid. With 12 internal DOF used for locomotion, thehumanoid matches the full complexity of the neuromuscular model policy control. We will also explore fullbody control with 34 DOF (includes legs, arms, torso, and neck, but not fingers, eyes, or mouth). Finally,we plan to test the generalizability of our results with the COMAN and ATLAS robots, which are similar incomplexity to the SARCOS robot.

Data Collection. All of the humanoid robots are equipped with position, force and torque sensing. In

12

Page 14: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

addition, we will use our motion capture and force plate systems to collect ground truth for joint motionsand ground reaction forces during the experiments. The measurements are sufficient to analyze the human-likeness and the robustness to external and internal disturbances with the proposed metrics.

7 Management Plan

7.1 Roles of the Investigators

This project will bring together researchers with complementary areas of expertise attacking an interdis-ciplinary problem. The key personnel will include the two PIs, Chris Atkeson and Hartmut Geyer (CMURobotics Institute), one postdoctoral researcher, and two graduate students.

Principal Investigators. Chris Atkeson has expertise in model-based optimal control of humanoids [125,143, 37, 154, 156], robot learning, behavior libraries, and humanoid robotics. Hartmut Geyer has exper-tise in legged systems including conceptual gait models [43, 44, 27, 150], computational models of hu-man neuromuscular control policies [42, 41, 119, 118, 121, 122], powered prosthetics and legged robots[33, 110, 84, 85], and gait analysis [112, 42, 27]. Neither PI can do the project alone. A joint effort is nec-essary to combine our expertise in this interdisciplinary project involving humanoid robotics, model-basedoptimization, and human neuromechanics. Our collaboration will provide a thrust towards a new generationof students that can take on the challenging questions in this interdisciplinary area of research.

Postdoctoral Researcher. We have found that for research involving technologically advanced platforms,the combination of a postdoctoral researcher with several faculty and students provides a very productivemix of expertise. It accelerates the training and learning of the graduate students while maintaining a highlevel of research productivity. We will recruit a postdoctoral researcher with expertise in humanoid roboticsto ensure such a productive environment. She will participate in all activities of the project with a focus ondeveloping a single framework that can seamlessly blend optimization and policy-based control. She willbe jointly mentored by the PIs (for details, see our Postdoctoral Researcher Mentoring Plan).

Graduate Students. The project will involve two graduate students who will be jointly supervised bythe PIs and the postdoctoral researcher. For the theoretical parts of the project, one student will focus on thehybrid approaches using the NMP as a cost or constraint (activity 2), and the other student will focus on theNMP as a replacement (activity 3). For the experimental validation in simulation and on robots, the studentswill work together (activities 1, 4 & 5). The students will be trained by the PIs in legged systems and optimalcontrol theory as well as in model-based optimization and human neuromechanics. They will also acquireexpertise in advanced techniques and algorithms for rigid body simulation and realtime hardware control.

Project Coordination. The PIs will jointly coordinate the activities supported by this project and willbe responsible for all aspects of it. For the day-to-day administration and financial management, they willbe supported by the staff of the CMU Robotics Institute. Frequent ad hoc meetings and interactions of allpersonnel involved are guaranteed as as the PIs’ groups share lab space and participate in a weekly, jointCMU seminar on bipedal locomotion. In addition, all personnel in this project will meet bi-weekly in amore formal setting to review research results and discuss future directions.

7.2 Time Line of Activities

Year 1. Theory: Develop hybrid approaches for planar locomotion (Activities 2 & 3). Simulation Experi-ments: Implement, evaluate and refine theory in simulation experiments of 2D ATRIAS platform (A1 & 4).

Year 2. Theory: Develop hybrid approaches for 3D locomotion (A2 & 3). Simulation Experiments: Imple-ment, evaluate and refine theory in simulation experiments of 3D ATRIAS and SARCOS platforms (A1 &4). Robot Experiments: Evaluate hybrid approaches on 2D (early) and 3D ATRIAS testbed (later) (A5).

13

Page 15: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Year 3. Theory: Refine theory based on year 2 robot experiments (A2 & 3). Simulation Experiments:Generalize in simulation experiments to COMAN platform (A1 & 4). Robots: Evaluate hybrid approacheson SARCOS (early) and COMAN testbed (later) (A5).

7.3 Risk Assessment and Mitigation

Theory Development Risks (Activities 2 and 3). Our ideas may not work out as expected, and we may en-counter unforeseen barriers to progress. Mitigation: The proposed research activities include the explorationof different cost and constraint modalities as well as alternative stages for the integration of the policy basedcontrol into the optimization framework (Fig. 7). For activity 2, we have working baseline controllers forboth the SARCOS and ATLAS robots. For activity 3, we have preliminary results that support the viabilityof the replacement approach (Sec. 6.2). Additionally, the intellectual independence of the three approachesprovides a layer of risk mitigation should the theoretical development for one of them be delayed or fail.

Implementation Risks (Activities 1, 4 and 5). As with any project including hardware evaluation,we face implementation risks. We may not be able to achieve real time performance of the blended con-trol framework. The humanoid robot platforms may break or have mechanical limitations that hamperimplementation (torque control limited on ATLAS and SARCOS due to stiction in actuators and play intransmission and bearings, joint velocity estimation required for approaches 1 and 2 limited on ATLASand SARCOS). Mitigation: We have extensive prior experience working with three hardware platforms[84, 85, 126, 38, 125, 143, 36, 37, 156, 154] including realtime control implementation, state estimation,and evaluation of locomotion controllers. We are currently updating some of the platforms to improve jointvelocity estimation (ATLAS and SARCOS). Working with multiple platforms also balances the risk of animplementation delay or failure on a single platform.

8 Broader Impacts of the Proposed Work

Generality of the Hybrid Approaches. The proposed control approaches can generalize to a wide rangeof behaviors including walking, running, velocity control, acceleration, deceleration, slope and stair climb-ing and descent, whole body climbing, turning, obstacle avoidance, stumble and trip recovery, handlingrocky, sandy, muddy, slippery, and potentially moving terrain (standing on a bus or a ship), and the useof handholds, canes, walkers, and other walking aids. We also believe the proposed control approacheswill generalize to a wide range of agile machines and vehicles, including self driving cars. Often humansmanually design intuitive control systems. We believe that combining these with a constrained model-basedonline optimizing control can improve the performance and “steerability” of the manual control design, andthe general optimal controller can benefit from the bias provided by the intuitive control.

Unplanned Broader Impact. Often the broader impacts of our work are serendipitous, and not plannedin advance. Examples of such ad hoc broader impacts from our recent work include: 1) Our technologiesbeing demonstrated on entries in the DARPA Robotics Challenge. 2) A graduate student was a participanton a Discovery Channel TV series, ”The Big Brain Theory: Pure Genius”. One purpose of the TV series isgetting people excited about engineering. 3) A graduate student supported a group of all female high schoolstudents in the robot FIRST competition. 4) Our work on soft robotics inspired the soft inflatable robotBaymax in the Disney movie Big Hero 6 [7]. We have participated in extensive publicity as a result. Anexplicit goal of this movie was to support STEAM. We expect to be involved with sequels to Big Hero 6,and to support Disney’s STEAM effort (which is quite large and well funded). We expect similar unplannedbroader impacts to result from this work, especially based on dramatic videos of agile robots.

Development of Human Resources. Students working on this project will gain experience and expertisein teaching and mentoring by assisting the course students in class projects, and will have the opportunity totrain their communication and inter-personal skills, as they will actively participate in the dissemination ofthe research results at conferences and in related K-12 outreach programs of the Robotics Institute.

14

Page 16: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

Participation of Underrepresented Groups. We will attract both undergraduate and graduate students,especially those from underrepresented groups. We will also make use of ongoing efforts in the RoboticsInstitute and CMU-wide. These efforts include supporting minority visits to CMU, recruiting at variousconferences and educational institutions, and providing minority fellowships. CMU is fortunate in beingsuccessful in attracting an usually high percentage of female undergraduates in Computer Science. Ourcollaboration with the Rehabilitation Science and Technology Department of the University of Pittsburgh inthe area of assistive robotics is a magnet for students with disabilities and students who are attracted by thepossibility of working on technology that directly helps people.

Outreach. One form of outreach we pursue is an aggressive program of visiting students and postdocs.This has been most successful internationally, with visitors from the Delft University of Technology (4students) [5, 83, 149], the HUBO lab at KAIST (1 student and 1 postdoc) [14, 22, 68], and the ChineseScholarship Council supported 5 students [78, 153]. We are currently experimenting with the use of Youtubeand lab notebooks on the web as wikis to make public preliminary results as well as final papers and videos.We have found this is a useful way to support internal communication as well as potentially create outsideinterest in our work. We will continue to give lectures to visiting K-12 classes. The Carnegie MellonRobotics Institute already has an aggressive outreach program at the K-12 level, and we will participate inthat program, as well as other CMU programs such as Andrew’s Leap (summer enrichment program for highschool students), the SAMS program (Summer Academy for Mathematics & Science: a summer programfor diversity aimed at high schoolers), Creative Tech Night for Girls, and other minority outreach programs.

Curriculum Development. We will develop course material on robot control and biologically inspiredapproaches to humanoid and rehabilitation robotics, which will directly be influenced by the planned activi-ties of this proposal and freely available on the web. The PIs currently teach several courses that will benefitfrom this material. The first course, 16-642: Manipulation, Mobility & Control, is part of the RoboticsInstitute’s recently established professional Master’s degree program that aims at training future leaders inthe workforce of robotics and intelligent automation enterprises and agencies. Two other courses, 16-868:Biomechanics and Motor Control and 16-745: Dynamic Optimization, directly address the research areas inwhich this proposal is embedded. We also teach a course designed to attract undergraduates into the field,16-264: Humanoids. All of these courses emphasize learning from interaction with real robots.

Dissemination Plan. For a more complete description of our dissemination plan, see our Data Man-agement Plan. We will maintain a public website to freely share our simulations and control code, and todocument research progress with video material. We will present our work at conferences and publish it injournals, and use these vehicles to advertise our work to potential collaborators in science and industry.

Technology Transfer. Our research results and algorithms are being used in Disney Research and throughthis technology transfer path will eventually be used in entertainment and education applications, and will beavailable to and inspire the public. Three former students work at Boston Dynamics/Google transferring ourwork to industrial applications, several students have done internships at Disney Research, and two formerpostdocs work there full time.

Benefits to Society. The proposed research has the potential to lead to more useful robots. A successfuloutcome can enable practical controllers for robust locomotion of legged robots in uncertain environmentswith applications in disaster response and elderly care. The outcomes of this project may also provide newinsights into human sensorimotor control of locomotion behaviors. Understanding how humans actuallycontrol their limbs has the potential to improve neural and robotic prostheses and exoskeletons which restorelegged mobility to people who have damaged or lost sensorimotor functions of their legs due to diabeticneuropathy, stroke, or spinal cord injury as well as reduce fall risks in older adults [138, 24, 81, 151].

Enhancement of Infrastructure for Research and Education. The project will help to maintain CMU’sATRIAS and SARCOS robot testbeds, powerful tools for research and education in dynamics and control.These robots regularly attract the interest of students and inspire them to advanced education and training inrobotics.

15

Page 17: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

References

[1] A. D. Ames. First steps toward underactuated human-inspired bipedal robotic walking. In Robot.Autom. (ICRA), 2012 IEEE Int. Conf., pages 1011–1017. IEEE, 2012.

[2] S. Anderson. The Design of Control Architectures for Force-controlled Humanoids Performing Dy-namic Tasks. PhD thesis, Robotics Institute, Carnegie Mellon University, 2013.

[3] S. Anderson and J. Hodgins. Adaptive torque-based control of a humanoid robot on an unstableplatform. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on,pages 511–517, 2010.

[4] S. Anderson and J. Hodgins. Informed priority control for humanoids. In Humanoid Robots (Hu-manoids), 2011 11th IEEE-RAS International Conference on, pages 416–422, 2011.

[5] S. O. Anderson, M. Wisse, C. G. Atkeson, J. K. Hodgins, G. J. Zeglin, and B. Moyer. Powered bipedsbased on passive dynamic principles. In Proceedings of the 5th IEEE-RAS International Conferenceon Humanoid Robots (Humanoids 2005), pages 110–116, 2005.

[6] C. G. Atkeson. Efficient robust policy optimization. In American Control Conference, 2012.

[7] C. G. Atkeson. Big Hero 6: Let’s Build Baymax. build-baymax.org, 2015.

[8] C. G. Atkeson. Team WPI-CMU: Darpa Robotics Challenge. www.cs.cmu.edu/˜cga/drc, 2015.

[9] C. G. Atkeson, B. P. W. Babu, N. Banerjee, D. Berenson, C. P. Bove, X. Cui, M. DeDonato, R. Du,S. Feng, M. Gennert, J. P. Graff, P. He, A. Jaeger, J. Kim, K. Knoedler, L. Li, C. Liu, X. Long,T. Padir, F. Polido, G. G. Tighe, and X. Xinjilefu. NO FALLS, NO RESETS: Reliable humanoidbehavior in the DARPA Robotics Challenge. In IEEE-RAS International Conference on HumanoidRobots (Humanoids), 2015.

[10] B. P. W. Babu, R. Du, T. Padir, and M. Gennert. Improving robustness in complex tasks for a supervi-sor operated humanoid˙ In IEEE-RAS International Conference on Humanoid Robots (Humanoids),2015.

[11] N. Banerjee, X. Long, R. Du, F. Polido, S. Feng, C. G. Atkeson, M. Gennert, and T. Padir. Human-supervised control of the ATLAS humanoid robot for traversing doors. In IEEE-RAS InternationalConference on Humanoid Robots (Humanoids), 2015.

[12] Z. Batts, S. Song, and H. Geyer. Toward a Virtual Neuromuscular Control for Robust Walking inBipedal Robots. In IEEE Conf. Intell. Robot. Syst. Accept.

[13] R. Bellman. Dynamic Programming (Dover Books on Computer Science). Dover Publications, 2003.

[14] D. Bentivegna, C. G. Atkeson, and J.-Y. Kim. Compliant control of a hydraulic humanoid joint. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2007.

[15] BiOM. BiOM Clinical Reference Page. www.biom.com/prosthetists/clinical-references/.

[16] R. Blickhan. The spring-mass model for running and hopping. J. Biomech., 22:1217–1227, 1989.

[17] K. Bouyarmane and A. Kheddar. Using a multi-objective controller to synthesize simulated humanoidrobot motion with changing contact configurations. In Intelligent Robots and Systems (IROS), 2011IEEE/RSJ International Conference on, pages 4414–4419, San Francisco, CA, USA, Sept 2011.

1

Page 18: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[18] K. Bouyarmane, J. Vaillant, F. Keith, and A. Kheddar. Exploring humanoid robots locomotion capa-bilities in virtual disaster response scenarios. In Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on, pages 337–342, Osaka, Japan, Nov 2012.

[19] T. G. Brown. On the nature of the fundamental activity of the nervous centres; together with ananalysis of the conditioning of rhythmic activity in progression, and a theory of the evolution offunction in the nervous system. J Physiol, 48(1):18–46, 1914.

[20] C. Chevallereau, E. R. Westervelt, and J. W. Grizzle. Asymptotically stable running for a five-link,four-actuator, planar bipedal robot. Int. J. Rob. Res., 24(6):431–464, 2005.

[21] B. Cho, I. Park, and J. Oh. Running pattern generation of a humanoid biped with a fixed point and itsrealization. Int. J. Humanoid Robot., 6(4):631–656, 2009.

[22] D. Choi, C. G. Atkeson, S. J. Cho, and J. Y. Kim. Phase plane control of a humanoid. In IEEE-RASInternational Conference on Humanoid Robots (Humanoids), pages 145–150, 2008.

[23] G. Courtine, Y. Gerasimenko, R. van den Brand, A. Yew, P. Musienko, H. Zhong, B. Song, Y. Ao,R. M. Ichiyama, I. Lavrov, R. R. Roy, M. V. Sofroniew, and V. R. Edgerton. Transformationof nonfunctional spinal circuits into functional states after the loss of brain input. Nat Neurosci,12(10):1333–1342, 2009.

[24] C. C. Cowie, K. F. Rust, E. S. Ford, M. S. Eberhardt, D. D. Byrd-Holt, C. Li, D. E. Williams, E. W.Gregg, K. E. Bainbridge, S. H. Saydah, and L. S. Geiss. Full accounting of diabetes and pre-diabetesin the U.S. population in 1988-1994 and 2005-2006. Diabetes Care, 32(2):287–294, 2009.

[25] DARPA. www.theroboticschallenge.org.

[26] M. de Lasa, I. Mordatch, and A. Hertzmann. Feature-based locomotion controllers. ACM Trans.Graph., 29(4):131:1–131:10, July 2010.

[27] R. Desai and H. Geyer. Robust swing leg placement under large disturbances. In Proc. IEEE Int ConfRobot. Biomimetics, pages 265–270, 2012.

[28] R. Desai and H. Geyer. Muscle-Reflex Control of Robust Swing Leg Placement. In Proc IEEE IntConf Robot. Autom., pages 2169 – 2174, 2013.

[29] V. Dietz. Proprioception and locomotor disorders. Nat Rev Neurosci, 3(10):781–790, 2002.

[30] T. W. Dorn, J. M. Wang, J. L. Hicks, and S. L. Delp. Predictive Simulation Generates Human Adap-tations during Loaded and Inclined Walking. PLoS One, 10(4):e0121407, Jan. 2015.

[31] B. Dynamics. Youtube: Boston dynamics channel, 2015.https://www.youtube.com/user/BostonDynamics.

[32] F. Dzeladini, J. van den Kieboom, and A. Ijspeert. The contribution of a central pattern generator ina reflex-based neuromuscular model. Front. Hum. Neurosci., 8:1–18, 2014.

[33] M. Eilenberg, H. Geyer, and H. Herr. Control of a Powered Ankle-Foot Prosthesis Based on a Neu-romuscular Model. IEEE Trans Neural Syst Rehabil Eng, 18(2):164–173, 2010.

[34] A. Escande, N. Mansard, and P.-B. Wieber. Hierarchical quadratic programming: Fast onlinehumanoid-robot motion generation. The International Journal of Robotics Research, page in press,march 2014.

2

Page 19: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[35] S. Feng, E. Whitman, X. Xinjilefu, and C. G. Atkeson. Optimization Based Full Body Control forthe Atlas Robot. In Proc IEEE Conf Humanoids, Madrid, Spain, 2014.

[36] S. Feng, E. Whitman, X. Xinjilefu, and C. G. Atkeson. Optimization-based full body control for theDARPA Robotics Challenge. Journal of Field Robotics, 32(2):293–312, 2015.

[37] S. Feng, X. Xinjilefu, C. G. Atkeson, and J. Kim. Optimization based controller design and imple-mentation for the Atlas robot in the DARPA Robotics Challenge Finals. In IEEE-RAS InternationalConference on Humanoid Robots (Humanoids), 2015.

[38] S. Feng, X. Xinjilefu, W. Huang, and C. G. Atkeson. 3d walking based on online optimization. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2013.

[39] T. Geijtenbeek, M. van de Panne, and A. F. van der Stappen. Flexible muscle-based locomotion forbipedal creatures. ACM Trans. Graph., 32:1–11, 2013.

[40] Y. Gerasimenko, R. R. Roy, and R. V. Edgerton. Epidural stimulation: Comparison of the spinalcircuits that generate and control locomotion in rats, cats and humans. Exp. Neurol., 209:417–425,2008.

[41] H. Geyer and H. Herr. A Muscle-Reflex Model that Encodes Principles of Legged Mechanics Pro-duces Human Walking Dynamics and Muscle Activities. IEEE Trans Neural Syst Rehabil Eng,18(3):263–273, 2010.

[42] H. Geyer, A. Seyfarth, and R. Blickhan. Positive force feedback in bouncing gaits? Proc. R. Soc.Lond. B, 270:2173–2183, 2003.

[43] H. Geyer, A. Seyfarth, and R. Blickhan. Spring-mass running: simple approximate solution andapplication to gait stability. J. Theor. Biol., 232(3):315–328, 2005.

[44] H. Geyer, A. Seyfarth, and R. Blickhan. Compliant leg behaviour explains the basic dynamics ofwalking and running. Proc. R. Soc. Lond. B, 273:2861–2867, 2006.

[45] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrainedoptimization. SIAM J. Optim, 12:979–1006, 2002.

[46] S. Grillner. Neurobiological bases of rhythmic motor acts in vertebrates. Science, 228(4696):143–9,Apr. 1985.

[47] K. Hase and N. Yamazaki. Computer Simulation Study of Human Locomotion with a Three-Dimensional Entire-Body Neuro-Musculo-Skeletal Model. I. Acquisition of Normal Walking. JSMEInt. J. Ser. C, 45:1040–1050, 2002.

[48] H. Herr, H. Geyer, and M. Eilenberg. Model-based neuromechanical controller for a robotic leg. USPat. No. 8864846, 2014.

[49] A. Herzog, L. Righetti, F. Grimminger, P. Pastor, and S. Schaal. Balancing experiments on a torque-controlled humanoid with hierarchical inverse dynamics. In Intelligent Robots and Systems (IROS),2011 IEEE/RSJ International Conference on, Sept 2014.

[50] M. Hirose and K. Ogawa. Honda humanoid robots development. Philos Trans. A Math Phys Eng Sci,365(1850):11–19, Jan. 2007.

[51] J. Hodgins. mocap.cs.cmu.edu.

3

Page 20: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[52] J. K. Hodgins. Three-Dimensional Human Running. In Proc. 1996 IEEE Int. Conf. Robot. Autom.,pages 3271–3276, 1996.

[53] W. Huang, J. Kim, and C. G. Atkeson. Energy-based optimal step planning for humanoids. InRobotics and Automation (ICRA), 2013 IEEE International Conference on, pages 3124–3129, 2013.

[54] H. Hultborn and J. B. Nielsen. Spinal control of locomotion - from cat to man. Acta Physiol.,189(2):111–121, 2007.

[55] M. Hutter, M. Hoepflinger, C. Gehring, C. D. R. M. Bloesch, and R. Siegwart. Hybrid operationalspace control for compliant legged systems. In Proc. of the 8th Robotics: Science and SystemsConference (RSS), 2012.

[56] M. Hutter, H. Sommer, C. Gehring, M. Hoepflinger, M. Bloesch, and R. Siegwart. Quadrupedal loco-motion using hierarchical operational space control. The International Journal of Robotics Research,2014.

[57] V. T. Inman, H. J. Ralston, F. Todd, and J. C. Lieberman. Human walking. Williams & Wilkins,Baltimore, 1981.

[58] D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Elsevier, 1970.

[59] M. Johnson, B. Shrewsbury, S. Bertrand, T. Wu, D. Duran, M. Floyd, P. Abeles, D. Stephen,N. Mertins, A. Lesman, J. Carff, W. Rifenburgh, P. Kaveti, W. Straatman, J. Smith, M. Griffioen,B. Layton, T. de Boer, T. Koolen, P. Neuhaus, and J. Pratt. Team IHMC’s lessons learned from theDARPA Robotics Challenge Trials. Journal of Field Robotics, 32(2):192–208, 2015.

[60] J. J. K. Jr., S. Kagami, K. Nishiwaki, M. Inaba, and H. Inoue. Dynamically-Stable Motion Planningfor Humanoid Robots. Auton. Robots, 12(1):105–118, Jan. 2002.

[61] S. Kagami, M. Mochimaru, Y. Ehara, N. Miyata, K. Nishiwaki, H. Inoue, and T. Kanade. Measure-ment and comparison of humanoid h7 walking with human being. Robotics and Autonomous Systems,48(4):177–187, 2004.

[62] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Biped walk-ing pattern generation by using preview control of zero-moment point. In Robotics and Automation,2003. IEEE International Conference on, volume 2, pages 1620–1626 vol.2, 2003.

[63] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Resolvedmomentum control: humanoid motion planning based on the linear and angular momentum. In Intell.Robot. Syst., pages 1644–1650, 2003.

[64] S. Kajita, T. Nagasaki, K. Kaneko, and H. Hirukawa. ZMP-Based Biped Running Control. Robot.Autom. Mag. IEEE, 14(2):63–72, 2007.

[65] O. Kanoun, F. Lamiraux, and P. B. Wieber. Kinematic control of redundant manipulators: Generaliz-ing the task-priority framework to inequality task. Robotics, IEEE Transactions on, 27(4):785–792,Aug 2011.

[66] O. Khatib. A unified approach for motion and force control of robot manipulators: The operationalspace formulation. Robotics and Automation, IEEE Journal of, 3(1):43–53, February 1987.

[67] J. Kim, N. S. Pollard, and C. G. Atkeson. Quadratic encoding of optimized humanoid walking. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2013.

4

Page 21: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[68] J.-Y. Kim, C. G. Atkeson, J. K. Hodgins, D. Bentivegna, and S. J. Cho. Online gain switching algo-rithm for joint position control of a hydraulic humanoid robot. In IEEE-RAS International Conferenceon Humanoid Robots (Humanoids), 2007.

[69] S. Kim, C. G. Atkeson, and S. Park. Perturbation-dependent selection of postural feedback gain andits scaling. Journal of Biomechanics, 45(8):1379–1386, 2012.

[70] Y. Kim, Y. Tagawa, G. Obinata, and K. Hase. Robust control of CPG-based 3D neuromusculoskeletalwalking model. Biol. Cybern., 105:269–282, 2011.

[71] T. Koolen, J. Smith, G. Thomas, S. Bertrand, J. Carff, N. Mertins, D. Stephen, P. Abeles, J. En-glsberger, S. McCrory, J. van Egmond, M. Griffioen, M. Floyd, S. Kobus, N. Manor, S. Alsheikh,D. Duran, L. Bunch, E. Morphis, L. Colasanto, K.-L. H. Hoang, B. Layton, P. Neuhaus, M. Johnson,, and J. Pratt. Summary of team IHMCs Virtual Robotics Challenge entry. In Humanoid Robots(Humanoids), 2013 13th IEEE-RAS International Conference on, 2013.

[72] S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, andR. Tedrake. Optimization-based locomotion planning, estimation, and control design for the Atlashumanoid robot. Autonomous Robots, in press, 2015.

[73] S. Kuindersma, F. Permenter, and R. Tedrake. An efficiently solvable quadratic program for sta-bilizing dynamic locomotion. In Robotics and Automation, 2013. ICRA ’13. IEEE InternationalConference on, 2013.

[74] T. Kwon and J. K. Hodgins. Control systems for human running using an inverted pendulum modeland a reference motion capture sequence. In Proceedings of the 2010 ACM SIGGRAPH/EurographicsSymposium on Computer Animation, SCA ’10, 2010.

[75] J. Lee. Model predictive control: Review of the three decades of development. Int. J. Control. Autom.Syst., 9(3):415–424, 2011.

[76] S.-H. Lee and A. Goswami. Ground reaction force control at each foot: A momentum-based hu-manoid balance controller for non-level and non-stationary ground. In Intelligent Robots and Systems(IROS), 2010 IEEE/RSJ International Conference on, pages 3157–3162, Oct 2010.

[77] G. Leonov and N. Kuznetsov. Analytical-numerical methods for investigation of hidden oscillationsin nonlinear control systems. In IFAC Proceedings, volume 18(1), pages 2494–2505, 2011.

[78] C. Liu and C. G. Atkeson. Standing balance control using a trajectory library. In IEEE/RSJ Interna-tional Conference on Intelligent Robots and Systems (IROS), pages 3031 — 3036, 2009.

[79] C. Liu, C. G. Atkeson, S. Feng, and X. Xinjilefu. Full-body motion planning and control for the caregress task of the DARPA Robotics Challenge. In IEEE-RAS International Conference on HumanoidRobots (Humanoids), 2015.

[80] C. Liu, C. G. Atkeson, and J. Su. Biped walking control using a trajectory library. ROBOTICA,31:311–322, 2013.

[81] D. Lloyd-Jones, R. J. Adams, T. M. Brown, M. Carnethon, S. Dai, G. De Simone, and Others. Heartdisease and stroke statistics–2010 update: a report from the American Heart Association. Circulation,121(7):e46–e215, 2010.

[82] L. Luyckx, M. Loccuer, and E. Noldus. Computational methods in nonlinear stability analysis: stabil-ity boundary calculations. Journal of Computational and Applied Mathematics, 168:289–297, 2004.

5

Page 22: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[83] T. Mandersloot, M. Wisse, and C. Atkeson. Controlling velocity in bipedal walking: A dynamicprogramming approach. In Proceedings of the 6th IEEE-RAS International Conference on HumanoidRobots (Humanoids), 2006.

[84] W. Martin, A. Wu, and H. Geyer. Robust Spring-Mass Model Running on ATRIAS Biped. In Dyn.Walk. 2014, 2014.

[85] W. Martin, A. Wu, and H. Geyer. Robust Spring Mass Model Running for a Physical Bipedal Robot.In IEEE Int Conf Robot. Autom., pages 6307–6312, 2015.

[86] T. A. McMahon and G. C. Cheng. The mechanism of running: how does stiffness couple with speed?J. Biomech., 23:65–78, 1990.

[87] M. Mistry, J. Buchli, and S. Schaal. Inverse dynamics control of floating base systems using orthog-onal decomposition. In Robotics and Automation (ICRA), 2010 IEEE International Conference on,pages 3406–3412, 2010.

[88] M. Mistry, A. Murai, K. Yamane, and J. Hodgins. Sit-to-stand task on a humanoid robot from humandemonstration. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conferenceon, pages 218–223, 2010.

[89] M. Mistry, A. Murai, K. Yamane, and J. Hodgins. Model-Based Control and Estimation of HumanoidRobots via Orthogonal Decomposition. In O. Khatib, V. Kumar, and G. Sukhatme, editors, Exp.Robot., pages 839–854. Springer Berlin Heidelberg, 2014.

[90] I. Mordatch, M. de Lasa, and A. Hertzmann. Robust physics-based locomotion using low-dimensionalplanning. ACM Trans. Graph., 29(4):1, July 2010.

[91] J. Morimoto, G. Endo, J. Nakanishi, and G. Cheng. A Biologically Inspired Biped LocomotionStrategy for Humanoid Robots: Modulation of Sinusoidal Patterns by a Coupled Oscillator Model.IEEE Trans. Robot., 24(1):185–191, Feb. 2008.

[92] G. Nelson, A. Saunders, N. Neville, B. Swilling, J. Bondaryk, D. Billings, C. Lee, R. Playter, andM. Raibert. Petman: A humanoid robot for testing chemical protective clothing. J. Robot. Soc. Japan,30(4):372–377, 2012.

[93] S. Noda, M. Murooka, S. Nozawa, Y. Kakiuchi, K. Okada, and M. Inaba. Generating whole-bodymotion keep away from joint torque, contact force, contact moment limitations enabling steep climb-ing with a real humanoid robot. In 2014 IEEE Int. Conf. Robot. Autom., pages 1775–1781. IEEE,May 2014.

[94] Y. Ogura, K. Shimomura, H. Kondo, A. Morishima, T. Okubo, S. Momoki, H. ok Lim, and A. Takan-ishi. Human-like walking with knee stretched, heel-contact and toe-off motion by a humanoid robot.In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 3976–3981,2006.

[95] C. Ott, M. Roa, and G. Hirzinger. Posture and balance control for biped robots based on contact forceoptimization. In Humanoid Robots (Humanoids), 2011 11th IEEE-RAS International Conference on,pages 26–33, Oct 2011.

[96] J. Perry. Gait analysis: normal and pathological function. SLACK Inc., Thorofare, NJ, 1992.

[97] E. Pierrot-Desseilligny and D. C. Burke. The circuitry of the human spinal cord: its role in motorcontrol and movement disorders. Cambridge University Press, Cambridge, 2005.

6

Page 23: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[98] G. A. Pratt and M. M. Williamson. Series elastic actuators. IEEE/RSJ Int. Conf. Intell. Robot. Syst.,1:399–406, 1995.

[99] J. Pratt, J. Carff, S. Drakunov, and A. Goswami. Capture point: A step toward humanoid pushrecovery. In Humanoid Robots, 2006 6th IEEE-RAS International Conference on, pages 200–207,2006.

[100] J. Pratt, T. Koolen, T. de Boer, J. Rebula, S. Cotton, J. Carff, M. Johnson, and P. Neuhaus.Capturability-based analysis and control of legged locomotion, Part 2: Application to M2V2, a lower-body humanoid. Int. J. Rob. Res., 31:1117–1133, 2012.

[101] M. Raibert, K. Blankenspoor, G. Nelson, and R. Playter. BigDog, the Rough-Terrain QuadrupedRobot. In Proc. 17th World Congr. Int Fed Autom. Control, pages 10823–10825, 2008.

[102] M. H. Raibert. Legged robots that balance. MIT press, Cambridge, 1986.

[103] O. Ramos, N. Mansard, O. Stasse, and P. Soueres. Walking on non-planar surfaces using an in-verse dynamic stack of tasks. In Humanoid Robots (Humanoids), 2012 12th IEEE-RAS InternationalConference on, pages 829–834, Nov 2012.

[104] L. Righetti, J. Buchli, M. Mistry, M. Kalakrishnan, and S. Schaal. Optimal distribution of contactforces with inverse dynamics control. International Journal of Robotics Research, pages 280–298,2013.

[105] L. Righetti, J. Buchli, M. Mistry, and S. Schaal. Inverse dynamics control of floating-base robots withexternal constraints: A unified view. In Robotics and Automation (ICRA), 2011 IEEE InternationalConference on, pages 1085–1090, 2011.

[106] L. Saab, O. Ramos, F. Keith, N. Mansard, P. Soueres, and J. Fourquet. Dynamic whole-body motiongeneration under rigid contacts and other unilateral constraints. Robotics, IEEE Transactions on,29(2):346–362, April 2013.

[107] S. Sanan, M. H. Ornstein, and C. G. Atkeson. Physical human interaction for an inflatable manipula-tor. In IEEE Engineering in Medicine and Biology Society (EMBC), pages 7401–7404, 2011.

[108] S. Schaal and C. G. Atkeson. Learning control for robotics. IEEE Robotics & Automation Magazine,17(2):20–29, 2010.

[109] A. Schepelmann, J. Austin, and H. Geyer. Evaluation of Decentralized Reactive Swing-Leg Controlon a Powered Robotic Leg. In Proc IEEE Int Conf Intell. Robot. Syst.

[110] A. Schepelmann, M. D. Taylor, and H. Geyer. Development of a Testbed for Robotic NeuromuscularControllers. In Robot. Sci. Syst. VIII - Online Proc., 2012.

[111] L. Sentis, J. Park, and O. Khatib. Compliant control of multicontact and center-of-mass behaviors inhumanoid robots. IEEE Trans. Robot., 26:483–501, 2010.

[112] A. Seyfarth, H. Geyer, and H. M. Herr. Swing-leg retraction: a simple control model for stablerunning. J. Exp. Biol., 206:2547–2555, 2003.

[113] A. Seyfarth, M. Gunther, and R. Blickhan. Stable operation of an elastic three-segmented leg. Biol.Cybern., 84:365–382, 2001.

[114] J. Shan and F. Nagashima. Neural Locomotion Controller Design and Implementation for HumanoidRobot HOAP-1. In Proc. 20th Annu. confrence Robot. Soc. Japan, 2002.

7

Page 24: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[115] C. S. Sherrington. Flexion-reflex of the limb, crossed extension-reflex, and reflex stepping and stand-ing. J Physiol, 40(1-2):28–121, 1910.

[116] P. B. Sistu and B. W. Bequette. Nonlinear model-predictive control: Closed-loop stability analysi.AIChE Journal, 42(12):3388–3402, 1996.

[117] K. W. Sok, K. Yamane, J. Lee, and J. Hodgins. Editing dynamic human motions via momentumand force. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on ComputerAnimation, SCA ’10, pages 11–20, 2010.

[118] S. Song, R. Desai, and H. Geyer. Integration of an adaptive swing control into a neuromuscularhuman walking model. In 2013 35th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., pages 4915–4918.IEEE, July 2013.

[119] S. Song and H. Geyer. Regulating speed and generating large speed transitions in a neuromuscularhuman walking model. In Proc IEEE Int. Conf. Robot. Autom., pages 511–516. IEEE, May 2012.

[120] S. Song and H. Geyer. Generalization of a muscle-reflex control model to 3D walking. In 2013 35thAnnu. Int. Conf. IEEE Eng. Med. Biol. Soc., pages 7463–7466. IEEE, July 2013.

[121] S. Song and H. Geyer. Robust 3D locomotion models using primarily reflex control. In Online ProcDyn. Walk., 2013.

[122] S. Song and H. Geyer. A neural circuitry that emphasizes spinal feedbacks generates diverse be-haviours of human locomotion. J. Physiol., 593(16):3493–3511, 2015.

[123] K. Sreenath, H.-W. Park, I. Poulakakis, and J. W. Grizzle. Embedding active force control withinthe compliant hybrid zero dynamics to achieve stable, fast running on MABEL. Int. J. Rob. Res.,32(3):324–345, 2013.

[124] B. Stephens. Dynamic balance force control for compliant humanoid robots. In International Con-ference on Intelligent Robots and Systems (IROS), pages 1248–1255, 2010.

[125] B. Stephens. Push Recovery Control for Force-Controlled Humanoid Robots. PhD thesis, CarnegieMellon University, 2011.

[126] B. J. Stephens and C. G. Atkeson. Push Recovery by stepping for humanoid robots with force con-trolled joints. 2010 10th IEEE-RAS Int. Conf. Humanoid Robot., pages 52–59, 2010.

[127] M. Stolle and C. G. Atkeson. Finding and transferring policies using stored behaviors. AutonomousRobots, 29(2):169—200, 2010.

[128] G. Taga. A model of the neuro-musculo-skeletal system for human locomotion: I. Emergence ofbasic gait. Biol. Cybern., 73(2):97–111, 1995.

[129] G. Taga. A model of the neuro-musculo-skeletal system for anticipatory adjustment of human loco-motion during obstacle avoidance. Biol Cybern, 78(1):9–17, 1998.

[130] G. Taga, Y. Yamaguchi, and H. Shimizu. Self-organized control of bipedal locomotion by neuraloscillators in unpredictable environment. Biol Cybern, 65(3):147–159, 1991.

[131] T. Takenaka, T. Matsumoto, T. Yoshiike, and S. Shirokura. Real time motion generation and controlfor biped robot 2nd report: Running gait pattern generation. In Intell. Robot. Syst. 2009. IROS 2009.IEEE/RSJ Int. Conf., pages 1092–1099. IEEE, 2009.

8

Page 25: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[132] R. Tedrake, I. R. Manchester, M. Tobenkin, and J. W. Roberts. LQR-trees: Feedback Motion Planningvia Sums-of-Squares Verification. Int. J. Rob. Res., 29(8):1038–1052, Apr. 2010.

[133] N. Thatte and H. Geyer. Toward Balance Recovery with Leg Prostheses using Neuromuscular ModelControl. IEEE Trans. Biomed. Eng. Accept.

[134] N. Thatte and H. Geyer. Towards Local Reflexive Control of a Powered Transfemoral Prosthesis forRobust Amputee Push and Trip Recovery. IEEE Conf. Intell. Robot. Syst., 2014.

[135] E. Theodorou, Y. Tassa, and E. Todorov. Stochastic Differential Dynamic Programming. In Am.Control Conf., pages 1125–1132. IEEE, 2010.

[136] D. Torricelli, R. Mizanoor, J. Gonzalez, V. Lippi, G. Hettich, L. Asslaener, M. Weckx, B. Vander-borght, S. Dosen, M. Sartori, J. Zhao, S. Schuetz, Q. Liu, T. Mergner, D. Lefeber, D. Farina, K. Berns,and J. Pons. Benchmarking Human-Like Posture and Locomotion of Humanoid Robots: A Prelim-inary Scheme. In A. Duff, N. Lepora, A. Mura, T. Prescott, and P. Verschure, editors, Lect. NotesComput. Sci. Living Mach., pages 320–331. Springer International Publishing Switzerland, 2014.

[137] J. L. van Leeuwen. Muscle function in locomotion. In R. Alexander, editor, Mech. Anim. Locomot.,volume 11, pages 191–249. Springer-Verlag, New York, 1992.

[138] A. I. Vinik, B. D. Mitchell, S. B. Leichter, A. L. Wagner, J. T. O’Brian, and L. P. Georges. Epidemi-ology of the complications of diabetes, pages 221–287. Cambridge University Press, Cambridge,1995.

[139] J. Wang. Numerical Nonlinear Robust Control with Applications to Humanoid Robots. PhD thesis,Robotics Institute, Carnegie Mellon University, 2015.

[140] J. M. Wang, S. R. Hamner, S. L. Delp, and V. Koltun. Optimizing locomotion controllers usingbiologically-based actuators and objectives. ACM Trans. Graph., 31:1–11, 2012.

[141] P. Wensing and D. Orin. Generation of dynamic humanoid behaviors through task-space control withconic optimization. In Robotics and Automation (ICRA), 2013 IEEE International Conference on,pages 3103–3109, Karlsruhe, Germany, May 2013.

[142] E. R. Westervelt and J. W. Grizzle. Feedback Control of Dynamic Bipedal Robot Locomotion. Controland Automation Series. CRC PressINC, 2007.

[143] E. Whitman. Coordination of Multiple Dynamic Programming Policies for Control of Bipedal Walk-ing. PhD thesis, Robotics Institute, Carnegie Mellon University, 2013.

[144] E. Whitman and C. G. Atkeson. Control of instantaneously coupled systems applied to humanoidwalking. In International Conference on Humanoid Robots, pages 210–217, 2010.

[145] E. C. Whitman and C. G. Atkeson. Multiple model robust dynamic programming. In AmericanControl Conference (ACC), pages 5998–6004, 2012.

[146] E. C. Whitman, B. J. Stephens, and C. G. Atkeson. Torso rotation for push recovery using a simplechange of variables. In International Conference on Humanoid Robots, pages 50–56, 2012.

[147] Wikipedia. Describing function. en.wikipedia.org/wiki/Describing_function, 2015.

[148] D. A. Winter. Biomechanics and motor control of human movement. Wiley, Hoboken, N.J., 4th editioedition, 2009.

9

Page 26: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/Humanoid_Main.pdfwork blending model-based and parametric policy-based control for whole body locomotion

[149] M. Wisse, C. G. Atkeson, and D. K. Kloimwieder. Swing leg retraction helps biped walking stability.In Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots (Humanoids),2005.

[150] A. Wu and H. Geyer. The 3-D Spring-Mass Model Reveals a Time-Based Deadbeat Control forHighly Robust Running and Steering in Uncertain Environments. IEEE Trans. Robot., 29(5):1114–1124, 2013.

[151] M. Wyndaele and J.-J. Wyndaele. Incidence, prevalence and epidemiology of spinal cord injury: whatlearns a worldwide literature survey? Spinal Cord, 44(9):523–529, 2006.

[152] W. Xian, W. Yuan, and T. Coombs. Numerical assessment of efficiency and control stability of an htssynchronous motor. In 9th European Conference on Applied Superconductivity (EUCAS 09), Journalof Physics: Conference Series, volume 234, 2010.

[153] D. Xing, C. G. Atkeson, J. Su, and B. Stephens. Gain scheduled control of perturbed standing balance.In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4063–4068,2010.

[154] Xinjilefu. State Estimation for Humanoid Robots. PhD thesis, Robotics Institute, Carnegie MellonUniversity, 2015.

[155] Xinjilefu and C. Atkeson. State estimation of a walking humanoid robot. In Intelligent Robots andSystems (IROS), 2012 IEEE/RSJ International Conference on, pages 3693–3699, 2012.

[156] X. Xinjilefu, S. Feng, and C. G. Atkeson. Center of mass estimator for humanoids and its appli-cation in modelling error compensation, fall detection and prevention. In IEEE-RAS InternationalConference on Humanoid Robots (Humanoids), 2015.

[157] K. Yamane, S. Anderson, and J. Hodgins. Controlling humanoid robots with human motion data:Experimental validation. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS InternationalConference on, pages 504–510, 2010.

[158] K. Yamane and J. Hodgins. Control-aware mapping of human motion data with stepping for hu-manoid robots. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conferenceon, pages 726–733, 2010.

[159] K. Yamane, J. Kuffner, and J. Hodgins. Synthesizing animations of human manipulation tasks. ACMTransactions on Graphics, 23(3):532–539, 2004.

[160] K. Yin, K. Loken, and M. van der Penne. SIMBICON: simple biped locomotion control. In ProcACM SIGGRAPH, volume 26, page 105, 2007.

[161] R. Zaier and S. Kanda. Adaptive locomotion controller and reflex system for humanoid robots. In2008 IEEE/RSJ Int. Conf. Intell. Robot. Syst. IROS, pages 2492–2497, 2008.

[162] S. Zapolsky, E. Drumwright, I. Havoutis, J. Buchli, and S. C. Inverse dynamics for a quadruped robotlocomoting along slippery surfaces. In International Conference on Climbing and Walking Robots(CLAWAR), 2013.

[163] M. Zucker, J. A. Bagnell, C. G. Atkeson, and J. Kuffner. An optimization approach to rough terrainlocomotion. In IEEE International Conference on Robotics and Automation (ICRA), pages 3589–3595, 2010.

[164] M. Zucker, N. D. Ratliff, M. Stolle, J. E. Chestnutt, J. A. Bagnell, C. G. Atkeson, and J. Kuffner.Optimization and learning for rough terrain legged locomotion. International Journal of RoboticResearch, 30(2):175–191, 2011.

10