Robot Perception and Control

Legged Locomotion

Last updated: Jul / 25 /2024
Kashu Yamazaki
kyamazak@andrew.cmu.edu

What can we do with RL Legged Robots?

#center

#center

#center
#center

Kashu Yamazaki, 2024

Actuator Networks arxiv

Actuators are extremely difficult to model accurately.

  • nonlinear and non-smooth dissipation in dynamics.
  • contains cascaded feedback loops and a number of internal states that are not even directly observable.

Actuator Networks is a data driven solution that can provide better simulation of an actuator via supervised learning.

  • learns action-to-torque relationship that includes all software and hardware dynamics.
  • actuator network estimated torque at the joints given a history of position errors and velocities.

#center

collect joint position errors, velocities, and torque using a controller for more than a million samples with varied amplitude and frequency and manual disturbances for diverse situation.

Kashu Yamazaki, 2024

Learning by Cheating arxiv github

Proposed two-stage training procedure, which first train a privileged agent and then using the agent as a teacher to train a purely vision-based system, for effective imitation learning. This paradigm is the underlying concept in the legged RL.

#center

Kashu Yamazaki, 2024

Learning Locomotion over Challenging Terrain arxiv github

#center

Kashu Yamazaki, 2024

RMA: Rapid Motor Adaptation paper

#center

Kashu Yamazaki, 2024

Learning to Walk in Minutes arvix github

Presents a training setup that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU (showcase of Isaac Gym).

  • A codebase is widely used as baseline for developing legged locomotion system.

#center

Kashu Yamazaki, 2024

Walk These Ways arxiv

Kashu Yamazaki, 2024

Perceptive locomotion

Perceptive locomotion for quadrupeds

Presented a three stage training and deploy method to perform zero-shot sim-to-real transfer [1].

  1. a teacher policy, which has access to privileged information, is trained to follow a random target velocity over randomly generated terrain with random disturbances.
  2. a student policy is trained to reproduce the teacher policy’s actions without using this privileged information.
  3. transfer the learned student policy to the physical robot and deploy it in the real world with onboard sensors.
Kashu Yamazaki, 2024

Training teacher policy

#center

Kashu Yamazaki, 2024

Training student policy

#center

Kashu Yamazaki, 2024

Deployment

#center

Kashu Yamazaki, 2024

Legged Locomotion using Egocentric Vision arxiv

#center

Kashu Yamazaki, 2024

Parkour Learning arxiv

Kashu Yamazaki, 2024

Extreme Parkour arxiv

#center

Kashu Yamazaki, 2024

Humanoid Parkour Learning arxiv

Kashu Yamazaki, 2024