ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots

Milad Shafiee             Guillaume Bellegarda             Auke Ijspeert

Abstract

Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. The robot differences encompass: a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 16 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot’s nominal mass.

Simulations

Training a single policy for 16 diverse robots. These robots exhibit variations in mass, ranging from 2 to 200 kg, nominal height from 18 to 100 cm, and come in three different morphologies, with two types of DoF, either 12 or 16.

we trained a single policy for 13 quadruped robots, excluding HYQ, Dog3, and B1, during training. We deliberately selected these robots to test the generalization capabilities of the framework. See the next video in the right.

We observed reasonable locomotion behavior for HYQ, Dog3, and B1 robots, even though the policy had not been specifically trained on them. We deliberately selected these robots to represent the robustness of the trained policy.


Experiments A1

Trotting 1 (m/s).

Trotting with additional 10 kg mass.

Trotting with additional 15 kg mass.


Experiments Go1

Trotting 1 (m/s).

Trotting on the uneven grass.

Trotting on the uneven grass.


BibTeX


@article{shafiee2023manyquadrupeds,
  title={ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots},
  author={Shafiee, Milad and Bellegarda, Guillaume and Ijspeert, Auke},
  journal={arXiv preprint arXiv:2310.10486},
  year={2023}
}