Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation
Presented at Humanoids 2025
Overview
Reinforcement learning (RL) has enabled advances in humanoid robot locomotion, yet most learning frameworks do not account for mechanical intelligence embedded in parallel actuation mechanisms due to limitations in simulators for closed kinematic chains. This omission can lead to inaccurate motion modeling and suboptimal policies, particularly for robots with high actuation complexity. This paper presents general formulations and simulation methods for three types of parallel mechanisms: a differential pulley, a five-bar linkage, and a four-bar linkage, and trains a parallel-mechanism aware policy through an end-to-end curriculum RL framework for BRUCE, a custom kid-sized humanoid robot. Unlike prior approaches that rely on simplified serial approximations, we simulate all closed-chain constraints natively using GPU-accelerated MuJoCo (MJX), preserving the hardware's mechanical nonlinear properties during training. We benchmark our RL approach against a model predictive controller (MPC), demonstrating better surface generalization and performance in real-world zero-shot deployment. This work highlights the computational approaches and performance benefits of fully simulating parallel mechanisms in end-to-end learning pipelines for legged humanoids.
BRUCE's lower body uses three distinct parallel mechanisms: a cable-driven differential pulley at the hip, a four-bar linkage, and a five-bar linkage in the legs. These designs combine motor outputs, reduce moving mass, and provide high transmission ratios. Instead of simplifying these closed-chain linkages into serial joints, our simulation models all equality constraints directly in GPU-accelerated MuJoCo (MJX). This preserves the true actuator-to-output mappings for position, velocity, and torque, while also making nonlinear transmission and singularities in the four-bar and five-bar linkages explicit. With this high-fidelity model, we train locomotion policies through curriculum reinforcement learning that control the same actuator joints as the hardware, achieving zero-shot transfer and showing the benefits of simulating parallel actuation fully.
We trained BRUCE with a curriculum reinforcement learning pipeline that gradually increased task difficulty, starting from simple balancing and progressing to dynamic walking under disturbances. The entire process ran on GPU-accelerated MuJoCo (MJX), allowing thousands of parallel environments to simulate BRUCE's closed-chain mechanisms efficiently. This large-scale, staged training produced robust policies that transferred directly to the real robot.
Full Video
Citation
If you use this work or find it helpful, please consider citing: (bibtex)
@INPROCEEDINGS{11203130,
author={Tanaka, Yusuke and Zhu, Alvin and Wang, Quanyou and Hong, Dennis},
booktitle={2025 IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids)},
title={Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation},
year={2025},
volume={},
number={},
pages={882-889},
keywords={Couplings;Training;Torque;Pulleys;Pipelines;Humanoid robots;Reinforcement learning;Kinematics;Predictive models;Hardware},
doi={10.1109/Humanoids65713.2025.11203130}}