A METHOD FOR COMPLEX SKILL ACQUISITION IN HUMANOID ROBOTS BASED ON IMITATION LEARNING AND REINFORCEMENT LEARNING-Upubscience Publisher

A METHOD FOR COMPLEX SKILL ACQUISITION IN HUMANOID ROBOTS BASED ON IMITATION LEARNING AND REINFORCEMENT LEARNING

Download as PDF

Volume 7, Issue 5, Pp 80-89, 2025

DOI: https://doi.org/10.61784/jcsee3095

Author(s)

ShuoPei Yang, MengLi Wei^*, YaNing Zhao

Affiliation(s)

Lingjing Jushen (Ningbo) Electronic Technology Co., Ltd., Ningbo 31500, Zhejiang, China.

Corresponding Author

MengLi Wei

ABSTRACT

This paper addresses core challenges in humanoid robot skill acquisition—such as low sample efficiency, poor safety, and weak generalization capability in high-dimensional continuous action spaces and complex dynamic environments—by proposing a hybrid framework that integrates imitation learning and reinforcement learning. The method employs a temporal variational autoencoder for behavior cloning and introduces an action-state alignment loss to enhance imitation quality. During the reinforcement learning phase, model-based safe exploration and curriculum-based reward shaping are combined to optimize the policy while ensuring safety. Experimental results demonstrate that the proposed framework significantly outperforms baseline methods in task success rate, sample efficiency, and zero-shot transfer performance, enabling efficient and robust skill learning from demonstration to autonomous execution. This provides an effective solution for the practical application of humanoid robots in complex environments.

KEYWORDS

Humanoid robot; Skill acquisition; Imitation learning; Reinforcement learning; Zero-shot transfer

CITE THIS PAPER

ShuoPei Yang, MengLi Wei, YaNing Zhao. A method for complex skill acquisition in humanoid robots based on imitation learning and reinforcement learning. Journal of Computer Science and Electrical Engineering. 2025, 7(5): 80-89. DOI: https://doi.org/10.61784/jcsee3095.

REFERENCES

[1] Krishna L, Castillo G A, Mishra U A, et al. Linear policies are sufficient to realize robust bipedal walking on challenging terrains. IEEE Robotics and Automation Letters, 2022, 7(2): 2047-2054.

[2] Chand P, Veer S, Poulakakis I. Interactive dynamic walking: Learning gait switching policies with generalization guarantees. IEEE Robotics and Automation Letters, 2022, 7(2): 4149-4156.

[3] Koseki S, Mohseni O, Owaki D, et al. Concerted control: Modulating joint stiffness using GRF for gait generation at different speeds. IEEE Robotics and Automation Letters, 2025, 10(4): 3446-3453.

[4] Viceconte P M, Camoriano R, Romualdi G, et al. ADHERENT : Learning human-like trajectory generators for whole-body control of humanoid robots. IEEE Robotics and Automation Letters, 2022, 7(2): 2779-2786.

[5] Mittal M, Yu C, Yu Q, et al. Orbit:A unified simulation framework for interactive robot learning environments. IEEE Robotics and Automation Letters, 2023, 8(6): 3740-3747.

[6] Song J Z, Chen Q L, Li Z D. A peg-in-hole robot assembly system based on Gauss mixture model. Robotics and Computer-Integrated Manufacturing, 2021, 67(1): 1-8．

[7] Wang S, Chen G D, Xu H, et al. A robotic peg-in-hole assembly strategy based on variable compliance center. IEEE Access, 2019(7): 167534-167546.

[8] Tang L, Liu H, Huang H, et al. A reinforcement learning method for rearranging scattered irregular objects inside a crate. IEEE Transactions on Cognitive and Developmental Systems, 2023, 15(3): 1314-1322.

[9] Radosavovic I, Xiao T, Zhang B, et al. Real-world humanoid locomotion with reinforcement learning. Science Robotics, 2024, 9(89): eadi9579.

[10] Zhang L J, Tang L, Liu L. Target position-guided in-hand re-orientation for five-fingered dexterous hands. Robot, 2025, 47(1): 10-21.

[11] Loper M, Mahmood N, Romero J, et al. SMPL: a skinned multi-person linear model. ACM Transactions on Graphics, 2015, 34(6): 1-16.

[12] Peng X B, MA Z, Abbeel P, et al. AMP: adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics, 2021, 40(4): 1-20.

[13] Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the International Conference on Neural Information Processing Systems, 2022, 35: 24824-24837.

[14] Alayrac J B, Donahue J, Luc P, et al. Flamingo: a visual language model for few-shot learning. In: Proceedings of the International Conference on Neural Information Processing Systems, 2022, 35: 23716-23736.

[15] Epstein D, Jabri A, Poole B, et al. Diffusion self-guidance for controllable image generation. In: Proceedings of the International Conference on Neural Information Processing Systems, 2023, 36: 16222-16239.

[16] Darvish K, Penco L, Ramos J, et al. Teleoperation of humanoid robots: A survey. IEEE Transactions on Robotics, 2023, 39(3): 1706-1727.

[17] Liu S, Yao S, Zhu G, et al. Operation status of teleoperator based shared control telerobotic system. Journal of Intelligent & Robotic Systems, 2020, 101(1): 8.

[18] Vemprala S H, Bonatti R, Bucker A, et al. ChatGPT for robotics: design principles and model abilities. IEEE Access, 2024, 12: 55682–55696.