Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Sign In to gain access to subscriptions and/or personal tools.
The International Journal of Robotics Research
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Endo, G.
Right arrow Articles by Cheng, G.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot

Gen Endo

Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku Tokyo, 152-8550, Japan, gendo{at}sms.titech.ac.jp

Jun Morimoto

ATR Computational Neuroscience Laboratories Computational Brain Project, ICORP Japan Science and Technology Agency 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto, 619-0288, Japan, xmorimo{at}atr.jp

Takamitsu Matsubara

ATR Computational Neuroscience Laboratories 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto, 619-0288, Japan, takam-m{at}atr.jp, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma-shi Nara, 630-0192, Japan

Jun Nakanishi

ATR Computational Neuroscience Laboratories Computational Brain Project, ICORP Japan Science and Technology Agency 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto, 619-0288, Japan, jun{at}atr.jp

Gordon Cheng

ATR Computational Neuroscience Laboratories ICORP, Japan Science and Technology Agency 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto, 619-0288, Japan, gordon{at}atr.jp

In this paper we describe a learning framework for a central pattern generator (CPG)-based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve CPG-based biped walking with a 3D hardware humanoid and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller can be acquired within a few thousand trials by numerical simulations and the controller obtained in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluate the walking velocity and stability. The results suggest that the learning algorithm is capable of adapting to environmental changes. Furthermore, we present an online learning scheme with an initial policy for a hardware robot to improve the controller within 200 iterations.

Key Words: humanoid robots • reinforcement learning • bipedal locomotion • central pattern generator

The International Journal of Robotics Research, Vol. 27, No. 2, 213-228 (2008)
DOI: 10.1177/0278364907084980


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?