Theses and Dissertations

Date of Award


Document Type


Degree Name

Master of Science (MS)


Computer Science

First Advisor

Dr. Dong-Chul Kim

Second Advisor

Dr. Zhixiang Chen

Third Advisor

Dr. Emmett Tomai


In the application of learning physics-based character skills, deep reinforcement learning (DRL) can lead to slow convergence and local optimum solutions during the training process of a reinforcement learning (RL) agent. With the presence of an environment with reward saltation, we can easily plan to magnify those saltatory rewards with the perspective of sample usage to increase the experience pool of an agent during this training process. In our work, we have proposed two modified algorithms. The first one is the addition of a parameter based reward optimization process to magnify the saltatory rewards and thus increasing an agent’s utilization of previous experiences. We have added this parameter based reward optimization with proximal policy optimization (PPO) algorithm. What’s more, the other proposed algorithm introduces generalized advantage estimation in estimating the advantage of the advantage actor critic (A2C) algorithm which resulted in faster convergence to the global optimal solutions of DRL. We have conducted all our experiments to measure their performances in a custom reinforcement learning environment built using a physics engine named PyBullet. In that custom environment, the RL agent has a humanoid body which learns humanlike motions, e.g., walk, run, spin, cartwheel, spinkick, and backflip, from imitating example reference motions using the RL algorithms. Our experiments have shown significant improvement in performance and convergence speed of DRL in this custom environment for learning humanlike motions using the modified versions of PPO and A2C if compared with their vanilla versions.


Copyright 2021 Md Rysul Kabir. All Rights Reserved.