Solver-Critic: A Reinforcement Learning Method for Discrete-Time-Constrained-Input Systems.

08:00 EDT 20th March 2020

Summary of "Solver-Critic: A Reinforcement Learning Method for Discrete-Time-Constrained-Input Systems."

In this article, a solver-critic (SC) architecture is developed for optimal control problems of discrete-time (DT)-constrained-input systems. The proposed design consists of three parts: 1) a critic network; 2) an action solver; and 3) a target network. The critic network first approximates the action-value function using the sum-of-squares (SOS) polynomial. Then, the action solver adopts the SOS programming to obtain control inputs within the constraint set. The target network introduces the soft update mechanism into policy evaluation to stabilize the learning process. By using the proposed architecture, the constrained-input control problem can be solved without adding the nonquadratic functionals into the reward function. In this article, the theoretical analysis of the convergence property is presented. Besides, the effects of both different initial Q-functions and different discount factors are investigated. It is proven that the learned policy converges to the optimal solution of the Hamilton-Jacobi-Bellman equation. Four numerical examples are provided to validate the theoretical analysis and also demonstrate the effectiveness of our approach.


Journal Details

This article was published in the following journal.

Name: IEEE transactions on cybernetics
ISSN: 2168-2275


