Track topics on Twitter Track topics that are important to you
In this article, a solver-critic (SC) architecture is developed for optimal control problems of discrete-time (DT)-constrained-input systems. The proposed design consists of three parts: 1) a critic network; 2) an action solver; and 3) a target network. The critic network first approximates the action-value function using the sum-of-squares (SOS) polynomial. Then, the action solver adopts the SOS programming to obtain control inputs within the constraint set. The target network introduces the soft update mechanism into policy evaluation to stabilize the learning process. By using the proposed architecture, the constrained-input control problem can be solved without adding the nonquadratic functionals into the reward function. In this article, the theoretical analysis of the convergence property is presented. Besides, the effects of both different initial Q-functions and different discount factors are investigated. It is proven that the learned policy converges to the optimal solution of the Hamilton-Jacobi-Bellman equation. Four numerical examples are provided to validate the theoretical analysis and also demonstrate the effectiveness of our approach.
This article was published in the following journal.
Name: IEEE transactions on cybernetics
This article studies an optimal event-triggered control (ETC) problem of nonlinear continuous-time systems subject to asymmetric control constraints. The present nonlinear plant differs from many stud...
In this article, a synchronous reinforcement-learning-based algorithm is developed for input-constrained partially unknown systems. The proposed control also alleviates the need for an initial stabili...
Utilizing the idea of long-term cumulative return, reinforcement learning (RL) has shown remarkable performance in various fields. We follow the formulation of landmark localization in 3D medical imag...
This article studies an event-triggered communication and adaptive dynamic programming (ADP) co-design control method for the multiplayer nonzero-sum (NZS) games of a class of nonlinear discrete-time ...
We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradie...
This is a clinical study designed to test the hypothesis that a computer model for dosing warfarin is superior to current clinical practice. Subjects will be randomized to two groups based...
The main aim of the study is to investigate whether intranasal oxytocin (24IU) influences reward sensitivity and performance monitoring during reinforcement learning.
This study will test a computational model reinforcement learning in depression and anxiety and test the extent to which the computational model predicts response to an adapted version of ...
Nocebo effects are adverse effects induced by patients' expectations. Nocebo effects on pain may underlie several clinical conditions, such as chronic pain. These effects can be learned vi...
The primary objective is to analyze the safety and efficacy of CD133+ cells, obtained from peripheral blood in the treatment of diabetic patients with critic ischemia in lower limbs. The ...
Learning the correct route through a maze to obtain reinforcement. It is used for human or animal populations. (Thesaurus of Psychological Index Terms, 6th ed)
Use of word stimulus to strengthen a response during learning.
Process in which individuals take the initiative, in diagnosing their learning needs, formulating learning goals, identifying resources for learning, choosing and implementing learning strategies and evaluating learning outcomes (Knowles, 1975)
The course of learning of an individual or a group. It is a measure of performance plotted over time.
Learning situations in which the sequence responses of the subject are instrumental in producing reinforcement. When the correct response occurs, which involves the selection from among a repertoire of responses, the subject is immediately reinforced.