Solver-Critic: A Reinforcement Learning Method for Discrete-Time-Constrained-Input Systems.

08:00 EDT 20th March 2020 | BioPortfolio

Summary of "Solver-Critic: A Reinforcement Learning Method for Discrete-Time-Constrained-Input Systems."

In this article, a solver-critic (SC) architecture is developed for optimal control problems of discrete-time (DT)-constrained-input systems. The proposed design consists of three parts: 1) a critic network; 2) an action solver; and 3) a target network. The critic network first approximates the action-value function using the sum-of-squares (SOS) polynomial. Then, the action solver adopts the SOS programming to obtain control inputs within the constraint set. The target network introduces the soft update mechanism into policy evaluation to stabilize the learning process. By using the proposed architecture, the constrained-input control problem can be solved without adding the nonquadratic functionals into the reward function. In this article, the theoretical analysis of the convergence property is presented. Besides, the effects of both different initial Q-functions and different discount factors are investigated. It is proven that the learned policy converges to the optimal solution of the Hamilton-Jacobi-Bellman equation. Four numerical examples are provided to validate the theoretical analysis and also demonstrate the effectiveness of our approach.


Journal Details

This article was published in the following journal.

Name: IEEE transactions on cybernetics
ISSN: 2168-2275


DeepDyve research library

PubMed Articles [20119 Associated PubMed Articles listed on BioPortfolio]

Adaptive Critic Learning for Constrained Optimal Event-Triggered Control With Discounted Cost.

This article studies an optimal event-triggered control (ETC) problem of nonlinear continuous-time systems subject to asymmetric control constraints. The present nonlinear plant differs from many stud...

Reinforcement Learning-Based Nearly Optimal Control for Constrained-Input Partially Unknown Systems Using Differentiator.

In this article, a synchronous reinforcement-learning-based algorithm is developed for input-constrained partially unknown systems. The proposed control also alleviates the need for an initial stabili...

Partial Policy-based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images.

Utilizing the idea of long-term cumulative return, reinforcement learning (RL) has shown remarkable performance in various fields. We follow the formulation of landmark localization in 3D medical imag...

Decentralized Event-Triggered Adaptive Control of Discrete-Time Nonzero-Sum Games Over Wireless Sensor-Actuator Networks With Input Constraints.

This article studies an event-triggered communication and adaptive dynamic programming (ADP) co-design control method for the multiplayer nonzero-sum (NZS) games of a class of nonlinear discrete-time ...

Multi-agent reinforcement learning with approximate model learning for competitive games.

We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradie...

Clinical Trials [6488 Associated Clinical Trials listed on BioPortfolio]

Reinforcement Learning for Warfarin Dosing

This is a clinical study designed to test the hypothesis that a computer model for dosing warfarin is superior to current clinical practice. Subjects will be randomized to two groups based...

Effects of Oxytocin on Reinforcement Learning

The main aim of the study is to investigate whether intranasal oxytocin (24IU) influences reward sensitivity and performance monitoring during reinforcement learning.

Set Your Goal: Engaging Go/No-Go Active Learning

This study will test a computational model reinforcement learning in depression and anxiety and test the extent to which the computational model predicts response to an adapted version of ...

The Role of Learning in Nocebo Hyperalgesia

Nocebo effects are adverse effects induced by patients' expectations. Nocebo effects on pain may underlie several clinical conditions, such as chronic pain. These effects can be learned vi...

A Prospective, Open, Non-randomized Phase I/II Study of Therapeutic Angiogenesis in Diabetic Patients With Critic Ischemia of Lower Limbs While Administering Positive CD133 Mobilized With G-CSF

The primary objective is to analyze the safety and efficacy of CD133+ cells, obtained from peripheral blood in the treatment of diabetic patients with critic ischemia in lower limbs. The ...

Medical and Biotech [MESH] Definitions

Learning the correct route through a maze to obtain reinforcement. It is used for human or animal populations. (Thesaurus of Psychological Index Terms, 6th ed)

Use of word stimulus to strengthen a response during learning.

Process in which individuals take the initiative, in diagnosing their learning needs, formulating learning goals, identifying resources for learning, choosing and implementing learning strategies and evaluating learning outcomes (Knowles, 1975)

The course of learning of an individual or a group. It is a measure of performance plotted over time.

Learning situations in which the sequence responses of the subject are instrumental in producing reinforcement. When the correct response occurs, which involves the selection from among a repertoire of responses, the subject is immediately reinforced.

Quick Search

DeepDyve research library

Searches Linking to this Article