Track topics on Twitter Track topics that are important to you
Utilizing the idea of long-term cumulative return, reinforcement learning (RL) has shown remarkable performance in various fields. We follow the formulation of landmark localization in 3D medical images as an RL problem. Whereas value-based methods have been widely used to solve RL-based localization problems, we adopt an actor-critic based direct policy search method framed in a temporal difference learning approach. In RL problems with large state and/or action spaces, learning the optimal behavior is challenging and requires many trials. To improve the learning, we introduce a partial policy-based reinforcement learning to enable solving the large problem of localization by learning the optimal policy on smaller partial domains. Independent actors efficiently learn the corresponding partial policies, each utilizing their own independent critic. The proposed policy reconstruction from the partial policies ensures a robust and efficient localization, where the sub-agents uniformly contribute to the state-transitions based on their simple partial policies mapping to binary actions. Experiments with three different localization problems in 3D CT and MR images showed that the proposed reinforcement learning requires a significantly smaller number of trials to learn the optimal behavior compared to the original behavior learning scheme in RL. It also ensures a satisfactory performance when trained on fewer images.
This article was published in the following journal.
Name: IEEE transactions on medical imaging
In recent years, the deep reinforcement learning (DRL) algorithms have been developed rapidly and have achieved excellent performance in many challenging tasks. However, due to the complexity of netwo...
In many medical image analysis applications, only a limited amount of training data is available due to the costs of image acquisition and the large manual annotation effort required from experts. Tra...
This paper investigates the automatic exploration problem under the unknown environment, which is the key point of applying the robotic system to some social tasks. The solution to this problem via st...
Minimally invasive alternatives are now available for many complex surgeries. These approaches are enabled by the increasing availability of intra-operative image guidance. Yet, fluoroscopic X-rays su...
It is commonly thought that visuomotor adaptation is mediated by the cerebellum while reinforcement learning is mediated by the basal ganglia. In contrast to this strict dichotomy, we demonstrate a ro...
This study will test a computational model reinforcement learning in depression and anxiety and test the extent to which the computational model predicts response to an adapted version of ...
This is a clinical study designed to test the hypothesis that a computer model for dosing warfarin is superior to current clinical practice. Subjects will be randomized to two groups based...
The main aim of the study is to investigate whether intranasal oxytocin (24IU) influences reward sensitivity and performance monitoring during reinforcement learning.
Nocebo effects are adverse effects induced by patients' expectations. Nocebo effects on pain may underlie several clinical conditions, such as chronic pain. These effects can be learned vi...
As most adolescents visit a healthcare provider once a year, health behavior change interventions linked to clinic-based health information technologies hold significant promise for improv...
Learning the correct route through a maze to obtain reinforcement. It is used for human or animal populations. (Thesaurus of Psychological Index Terms, 6th ed)
Use of word stimulus to strengthen a response during learning.
Process in which individuals take the initiative, in diagnosing their learning needs, formulating learning goals, identifying resources for learning, choosing and implementing learning strategies and evaluating learning outcomes (Knowles, 1975)
A MACHINE LEARNING paradigm used to make predictions about future instances based on a given set of unlabeled paired input-output training (sample) data.
A MACHINE LEARNING paradigm used to make predictions about future instances based on a given set of labeled paired input-output training (sample) data.