Selector-Actor-Critic and Tuner-Actor-Critic Algorithms for Reinforcement Learning

Thumbnail Image
Date
2019-01-01
Authors
Masadeh, Ala'eddin
Kamal, Ahmed
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Wang, Zhengdao
Professor
Person
Masadeh, Ala’eddin
Assistant Professor
Research Projects
Organizational Units
Organizational Unit
Electrical and Computer Engineering

The Department of Electrical and Computer Engineering (ECpE) contains two focuses. The focus on Electrical Engineering teaches students in the fields of control systems, electromagnetics and non-destructive evaluation, microelectronics, electric power & energy systems, and the like. The Computer Engineering focus teaches in the fields of software systems, embedded systems, networking, information security, computer architecture, etc.

History
The Department of Electrical Engineering was formed in 1909 from the division of the Department of Physics and Electrical Engineering. In 1985 its name changed to Department of Electrical Engineering and Computer Engineering. In 1995 it became the Department of Electrical and Computer Engineering.

Dates of Existence
1909-present

Historical Names

  • Department of Electrical Engineering (1909-1985)
  • Department of Electrical Engineering and Computer Engineering (1985-1995)

Related Units

Journal Issue
Is Version Of
Versions
Series
Abstract

This work presents two reinforcement learning (RL) architectures, which mimic rational humans in the way of analyzing the available information and making decisions. The proposed algorithms are called selector-actor-critic (SAC) and tuner-actor-critic (TAC). They are obtained by modifying the well known actor-critic (AC) algorithm. SAC is equipped with an actor, a critic, and a selector. The role of the selector is to determine the most promising action at the current state based on the last estimate from the critic. TAC is model based, and consists of a tuner, a model-learner, an actor, and a critic. After receiving the approximated value of the current state-action pair from the critic and the learned model from the model-learner, the tuner uses the Bellman equation to tune the value of the current state-action pair. Then, this tuned value is used by the actor to optimize the policy. We investigate the performance of the proposed algorithms, and compare with AC algorithm to show the advantages of the proposed algorithms using numerical simulations.

Comments

This is a manuscript of a proceeding published as Masadeh, Ala'eddin, Zhengdao Wang, and Ahmed E. Kamal. "Selector-Actor-Critic and Tuner-Actor-Critic Algorithms for Reinforcement Learning." In 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). DOI: 10.1109/WCSP.2019.8928124. Posted with permission.

Description
Keywords
Citation
DOI
Copyright
Tue Jan 01 00:00:00 UTC 2019