qualia2.rl.agents package¶
Submodules¶
qualia2.rl.agents.ddqn module¶
-
class
qualia2.rl.agents.ddqn.
DDQN
(eps, actions)[source]¶ Bases:
qualia2.rl.rl_core.ValueAgent
DQN 2015 implementation
This implementation uses double networks for learning. DQN class incopolates the model (Module) and the optim (Optimizer). The model learns with experience replay, which is implemented in update() method.
-
class
qualia2.rl.agents.ddqn.
DDQNTrainer
(memory, batch, capacity, gamma=0.99, target_update_interval=3)[source]¶ Bases:
qualia2.rl.rl_util.Trainer
- Args:
memory (deque): replay memory object capacity (int): capacity of the memory batch (int): batch size for training gamma (int): gamma value target_update_interval (int): interval for updating target network
qualia2.rl.agents.dqn module¶
-
class
qualia2.rl.agents.dqn.
DQN
(eps, actions)[source]¶ Bases:
qualia2.rl.rl_core.ValueAgent
DQN 2013 implementation
This implementation uses single network for learning. DQN class incopolates the model (Module) and the optim (Optimizer). The model learns with experience replay, which is implemented in update() method.
-
class
qualia2.rl.agents.dqn.
DQNTrainer
(memory, batch, capacity, gamma=0.99)[source]¶ Bases:
qualia2.rl.rl_util.Trainer
- Args:
memory (deque): replay memory object capacity (int): capacity of the memory batch (int): batch size for training gamma (int): gamma value
qualia2.rl.agents.td3 module¶
-
class
qualia2.rl.agents.td3.
TD3
(actor, critic)[source]¶ Bases:
qualia2.rl.rl_core.ActorCriticAgent
Twin Delayed DDPG (TD3)
- Args:
actor (Module): actor network critic (Module): critic network
-
class
qualia2.rl.agents.td3.
TD3Trainer
(memory, batch, capacity, gamma=0.99, polyak=0.995, policy_delay=2, exploration_noise=0.1, policy_noise=0.2, noise_clip=0.5)[source]¶ Bases:
qualia2.rl.rl_util.Trainer
TD3 Trainer
- Args:
memory (deque): replay memory object capacity (int): capacity of the memory batch (int): batch size for training gamma (int): gamma value policy_delay (int): interval for updating target network