qualia2.rl.agents package¶
Submodules¶
qualia2.rl.agents.ddqn module¶
-
class
qualia2.rl.agents.ddqn.DDQN(eps, actions)[source]¶ Bases:
qualia2.rl.rl_core.ValueAgentDQN 2015 implementation
This implementation uses double networks for learning. DQN class incopolates the model (Module) and the optim (Optimizer). The model learns with experience replay, which is implemented in update() method.
-
class
qualia2.rl.agents.ddqn.DDQNTrainer(memory, batch, capacity, gamma=0.99, target_update_interval=3)[source]¶ Bases:
qualia2.rl.rl_util.Trainer- Args:
memory (deque): replay memory object capacity (int): capacity of the memory batch (int): batch size for training gamma (int): gamma value target_update_interval (int): interval for updating target network
qualia2.rl.agents.dqn module¶
-
class
qualia2.rl.agents.dqn.DQN(eps, actions)[source]¶ Bases:
qualia2.rl.rl_core.ValueAgentDQN 2013 implementation
This implementation uses single network for learning. DQN class incopolates the model (Module) and the optim (Optimizer). The model learns with experience replay, which is implemented in update() method.
-
class
qualia2.rl.agents.dqn.DQNTrainer(memory, batch, capacity, gamma=0.99)[source]¶ Bases:
qualia2.rl.rl_util.Trainer- Args:
memory (deque): replay memory object capacity (int): capacity of the memory batch (int): batch size for training gamma (int): gamma value
qualia2.rl.agents.td3 module¶
-
class
qualia2.rl.agents.td3.TD3(actor, critic)[source]¶ Bases:
qualia2.rl.rl_core.ActorCriticAgentTwin Delayed DDPG (TD3)
- Args:
actor (Module): actor network critic (Module): critic network
-
class
qualia2.rl.agents.td3.TD3Trainer(memory, batch, capacity, gamma=0.99, polyak=0.995, policy_delay=2, exploration_noise=0.1, policy_noise=0.2, noise_clip=0.5)[source]¶ Bases:
qualia2.rl.rl_util.TrainerTD3 Trainer
- Args:
memory (deque): replay memory object capacity (int): capacity of the memory batch (int): batch size for training gamma (int): gamma value policy_delay (int): interval for updating target network