qualia2.rl.envs package¶
Submodules¶
qualia2.rl.envs.atari module¶
-
class
qualia2.rl.envs.atari.
BreakOut
(width=84, height=84)[source]¶ Bases:
qualia2.rl.envs.atari.AtariBase
Maximize your score in the Atari 2600 game Breakout.
Observation: Gym Default:
Type: Box(210, 160, 3) RGB image
- Transformed:
(1, 84, 84) BW image
- Actions:
Discrete(4) Num Action 0 no operation 1 fire 2 move right 3 move left
-
class
qualia2.rl.envs.atari.
BreakOutRAM
[source]¶ Bases:
qualia2.rl.envs.atari.AtariBase
Maximize your score in the Atari 2600 game Breakout.
- Observation:
Box(128,) the RAM of the Atari machine
- Actions:
Discrete(4) Num Action 0 no operation 1 fire 2 move right 3 move left
-
class
qualia2.rl.envs.atari.
Pong
(width=84, height=84)[source]¶ Bases:
qualia2.rl.envs.atari.AtariBase
Maximize your score in the Atari 2600 game Pong.
Observation: Gym Default:
Type: Box(210, 160, 3) RGB image
- Transformed:
(1, 84, 84) BW image
- Actions:
Discrete(6) Num Action 0 no operation 1 fire 2 move right 3 move left 4 RIGHTFIRE 5 RIGHTFIRE
-
class
qualia2.rl.envs.atari.
PongRAM
[source]¶ Bases:
qualia2.rl.envs.atari.AtariBase
Pong
Maximize your score in the Atari 2600 game Pong.
- Observation:
Box(128,) the RAM of the Atari machine
- Actions:
Discrete(6) Num Action 0 no operation 1 fire 2 move right 3 move left 4 RIGHTFIRE 5 RIGHTFIRE
qualia2.rl.envs.box2d module¶
-
class
qualia2.rl.envs.box2d.
BipedalWalker
[source]¶ Bases:
qualia2.rl.core.Env
Get a 2D biped walker to walk through rough terrain. Observation:
Type: Box(24) Num Observation Min Max Mean 0 hull_angle 0 2*pi 0.5 1 hull_angularVelocity -inf +inf - 2 vel_x -1 +1 - 3 vel_y -1 +1 - 4 hip_joint_1_angle -inf +inf - 5 hip_joint_1_speed -inf +inf - 6 knee_joint_1_angle -inf +inf - 7 knee_joint_1_speed -inf +inf - 8 leg_1_ground_contact_flag 0 1 - 9 hip_joint_2_angle -inf +inf - 10 hip_joint_2_speed -inf +inf - 11 knee_joint_2_angle -inf +inf - 12 knee_joint_2_speed -inf +inf - 13 leg_2_ground_contact_flag 0 1 - 14-23 10 lidar readings -inf +inf -
- Actions:
Type: Box(4) - Torque control(default) Num Name Min Max 0 Hip_1 (Torque / Velocity) -1 +1 1 Knee_1 (Torque / Velocity) -1 +1 2 Hip_2 (Torque / Velocity) -1 +1 3 Knee_2 (Torque / Velocity) -1 +1
- Rewards:
Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points, more optimal agent will get better score.
- Reference:
-
class
qualia2.rl.envs.box2d.
BipedalWalkerHardcore
[source]¶ Bases:
qualia2.rl.core.Env
BipedalWalker
Get a 2D biped walker to walk through rough terrain. Observation:
Type: Box(24) Num Observation Min Max Mean 0 hull_angle 0 2*pi 0.5 1 hull_angularVelocity -inf +inf - 2 vel_x -1 +1 - 3 vel_y -1 +1 - 4 hip_joint_1_angle -inf +inf - 5 hip_joint_1_speed -inf +inf - 6 knee_joint_1_angle -inf +inf - 7 knee_joint_1_speed -inf +inf - 8 leg_1_ground_contact_flag 0 1 - 9 hip_joint_2_angle -inf +inf - 10 hip_joint_2_speed -inf +inf - 11 knee_joint_2_angle -inf +inf - 12 knee_joint_2_speed -inf +inf - 13 leg_2_ground_contact_flag 0 1 - 14-23 10 lidar readings -inf +inf -
- Actions:
Type: Box(4) - Torque control(default) Num Name Min Max 0 Hip_1 (Torque / Velocity) -1 +1 1 Knee_1 (Torque / Velocity) -1 +1 2 Hip_2 (Torque / Velocity) -1 +1 3 Knee_2 (Torque / Velocity) -1 +1
- Rewards:
Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points, more optimal agent will get better score.
- Reference:
-
class
qualia2.rl.envs.box2d.
CarRacing
[source]¶ Bases:
qualia2.rl.core.Env
- Observation:
Type: Box(96,96,3)
qualia2.rl.envs.classic_control module¶
-
class
qualia2.rl.envs.classic_control.
Acrobot
[source]¶ Bases:
qualia2.rl.core.Env
The acrobot system includes two joints and two links, where the joint between the two links is actuated.
-
class
qualia2.rl.envs.classic_control.
CartPole
[source]¶ Bases:
qualia2.rl.core.Env
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over by increasing and reducing the cart’s velocity.
- Observation:
Type: Box(4) Num Observation Min Max 0 Cart Position -4.8 4.8 1 Cart Velocity -Inf Inf 2 Pole Angle -24 deg 24 deg 3 Pole Velocity At Tip -Inf Inf
- Actions:
Type: Discrete(2) Num Action 0 Push cart to the left 1 Push cart to the right
- Reward:
0 for each step -1 if terminate condition meet before max_steps-5 1 if terminate condition meet after max_steps-5 (Note: original reward with the gym environment is not used)
- Reference:
-
class
qualia2.rl.envs.classic_control.
MountainCar
[source]¶ Bases:
qualia2.rl.core.Env
Get an under powered car to the top of a hill (top = 0.5 position)
- Observation:
Type: Box(2) Num Observation Min Max 0 position -1.2 0.6 1 velocity -0.07 0.07
- Actions:
Type: Discrete(3) Num Action 0 push left 1 no push 2 push right
- Reward:
-1 for each step
- Reference:
-
class
qualia2.rl.envs.classic_control.
MountainCarContinuous
[source]¶ Bases:
qualia2.rl.core.Env
MountainCar
Get an under powered car to the top of a hill (top = 0.5 position)
- Observation:
Type: Box(2) Num Observation Min Max 0 position -1.2 0.6 1 velocity -0.07 0.07
- Actions:
Type: Box(1) Num Action 0 Push car to the left (negative value) or to the right (positive value)
- Reward:
Reward is 100 for reaching the target of the hill on the right hand side, minus the squared sum of actions from start to goal.
- Reference:
-
class
qualia2.rl.envs.classic_control.
Pendulum
[source]¶ Bases:
qualia2.rl.core.Env
Try to keep a frictionless pendulum standing up.
- Observation:
Type: Box(3) Num Observation Min Max 0 cos(theta) -1.0 1.0 1 sin(theta) -1.0 1.0 2 theta dot -8.0 8.0
- Actions:
Type: Box(1) Num Action Min Max 0 Joint effort -2.0 2.0
- Reward:
The precise equation for reward: -(theta^2 + 0.1*theta_dt^2 + 0.001*action^2) Theta is normalized between -pi and pi. Therefore, the lowest cost is -(pi^2 + 0.1*8^2 + 0.001*2^2) = -16.2736044, and the highest cost is 0. In essence, the goal is to remain at zero angle (vertical), with the least rotational velocity, and the least effort.
- Starting State:
Random angle from -pi to pi, and random velocity between -1 and 1
qualia2.rl.envs.roboschool module¶
-
class
qualia2.rl.envs.roboschool.
RoboschoolAnt
[source]¶ Bases:
qualia2.rl.envs.roboschool.RoboSchoolBase
- Observation:
Type: Box(28,)
- Actions:
Type: Box(8,)
-
class
qualia2.rl.envs.roboschool.
RoboschoolHalfCheetah
[source]¶ Bases:
qualia2.rl.envs.roboschool.RoboSchoolBase
- Observation:
Type: Box(26,)
- Actions:
Type: Box(6,)
-
class
qualia2.rl.envs.roboschool.
RoboschoolHopper
[source]¶ Bases:
qualia2.rl.envs.roboschool.RoboSchoolBase
RoboschoolHumanoid
- Observation:
Type: Box(15,)
- Actions:
Type: Box(3,)
-
class
qualia2.rl.envs.roboschool.
RoboschoolHumanoid
[source]¶ Bases:
qualia2.rl.envs.roboschool.RoboSchoolBase
- Observation:
Type: Box(44,)
- Actions:
Type: Box(17,)
-
class
qualia2.rl.envs.roboschool.
RoboschoolWalker2d
[source]¶ Bases:
qualia2.rl.envs.roboschool.RoboSchoolBase
RoboschoolHumanoid
- Observation:
Type: Box(22,)
- Actions:
Type: Box(6,)
qualia2.rl.envs.toy_text module¶
-
class
qualia2.rl.envs.toy_text.
FrozenLake
[source]¶ Bases:
qualia2.rl.core.Env
The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.
SFFF (S: starting point, safe) FHFH (F: frozen surface, safe) FFFH (H: hole, fall to your doom) HFFG (G: goal, where the frisbee is located)
-
class
qualia2.rl.envs.toy_text.
FrozenLake8x8
[source]¶ Bases:
qualia2.rl.core.Env
The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.
SFFF (S: starting point, safe) FHFH (F: frozen surface, safe) FFFH (H: hole, fall to your doom) HFFG (G: goal, where the frisbee is located)