qualia2.rl.envs package

Submodules

qualia2.rl.envs.atari module

class qualia2.rl.envs.atari.AtariBase(env)[source]

Bases: qualia2.rl.core.Env

property actions
static normalize(image)[source]
static resize(image, width, height)[source]
static state_to_image(state)[source]
static to_gray(image)[source]
class qualia2.rl.envs.atari.BreakOut(width=84, height=84)[source]

Bases: qualia2.rl.envs.atari.AtariBase

Maximize your score in the Atari 2600 game Breakout.

Observation: Gym Default:

Type: Box(210, 160, 3) RGB image

Transformed:

(1, 84, 84) BW image

Actions:

Discrete(4) Num Action 0 no operation 1 fire 2 move right 3 move left

state_transformer(state)[source]
class qualia2.rl.envs.atari.BreakOutRAM[source]

Bases: qualia2.rl.envs.atari.AtariBase

Maximize your score in the Atari 2600 game Breakout.

Observation:

Box(128,) the RAM of the Atari machine

Actions:

Discrete(4) Num Action 0 no operation 1 fire 2 move right 3 move left

class qualia2.rl.envs.atari.Pong(width=84, height=84)[source]

Bases: qualia2.rl.envs.atari.AtariBase

Maximize your score in the Atari 2600 game Pong.

Observation: Gym Default:

Type: Box(210, 160, 3) RGB image

Transformed:

(1, 84, 84) BW image

Actions:

Discrete(6) Num Action 0 no operation 1 fire 2 move right 3 move left 4 RIGHTFIRE 5 RIGHTFIRE

state_transformer(state)[source]
class qualia2.rl.envs.atari.PongRAM[source]

Bases: qualia2.rl.envs.atari.AtariBase

Pong

Maximize your score in the Atari 2600 game Pong.

Observation:

Box(128,) the RAM of the Atari machine

Actions:

Discrete(6) Num Action 0 no operation 1 fire 2 move right 3 move left 4 RIGHTFIRE 5 RIGHTFIRE

qualia2.rl.envs.box2d module

class qualia2.rl.envs.box2d.BipedalWalker[source]

Bases: qualia2.rl.core.Env

Get a 2D biped walker to walk through rough terrain. Observation:

Type: Box(24) Num Observation Min Max Mean 0 hull_angle 0 2*pi 0.5 1 hull_angularVelocity -inf +inf - 2 vel_x -1 +1 - 3 vel_y -1 +1 - 4 hip_joint_1_angle -inf +inf - 5 hip_joint_1_speed -inf +inf - 6 knee_joint_1_angle -inf +inf - 7 knee_joint_1_speed -inf +inf - 8 leg_1_ground_contact_flag 0 1 - 9 hip_joint_2_angle -inf +inf - 10 hip_joint_2_speed -inf +inf - 11 knee_joint_2_angle -inf +inf - 12 knee_joint_2_speed -inf +inf - 13 leg_2_ground_contact_flag 0 1 - 14-23 10 lidar readings -inf +inf -

Actions:

Type: Box(4) - Torque control(default) Num Name Min Max 0 Hip_1 (Torque / Velocity) -1 +1 1 Knee_1 (Torque / Velocity) -1 +1 2 Hip_2 (Torque / Velocity) -1 +1 3 Knee_2 (Torque / Velocity) -1 +1

Rewards:

Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points, more optimal agent will get better score.

Reference:

https://github.com/openai/gym/wiki/BipedalWalker-v2

class qualia2.rl.envs.box2d.BipedalWalkerHardcore[source]

Bases: qualia2.rl.core.Env

BipedalWalker

Get a 2D biped walker to walk through rough terrain. Observation:

Type: Box(24) Num Observation Min Max Mean 0 hull_angle 0 2*pi 0.5 1 hull_angularVelocity -inf +inf - 2 vel_x -1 +1 - 3 vel_y -1 +1 - 4 hip_joint_1_angle -inf +inf - 5 hip_joint_1_speed -inf +inf - 6 knee_joint_1_angle -inf +inf - 7 knee_joint_1_speed -inf +inf - 8 leg_1_ground_contact_flag 0 1 - 9 hip_joint_2_angle -inf +inf - 10 hip_joint_2_speed -inf +inf - 11 knee_joint_2_angle -inf +inf - 12 knee_joint_2_speed -inf +inf - 13 leg_2_ground_contact_flag 0 1 - 14-23 10 lidar readings -inf +inf -

Actions:

Type: Box(4) - Torque control(default) Num Name Min Max 0 Hip_1 (Torque / Velocity) -1 +1 1 Knee_1 (Torque / Velocity) -1 +1 2 Hip_2 (Torque / Velocity) -1 +1 3 Knee_2 (Torque / Velocity) -1 +1

Rewards:

Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points, more optimal agent will get better score.

Reference:

https://github.com/openai/gym/wiki/BipedalWalker-v2

class qualia2.rl.envs.box2d.CarRacing[source]

Bases: qualia2.rl.core.Env

Observation:

Type: Box(96,96,3)

class qualia2.rl.envs.box2d.LunarLander[source]

Bases: qualia2.rl.core.Env

Observation:

Type: Box(8)

Actions:

Type: Discrete(4)

class qualia2.rl.envs.box2d.LunarLanderContinuous[source]

Bases: qualia2.rl.core.Env

Observation:

Type: Box(8)

Actions:

Type: Box(2)

qualia2.rl.envs.classic_control module

class qualia2.rl.envs.classic_control.Acrobot[source]

Bases: qualia2.rl.core.Env

The acrobot system includes two joints and two links, where the joint between the two links is actuated.

class qualia2.rl.envs.classic_control.CartPole[source]

Bases: qualia2.rl.core.Env

A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over by increasing and reducing the cart’s velocity.

Observation:

Type: Box(4) Num Observation Min Max 0 Cart Position -4.8 4.8 1 Cart Velocity -Inf Inf 2 Pole Angle -24 deg 24 deg 3 Pole Velocity At Tip -Inf Inf

Actions:

Type: Discrete(2) Num Action 0 Push cart to the left 1 Push cart to the right

Reward:

0 for each step -1 if terminate condition meet before max_steps-5 1 if terminate condition meet after max_steps-5 (Note: original reward with the gym environment is not used)

Reference:

https://github.com/openai/gym/wiki/CartPole-v0

reward_transformer(reward, done)[source]
step(action)[source]
class qualia2.rl.envs.classic_control.MountainCar[source]

Bases: qualia2.rl.core.Env

Get an under powered car to the top of a hill (top = 0.5 position)

Observation:

Type: Box(2) Num Observation Min Max 0 position -1.2 0.6 1 velocity -0.07 0.07

Actions:

Type: Discrete(3) Num Action 0 push left 1 no push 2 push right

Reward:

-1 for each step

Reference:

https://github.com/openai/gym/wiki/MountainCar-v0

reward_transformer(reward, done)[source]
step(action)[source]
class qualia2.rl.envs.classic_control.MountainCarContinuous[source]

Bases: qualia2.rl.core.Env

MountainCar

Get an under powered car to the top of a hill (top = 0.5 position)

Observation:

Type: Box(2) Num Observation Min Max 0 position -1.2 0.6 1 velocity -0.07 0.07

Actions:

Type: Box(1) Num Action 0 Push car to the left (negative value) or to the right (positive value)

Reward:

Reward is 100 for reaching the target of the hill on the right hand side, minus the squared sum of actions from start to goal.

Reference:

https://github.com/openai/gym/wiki/MountainCarContinuous-v0

reward_transformer(reward, done)[source]
step(action)[source]
class qualia2.rl.envs.classic_control.Pendulum[source]

Bases: qualia2.rl.core.Env

Try to keep a frictionless pendulum standing up.

Observation:

Type: Box(3) Num Observation Min Max 0 cos(theta) -1.0 1.0 1 sin(theta) -1.0 1.0 2 theta dot -8.0 8.0

Actions:

Type: Box(1) Num Action Min Max 0 Joint effort -2.0 2.0

Reward:

The precise equation for reward: -(theta^2 + 0.1*theta_dt^2 + 0.001*action^2) Theta is normalized between -pi and pi. Therefore, the lowest cost is -(pi^2 + 0.1*8^2 + 0.001*2^2) = -16.2736044, and the highest cost is 0. In essence, the goal is to remain at zero angle (vertical), with the least rotational velocity, and the least effort.

Starting State:

Random angle from -pi to pi, and random velocity between -1 and 1

qualia2.rl.envs.roboschool module

class qualia2.rl.envs.roboschool.RoboSchoolBase(env)[source]

Bases: qualia2.rl.core.Env

show(filename=None)[source]
class qualia2.rl.envs.roboschool.RoboschoolAnt[source]

Bases: qualia2.rl.envs.roboschool.RoboSchoolBase

Observation:

Type: Box(28,)

Actions:

Type: Box(8,)

class qualia2.rl.envs.roboschool.RoboschoolHalfCheetah[source]

Bases: qualia2.rl.envs.roboschool.RoboSchoolBase

Observation:

Type: Box(26,)

Actions:

Type: Box(6,)

class qualia2.rl.envs.roboschool.RoboschoolHopper[source]

Bases: qualia2.rl.envs.roboschool.RoboSchoolBase

RoboschoolHumanoid

Observation:

Type: Box(15,)

Actions:

Type: Box(3,)

class qualia2.rl.envs.roboschool.RoboschoolHumanoid[source]

Bases: qualia2.rl.envs.roboschool.RoboSchoolBase

Observation:

Type: Box(44,)

Actions:

Type: Box(17,)

class qualia2.rl.envs.roboschool.RoboschoolWalker2d[source]

Bases: qualia2.rl.envs.roboschool.RoboSchoolBase

RoboschoolHumanoid

Observation:

Type: Box(22,)

Actions:

Type: Box(6,)

qualia2.rl.envs.toy_text module

class qualia2.rl.envs.toy_text.FrozenLake[source]

Bases: qualia2.rl.core.Env

The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.

SFFF (S: starting point, safe) FHFH (F: frozen surface, safe) FFFH (H: hole, fall to your doom) HFFG (G: goal, where the frisbee is located)

Reference:

https://gym.openai.com/envs/FrozenLake-v0/

show(filename=None)[source]
class qualia2.rl.envs.toy_text.FrozenLake8x8[source]

Bases: qualia2.rl.core.Env

The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.

SFFF (S: starting point, safe) FHFH (F: frozen surface, safe) FFFH (H: hole, fall to your doom) HFFG (G: goal, where the frisbee is located)

Reference:

https://gym.openai.com/envs/FrozenLake8x8-v0/

show(filename=None)[source]

Module contents