Environments

Package that contains all available MMLF world environments.

A list of all environments:
  • Environment ‘Maze Cliff‘ from module maze_cliff_environment implemented in class MazeCliffEnvironment.

    In this maze, there are two alternative ways from the start to the goal state: one short way which leads along a dangerous cliff and one long but secure way. If the agent happens to step into the maze, it will get a huge negative reward (configurable via cliffPenalty) and is reset into the start state. Per default, the maze is deterministic, i.e. the agent always moves in the direction it chooses. However, the parameter stochasticity allows to control the stochasticity of the environment. For instance, when stochasticity is set to 0.01, the the agent performs a random move instead of the chosen one with probability 0.01.

  • Environment ‘Pinball 2D‘ from module pinball_maze_environment implemented in class PinballMazeEnvironment.

    The pinball maze environment class.

  • Environment ‘Linear Markov Chain‘ from module linear_markov_chain_environment implemented in class LinearMarkovChainEnvironment.

    The agent starts in the middle of this linear markov chain. He can either move right or left. The chain is not stochastic, i.e. when the agent wants to move right, the state is decreased with probability 1 by 1. When the agent wants to move left, the state is increased with probability 1 by 1 accordingly.

  • Environment ‘Maze 2D‘ from module maze2d_environment implemented in class Maze2dEnvironment.

    A 2d maze world, in which the agent is situated at each moment in time in a certain field (specified by its (row,column) coordinate) and can move either upwards, downwards, left or right. The structure of the maze can be configured via a text-based config file.

  • Environment ‘Double Pole Balancing‘ from module double_pole_balancing_environment implemented in class DoublePoleBalancingEnvironment.

    In the double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region.

  • Environment ‘Partial Observable Double Pole Balancing‘ from module po_double_pole_balancing_environment implemented in class PODoublePoleBalancingEnvironment.

    In the partially observable double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region. In contrast to the fully observable double pole balancing environment, the agent only observes the current position of cart and the two poles but not their velocities. This renders the problem to be not markovian.

  • Environment ‘Mountain Car‘ from module mcar_env implemented in class MountainCarEnvironment.

    In the mountain car environment, the agent has to control a car which is situated somewhere in a valley between two hills. The goal of the agent is to reach the top of the right hill. Unfortunately, the engine of the car is not strong enough to reach the top of the hill directly from many start states. Thus, it has first to drive in the wrong direction to gather enough potential energy.

  • Environment ‘Single Pole Balancing‘ from module single_pole_balancing_environment implemented in class SinglePoleBalancingEnvironment.

    In the single pole balancing environment, the task of the agent is to control a cart such that a pole which is mounted on the cart stays in a nearly vertical position (to balance it). At the same time, the cart has to stay in a confined region.

  • Environment ‘Seventeen and Four‘ from module seventeen_and_four implemented in class SeventeenAndFourEnvironment.

    This environment implements a simplified form of the card game seventeen & four, in which the agent takes the role of the player and plays against a hard-coded dealer.

Single-agent environment base-class

Abstract base classes for environments in the MMLF.

This module defines abstract base classes from which environments in the MMLF must be derived.

The following environment base classes are defined:
SingleAgentEnvironment:
 : Base class for environments in which a single agent can act.
class environments.single_agent_environment.SingleAgentEnvironment(config, baseUserDir, useGUI)

MMLF interface for environments with a simgle agent

Each environment that should be used in the MMLF needs to be derived from this class and implement the following methods:

Interface Methods

getInitialState:
 Returns the initial state of the environment
getStateSpace:Returns the state space of the environment
getActionSpace:Returns the action space of the environment
evaluateAction(actionObject):
 Evaluate the action defined in actionObject
actionTaken(action)

Executes an action chosen by the agent.

Causes a state transition of the environment based on the specific action chosen by the agent. Depending on the successor state, the agent is rewarded, informed about the end of an episodes and/or provided with the next state.

Parameters

action:: A dictionary that specifies for each dimension of the action space the value the agent has chosen for the dimension.
evaluateAction(actionObject)

Execute an agent’s action in the environment.

Take an actionObject containing the action of an agent, and evaluate this action, calculating the next state, and the reward the agent should receive for having taken this action.

Additionally, decide whether the episode should continue, or end after the reward has been issued to the agent.

This method returns a dictionary with the following keys:
rewardValue:: An integer or float representing the agent’s reward. If rewardValue == None, then no reward is given to the agent.
startNewEpisode:
 : True if the agent’s action has caused an episode to get finished.
nextState:: A State object which contains the state the environment takes on after executing the action. This might be the initial state of the next episode if a new episode has just started (startNewEpisode == True)
terminalState:: A State object which contains the terminal state of the environment in the last episode if a new episode has just started (startNewEpisode == True). Otherwise None.
getActionSpace()

Return the action space of this environment.

More information about action spaces can be found in State and Action Spaces

getInitialState()

Returns the initial state of the environment

More information about (valid) states can be found in State and Action Spaces

getStateSpace()

Returns the state space of this environment.

More information about state spaces can be found in State and Action Spaces

getWish()

Query the next command object for the interaction server.

giveAgentInfo(agentInfo)

Check whether an agent is compatible with this environment.

The parameter agentInfo contains informations whether an agent is suited for continuous state and/or action spaces, episodic domains etc. It is checked whether the agent has the correct capabilties for this environment. If not, an ImproperAgentException exception is thrown.

Parameters

agentInfo:: A dictionary-like object of type GiveAgentInfo that contains information regarding the agent’s capabilities.
plotStateSpaceStructure(axis)

Plot structure of state space into given axis.

Just a helper function for viewers and graphic logging.

stop()

Method which is called when the environment should be stopped

class environments.single_agent_environment.ImproperAgentException

Exception thrown if an improper agent is added to a world.

Fully-observable Double Pole Balancing

Module that implements the double pole balancing environment and its dynamics.

class worlds.double_pole_balancing.environments.double_pole_balancing_environment.DoublePoleBalancingEnvironment(useGUI, *args, **kwargs)

The double pole balancing environment

In the double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region.

The agent can apply in every time step a force between -10N and 10N in order to accelerate the car. Thus the action space is one-dimensional and continuous. The state consists of the cart’s current position and velocity as well as the poles’ angles and angular velocities. Thus, the state space is six-dimensional and continuous.

The config dict of the environment expects the following parameters:

CONFIG DICT
GRAVITY:: The gravity force. Benchmark default “-9.8”.
MASSCART:: The mass of the cart. Benchmark default “1.0”.
TAU:: The time step between two commands of the agent. Benchmark default “0.02”
MASSPOLE_1:: The mass of pole 1. Benchmark default “0.1”
MASSPOLE_2:: The mass of pole 2. Benchmark default “0.01”
LENGTH_1:: The length of pole 1. Benchmark default “0.5”
LENGTH_2:: The length of pole 2. Benchmark default “0.05”
MUP:: Coefficient of friction of the poles’ hinges. Benchmark default “0.000002”
MUC:: Coefficient that controls friction. Benchmark default “0.0005”
INITIALPOLEANGULARPOSITION1:
 : Initial angle of pole 1. Benchmark default “4.0”
MAXCARTPOSITION:
 : The maximal distance the cart is allowed to move away from its start position. Benchmark default “2.4”
MAXPOLEANGULARPOSITION1:
 : Maximal angle pole 1 is allowed to take on. Benchmark default “36.0”
MAXPOLEANGULARPOSITION2:
 : Maximal angle pole 2 is allowed to take on. Benchmark default “36.0”
MAXSTEPS:: The number of steps the agent must balance the poles. Benchmark default “100000”

Linear Markov Chain

A linear markov chain environment.

class worlds.linear_markov_chain.environments.linear_markov_chain_environment.LinearMarkovChainEnvironment(useGUI, *args, **kwargs)

A linear markov chain.

The agent starts in the middle of this linear markov chain. He can either move right or left. The chain is not stochastic, i.e. when the agent wants to move right, the state is decreased with probability 1 by 1. When the agent wants to move left, the state is increased with probability 1 by 1 accordingly.

New in version 0.9.10: Added LinearMarkovChain environment

CONFIG DICT
length:: The number of states of the linear markov chain

Maze 2D

Two-dimensional maze world environment.

class worlds.maze2d.environments.maze2d_environment.Maze2dEnvironment(useGUI, *args, **kwargs)

The two-dimensional maze environment for an agent without orientation.

A 2d maze world, in which the agent is situated at each moment in time in a certain field (specified by its (row,column) coordinate) and can move either upwards, downwards, left or right. The structure of the maze can be configured via a text-based config file.

CONFIG DICT
episodesUntilDoorChange:
 : Episodes that the door will remain in their initial state. After this number of episodes, the door state is inverted.
MAZE:: Name of the config file, where the maze is defined. These files are located in folder ‘worlds/maze2d’

Maze Cliff

This module contains an implementation of the maze 2d dynamics which can be used in a world of the mmlf.framework.

2008-03-08, Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)

class worlds.maze_cliff.environments.maze_cliff_environment.MazeCliffEnvironment(useGUI, *args, **kwargs)

The two-dimensional maze cliff environment.

In this maze, there are two alternative ways from the start to the goal state: one short way which leads along a dangerous cliff and one long but secure way. If the agent happens to step into the maze, it will get a huge negative reward (configurable via cliffPenalty) and is reset into the start state. Per default, the maze is deterministic, i.e. the agent always moves in the direction it chooses. However, the parameter stochasticity allows to control the stochasticity of the environment. For instance, when stochasticity is set to 0.01, the the agent performs a random move instead of the chosen one with probability 0.01.

The maze structure is as follows where “S” is the start state, “G” the goal state and “C” is a cliff field: ********** * * * * * * SCCCCCCCCCCG **********

CONFIG DICT
cliffPenalty:: The reward an agent obtains when stepping into the cliff area
stochasticity:: The stochasticity of the state transition matrix. With probability 1-stochasticity the desired transition is made, otherwise a random transition

Mountain Car

Module that implements the mountain car environment and its dynamics.

class worlds.mountain_car.environments.mcar_env.MountainCarEnvironment(config, useGUI, *args, **kwargs)

The mountain car environment.

In the mountain car environment, the agent has to control a car which is situated somewhere in a valley between two hills. The goal of the agent is to reach the top of the right hill. Unfortunately, the engine of the car is not strong enough to reach the top of the hill directly from many start states. Thus, it has first to drive in the wrong direction to gather enough potential energy.

The agent can either accelerate left, right, or coast. Thus, the action space is discrete with three discrete actions. The agent observes two continuous state components: The current position and velocity of the car. The start state of the car is stochastically initialised.

CONFIG DICT
maxStepsPerEpisode:
 : The number of steps the agent has maximally to reach the goal. Benchmark default is “500”.
accelerationFactor:
 : A factor that influences how strong the cars engine is relative to the slope of the hill. Benchmark default is “0.001”.
maxGoalVelocity:
 : Maximum velocity the agent might have when reaching the goal. If smaller than 0.07, this effectively makes the task MountainPark instead of MountainCar. Benchmark default is “0.07”
positionNoise:: Noise that is added to the agent’s observation of the position. Benchmark default is “0.0”
velocityNoise:: Noise that is added to the agent’s observation of the velocity. Benchmark default is “0.0”

Partially-observable Double Pole Balancing

Module that implements the partially observable double pole balancing.

class worlds.po_double_pole_balancing.environments.po_double_pole_balancing_environment.PODoublePoleBalancingEnvironment(useGUI, *args, **kwargs)

The partially observable double pole balancing environment

In the partially observable double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region. In contrast to the fully observable double pole balancing environment, the agent only observes the current position of cart and the two poles but not their velocities. This renders the problem to be not markovian.

The agent can apply in every time step a force between -10N and 10N in order to accelerate the car. Thus the action space is one-dimensional and continuous. The state consists of the cart’s current position and velocity as well as the poles’ angles and angular velocities. Thus, the state space is six-dimensional and continuous.

The config dict of the environment expects the following parameters:

CONFIG DICT
GRAVITY:: The gravity force. Benchmark default “-9.8”.
MASSCART:: The mass of the cart. Benchmark default “1.0”.
TAU:: The time step between two commands of the agent. Benchmark default “0.02”
MASSPOLE_1:: The mass of pole 1. Benchmark default “0.1”
MASSPOLE_2:: The mass of pole 2. Benchmark default “0.01”
LENGTH_1:: The length of pole 1. Benchmark default “0.5”
LENGTH_2:: The length of pole 2. Benchmark default “0.05”
MUP:: Coefficient of friction of the poles’ hinges. Benchmark default “0.000002”
MUC:: Coefficient that controls friction. Benchmark default “0.0005”
INITIALPOLEANGULARPOSITION1:
 : Initial angle of pole 1. Benchmark default “4.0”
MAXCARTPOSITION:
 : The maximal distance the cart is allowed to move away from its start position. Benchmark default “2.4”
MAXPOLEANGULARPOSITION1:
 : Maximal angle pole 1 is allowed to take on. Benchmark default “36.0”
MAXPOLEANGULARPOSITION2:
 : Maximal angle pole 2 is allowed to take on. Benchmark default “36.0”
MAXSTEPS:: The number of steps the agent must balance the poles. Benchmark default “100000”

Pinball

Module that contains the the pinball maze environment.

class worlds.pinball_maze.environments.pinball_maze_environment.PinballMazeEnvironment(useGUI, *args, **kwargs)

The pinball maze environment class.

The pinball maze environment class.

See also

George Konidaris and Andrew G Barto “Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining” in “Advances in Neural Information Processing Systems”, 2009

New in version 0.9.9.

CONFIG DICT
DRAG:: Factor that slows the ball each time step (multiplied to velocity after each step)
NOISE:: gaussian noise with MU_POS for position [x,y] and MU_VEL for velocity [xdot,ydot]; as simplification the covariance matrix is just a unit matrix multiplied with SIGMA
THRUST_PENALTY:: Reward the agent gains each time it accelerates the ball
STEP_PENALTY:: Reward the agent gains each time step it not thrusts or terminates
END_EPISODE_REWARD:
 : Reward the agent gains if the ball reaches the goal
SUBSTEPS:: number of dynamic steps of the environment between each of the agent’s actions
MAZE:: Name of the config file, where the maze is defined. These files are located in folder ‘worlds/pinball_maze’

Seventeen and Four

Module that contains the seventeen & four environment

This module contains a simplified implementation of the card game seventeen & four, in which the agent takes the role of the player and plays against a hard-coded dealer.

class worlds.seventeen_and_four.environments.seventeen_and_four.SeventeenAndFourEnvironment(useGUI, *args, **kwargs)

The seventeen & four environment

This environment implements a simplified form of the card game seventeen & four, in which the agent takes the role of the player and plays against a hard-coded dealer.

The player starts initially with two randomly drawn card with values of 2,3,4,7,8,9,10 or 11. The goal is get a set of cards whose sum is as close as possible to 21. The agent can stick with two cards or draw arbitrarily many cards sequentially. If the sum of cards becomes greater than 21, the agent looses and gets a reward of -1. If the agent stops with cards less valued than 22, a hard-coded dealer policy starts playing against the agent. This dealer draws card until it has either equal/more points than the agent or more than 21. In the first case, the dealer wins and the agent gets a reward of -1, otherwise the player wins and gets a reward of 0.

Single Pole Balancing

Module that implements the single pole balancing environment and its dynamics.

class worlds.single_pole_balancing.environments.single_pole_balancing_environment.SinglePoleBalancingEnvironment(useGUI, *args, **kwargs)

The single pole balancing environment.

In the single pole balancing environment, the task of the agent is to control a cart such that a pole which is mounted on the cart stays in a nearly vertical position (to balance it). At the same time, the cart has to stay in a confined region.

The agent can apply in every time step a force between -2N and 2N in order to accelerate the car. Thus the action space is one-dimensional and continuous. The state consists of the cart’s current position and velocity as well as the pole’s angle and angular velocity. Thus, the state space is four-dimensional and continuous.

CONFIG DICT
GRAVITY:: The gravity force. Benchmark default “-9.8”
MASSCART:: The mass of the cart. Benchmark default “1.0”
MASSPOLE:: The mass of the pole. Benchmark default “0.1”
TOTAL_MASS:: The total mass (pole + cart). Benchmark default “1.1”
LENGTH:: The length of the pole. Benchmark default “0.5”
POLEMASS_LENGTH:
 : The center of mass of the pole. Benchmark default “0.05”
TAU:: The time step between two commands of the agent. Benchmark default “0.02”
MAXCARTPOSITION:
 : The maximal distance the cart is allowed to move away from its start position. Benchmark default “7.5”
MAXPOLEANGULARPOSITION:
 : Maximal angle the pole is allowed to take on. Benchmark default “0.7”
MAXSTEPS:: The number of steps the agent must balance the poles. Benchmark default “100000”