Package that contains all available MMLF world environments.
Environment ‘Maze Cliff‘ from module maze_cliff_environment implemented in class MazeCliffEnvironment.
In this maze, there are two alternative ways from the start to the goal state: one short way which leads along a dangerous cliff and one long but secure way. If the agent happens to step into the maze, it will get a huge negative reward (configurable via cliffPenalty) and is reset into the start state. Per default, the maze is deterministic, i.e. the agent always moves in the direction it chooses. However, the parameter stochasticity allows to control the stochasticity of the environment. For instance, when stochasticity is set to 0.01, the the agent performs a random move instead of the chosen one with probability 0.01.
Environment ‘Pinball 2D‘ from module pinball_maze_environment implemented in class PinballMazeEnvironment.
The pinball maze environment class.
Environment ‘Linear Markov Chain‘ from module linear_markov_chain_environment implemented in class LinearMarkovChainEnvironment.
The agent starts in the middle of this linear markov chain. He can either move right or left. The chain is not stochastic, i.e. when the agent wants to move right, the state is decreased with probability 1 by 1. When the agent wants to move left, the state is increased with probability 1 by 1 accordingly.
Environment ‘Maze 2D‘ from module maze2d_environment implemented in class Maze2dEnvironment.
A 2d maze world, in which the agent is situated at each moment in time in a certain field (specified by its (row,column) coordinate) and can move either upwards, downwards, left or right. The structure of the maze can be configured via a text-based config file.
Environment ‘Double Pole Balancing‘ from module double_pole_balancing_environment implemented in class DoublePoleBalancingEnvironment.
In the double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region.
Environment ‘Partial Observable Double Pole Balancing‘ from module po_double_pole_balancing_environment implemented in class PODoublePoleBalancingEnvironment.
In the partially observable double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region. In contrast to the fully observable double pole balancing environment, the agent only observes the current position of cart and the two poles but not their velocities. This renders the problem to be not markovian.
Environment ‘Mountain Car‘ from module mcar_env implemented in class MountainCarEnvironment.
In the mountain car environment, the agent has to control a car which is situated somewhere in a valley between two hills. The goal of the agent is to reach the top of the right hill. Unfortunately, the engine of the car is not strong enough to reach the top of the hill directly from many start states. Thus, it has first to drive in the wrong direction to gather enough potential energy.
Environment ‘Single Pole Balancing‘ from module single_pole_balancing_environment implemented in class SinglePoleBalancingEnvironment.
In the single pole balancing environment, the task of the agent is to control a cart such that a pole which is mounted on the cart stays in a nearly vertical position (to balance it). At the same time, the cart has to stay in a confined region.
Environment ‘Seventeen and Four‘ from module seventeen_and_four implemented in class SeventeenAndFourEnvironment.
This environment implements a simplified form of the card game seventeen & four, in which the agent takes the role of the player and plays against a hard-coded dealer.
Abstract base classes for environments in the MMLF.
This module defines abstract base classes from which environments in the MMLF must be derived.
SingleAgentEnvironment: | |
---|---|
: Base class for environments in which a single agent can act. |
MMLF interface for environments with a simgle agent
Each environment that should be used in the MMLF needs to be derived from this class and implement the following methods:
Interface Methods
getInitialState: Returns the initial state of the environment getStateSpace: Returns the state space of the environment getActionSpace: Returns the action space of the environment evaluateAction(actionObject): Evaluate the action defined in actionObject
Executes an action chosen by the agent.
Causes a state transition of the environment based on the specific action chosen by the agent. Depending on the successor state, the agent is rewarded, informed about the end of an episodes and/or provided with the next state.
Parameters
action: : A dictionary that specifies for each dimension of the action space the value the agent has chosen for the dimension.
Execute an agent’s action in the environment.
Take an actionObject containing the action of an agent, and evaluate this action, calculating the next state, and the reward the agent should receive for having taken this action.
Additionally, decide whether the episode should continue, or end after the reward has been issued to the agent.
rewardValue: | : An integer or float representing the agent’s reward. If rewardValue == None, then no reward is given to the agent. |
---|---|
startNewEpisode: | |
: True if the agent’s action has caused an episode to get finished. | |
nextState: | : A State object which contains the state the environment takes on after executing the action. This might be the initial state of the next episode if a new episode has just started (startNewEpisode == True) |
terminalState: | : A State object which contains the terminal state of the environment in the last episode if a new episode has just started (startNewEpisode == True). Otherwise None. |
Return the action space of this environment.
More information about action spaces can be found in State and Action Spaces
Returns the initial state of the environment
More information about (valid) states can be found in State and Action Spaces
Returns the state space of this environment.
More information about state spaces can be found in State and Action Spaces
Query the next command object for the interaction server.
Check whether an agent is compatible with this environment.
The parameter agentInfo contains informations whether an agent is suited for continuous state and/or action spaces, episodic domains etc. It is checked whether the agent has the correct capabilties for this environment. If not, an ImproperAgentException exception is thrown.
Parameters
agentInfo: : A dictionary-like object of type GiveAgentInfo that contains information regarding the agent’s capabilities.
Plot structure of state space into given axis.
Just a helper function for viewers and graphic logging.
Method which is called when the environment should be stopped
Exception thrown if an improper agent is added to a world.
Module that implements the double pole balancing environment and its dynamics.
The double pole balancing environment
In the double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region.
The agent can apply in every time step a force between -10N and 10N in order to accelerate the car. Thus the action space is one-dimensional and continuous. The state consists of the cart’s current position and velocity as well as the poles’ angles and angular velocities. Thus, the state space is six-dimensional and continuous.
The config dict of the environment expects the following parameters:
GRAVITY: | : The gravity force. Benchmark default “-9.8”. |
---|---|
MASSCART: | : The mass of the cart. Benchmark default “1.0”. |
TAU: | : The time step between two commands of the agent. Benchmark default “0.02” |
MASSPOLE_1: | : The mass of pole 1. Benchmark default “0.1” |
MASSPOLE_2: | : The mass of pole 2. Benchmark default “0.01” |
LENGTH_1: | : The length of pole 1. Benchmark default “0.5” |
LENGTH_2: | : The length of pole 2. Benchmark default “0.05” |
MUP: | : Coefficient of friction of the poles’ hinges. Benchmark default “0.000002” |
MUC: | : Coefficient that controls friction. Benchmark default “0.0005” |
INITIALPOLEANGULARPOSITION1: | |
: Initial angle of pole 1. Benchmark default “4.0” | |
MAXCARTPOSITION: | |
: The maximal distance the cart is allowed to move away from its start position. Benchmark default “2.4” | |
MAXPOLEANGULARPOSITION1: | |
: Maximal angle pole 1 is allowed to take on. Benchmark default “36.0” | |
MAXPOLEANGULARPOSITION2: | |
: Maximal angle pole 2 is allowed to take on. Benchmark default “36.0” | |
MAXSTEPS: | : The number of steps the agent must balance the poles. Benchmark default “100000” |
A linear markov chain environment.
A linear markov chain.
The agent starts in the middle of this linear markov chain. He can either move right or left. The chain is not stochastic, i.e. when the agent wants to move right, the state is decreased with probability 1 by 1. When the agent wants to move left, the state is increased with probability 1 by 1 accordingly.
New in version 0.9.10: Added LinearMarkovChain environment
length: | : The number of states of the linear markov chain |
---|
Two-dimensional maze world environment.
The two-dimensional maze environment for an agent without orientation.
A 2d maze world, in which the agent is situated at each moment in time in a certain field (specified by its (row,column) coordinate) and can move either upwards, downwards, left or right. The structure of the maze can be configured via a text-based config file.
episodesUntilDoorChange: | |
---|---|
: Episodes that the door will remain in their initial state. After this number of episodes, the door state is inverted. | |
MAZE: | : Name of the config file, where the maze is defined. These files are located in folder ‘worlds/maze2d’ |
This module contains an implementation of the maze 2d dynamics which can be used in a world of the mmlf.framework.
2008-03-08, Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
The two-dimensional maze cliff environment.
In this maze, there are two alternative ways from the start to the goal state: one short way which leads along a dangerous cliff and one long but secure way. If the agent happens to step into the maze, it will get a huge negative reward (configurable via cliffPenalty) and is reset into the start state. Per default, the maze is deterministic, i.e. the agent always moves in the direction it chooses. However, the parameter stochasticity allows to control the stochasticity of the environment. For instance, when stochasticity is set to 0.01, the the agent performs a random move instead of the chosen one with probability 0.01.
The maze structure is as follows where “S” is the start state, “G” the goal state and “C” is a cliff field: ********** * * * * * * SCCCCCCCCCCG **********
cliffPenalty: | : The reward an agent obtains when stepping into the cliff area |
---|---|
stochasticity: | : The stochasticity of the state transition matrix. With probability 1-stochasticity the desired transition is made, otherwise a random transition |
Module that implements the mountain car environment and its dynamics.
The mountain car environment.
In the mountain car environment, the agent has to control a car which is situated somewhere in a valley between two hills. The goal of the agent is to reach the top of the right hill. Unfortunately, the engine of the car is not strong enough to reach the top of the hill directly from many start states. Thus, it has first to drive in the wrong direction to gather enough potential energy.
The agent can either accelerate left, right, or coast. Thus, the action space is discrete with three discrete actions. The agent observes two continuous state components: The current position and velocity of the car. The start state of the car is stochastically initialised.
maxStepsPerEpisode: | |
---|---|
: The number of steps the agent has maximally to reach the goal. Benchmark default is “500”. | |
accelerationFactor: | |
: A factor that influences how strong the cars engine is relative to the slope of the hill. Benchmark default is “0.001”. | |
maxGoalVelocity: | |
: Maximum velocity the agent might have when reaching the goal. If smaller than 0.07, this effectively makes the task MountainPark instead of MountainCar. Benchmark default is “0.07” | |
positionNoise: | : Noise that is added to the agent’s observation of the position. Benchmark default is “0.0” |
velocityNoise: | : Noise that is added to the agent’s observation of the velocity. Benchmark default is “0.0” |
Module that implements the partially observable double pole balancing.
The partially observable double pole balancing environment
In the partially observable double pole balancing environment, the task of the agent is to control a cart such that two poles which are mounted on the cart stay in a nearly vertical position (to balance them). At the same time, the cart has to stay in a confined region. In contrast to the fully observable double pole balancing environment, the agent only observes the current position of cart and the two poles but not their velocities. This renders the problem to be not markovian.
The agent can apply in every time step a force between -10N and 10N in order to accelerate the car. Thus the action space is one-dimensional and continuous. The state consists of the cart’s current position and velocity as well as the poles’ angles and angular velocities. Thus, the state space is six-dimensional and continuous.
The config dict of the environment expects the following parameters:
GRAVITY: | : The gravity force. Benchmark default “-9.8”. |
---|---|
MASSCART: | : The mass of the cart. Benchmark default “1.0”. |
TAU: | : The time step between two commands of the agent. Benchmark default “0.02” |
MASSPOLE_1: | : The mass of pole 1. Benchmark default “0.1” |
MASSPOLE_2: | : The mass of pole 2. Benchmark default “0.01” |
LENGTH_1: | : The length of pole 1. Benchmark default “0.5” |
LENGTH_2: | : The length of pole 2. Benchmark default “0.05” |
MUP: | : Coefficient of friction of the poles’ hinges. Benchmark default “0.000002” |
MUC: | : Coefficient that controls friction. Benchmark default “0.0005” |
INITIALPOLEANGULARPOSITION1: | |
: Initial angle of pole 1. Benchmark default “4.0” | |
MAXCARTPOSITION: | |
: The maximal distance the cart is allowed to move away from its start position. Benchmark default “2.4” | |
MAXPOLEANGULARPOSITION1: | |
: Maximal angle pole 1 is allowed to take on. Benchmark default “36.0” | |
MAXPOLEANGULARPOSITION2: | |
: Maximal angle pole 2 is allowed to take on. Benchmark default “36.0” | |
MAXSTEPS: | : The number of steps the agent must balance the poles. Benchmark default “100000” |
Module that contains the the pinball maze environment.
The pinball maze environment class.
The pinball maze environment class.
See also
George Konidaris and Andrew G Barto “Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining” in “Advances in Neural Information Processing Systems”, 2009
New in version 0.9.9.
DRAG: | : Factor that slows the ball each time step (multiplied to velocity after each step) |
---|---|
NOISE: | : gaussian noise with MU_POS for position [x,y] and MU_VEL for velocity [xdot,ydot]; as simplification the covariance matrix is just a unit matrix multiplied with SIGMA |
THRUST_PENALTY: | : Reward the agent gains each time it accelerates the ball |
STEP_PENALTY: | : Reward the agent gains each time step it not thrusts or terminates |
END_EPISODE_REWARD: | |
: Reward the agent gains if the ball reaches the goal | |
SUBSTEPS: | : number of dynamic steps of the environment between each of the agent’s actions |
MAZE: | : Name of the config file, where the maze is defined. These files are located in folder ‘worlds/pinball_maze’ |
Module that contains the seventeen & four environment
This module contains a simplified implementation of the card game seventeen & four, in which the agent takes the role of the player and plays against a hard-coded dealer.
The seventeen & four environment
This environment implements a simplified form of the card game seventeen & four, in which the agent takes the role of the player and plays against a hard-coded dealer.
The player starts initially with two randomly drawn card with values of 2,3,4,7,8,9,10 or 11. The goal is get a set of cards whose sum is as close as possible to 21. The agent can stick with two cards or draw arbitrarily many cards sequentially. If the sum of cards becomes greater than 21, the agent looses and gets a reward of -1. If the agent stops with cards less valued than 22, a hard-coded dealer policy starts playing against the agent. This dealer draws card until it has either equal/more points than the agent or more than 21. In the first case, the dealer wins and the agent gets a reward of -1, otherwise the player wins and gets a reward of 0.
Module that implements the single pole balancing environment and its dynamics.
The single pole balancing environment.
In the single pole balancing environment, the task of the agent is to control a cart such that a pole which is mounted on the cart stays in a nearly vertical position (to balance it). At the same time, the cart has to stay in a confined region.
The agent can apply in every time step a force between -2N and 2N in order to accelerate the car. Thus the action space is one-dimensional and continuous. The state consists of the cart’s current position and velocity as well as the pole’s angle and angular velocity. Thus, the state space is four-dimensional and continuous.
GRAVITY: | : The gravity force. Benchmark default “-9.8” |
---|---|
MASSCART: | : The mass of the cart. Benchmark default “1.0” |
MASSPOLE: | : The mass of the pole. Benchmark default “0.1” |
TOTAL_MASS: | : The total mass (pole + cart). Benchmark default “1.1” |
LENGTH: | : The length of the pole. Benchmark default “0.5” |
POLEMASS_LENGTH: | |
: The center of mass of the pole. Benchmark default “0.05” | |
TAU: | : The time step between two commands of the agent. Benchmark default “0.02” |
MAXCARTPOSITION: | |
: The maximal distance the cart is allowed to move away from its start position. Benchmark default “7.5” | |
MAXPOLEANGULARPOSITION: | |
: Maximal angle the pole is allowed to take on. Benchmark default “0.7” | |
MAXSTEPS: | : The number of steps the agent must balance the poles. Benchmark default “100000” |