Writing an environment

This tuturial will explain how you can implement your own environment for the MMLF.

Note

Implementing a new environment is easier with a local installation of the MMLF (see Installation Tutorial).

See also

Get an overview over the existing environments in Existing environments

Learning about the basic structure of MMLF environments

To begin, please take a look into the worlds/linear_markov_chain/environments subdirectory of the MMLF and open the linear_markov_chain_environment.py in the python editor of your choice. The Linear Markov Chain is a quite simple and straightforward environment which demonstrates well the inner life of an environment.

What you can learn from the environment is the following:
  • Each environment has to be a subclass of SingleAgentEnvironment
  • Each environment class must have a static attribute DEFAULT_CONFIG_DICT, which contains the parameters that are available for customizing the environment and their default values.
  • The __init__ method gets passed additional arguments (*args) and keyword arguments (**kwargs). These MUST be passed on to the superclass’ constructor using super(SingleAgentEnvironment, self).__init__(useGUI, *args, **kwargs)
  • Each environment must have an EnvironmentInfo attribute that specifies which communication protocol the environment supports, which capabilities agents must have that can be used in this environment etc.
  • The __init__ method defines state space and action space of the environment as well as its initial state. In the most simple form, these spaces are defined as dicts that map dimension name onto a pair specifying whether the dimension has discrete or continuous values and which values may occur (so-called ‘old-style’ spaces).
  • The evaluateAction(self, actionObject) method is called to compute the effect of an action chosen by the agent onto the environment. The state transition is computed, and whether an episode has finished (i.e. whether a terminal state has been reached) is checked. Depending on this, the reward is computed. A dictionary containing the immediate reward, the terminal state (if one is reached; otherwise None), the current state (possibly the initial state of the next episode if the episode has been terminated), and a boolean indicating whether a new episodes starts is returned.
  • In each environment module, the module-level attribute EnvironmentClass needs set to the class that inherits from SingleAgentEnvironment. This assignment is located usually at the end of the module: EnvironmentClass = LinearMarkovChainEnvironment
  • Furthermore, the module-level attribute EnvironmentName should be set to the name of the environment, e.g. EnvironmentName = “Linear Markov Chain”. This name is used for instance in the GUI.
  • The environment can send messages to the logger by calling “self.environmentLog.info(message)”

Writing a new MMLF environment

For writing a new MMLF environment, the following steps must be executed:
  1. Go into the worlds subdirectory of the MMLF and create a new world directory (e.g. example_world). Make this subdirectory a python package by adding an empty __init__.py file. Create a subdirectory “environments” in the world directory. In this “environments” subdirectory, create again an empty __init__.py file and a file that contains the actual python environment module (e.g. example_environment.py)

  2. Open the example_environment.py file. In this file, you have to implement a subclass of SingleAgentEnvironment. Lets call this subclass ExampleEnvironment.

  3. The environment class must have a class-attribute DEFAULT_CONFIG_DICT, which is a dictionary that contains the parameters that are available for customizing the environment and their default values. These parameters can be later on configured, e.g., in the MMLF GUI. Each parameter that can customize the behaviour of your enviromment should be contained in this dictionary. If your environment has no parameters, you can simply set “DEFAULT_CONFIG_DICT = {}”

  4. In the __init__ method of the class, you have to specify EnvironmentInfo. Adapt this object such that reflects the demands your environment poses onto agents that can be used in it.

  5. State- and ActionSpace must be defined. These can be either defined by defining each of their dimensions explicitly (see State and Action Spaces) and adding them to the spaces or by defining the spaces directly the “old-style” way. Such an old-style definition is a dictionary mapping the dimension names to a shorthand definition of them:

    {"column": ("discrete", [0,1,2,3]),
    "row": ("discrete", [3,4,5])}
    

    This defines a space with two discrete dimensions with the names “column” and “row”. The “column” dimension can take on the values 0,1,2, and 3 and the “row” dimension the values 3,4, and 5.

  6. The getInitialState method must be implemented: This method is used for sampling a start state at the beginning of each episode. This state is currently NOT an MMLF state object but a dictionary which maps dimension name to dimension value. This may change in future releases of the MMLF.

  7. The evaluateAction method is the place where the actual dynamics of the environment are implemented. It gets as parameter the actionObject chosen by the agent. This actionObject is a dictionary mapping the action space dimensions onto the values chosen by the agent for the respective dimension. Thus, via “actionObject[‘force’]”, one could access the force the agent has chosen (let force be an action space dimension). The implementation of the method depends on your environment; important is that a state transition has to happen and a reward must be computed. The method must thus return a dictionary containing the immediate reward, the terminal state (if one is reached; otherwise None), the current state (possibly the initial state of the next episode if the episode has been terminated), and a boolean indicating whether a new episodes starts is.

  8. Create an module attribute EnvironmentClass and assign the environment class to it: “EnvironmentClass = ExampleEnvironment”

  9. Create an module attribute EnvironmentName and assign the environment’s name to it: “EnvironmentName = “Example”“

LinearMarkovChainEnvironment

# Maja Machine Learning Framework
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.

# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.

# Author: Jan Hendrik Metzen  (jhm@informatik.uni-bremen.de)
# Created: 2011/04/05
""" A linear markov chain environment. """

__author__ = "Jan Hendrik Metzen"
__copyright__ = "Copyright 2011, University Bremen, AG Robotics"
__credits__ = ['Mark Edgington']
__license__ = "GPLv3"
__version__ = "1.0"
__maintainer__ = "Jan Hendrik Metzen"
__email__ = "jhm@informatik.uni-bremen.de"

from copy import deepcopy

from mmlf.framework.spaces import StateSpace, ActionSpace
from mmlf.framework.protocol import EnvironmentInfo
from mmlf.environments.single_agent_environment import SingleAgentEnvironment

# Each environment has to inherit directly or indirectly from SingleAgentEnvironment
class LinearMarkovChainEnvironment(SingleAgentEnvironment):
    """ A linear markov chain.
    
    The agent starts in the middle of this linear markov chain. He can either
    move right or left. The chain is not stochastic, i.e. when the agent 
    wants to move right, the state is decreased with probability 1 by 1.  
    When the agent wants to move left, the state is increased with probability 1
    by 1 accordingly.
    
    .. versionadded:: 0.9.10
       Added LinearMarkovChain environment
    
    **CONFIG DICT**
        :length: : The number of states of the linear markov chain
    
    """
    
    # Add default configuration for this environment to this static dict
    # This specific parameter controls how long the linear markov chain is
    # (i.e. how many states there are)
    DEFAULT_CONFIG_DICT = {"length" : 21}
    
    def __init__(self, useGUI, *args, **kwargs):
        # Create the environment info
        self.environmentInfo = \
            EnvironmentInfo(# Which communication protocol version can the 
                            # environment handle?
                            versionNumber="0.3",
                            # Name of the environment (can be chosen arbitrarily) 
                            environmentName="LinearMarkovChain",
                            # Is the action space of this environment discrete?
                            discreteActionSpace=True,
                            # Is the environment episodic?
                            episodic=True,
                            # Is the state space of environment continuous?
                            continuousStateSpace=False,
                            # Is the action space of environment continuous?
                            continuousActionSpace=False,
                            # Is the environment stochastic?
                            stochastic=False)

        # Calls constructor of base class
        # After this call, the environment has an attribute "self.configDict",
        # The values of this dict are evaluated, i.e. instead of '100' (string),
        # the key 'length' will have the same value 100 (int).
        super(LinearMarkovChainEnvironment, self).__init__(useGUI=useGUI, *args, **kwargs)
               
        # The state space of the linear markov chain
        oldStyleStateSpace =  {"field": ("discrete", range(self.configDict["length"]))}
        
        self.stateSpace = StateSpace()
        self.stateSpace.addOldStyleSpace(oldStyleStateSpace, limitType="soft")
        
        # The action space of the linear markov chain
        oldStyleActionSpace =  {"action": ("discrete", ["left", "right"])}
        
        self.actionSpace = ActionSpace()
        self.actionSpace.addOldStyleSpace(oldStyleActionSpace, limitType="soft")
        
        # The initial state of the environment
        self.initialState =  {"field": self.configDict["length"] / 2}
        # The current state is initially set to the initial state
        self.currentState = deepcopy(self.initialState)

    ########################## Interface Functions #####################################
    def getInitialState(self):
        """ Returns the initial state of the environment """
        self.environmentLog.debug("Episode starts in state '%s'." 
                                    % (self.initialState['field']))
        return self.initialState
    
    def evaluateAction(self, actionObject):
        """ Execute an agent's action in the environment.
        
        Take an actionObject containing the action of an agent, and evaluate 
        this action, calculating the next state, and the reward the agent 
        should receive for having taken this action.
        
        Additionally, decide whether the episode should continue,
        or end after the reward has been  issued to the agent.
        
        This method returns a dictionary with the following keys:
           :rewardValue: : An integer or float representing the agent's reward.
                           If rewardValue == None, then no reward is given to the agent.
           :startNewEpisode: : True if the agent's action has caused an episode
                               to get finished.
           :nextState: : A State object which contains the state the environment
                         takes on after executing the action. This might be the
                         initial state of the next episode if a new episode
                         has just started (startNewEpisode == True)
           :terminalState: : A State object which contains the terminal state 
                             of the environment in the last episode if a new 
                             episode has just started (startNewEpisode == True). 
                             Otherwise None.        
        """
        action = actionObject['action']
        previousState = self.currentState['field']
        
        # Change state of environment deterministically
        if action == 'left':
            self.currentState['field'] -= 1
        else:
            self.currentState['field'] += 1
            
        self.environmentLog.debug("Agent chose action '%s' which caused a transition from '%s' to '%s'." 
                                    % (action, previousState, self.currentState['field']))
        
        #Check if the episode is finished (i.e. the goal is reached)
        episodeFinished = self._checkEpisodeFinished()
        
        terminalState = self.currentState if episodeFinished else None
        
        if episodeFinished:
            self.episodeLengthObservable.addValue(self.episodeCounter,
                                                  self.stepCounter + 1)
            self.returnObservable.addValue(self.episodeCounter,
                                           -self.stepCounter)
            self.environmentLog.debug("Terminal state '%s' reached." 
                                            % self.currentState['field'])
            self.environmentLog.info("Episode %s lasted for %s steps." 
                                     % (self.episodeCounter, self.stepCounter  + 1))
            
            reward = 10 if self.currentState['field'] != 0 else -10
            
            self.stepCounter = 0
            self.episodeCounter += 1
            
            # Reset the simulation to the initial state (always the same)
            self.currentState = deepcopy(self.initialState)
        else:
            reward = -1
            self.stepCounter += 1
        
        resultsDict = {"reward" : reward, 
                       "terminalState" : terminalState,
                       "nextState" : self.currentState,
                       "startNewEpisode" : episodeFinished}
        return resultsDict
        
    def _checkEpisodeFinished(self):
        """ Checks whether the episode is finished.
        
        An episode is finished whenever the leftmost or rightmost state of the
        chain is reached.
        """        
        return self.currentState['field'] in [0, self.configDict['length']-1]
    
# Each module that implements an environment must have a module-level attribute 
# "EnvironmentClass" that is set to the class that inherits from SingleAgentEnvironment
EnvironmentClass = LinearMarkovChainEnvironment
# Furthermore, the name of the environment has to be assigned to "EnvironmentName".
# This  name is used in the GUI. 
EnvironmentName = "Linear Markov Chain"