.. _writing_environments:

Writing an environment
=======================
This tuturial will explain how you can implement your own environment for the MMLF. 

.. note::
     Implementing a new environment is easier with a local installation of the MMLF (see :ref:`Installation Tutorial <installation>`).

.. seealso::
   Get an overview over the existing environments in :ref:`environment_list`

Learning about the basic structure of MMLF environments
--------------------------------------------------------

To begin, please take a look into the worlds/linear_markov_chain/environments subdirectory of the MMLF and open the linear_markov_chain_environment.py in the python editor of your choice. The :ref:`Linear Markov Chain <linear_markov_chain_environment>` is a quite simple and straightforward environment which demonstrates well the inner life of an environment.

What you can learn from the environment is the following:
  * Each environment has to be a subclass of SingleAgentEnvironment
  * Each environment class must have a static attribute DEFAULT_CONFIG_DICT, which contains the parameters that are available for customizing the environment and their default values.
  * The __init__ method gets passed additional arguments (``*args``) and keyword arguments (``**kwargs``). These MUST be passed on to the superclass' constructor using ``super(SingleAgentEnvironment, self).__init__(useGUI, *args, **kwargs)``
  * Each environment must have an EnvironmentInfo attribute that specifies which communication protocol the environment supports, which capabilities agents must have that can be used in this environment etc.
  * The __init__ method defines  :ref:`state space <state_spaces>` and  :ref:`action space <action_spaces>` of the environment as well as its initial state. In the most simple form, these spaces are defined as dicts that map dimension name onto a pair specifying whether the dimension has discrete or continuous values and which values may occur (so-called 'old-style' spaces).
  * The evaluateAction(self, actionObject) method is called to compute the effect of an action chosen by the agent onto the environment. The state transition is computed, and whether an episode has finished (i.e. whether a terminal state has been reached) is checked. Depending on this, the reward is computed. A dictionary containing the immediate reward, the terminal state (if one is reached; otherwise None), the current state (possibly the initial state of the next episode if the episode has been terminated), and a boolean indicating whether a new episodes starts is returned.
  * In each environment module, the module-level attribute EnvironmentClass needs set to the class that inherits from SingleAgentEnvironment. This assignment is located usually at the end of the module: EnvironmentClass = LinearMarkovChainEnvironment
  * Furthermore, the module-level attribute EnvironmentName should be set to the name of the environment, e.g. EnvironmentName = "Linear Markov Chain". This name is used for instance in the GUI.
  * The environment can send messages to the logger by calling "self.environmentLog.info(message)"

Writing a new MMLF environment
-------------------------------

For writing a new MMLF environment, the following steps must be executed:
 #. Go into the worlds subdirectory of the MMLF and create a new world directory (e.g. example_world). Make this subdirectory a python package by adding an empty __init__.py file. Create a subdirectory "environments" in the world directory. In this "environments" subdirectory, create again an empty __init__.py file and a file that contains the actual python environment module (e.g. example_environment.py)
 #. Open the example_environment.py file. In this file, you have to implement a subclass of SingleAgentEnvironment. Lets call this subclass ExampleEnvironment.
 #. The environment class must have a class-attribute DEFAULT_CONFIG_DICT, which is a dictionary that contains the parameters that are available for customizing the environment and their default values. These parameters can be later on configured, e.g., in the MMLF GUI. Each parameter that can customize the behaviour of your enviromment should be contained in this dictionary. If your environment has no parameters, you can simply set "DEFAULT_CONFIG_DICT = {}"
 #. In the __init__ method of the class, you have to specify EnvironmentInfo. Adapt this object such that reflects the demands your environment poses onto agents that can be used in it.
 #. State- and ActionSpace must be defined. These can be either defined by defining each of their dimensions explicitly (see :ref:`state_and_action_spaces`) and adding them to the spaces or by defining the spaces directly the "old-style" way. Such an old-style definition is a dictionary mapping the dimension names to a shorthand definition of them::

      {"column": ("discrete", [0,1,2,3]),
      "row": ("discrete", [3,4,5])}
    This defines a space with two discrete dimensions with the names "column" and "row". The "column" dimension can take on the values 0,1,2, and 3 and the "row" dimension the values 3,4, and 5.
 #. The *getInitialState* method must be implemented: This method is used for sampling a start state at the beginning of each episode. This state is currently NOT an MMLF state object but a dictionary which maps dimension name to dimension value.  This may change in future releases of the MMLF.
 #. The *evaluateAction* method is the place where the actual dynamics of the environment are implemented. It gets as parameter the actionObject chosen by the agent. This actionObject is a dictionary mapping the action space dimensions onto the values chosen by the agent for the respective dimension. Thus, via "actionObject['force']", one could access the force the agent has chosen (let force be an action space dimension). The implementation of the method depends on your environment; important is that a state transition has to happen and a reward must be computed. The method must thus return a dictionary containing the immediate reward, the terminal state (if one is reached; otherwise None), the current state (possibly the initial state of the next episode if the episode has been terminated), and a boolean indicating whether a new episodes starts is.
 #. Create an module attribute EnvironmentClass and assign the environment class to it: "EnvironmentClass = ExampleEnvironment"
 #. Create an module attribute EnvironmentName and assign the environment's name to it: "EnvironmentName = "Example""


.. _linear_markov_chain_environment:

LinearMarkovChainEnvironment
-----------------------------

.. literalinclude:: ../../mmlf/worlds/linear_markov_chain/environments/linear_markov_chain_environment.py
   :language: python