Writing an agent

This tuturial will explain how you can write your own learning agent for the MMLF.

Note

Writing an agent is easier with a local installation of the MMLF (see Installation Tutorial).

See also

Get an overview over the existing agents in Existing agents

Learning about the basic structure of MMLF agents

For the start, please take a look into the agents subdirectory of the MMLF and open the random_agent.py in the python editor of your choice. The RandomAgent is a quite simple and straightforward agent which demonstrates well the inner life of an agent (though an intelligent agent might choose his actions differently ;-)).

What you can learn from the agent is the following:
  • Each agent has to be a subclass of AgentBase
  • Each agent class must have a static attribute DEFAULT_CONFIG_DICT, which contains the parameters that are available for customizing the agent’s behaviour and their default values.
  • The __init__ method gets passed additional arguments (*args) and keyword arguments (**kwargs). These MUST be passed on to the superclass’ constructor using super(RandomAgent, self).__init__(*args, **kwargs)
  • Each agent must have an AgentInfo attribute that specifies in which kinds of environments it can be used, which communication protocol it supports etc.
  • The setStateSpace(self, stateSpace) method is called to inform the agent about the structure of the state space. A default implementation of this method is contained in the AgentBase class; if this implementation is sufficient the method need not be implemented again.
  • The setActionSpace(self, actionSpace) method is called to inform the agent about the structure of the action space. A default implementation of this method is contained in the AgentBase class; if this implementation is sufficient the method need not be implemented again.
  • The setState(self, state) method is called to inform the agent about the current state of the environment. To correctly interpret the state, the agent has to use the definition of the state space.
  • The getAction(self) method is called to ask the agent for the action he wants to perform. The agent should store its decision in a dictionary which maps action dimension name to the chosen value for this dimension. For instance: {“gasPedalForce”: “extreme”, “steeringWheelAngle”: 30} (see page action space). This action dictionary must be converted to an ActionTaken object via the method _generateActionObject(actionDictionary) of AgentBase.
  • The giveReward(self, reward) method is called to reward the agent for its last action(s). The passed reward is a float value. The agent can treat this reward in different ways, e.g. accumulate it, use it directly for policy optimization etc.
  • The nextEpisodeStarted(self) method is called whenever one epsiode is over and the next one is started, i.e. when the environment is reset. In this method, you should finish all calculations which make only sense during one episode (such as accumulating the reward obtained during one episode).
  • In each agent module, the module-level attribute AgentClass needs set to the class that inherits from AgentBase. This assignment is located usually at the end of the module: AgentClass = RandomAgent
  • Furthermore, the module-level attribute AgentName should be set to the name of the agent, e.g. AgentName = “Random”. This name is used for instance in the GUI.
  • The agent can send messages to the logger by calling “self.agentLog.info(message)”

Writing a new MMLF agent

Let’s write a new agent. This agent should execute actions independent of the state in a round-robin like manner, i.e. when there are three available actions a1, a2, a3, the agent should choose actions in this sequence: a1,a2,a3,a1,a2,a3,... Obviously this is not a very clever approach, but for the tutorial it should suffice.

In order to implement a new agent that chooses actions in a round-robin like manner, you have to do the following:
  1. Go into the agents subdirectory of the MMLF and create a copy of the random_agent.py (let’s call this copy example_agent.py).

  2. Open example_agent.py and rename the agent class from RandomAgent to ExampleAgent. Replace every occurrence of RandomAgent by ExampleAgent.

  3. Set “DEFAULT_CONFIG_DICT = {}”, since the agent does not have any configuration options.

  4. Set “continuousAction = False”, since the round-robin action selection is only possible for a finite (non-continuous) action set.

  5. Add the following lines add the end of “setActionSpace”:

    # We can only deal with one-dimensional action spaces
    assert self.actionSpace.getNumberOfDimensions() == 1
    # Get a list of all actions this agent might take
    self.actions = self.actionSpace.getActionList()
    # Get name of action dimension
    self.actionDimensionName = self.actionSpace.getDimensionNames()[0]
    # Create an iterator that iterates in a round-robin manner over available actions
    self.nextActionIterator = __import__("itertools").cycle(self.actions)
    
  6. Reimplement the method “getAction()” as follows:

    def getAction(self, **kwargs):
        """ Request the next action the agent want to execute """
        # Get next action  from iterator
        # We are only interested in the value of the first (and only) dimension,
        # thus the "0"
        nextAction = self.nextActionIterator.next()[0]
        # Create a dictionary that maps dimension name to chosen action
        actionDictionary = {self.actionDimensionName : nextAction}
        # Generate mmlf.framework.protocol.ActionTaken object
        return self._generateActionObject(actionDictionary)
    
  7. Remove superfluous methods “setStateSpace()”, “setState()”, “giveReward”, and “nextEpisodeStarted()”. They can use the default implementation of the AgentBase class

  8. Set thet AgentClass module attribute appropriately: “AgentClass = ExampleAgent”

  9. Set the AgentName to something meaningful: “AgentName = “RoundRobin”“

  10. Do not forget to update the comments and the documentation of your new module!

That’s it! Your agent module should now look like shown here. You can test it in the GUI by selecting “RoundRobin” as agent.

RandomAgent

# Maja Machine Learning Framework
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.

# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.

# Author: Jan Hendrik Metzen  (jhm@informatik.uni-bremen.de)
# Created: 2007/07/23

""" MMLF agent that acts randomly

This module defines a simple agent that can interact with an environment.
It chooses all available actions with the same probability.

This module deals also as an example of how to implement an MMLF agent.
"""

__author__ = "Jan Hendrik Metzen"
__copyright__ = "Copyright 2011, University Bremen, AG Robotics"
__credits__ = ['Mark Edgington']
__license__ = "GPLv3"
__version__ = "1.0"
__maintainer__ = "Jan Hendrik Metzen"
__email__ = "jhm@informatik.uni-bremen.de"

from collections import defaultdict

import mmlf.framework.protocol

from mmlf.agents.agent_base import AgentBase

# Each agent has to inherit directly or indirectly from AgentBase
class RandomAgent(AgentBase):
    """ Agent that chooses uniformly randomly among the available actions. """
    
    # Add default configuration for this agent to this static dict
    # This specific parameter controls after how many steps we send information 
    # regarding the accumulated reward to the logger.
    DEFAULT_CONFIG_DICT = {'Reward_log_frequency' : 100}
    
    def __init__(self, *args, **kwargs):
        # Create the agent info
        self.agentInfo = \
            mmlf.framework.protocol.AgentInfo(# Which communication protocol 
                                                 # version can the agent handle?
                                                 versionNumber = "0.3",
                                                 # Name of the agent (can be 
                                                 # chosen arbitrarily) 
                                                 agentName= "Random", 
                                                 # Can the agent be used in 
                                                 # environments with continuous
                                                 # state spaces?
                                                 continuousState = True,
                                                 # Can the agent be used in 
                                                 # environments with continuous
                                                 # action spaces?
                                                 continuousAction = True,
                                                 # Can the agent be used in 
                                                 # environments with discrete
                                                 # action spaces?
                                                 discreteAction = True,
                                                 # Can the agent be used in
                                                 # non-episodic environments
                                                 nonEpisodicCapable = True)
    
        # Calls constructor of base class
        # After this call, the agent has an attribute "self.configDict",
        # The values of this dict are evaluated, i.e. instead of '100' (string),
        # the key 'Reward log frequency' will have the same value 100 (int).
        super(RandomAgent, self).__init__(*args, **kwargs)
        
        # The superclass AgentBase implements the methods setStateSpace() and
        # setActionSpace() which set the attributes stateSpace and actionSpace
        # They can be overwritten if the agent has to modify these spaces
        # for some reason
        self.stateSpace = None
        self.actionSpace = None
        
        # The agent keeps track of all rewards it obtained in an episode
        # The rewardDict implements a mapping from the episode index to a list
        # of all rewards it obtained in this episode
        self.rewardDict = defaultdict(list)

    ######################  BEGIN COMMAND-HANDLING METHODS ###############################
    
    def setStateSpace(self, stateSpace):
        """ Informs the agent about the state space of the environment
        
        More information about state spaces can be found in 
        :ref:`state_and_action_spaces`
        """
        # We delegate to the superclass, which does the following:
        # self.stateSpace = stateSpace
        # We need not implement this method for this, but it is given in order
        # to show what is going on...
        super(RandomAgent, self).setStateSpace(stateSpace) 

        
    def setActionSpace(self, actionSpace):
        """ Informs the agent about the action space of the environment
        
        More information about action spaces can be found in 
        :ref:`state_and_action_spaces`
        """
        # We delegate to the superclass, which does the following:
        # self.actionSpace = actionSpace
        # We need not implement this method for this, but it is given in order
        # to show what is going on...
        super(RandomAgent, self).setActionSpace(actionSpace) 
    
    def setState(self, state):
        """ Informs the agent of the environment's current state 
        
        More information about (valid) states can be found in 
        :ref:`state_and_action_spaces`
        """
        # We delegate to the superclass, which does the following:
        #     self.state = self.stateSpace.parseStateDict(state) # Parse state dict
        #     self.state.scale(0, 1) # Scale state such that each dimension falls into the bin (0,1)
        #     self.stepCounter += 1 # Count how many steps have passed
        
        # We need not implement this method for this, but it is given in order
        # to show what is going on...
        super(RandomAgent, self).setState(state)

        
    def getAction(self):
        """ Request the next action the agent want to execute """
        # Each action of the agent corresponds to one step
        action = self._chooseRandomAction()
        
        # Call super class method since this updates some internal information
        # (self.lastState, self.lastAction, self.reward, self.state, self.action)
        super(RandomAgent, self).getAction()
        
        return action
    
    def giveReward(self, reward):
        """ Provides a reward to the agent """
        self.rewardDict[self.episodeCounter].append(reward) # remember reward
        # Send message about the accumulated reward every 
        # self.configDict['Reward log frequency'] episodes to logger 
        if self.stepCounter % self.configDict['Reward_log_frequency'] == 0:
            self.agentLog.info("Reward accumulated after %s steps in episode %s: %s" 
                                % (self.stepCounter, self.episodeCounter,
                                   sum(self.rewardDict[self.episodeCounter])))
        
    def nextEpisodeStarted(self):
        """ Informs the agent that a new episode has started."""
        # We delegate to the superclass, which does the following:
        #     self.episodeCounter += 1
        #     self.stepCounter = 0
        super(RandomAgent, self).nextEpisodeStarted()
    
    ########################  END COMMAND-HANDLING METHODS ###############################
    
    def _chooseRandomAction(self):
        "Chooses an action randomly from the action space"
        
        assert self.actionSpace, "Error: Action requested before actionSpace "\
                                 "was specified"
        
        # We sample a random action from the action space
        # This returns a dictionary with a mapping from action dimension name
        # to the sample value.
        # For instance: {"gasPedalForce": "extreme", "steeringWheelAngle": 30}
        actionDictionary = self.actionSpace.sampleRandomAction()

        # The action dictionary has to be converted into an
        # mmlf.framework.protocol.ActionTaken object.
        # This is done using the _generateActionObject method
        # of the superclass
        return self._generateActionObject(actionDictionary)

# Each module that implements an agent must have a module-level attribute 
# "AgentClass" that is set to the class that inherits from Agentbase
AgentClass = RandomAgent
# Furthermore, the name of the agent has to be assigned to "AgentName". This 
# name is used in the GUI. 
AgentName = "Random"

ExampleAgent

# Maja Machine Learning Framework
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.

# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.

# Author: Jan Hendrik Metzen  (jhm@informatik.uni-bremen.de)
# Created: 2007/07/23

""" MMLF agent that chooses actions in a round-robin manner.

This agent's sole purpose is to give an example of how to write an agent.
It should not be used for any actual learning.  
"""

__author__ = "Jan Hendrik Metzen"
__copyright__ = "Copyright 2011, University Bremen, AG Robotics"
__credits__ = ['Mark Edgington']
__license__ = "GPLv3"
__version__ = "1.0"
__maintainer__ = "Jan Hendrik Metzen"
__email__ = "jhm@informatik.uni-bremen.de"


import mmlf.framework.protocol

from mmlf.agents.agent_base import AgentBase

# Each agent has to inherit directly or indirectly from AgentBase
class ExampleAgent(AgentBase):
    """ MMLF agent that chooses actions in a round-robin manner. """

    DEFAULT_CONFIG_DICT = {}
    
    def __init__(self, *args, **kwargs):

        # Create the agent info
        self.agentInfo = \
            mmlf.framework.protocol.AgentInfo(# Which communication protocol 
                                                 # version can the agent handle?
                                                 versionNumber = "0.3",
                                                 # Name of the agent (can be 
                                                 # chosen arbitrarily) 
                                                 agentName= "Round Robin", 
                                                 # Can the agent be used in 
                                                 # environment with contiuous
                                                 # state spaces?
                                                 continuousState = True,
                                                 # Can the agent be used in 
                                                 # environment with continuous
                                                 # action spaces?
                                                 continuousAction = True,
                                                 # Can the agent be used in 
                                                 # environment with discrete
                                                 # action spaces?
                                                 discreteAction = True,
                                                 # Can the agent be used in
                                                 # non-episodic environments
                                                 nonEpisodicCapable = True)
    
        # Calls constructor of base class
        # After this call, the agent has an attribute "self.configDict",
        # that contains the information from config['configDict'].
        # The values of this dict are evaluated, i.e. instead of '100' (string),
        # the key 'Reward log frequency' will have the same value 100 (int).
        super(ExampleAgent, self).__init__(*args, **kwargs)
        
        # The superclass AgentBase implements the methods setStateSpace() and
        # setActionSpace() which set the attributes stateSpace and actionSpace
        # They can be overwritten if the agent has to modify these spaces
        # for some reason
        self.stateSpace = None
        self.actionSpace = None
        
        # The agent keeps track of the sum of all rewards it obtained
        self.rewardValue = 0
    
    ######################  BEGIN COMMAND-HANDLING METHODS ###############################
            
    def setActionSpace(self, actionSpace):
        """ Informs the agent about the action space of the environment
        
        More information about action spaces can be found in 
        :ref:`state_and_action_spaces`
        """
        super(ExampleAgent, self).setActionSpace(actionSpace)
        
        # We can only deal with one-dimensional action spaces
        assert self.actionSpace.getNumberOfDimensions() == 1
        
        # Get a list of all actions this agent might take
        self.actions = self.actionSpace.getActionList()
        # Get name of action dimension
        self.actionDimensionName = self.actionSpace.getDimensionNames()[0]
        # Create an iterator that iterates in a round-robin manner over available actions
        self.nextActionIterator = __import__("itertools").cycle(self.actions)        
        
    def getAction(self):
        """ Request the next action the agent want to execute """
        # Get next action  from iterator
        # We are only interested in the value of the first (and only) dimension,
        # thus the "0"
        nextAction = self.nextActionIterator.next()[0] 
        # Create a dictionary that maps dimension name to chosen action
        actionDictionary = {self.actionDimensionName : nextAction}
        
        # Call super class method since this updates some internal information
        # (self.lastState, self.lastAction, self.reward, self.state, self.action)
        super(ExampleAgent, self).getAction()
        
        # Generate mmlf.framework.protocol.ActionTaken object
        return self._generateActionObject(actionDictionary)
    
# Each module that implements an agent must have a module-level attribute 
# "AgentClass" that is set to the class that implements the AgentBase superclass
AgentClass = ExampleAgent
# Furthermore, the name of the agent has to be assigned to "AgentName". This 
# name is used in the GUI. 
AgentName = "RoundRobin"