This tuturial will explain how you can write your own learning agent for the MMLF.
Note
Writing an agent is easier with a local installation of the MMLF (see Installation Tutorial).
See also
Get an overview over the existing agents in Existing agents
For the start, please take a look into the agents subdirectory of the MMLF and open the random_agent.py in the python editor of your choice. The RandomAgent is a quite simple and straightforward agent which demonstrates well the inner life of an agent (though an intelligent agent might choose his actions differently ;-)).
Let’s write a new agent. This agent should execute actions independent of the state in a round-robin like manner, i.e. when there are three available actions a1, a2, a3, the agent should choose actions in this sequence: a1,a2,a3,a1,a2,a3,... Obviously this is not a very clever approach, but for the tutorial it should suffice.
Go into the agents subdirectory of the MMLF and create a copy of the random_agent.py (let’s call this copy example_agent.py).
Open example_agent.py and rename the agent class from RandomAgent to ExampleAgent. Replace every occurrence of RandomAgent by ExampleAgent.
Set “DEFAULT_CONFIG_DICT = {}”, since the agent does not have any configuration options.
Set “continuousAction = False”, since the round-robin action selection is only possible for a finite (non-continuous) action set.
Add the following lines add the end of “setActionSpace”:
# We can only deal with one-dimensional action spaces
assert self.actionSpace.getNumberOfDimensions() == 1
# Get a list of all actions this agent might take
self.actions = self.actionSpace.getActionList()
# Get name of action dimension
self.actionDimensionName = self.actionSpace.getDimensionNames()[0]
# Create an iterator that iterates in a round-robin manner over available actions
self.nextActionIterator = __import__("itertools").cycle(self.actions)
Reimplement the method “getAction()” as follows:
def getAction(self, **kwargs):
""" Request the next action the agent want to execute """
# Get next action from iterator
# We are only interested in the value of the first (and only) dimension,
# thus the "0"
nextAction = self.nextActionIterator.next()[0]
# Create a dictionary that maps dimension name to chosen action
actionDictionary = {self.actionDimensionName : nextAction}
# Generate mmlf.framework.protocol.ActionTaken object
return self._generateActionObject(actionDictionary)
Remove superfluous methods “setStateSpace()”, “setState()”, “giveReward”, and “nextEpisodeStarted()”. They can use the default implementation of the AgentBase class
Set thet AgentClass module attribute appropriately: “AgentClass = ExampleAgent”
Set the AgentName to something meaningful: “AgentName = “RoundRobin”“
Do not forget to update the comments and the documentation of your new module!
That’s it! Your agent module should now look like shown here. You can test it in the GUI by selecting “RoundRobin” as agent.
# Maja Machine Learning Framework
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.
# Author: Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
# Created: 2007/07/23
""" MMLF agent that acts randomly
This module defines a simple agent that can interact with an environment.
It chooses all available actions with the same probability.
This module deals also as an example of how to implement an MMLF agent.
"""
__author__ = "Jan Hendrik Metzen"
__copyright__ = "Copyright 2011, University Bremen, AG Robotics"
__credits__ = ['Mark Edgington']
__license__ = "GPLv3"
__version__ = "1.0"
__maintainer__ = "Jan Hendrik Metzen"
__email__ = "jhm@informatik.uni-bremen.de"
from collections import defaultdict
import mmlf.framework.protocol
from mmlf.agents.agent_base import AgentBase
# Each agent has to inherit directly or indirectly from AgentBase
class RandomAgent(AgentBase):
""" Agent that chooses uniformly randomly among the available actions. """
# Add default configuration for this agent to this static dict
# This specific parameter controls after how many steps we send information
# regarding the accumulated reward to the logger.
DEFAULT_CONFIG_DICT = {'Reward_log_frequency' : 100}
def __init__(self, *args, **kwargs):
# Create the agent info
self.agentInfo = \
mmlf.framework.protocol.AgentInfo(# Which communication protocol
# version can the agent handle?
versionNumber = "0.3",
# Name of the agent (can be
# chosen arbitrarily)
agentName= "Random",
# Can the agent be used in
# environments with continuous
# state spaces?
continuousState = True,
# Can the agent be used in
# environments with continuous
# action spaces?
continuousAction = True,
# Can the agent be used in
# environments with discrete
# action spaces?
discreteAction = True,
# Can the agent be used in
# non-episodic environments
nonEpisodicCapable = True)
# Calls constructor of base class
# After this call, the agent has an attribute "self.configDict",
# The values of this dict are evaluated, i.e. instead of '100' (string),
# the key 'Reward log frequency' will have the same value 100 (int).
super(RandomAgent, self).__init__(*args, **kwargs)
# The superclass AgentBase implements the methods setStateSpace() and
# setActionSpace() which set the attributes stateSpace and actionSpace
# They can be overwritten if the agent has to modify these spaces
# for some reason
self.stateSpace = None
self.actionSpace = None
# The agent keeps track of all rewards it obtained in an episode
# The rewardDict implements a mapping from the episode index to a list
# of all rewards it obtained in this episode
self.rewardDict = defaultdict(list)
###################### BEGIN COMMAND-HANDLING METHODS ###############################
def setStateSpace(self, stateSpace):
""" Informs the agent about the state space of the environment
More information about state spaces can be found in
:ref:`state_and_action_spaces`
"""
# We delegate to the superclass, which does the following:
# self.stateSpace = stateSpace
# We need not implement this method for this, but it is given in order
# to show what is going on...
super(RandomAgent, self).setStateSpace(stateSpace)
def setActionSpace(self, actionSpace):
""" Informs the agent about the action space of the environment
More information about action spaces can be found in
:ref:`state_and_action_spaces`
"""
# We delegate to the superclass, which does the following:
# self.actionSpace = actionSpace
# We need not implement this method for this, but it is given in order
# to show what is going on...
super(RandomAgent, self).setActionSpace(actionSpace)
def setState(self, state):
""" Informs the agent of the environment's current state
More information about (valid) states can be found in
:ref:`state_and_action_spaces`
"""
# We delegate to the superclass, which does the following:
# self.state = self.stateSpace.parseStateDict(state) # Parse state dict
# self.state.scale(0, 1) # Scale state such that each dimension falls into the bin (0,1)
# self.stepCounter += 1 # Count how many steps have passed
# We need not implement this method for this, but it is given in order
# to show what is going on...
super(RandomAgent, self).setState(state)
def getAction(self):
""" Request the next action the agent want to execute """
# Each action of the agent corresponds to one step
action = self._chooseRandomAction()
# Call super class method since this updates some internal information
# (self.lastState, self.lastAction, self.reward, self.state, self.action)
super(RandomAgent, self).getAction()
return action
def giveReward(self, reward):
""" Provides a reward to the agent """
self.rewardDict[self.episodeCounter].append(reward) # remember reward
# Send message about the accumulated reward every
# self.configDict['Reward log frequency'] episodes to logger
if self.stepCounter % self.configDict['Reward_log_frequency'] == 0:
self.agentLog.info("Reward accumulated after %s steps in episode %s: %s"
% (self.stepCounter, self.episodeCounter,
sum(self.rewardDict[self.episodeCounter])))
def nextEpisodeStarted(self):
""" Informs the agent that a new episode has started."""
# We delegate to the superclass, which does the following:
# self.episodeCounter += 1
# self.stepCounter = 0
super(RandomAgent, self).nextEpisodeStarted()
######################## END COMMAND-HANDLING METHODS ###############################
def _chooseRandomAction(self):
"Chooses an action randomly from the action space"
assert self.actionSpace, "Error: Action requested before actionSpace "\
"was specified"
# We sample a random action from the action space
# This returns a dictionary with a mapping from action dimension name
# to the sample value.
# For instance: {"gasPedalForce": "extreme", "steeringWheelAngle": 30}
actionDictionary = self.actionSpace.sampleRandomAction()
# The action dictionary has to be converted into an
# mmlf.framework.protocol.ActionTaken object.
# This is done using the _generateActionObject method
# of the superclass
return self._generateActionObject(actionDictionary)
# Each module that implements an agent must have a module-level attribute
# "AgentClass" that is set to the class that inherits from Agentbase
AgentClass = RandomAgent
# Furthermore, the name of the agent has to be assigned to "AgentName". This
# name is used in the GUI.
AgentName = "Random"
# Maja Machine Learning Framework
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.
# Author: Jan Hendrik Metzen (jhm@informatik.uni-bremen.de)
# Created: 2007/07/23
""" MMLF agent that chooses actions in a round-robin manner.
This agent's sole purpose is to give an example of how to write an agent.
It should not be used for any actual learning.
"""
__author__ = "Jan Hendrik Metzen"
__copyright__ = "Copyright 2011, University Bremen, AG Robotics"
__credits__ = ['Mark Edgington']
__license__ = "GPLv3"
__version__ = "1.0"
__maintainer__ = "Jan Hendrik Metzen"
__email__ = "jhm@informatik.uni-bremen.de"
import mmlf.framework.protocol
from mmlf.agents.agent_base import AgentBase
# Each agent has to inherit directly or indirectly from AgentBase
class ExampleAgent(AgentBase):
""" MMLF agent that chooses actions in a round-robin manner. """
DEFAULT_CONFIG_DICT = {}
def __init__(self, *args, **kwargs):
# Create the agent info
self.agentInfo = \
mmlf.framework.protocol.AgentInfo(# Which communication protocol
# version can the agent handle?
versionNumber = "0.3",
# Name of the agent (can be
# chosen arbitrarily)
agentName= "Round Robin",
# Can the agent be used in
# environment with contiuous
# state spaces?
continuousState = True,
# Can the agent be used in
# environment with continuous
# action spaces?
continuousAction = True,
# Can the agent be used in
# environment with discrete
# action spaces?
discreteAction = True,
# Can the agent be used in
# non-episodic environments
nonEpisodicCapable = True)
# Calls constructor of base class
# After this call, the agent has an attribute "self.configDict",
# that contains the information from config['configDict'].
# The values of this dict are evaluated, i.e. instead of '100' (string),
# the key 'Reward log frequency' will have the same value 100 (int).
super(ExampleAgent, self).__init__(*args, **kwargs)
# The superclass AgentBase implements the methods setStateSpace() and
# setActionSpace() which set the attributes stateSpace and actionSpace
# They can be overwritten if the agent has to modify these spaces
# for some reason
self.stateSpace = None
self.actionSpace = None
# The agent keeps track of the sum of all rewards it obtained
self.rewardValue = 0
###################### BEGIN COMMAND-HANDLING METHODS ###############################
def setActionSpace(self, actionSpace):
""" Informs the agent about the action space of the environment
More information about action spaces can be found in
:ref:`state_and_action_spaces`
"""
super(ExampleAgent, self).setActionSpace(actionSpace)
# We can only deal with one-dimensional action spaces
assert self.actionSpace.getNumberOfDimensions() == 1
# Get a list of all actions this agent might take
self.actions = self.actionSpace.getActionList()
# Get name of action dimension
self.actionDimensionName = self.actionSpace.getDimensionNames()[0]
# Create an iterator that iterates in a round-robin manner over available actions
self.nextActionIterator = __import__("itertools").cycle(self.actions)
def getAction(self):
""" Request the next action the agent want to execute """
# Get next action from iterator
# We are only interested in the value of the first (and only) dimension,
# thus the "0"
nextAction = self.nextActionIterator.next()[0]
# Create a dictionary that maps dimension name to chosen action
actionDictionary = {self.actionDimensionName : nextAction}
# Call super class method since this updates some internal information
# (self.lastState, self.lastAction, self.reward, self.state, self.action)
super(ExampleAgent, self).getAction()
# Generate mmlf.framework.protocol.ActionTaken object
return self._generateActionObject(actionDictionary)
# Each module that implements an agent must have a module-level attribute
# "AgentClass" that is set to the class that implements the AgentBase superclass
AgentClass = ExampleAgent
# Furthermore, the name of the agent has to be assigned to "AgentName". This
# name is used in the GUI.
AgentName = "RoundRobin"