Framework

Package that contains the core components of the MMLF.

World

Module containing a class that represents a RL world in the MMLF.

class framework.world.World(worldConfigObject, baseUserDir, useGUI)

Represents a world consisting of one agent and environment

The world is created based on a configuration dictionary (worldConfigObject). and provides methods for creating agent and environment based on this specification and for running the world a given number of steps or epsiodes.

agentPollMethod(commandObject)

Let the agent execute the given command object.

Valid command objects are found in the mmlf.framework.protocol module.

createMonitor(monitorConf=None)

Create monitor based on monitorConf.

environmentPollMethod(commandObject)

Let the environment execute the given command object.

Valid command objects are found in the mmlf.framework.protocol module.

executeEpisode(monitorConf=None)

Executes one episode in the current world.

executeSteps(n=1, monitorConf=None)

Executes n steps of the current world.

getAgent()

Returns the world’s agent

getEnvironment()

Returns the world’s environment

loadAgent(agentConfig, useGUI)

Create agent based on agent config dict.

loadEnvironment(envConfig, useGUI)

Create environment based on environment config dict.

run(numOfEpisodes=inf, monitorConf=None)

Start the execution of the current world.

Let the world run for numOfEpisodes episodes.

setWorldPackageName(worldPackageName)

Set the name of the python package from which the world should be taken.

stop()

Halt the execution of the current world.

Spaces

Modules for state and action spaces

State and action spaces define the range of possible states the agent might perceive and the actions that are available to the agent. These spaces are dict-like objects that map dimension names (the dict keys) to dimension objects that contain information about this dimension. The number of items in this dict-like structure is the dimensionality of the (state/action) space.

class framework.spaces.Dimension(dimensionName, dimensionType, dimensionValues, limitType=None)

A single dimension of a (state or action) space

A dimension is either continuous or discrete. A “discrete” dimension might take on only a finite, discrete number of values.

For instance, consider a dimension of a state space describing the color of a fruit. This dimension take on the values “red”, “blue”, or “green”. In contrast, consider a second “continuous” dimension, e.g. the weight of a fruit. This weight might be somewhere between 0g and 1000g. If we allow any arbitrary weight (not only full gramms), the dimension is truly continuous.

This properties of a dimension can be checked using the method:
  • isDiscrete : Returns whether the respective dimension is discrete

  • isContinuous : Returns whether the respective dimension is continuous

  • getValueRanges : Returns the allowed values a continuous! dimension

    might take on.

  • getValues : Returns the allowed values a discrete! dimension

    might take on.

getValueRanges()

Returns the ranges the value of this dimension can take on

getValues()

Returns a list of all possible values of this dimension

isContinuous()

Returns whether this is a continuous dimension

isDiscrete()

Returns whether this is a discrete dimension

class framework.spaces.Space

Base class for state and action spaces.

Class which represents the state space of an environment or the action space of an agent.

This is essentially a dictionary whose keys are the names of the dimensions, and whose values are Dimension objects.

addContinuousDimension(dimensionName, dimensionValues, limitType='soft')

Add the named continuous dimension to the space.

dimensionValues is a list of (rangeStart, rangeEnd) 2-tuples which define the valid ranges of this dimension. (i.e. [(0, 50), (75.5, 82)] )

If limitType is set to “hard”, then the agent is responsible to check that the limits are not exceeded. When it is set to “soft”, then the agent should not expect that all the values of this dimension will be strictly within the bounds of the specified ranges, but that the ranges serve as an approximate values of where the values will be (i.e. as [mean-std.dev., mean+std.dev] instead of [absolute min. value, absolute max. value])

addDiscreteDimension(dimensionName, dimensionValues)

Add the named continuous dimension to the space.

dimensionValues is a list of strings representing possible discrete states of this dimension. (i.e. [“red”, “green”, “blue”])

addOldStyleSpace(oldStyleSpace, limitType='soft')

Takes an old-style (using the old format) space dictionary, and adds its dimensions to this object.

getDimensionNames()

Return the names of the space dimensions

getDimensions()

Return the names of the space dimensions

getNumberOfDimensions()

Returns how many dimensions this space has

hasContinuousDimensions()

Return whether this space has continuous dimensions

hasDiscreteDimensions()

Return whether this space has discrete dimensions

class framework.spaces.StateSpace

Specialization of Space for state spaces.

For instance, a state space could be defined as follows:

{ "color": Dimension(dimensionType = "discrete",
                     dimensionValues = ["red","green", "blue"]),
  "weight": Dimension(dimensionType = "continuous",
                      dimensionValues = [(0,1000)]) }

This state space has two dimensions (“color” and “weight”), a discrete and a continuous one. The discrete dimension “color” can take on three values (“red”,”green”, or “blue”) and the continuous dimension “weight” any value between 0 and 1000.

A valid state of the state space defined above would be:

s1 = {"color": "red", "weight": 300}

Invalid states (s2 since the color is invalid and s3 since its weight is too large):

s2 = {"color": "yellow", "weight": 300}
s3 = {"color": "red", "weight": 1300}

The class provides additional methods for checking if a certain state is valid according to this state space (isValidState) and to scale a state such that it lies within a certain interval (scaleState).

getStateList()

Returns a list of all possible states.

Even if this state space has more than one dimension, it returns a one dimensional list that contains all possible states.

This is achieved by creating the crossproduct of the values of all dimensions. It requires that all dimensions are discrete.

class framework.spaces.ActionSpace

Specialization of Space for action spaces.

For instance, an action space could be defined as follows:

{ "gasPedalForce": ("discrete", ["low", "medium", "floored"]),
  "steeringWheelAngle": ("continuous", [(-120,120)]) }

This action space has two dimensions (“gasPedalForce” and “steeringWheelAngle”), a discrete and a continuous one. The discrete dimension “gasPedalForce” can take on three values (“low”,”medium”, or “floored”) and the continuous dimension “steeringWheelAngle” any value between -120 and 120.

A valid action according to this action space would be:

a1 = {"gasPedalForce": "low", "steeringWheelAngle": -50}

Invalid actions (a2 since the gasPedalForce is invalid and s3 since its steeringWheelAngle is too small):

a2 = {"gasPedalForce": "extreme", "steeringWheelAngle": 30}
a3 = {"gasPedalForce": "medium", "steeringWheelAngle": -150}

The class provides additional methods for discretizing an action space (discretizedActionSpace) and to return a list of all available actions (getActionList).

chopContinuousAction(action)

Chop a continuous action into the range of allowed values.

discretizedActionSpace(discreteActionsPerDimension)

Return a discretized version of this action space

Returns a discretized version of this action space. Every continuous action space dimension is discretized into discreteActionsPerDimension actions.

getActionList()

Returns a list of all allowed action an agent might take.

Even if this action space has more than one dimension, it returns a one dimensional list that contains all allowed action combinations.

This is achieved by creating the crossproduct of the values of all dimensions. It requires that all dimensions are discrete.

sampleRandomAction()

Returns a randomly sampled, valid action from the action space.

State

Module that contains the main State class of the MMLF

The MMLF uses a state class that is derived from numpy.array. In contrast to numpy.array, MMLF states can be hashed and thus be used as keys for dictionaries and set. This is realized by calling “hashlib.sha1(self).hexdigest()” In order to improve performance, State classes cache their hash value. Because of this, it is necessary to consider a state object to be constant, i.e. not changeable (except for calling “scale”).

Each state object stores its state dimension definition. Furthermore, it implements a method “scale(self, minValue = 0, maxValue = 1):” which scales each dimension into the range (minValue, maxValue).

class framework.state.State

State class for the MMLF.

Based on numpy arrays, but can be used as dictionary key

hasDimension(dimension)

Returns whether this state has the dimension dimension

isContinuous()

Return whether this state is in a continuous state space

scale(minValue=0, maxValue=1)

Scale state such that it falls into the range (minValue, maxValue)

Scale the state so that (for each dimension) the range specified in this state space is scaled to the interval (minValue, maxValue)

Monitor

Monitoring of the MMLF based on logging performance metrics and graphics.

class framework.monitor.Monitor(world, configDict)

Monitor of the MMLF.

The monitor supervises the execution of a world within the MMLF and stores certain selected information periodically. It always stores the values a FloatStreamObservable takes on into a file with the suffix “fso”. For other observables (FunctionOverStateSpaceObservable, StateActionValuesObservable, ModelObservable) a plot is generated and stored into files if this is specified in the monitor’s config dict (using functionOverStateSpaceLogging, stateActionValuesLogging, modelLogging).

CONFIG DICT
policyLogFrequency:
 : Frequency of storing the agent’s policy in a serialized version to a file. The policy is stored in the file policy_x in the subdirectory “policy” of the agent’s log directory where x is the episodes’ numbers.
plotObservables:
 : The names of the observables that should be stored to a file. If “All”, all observables are stored. Defaults to “All” (also if plotObservables is not specified in a config file).
stateActionValuesLogging:
 : Configuration of periodically plotting StateActionValuesObservables. Examples for StateActionValuesObservable are state-action value functions or stochastic policies. The plots are stored in the file episode_x.pdf in a subdirectory of the agent’s log directory with the observable’s name where x is the episodes’ numbers.
functionOverStateSpaceLogging:
 : Configuration of periodically plotting FunctionOverStateSpaceObservables. Examples for FunctionOverStateSpaceObservable are state value functions or deterministic policies. The plots are stored in the file episode_x.pdf in a subdirectory of the agent’s log directory with the observable’s name where x is the episodes’ numbers.
modelLogging:: Configuration of periodically plotting ModelObservables. Examples for ModelObservables are models. The plots are stored in the file episode_x.pdf in a subdirectory of the agent’s log directory with the observable’s name where x is the episodes’ numbers.
active:: Whether the respective kind of logging is activated
logFrequency:: Frequency (in episodes) of creating a plot based on the respective observable and storing it to a file.
stateDims:: The state space dimensions that are varied in the plot. All other state space dimensions are kept constant. If None, the stateDims are automatically deduced. This is only possible under specific conditions (2d state space)
actions:: The actions for which separate plots are created for each StateActionValuesObservable.
rasterPoints:: The resolution (rasterPoint*rasterPoint) of the plot in continuous domain.
colouring:: The background colouring of a model plot. Can be either “Rewards” or “Exploration”. If “Rewards”, the reward predicted by the model is used for as background, while for “Exploration”, each state-action pair tried at least minExplorationValue times is coloured with one colour, the others with an other colour.
plotSamples:: If true, the state-action pairs that have been observed are plotted into the model-plot. Otherwise the model’s predictions.
minExplorationValue:
 : If the colouring is “Exploration”, each state-action pair tried at least minExplorationValue times is coloured with one colour, the others with an other colour.
notifyEndOfEpisode()

Notify monitor that an episode in the world has terminated.

Experiment

Classes and functions used for MMLF experiments.

class framework.experiment.RetryQueue(maxsize=0)

Queue which will retry if interrupted with EINTR.

class framework.experiment.WorldRunner(worldConfigObject, numberOfEpisodes, exceptionOccurredSignal=None)

Class which encapsulates running of a world in an MMLF experiment.

This class contains code for supporting the distributed, concurrent execution of worlds in an MMLF experiment. One instance of the class WorldRunner should be created for every run of a world. By calling this instance, the world is executed. This may happen in a separate thread or process since the instances of WorldRunner can be passed between threads and processes.

Parameters
worldConfigObject:
 : The world configuration (in a dictionary)
numberOfEpisodes:
 : The number of episodes that should be conducted in the world
exceptionOccurredSignal:
 : An optional PyQt signal to which exceptions can be send
framework.experiment.runExperiment(experimentConfig, observableQueue, updateQueue, exceptionOccurredSignal, useGUI=True)

Handles the execution of an experiment.

This function handles the concurrent or sequential execution of an experiment.

Parameters
experimentConfig:
 : The experiment configuration (in a dictionary)
observableQueue:
 : A multiprocessing.Queue. Used for informing the main process (e.g. the GUI) of observables created in the world runs.
updateQueue:: A multiprocessing.Queue. Used for informing the main process (e.g. the GUI) of changes in observables.
exceptionOccurredSignal:
 : An optional PyQt signal to which exceptions can be send

Filesystem

class framework.filesystem.BaseUserDirectory(basePath='/home/jmetzen/.mmlf')

Represents a user’s base-directory (typically $HOME/.mmlf)

In this directory, the user can define worlds, YAML configuration files, and the interaction server logs information while running a world.

addAbsolutePath(absDirPath, pathRef, force=False)

Add an absolute path to self.pathDict. This is so that other commands can use this path via the pathRef shortcut.

The force argument allows an existing pathRef to be overridden. Normally this is not recommended, unless the pathRef is known not to be used globally.

appendStringToPath(pathRef, stringToAppend=None)

Modify the absolute path stored under pathRef, appending a “_<timestamp>” string to the end of the path (where <timestamp> will be something like 20070929_12_59_59, which represents 12:59:59 on 2007-09-29). If _<timestamp> has already been appended to the end of the path, it is replaced with a current timestamp.

If stringToAppend is provided, then a timestamp is NOT added, and instead, only the specified string is appended.

createPath(pathList, refName=None, baseRef=None, force=False)

Attept to create the path specified in pathList. pathList is a list which contains a series of directory names. For example, it might be [‘abc’,’def’,’ghi’], which would cause this method to create all the necessary directories such that the following path exists:

$HOME/.mmlf/abc/def/ghi

refName is a quick reference name which can be used to refer to the path. If specified, the full path will be added to a dictionary, in which the key is the value of refName (a string, for example).

If the baseRef argument is provided, it will be looked up and used as the base directory to which the path defined in pathList will be appended. For example, if there already exists a ‘logdir’ refName in self.pathDict, then writing createPath([‘a’,’b’,’c’], baseRef=’logdir’) might create (depending on how ‘logdir’ is defined) the directories needed for the path “$HOME/.mmlf/logs/a/b/c/”.

The force argument allows an existing refName to be overridden. Normally this is not recommended, unless the refName is known not to be used globally.

fileExists(pathRef, fileName)

Check to see if the file (located in the path referred to by pathRef) exists.

getAbsolutePath(pathRef, fileName=None)

Return the absolute path to file fileName located in the directory referred to by pathRef. If no filename is specified, then only the path corresponding to pathRef is returned.

getFileAsText(pathRef, fileName)

Get the contents of the file fileName located in the directory referred to by pathRef, and return it as a string.

getFileObj(pathRef, fileName, fileOpenMode='rb')

Get a file handle object for the fileName specified, assuming that the file is located in the directory referred to by pathRef (in self.pathDict). It is the programmer’s responsibility to close this file object later on after it has been created..

setBasePath(absolutePath=None)

Set (and create if necessary) the base user-directory path. If a path is specified (as an absolute path string), then it is used. Otherwise, the default path is used.

If for some reason this fails, then a writeable temporary directory is chosen.

setTime(timeStr=None)

Sets the time string of baseUserDir (which is used to distinguish the logs of different runs).

If no timeString is given, it defaults to the current time

class framework.filesystem.LogFile(baseUserDir, logFileName, baseRef=None)

Represents a logfile, to which data can be written.

addText(text)

add the specified text to the logfile. Note, no newline is added to the logfile, so this must be included in the text string if it is desired.

Interaction Server

InteractionServer that handles the communication between environment and agent.

class framework.interaction_server.InteractionServer(world, monitor, initialize=True)

InteractionServer that handles the communication between environment and agent.

It controls the interaction between the environment and the agents, and allows direct control of various aspects of this interaction.

loopInitialize()

Inform environment about agent by giving its agentInfo.

loopIteration()

Perform one iteration of the interaction server loop.

Call the environmentPollMethod once and based on the result, decide if and how to call the agentPollMethod

run(numOfEpisodes)

Run an MMLF world for numOfEpisodes epsiodes.

stop()

Stop the execution of a world.

Protocol

class framework.protocol.Message(**kwargs)

A base class which represents a command or response sent between the interaction server and agents/environment.

class framework.protocol.GiveAgentInfo(agentInfo)

Gives the Environment all information about the agents which the agents report on themselves.

AgentInfoList is a list of AgentInfo() objects. Each agent must make available such an object instance named “agentInfo”. i.e. one can access an agents info by looking at agentObj.agentInfo

class framework.protocol.GetWish

Get the next wish/command of the Environment

The environment’s transact method should return a single message object in response to receiving this message object via transact().

class framework.protocol.ActionTaken(action=None)

Communicate to the environment (from the interaction server) the action taken by an agent.

This message must be communicated via Env.transact() right after the ‘wish’ is requested by the environment. In other words, there should be no other message communicated via transact() before this message is communicated.

class framework.protocol.SetStateSpace(stateSpace)

Set the state space of one or more agents

class framework.protocol.SetState(state)

Set the state of one or more agents

class framework.protocol.SetActionSpace(actionSpace)

Set the action space of one or more agents

class framework.protocol.GetAction(extendedTestingIsActive=False)

Request (demand) an action from one or more agents.

If this request is issued from the Environment, then it will receive a response via a subsequent GiveResponse command.

class framework.protocol.GiveReward(reward)

Give a reward to one or more agents

class framework.protocol.NextEpisodeStarted

Communicate to an agent that a new episode has begun

class framework.protocol.AgentInfo(**kwargs)

A class which represents the capabilities of an agent.

class framework.protocol.EnvironmentInfo(**kwargs)

A class which represents the configuration of an environment.