Package that contains the core components of the MMLF.
Module containing a class that represents a RL world in the MMLF.
Represents a world consisting of one agent and environment
The world is created based on a configuration dictionary (worldConfigObject). and provides methods for creating agent and environment based on this specification and for running the world a given number of steps or epsiodes.
Let the agent execute the given command object.
Valid command objects are found in the mmlf.framework.protocol module.
Create monitor based on monitorConf.
Let the environment execute the given command object.
Valid command objects are found in the mmlf.framework.protocol module.
Executes one episode in the current world.
Executes n steps of the current world.
Returns the world’s agent
Returns the world’s environment
Create agent based on agent config dict.
Create environment based on environment config dict.
Start the execution of the current world.
Let the world run for numOfEpisodes episodes.
Set the name of the python package from which the world should be taken.
Halt the execution of the current world.
Modules for state and action spaces
State and action spaces define the range of possible states the agent might perceive and the actions that are available to the agent. These spaces are dict-like objects that map dimension names (the dict keys) to dimension objects that contain information about this dimension. The number of items in this dict-like structure is the dimensionality of the (state/action) space.
A single dimension of a (state or action) space
A dimension is either continuous or discrete. A “discrete” dimension might take on only a finite, discrete number of values.
For instance, consider a dimension of a state space describing the color of a fruit. This dimension take on the values “red”, “blue”, or “green”. In contrast, consider a second “continuous” dimension, e.g. the weight of a fruit. This weight might be somewhere between 0g and 1000g. If we allow any arbitrary weight (not only full gramms), the dimension is truly continuous.
isDiscrete : Returns whether the respective dimension is discrete
isContinuous : Returns whether the respective dimension is continuous
might take on.
might take on.
Returns the ranges the value of this dimension can take on
Returns a list of all possible values of this dimension
Returns whether this is a continuous dimension
Returns whether this is a discrete dimension
Base class for state and action spaces.
Class which represents the state space of an environment or the action space of an agent.
This is essentially a dictionary whose keys are the names of the dimensions, and whose values are Dimension objects.
Add the named continuous dimension to the space.
dimensionValues is a list of (rangeStart, rangeEnd) 2-tuples which define the valid ranges of this dimension. (i.e. [(0, 50), (75.5, 82)] )
If limitType is set to “hard”, then the agent is responsible to check that the limits are not exceeded. When it is set to “soft”, then the agent should not expect that all the values of this dimension will be strictly within the bounds of the specified ranges, but that the ranges serve as an approximate values of where the values will be (i.e. as [mean-std.dev., mean+std.dev] instead of [absolute min. value, absolute max. value])
Add the named continuous dimension to the space.
dimensionValues is a list of strings representing possible discrete states of this dimension. (i.e. [“red”, “green”, “blue”])
Takes an old-style (using the old format) space dictionary, and adds its dimensions to this object.
Return the names of the space dimensions
Return the names of the space dimensions
Returns how many dimensions this space has
Return whether this space has continuous dimensions
Return whether this space has discrete dimensions
Specialization of Space for state spaces.
For instance, a state space could be defined as follows:
{ "color": Dimension(dimensionType = "discrete",
dimensionValues = ["red","green", "blue"]),
"weight": Dimension(dimensionType = "continuous",
dimensionValues = [(0,1000)]) }
This state space has two dimensions (“color” and “weight”), a discrete and a continuous one. The discrete dimension “color” can take on three values (“red”,”green”, or “blue”) and the continuous dimension “weight” any value between 0 and 1000.
A valid state of the state space defined above would be:
s1 = {"color": "red", "weight": 300}
Invalid states (s2 since the color is invalid and s3 since its weight is too large):
s2 = {"color": "yellow", "weight": 300}
s3 = {"color": "red", "weight": 1300}
The class provides additional methods for checking if a certain state is valid according to this state space (isValidState) and to scale a state such that it lies within a certain interval (scaleState).
Returns a list of all possible states.
Even if this state space has more than one dimension, it returns a one dimensional list that contains all possible states.
This is achieved by creating the crossproduct of the values of all dimensions. It requires that all dimensions are discrete.
Specialization of Space for action spaces.
For instance, an action space could be defined as follows:
{ "gasPedalForce": ("discrete", ["low", "medium", "floored"]),
"steeringWheelAngle": ("continuous", [(-120,120)]) }
This action space has two dimensions (“gasPedalForce” and “steeringWheelAngle”), a discrete and a continuous one. The discrete dimension “gasPedalForce” can take on three values (“low”,”medium”, or “floored”) and the continuous dimension “steeringWheelAngle” any value between -120 and 120.
A valid action according to this action space would be:
a1 = {"gasPedalForce": "low", "steeringWheelAngle": -50}
Invalid actions (a2 since the gasPedalForce is invalid and s3 since its steeringWheelAngle is too small):
a2 = {"gasPedalForce": "extreme", "steeringWheelAngle": 30}
a3 = {"gasPedalForce": "medium", "steeringWheelAngle": -150}
The class provides additional methods for discretizing an action space (discretizedActionSpace) and to return a list of all available actions (getActionList).
Chop a continuous action into the range of allowed values.
Return a discretized version of this action space
Returns a discretized version of this action space. Every continuous action space dimension is discretized into discreteActionsPerDimension actions.
Returns a list of all allowed action an agent might take.
Even if this action space has more than one dimension, it returns a one dimensional list that contains all allowed action combinations.
This is achieved by creating the crossproduct of the values of all dimensions. It requires that all dimensions are discrete.
Returns a randomly sampled, valid action from the action space.
Module that contains the main State class of the MMLF
The MMLF uses a state class that is derived from numpy.array. In contrast to numpy.array, MMLF states can be hashed and thus be used as keys for dictionaries and set. This is realized by calling “hashlib.sha1(self).hexdigest()” In order to improve performance, State classes cache their hash value. Because of this, it is necessary to consider a state object to be constant, i.e. not changeable (except for calling “scale”).
Each state object stores its state dimension definition. Furthermore, it implements a method “scale(self, minValue = 0, maxValue = 1):” which scales each dimension into the range (minValue, maxValue).
State class for the MMLF.
Based on numpy arrays, but can be used as dictionary key
Returns whether this state has the dimension dimension
Return whether this state is in a continuous state space
Scale state such that it falls into the range (minValue, maxValue)
Scale the state so that (for each dimension) the range specified in this state space is scaled to the interval (minValue, maxValue)
Monitoring of the MMLF based on logging performance metrics and graphics.
Monitor of the MMLF.
The monitor supervises the execution of a world within the MMLF and stores certain selected information periodically. It always stores the values a FloatStreamObservable takes on into a file with the suffix “fso”. For other observables (FunctionOverStateSpaceObservable, StateActionValuesObservable, ModelObservable) a plot is generated and stored into files if this is specified in the monitor’s config dict (using functionOverStateSpaceLogging, stateActionValuesLogging, modelLogging).
policyLogFrequency: | |
---|---|
: Frequency of storing the agent’s policy in a serialized version to a file. The policy is stored in the file policy_x in the subdirectory “policy” of the agent’s log directory where x is the episodes’ numbers. | |
plotObservables: | |
: The names of the observables that should be stored to a file. If “All”, all observables are stored. Defaults to “All” (also if plotObservables is not specified in a config file). | |
stateActionValuesLogging: | |
: Configuration of periodically plotting StateActionValuesObservables. Examples for StateActionValuesObservable are state-action value functions or stochastic policies. The plots are stored in the file episode_x.pdf in a subdirectory of the agent’s log directory with the observable’s name where x is the episodes’ numbers. | |
functionOverStateSpaceLogging: | |
: Configuration of periodically plotting FunctionOverStateSpaceObservables. Examples for FunctionOverStateSpaceObservable are state value functions or deterministic policies. The plots are stored in the file episode_x.pdf in a subdirectory of the agent’s log directory with the observable’s name where x is the episodes’ numbers. | |
modelLogging: | : Configuration of periodically plotting ModelObservables. Examples for ModelObservables are models. The plots are stored in the file episode_x.pdf in a subdirectory of the agent’s log directory with the observable’s name where x is the episodes’ numbers. |
active: | : Whether the respective kind of logging is activated |
logFrequency: | : Frequency (in episodes) of creating a plot based on the respective observable and storing it to a file. |
stateDims: | : The state space dimensions that are varied in the plot. All other state space dimensions are kept constant. If None, the stateDims are automatically deduced. This is only possible under specific conditions (2d state space) |
actions: | : The actions for which separate plots are created for each StateActionValuesObservable. |
rasterPoints: | : The resolution (rasterPoint*rasterPoint) of the plot in continuous domain. |
colouring: | : The background colouring of a model plot. Can be either “Rewards” or “Exploration”. If “Rewards”, the reward predicted by the model is used for as background, while for “Exploration”, each state-action pair tried at least minExplorationValue times is coloured with one colour, the others with an other colour. |
plotSamples: | : If true, the state-action pairs that have been observed are plotted into the model-plot. Otherwise the model’s predictions. |
minExplorationValue: | |
: If the colouring is “Exploration”, each state-action pair tried at least minExplorationValue times is coloured with one colour, the others with an other colour. |
Notify monitor that an episode in the world has terminated.
Classes and functions used for MMLF experiments.
Queue which will retry if interrupted with EINTR.
Class which encapsulates running of a world in an MMLF experiment.
This class contains code for supporting the distributed, concurrent execution of worlds in an MMLF experiment. One instance of the class WorldRunner should be created for every run of a world. By calling this instance, the world is executed. This may happen in a separate thread or process since the instances of WorldRunner can be passed between threads and processes.
worldConfigObject: | |
---|---|
: The world configuration (in a dictionary) | |
numberOfEpisodes: | |
: The number of episodes that should be conducted in the world | |
exceptionOccurredSignal: | |
: An optional PyQt signal to which exceptions can be send |
Handles the execution of an experiment.
This function handles the concurrent or sequential execution of an experiment.
experimentConfig: | |
---|---|
: The experiment configuration (in a dictionary) | |
observableQueue: | |
: A multiprocessing.Queue. Used for informing the main process (e.g. the GUI) of observables created in the world runs. | |
updateQueue: | : A multiprocessing.Queue. Used for informing the main process (e.g. the GUI) of changes in observables. |
exceptionOccurredSignal: | |
: An optional PyQt signal to which exceptions can be send |
Represents a user’s base-directory (typically $HOME/.mmlf)
In this directory, the user can define worlds, YAML configuration files, and the interaction server logs information while running a world.
Add an absolute path to self.pathDict. This is so that other commands can use this path via the pathRef shortcut.
The force argument allows an existing pathRef to be overridden. Normally this is not recommended, unless the pathRef is known not to be used globally.
Modify the absolute path stored under pathRef, appending a “_<timestamp>” string to the end of the path (where <timestamp> will be something like 20070929_12_59_59, which represents 12:59:59 on 2007-09-29). If _<timestamp> has already been appended to the end of the path, it is replaced with a current timestamp.
If stringToAppend is provided, then a timestamp is NOT added, and instead, only the specified string is appended.
Attept to create the path specified in pathList. pathList is a list which contains a series of directory names. For example, it might be [‘abc’,’def’,’ghi’], which would cause this method to create all the necessary directories such that the following path exists:
$HOME/.mmlf/abc/def/ghi
refName is a quick reference name which can be used to refer to the path. If specified, the full path will be added to a dictionary, in which the key is the value of refName (a string, for example).
If the baseRef argument is provided, it will be looked up and used as the base directory to which the path defined in pathList will be appended. For example, if there already exists a ‘logdir’ refName in self.pathDict, then writing createPath([‘a’,’b’,’c’], baseRef=’logdir’) might create (depending on how ‘logdir’ is defined) the directories needed for the path “$HOME/.mmlf/logs/a/b/c/”.
The force argument allows an existing refName to be overridden. Normally this is not recommended, unless the refName is known not to be used globally.
Check to see if the file (located in the path referred to by pathRef) exists.
Return the absolute path to file fileName located in the directory referred to by pathRef. If no filename is specified, then only the path corresponding to pathRef is returned.
Get the contents of the file fileName located in the directory referred to by pathRef, and return it as a string.
Get a file handle object for the fileName specified, assuming that the file is located in the directory referred to by pathRef (in self.pathDict). It is the programmer’s responsibility to close this file object later on after it has been created..
Set (and create if necessary) the base user-directory path. If a path is specified (as an absolute path string), then it is used. Otherwise, the default path is used.
If for some reason this fails, then a writeable temporary directory is chosen.
Sets the time string of baseUserDir (which is used to distinguish the logs of different runs).
If no timeString is given, it defaults to the current time
InteractionServer that handles the communication between environment and agent.
InteractionServer that handles the communication between environment and agent.
It controls the interaction between the environment and the agents, and allows direct control of various aspects of this interaction.
Inform environment about agent by giving its agentInfo.
Perform one iteration of the interaction server loop.
Call the environmentPollMethod once and based on the result, decide if and how to call the agentPollMethod
Run an MMLF world for numOfEpisodes epsiodes.
Stop the execution of a world.
A base class which represents a command or response sent between the interaction server and agents/environment.
Gives the Environment all information about the agents which the agents report on themselves.
AgentInfoList is a list of AgentInfo() objects. Each agent must make available such an object instance named “agentInfo”. i.e. one can access an agents info by looking at agentObj.agentInfo
Get the next wish/command of the Environment
The environment’s transact method should return a single message object in response to receiving this message object via transact().
Communicate to the environment (from the interaction server) the action taken by an agent.
This message must be communicated via Env.transact() right after the ‘wish’ is requested by the environment. In other words, there should be no other message communicated via transact() before this message is communicated.
Set the state space of one or more agents
Set the state of one or more agents
Set the action space of one or more agents
Request (demand) an action from one or more agents.
If this request is issued from the Environment, then it will receive a response via a subsequent GiveResponse command.
Give a reward to one or more agents
Communicate to an agent that a new episode has begun
A class which represents the capabilities of an agent.
A class which represents the configuration of an environment.