Policies

Interface for MMLF policies

A policy is a mapping from state to action. Different learning algorithms of the MMLF have different representation of policies, for example neural networks which represents the policy directly or polcies based on value functions.

This module encapsulates these details so that a stored policy can be loaded directly.

MMLF policies must implement the following methods:
  • evaluate
  • getParameters
  • setParameters
class resources.policies.policy.Policy(*args, **kwargs)

Interface for MMLF policies

MMLF policies must implement the following methods:
  • evaluate(state): Evaluates the deterministic policy for the given state
  • getParameters(): Returns the parameters of this policy
  • setParameters(parameters): Sets the parameters of the policy to the given parameters
static create(policySpec, numStateDims, actionSpace)

Factory method that creates policy based on spec-dictionary.

evaluate(state)

Evaluates the deterministic policy for the given state

getParameters()

Returns the parameters of this policy

static getPolicyDict()

Returns dict that contains a mapping from policy name to policy class.

setParameters(parameters)

Sets the parameters of the policy to the given parameters

Linear Policy

Linear Policies for discrete and continuous action spaces

This module contains classes that represent linear policies for discrete and continuous action spaces.

class resources.policies.linear_policy.LinearDiscreteActionPolicy(inputDims, actionSpace, bias=True, numOfDuplications=1, **kwargs)

Linear policy for discrete action spaces

Class for linear policies on discrete action space using a 1-of-n encoding, i.e. pi(s) = argmax_{a_j} sum_{i=0}^n w_ij s_i

For each discrete action, numOfDuplications outputs in the 1-of-n encoding are created, i.e. n=numOfActions*numOfDuplications. This allows to represent more complex policies.

Expected parameters:
  • inputDims: The number of input (state) dimensions
  • actionSpace: The action space which determines which actions are allowed.
CONFIG DICT
bias:: Determines whether or not an additional bias (state dimension always equals to 1) is added
numOfDuplications:
 : Determines how many outputs there are for each discrete action in the 1-of-n encoding.
class resources.policies.linear_policy.LinearContinuousActionPolicy(inputDims, actionSpace, bias=True, **kwargs)

Linear policy for continuous action spaces

Class for linear policies on continuous action space , i.e. pi(s) = [sum_{i=0}^n w_i0 s_i, dots, sum_{i=0}^n w_ij s_i]

Expected parameters:
  • inputDims: The number of input (state) dimensions

  • actionSpace: The action space which determines which actions are allowed.

    It is currently only possible to use this policy with one-dimensional action space with contiguous value ranges

CONFIG DICT
bias:: Determines whether or not an additional bias (state dimension always equals to 1) is added

Multi-layer Perceptron Policy

Policies for discrete and continuous action spaces based on an MLP

This module contains classes that represent policies for discrete and continuous action spaces that are based on an multi-layer perceptron representation.

class resources.policies.mlp_policy.MLPPolicy(inputDims, actionSpace, hiddenUnits=5, bias=True, independentOutputs=False, **kwargs)

Policy based on a MLP representation for disc. and cont. action spaces

The MLP are based on the ffnet module. It can be specified how many hiddenUnits should be contained in the MLP and whether the neurons should get an addition bias input. If independentOutputs is set to True, for each output, the hidden layer is cloned, i.e. different outputs do not share common neurons in the network. The actionSpace defines which actions the agent can choose.

If the action space has no continuous actions, the finite set of (n) possible action selections is determined. Action selection is based on a 1-of-n encoding, meaning that for each available action the MLP has one output. The action whose corresponding network output has maximal activation is chosen.

If the action space has continuous actions, it is assumed that the action space is one-dimensional. The MLP hase one output falling in the range [0,1]. This output is scaled to the allowed action range to yield the action. Currently, it is assumed that continuous action spaces are one-dimensional and contiguous.

CONFIG DICT
bias:: Determines whether or not an additional bias (state dimension always equals to 1) is added
hiddenUnits:: Determines hte number of neurons in the hidden layer of the multi-layer perceptron
independentOutputs:
 : If True, for each output the hidden layer is cloned, i.e. different outputs do not share common neurons in the network

Value Function Policy

Policies that are represented using value function

This module contains a class that wraps function approximators such that they can be used directly as policies (i.e. implement the policy interface)

class resources.policies.value_function_policy.ValueFunctionPolicy(valueFunction, actions)

Class for policies that are represented using value function

This class wraps a function approximator such that it can be used directly as policy (i.e. implement the policy interface)