Interface for MMLF policies
A policy is a mapping from state to action. Different learning algorithms of the MMLF have different representation of policies, for example neural networks which represents the policy directly or polcies based on value functions.
This module encapsulates these details so that a stored policy can be loaded directly.
Interface for MMLF policies
Factory method that creates policy based on spec-dictionary.
Evaluates the deterministic policy for the given state
Returns the parameters of this policy
Returns dict that contains a mapping from policy name to policy class.
Sets the parameters of the policy to the given parameters
Linear Policies for discrete and continuous action spaces
This module contains classes that represent linear policies for discrete and continuous action spaces.
Linear policy for discrete action spaces
Class for linear policies on discrete action space using a 1-of-n encoding, i.e. pi(s) = argmax_{a_j} sum_{i=0}^n w_ij s_i
For each discrete action, numOfDuplications outputs in the 1-of-n encoding are created, i.e. n=numOfActions*numOfDuplications. This allows to represent more complex policies.
bias: | : Determines whether or not an additional bias (state dimension always equals to 1) is added |
---|---|
numOfDuplications: | |
: Determines how many outputs there are for each discrete action in the 1-of-n encoding. |
Linear policy for continuous action spaces
Class for linear policies on continuous action space , i.e. pi(s) = [sum_{i=0}^n w_i0 s_i, dots, sum_{i=0}^n w_ij s_i]
inputDims: The number of input (state) dimensions
It is currently only possible to use this policy with one-dimensional action space with contiguous value ranges
bias: | : Determines whether or not an additional bias (state dimension always equals to 1) is added |
---|
Policies for discrete and continuous action spaces based on an MLP
This module contains classes that represent policies for discrete and continuous action spaces that are based on an multi-layer perceptron representation.
Policy based on a MLP representation for disc. and cont. action spaces
The MLP are based on the ffnet module. It can be specified how many hiddenUnits should be contained in the MLP and whether the neurons should get an addition bias input. If independentOutputs is set to True, for each output, the hidden layer is cloned, i.e. different outputs do not share common neurons in the network. The actionSpace defines which actions the agent can choose.
If the action space has no continuous actions, the finite set of (n) possible action selections is determined. Action selection is based on a 1-of-n encoding, meaning that for each available action the MLP has one output. The action whose corresponding network output has maximal activation is chosen.
If the action space has continuous actions, it is assumed that the action space is one-dimensional. The MLP hase one output falling in the range [0,1]. This output is scaled to the allowed action range to yield the action. Currently, it is assumed that continuous action spaces are one-dimensional and contiguous.
bias: | : Determines whether or not an additional bias (state dimension always equals to 1) is added |
---|---|
hiddenUnits: | : Determines hte number of neurons in the hidden layer of the multi-layer perceptron |
independentOutputs: | |
: If True, for each output the hidden layer is cloned, i.e. different outputs do not share common neurons in the network |
Policies that are represented using value function
This module contains a class that wraps function approximators such that they can be used directly as policies (i.e. implement the policy interface)
Class for policies that are represented using value function
This class wraps a function approximator such that it can be used directly as policy (i.e. implement the policy interface)