MMLF function approximator interface
This module defines the interface for function approximators that can be used with temporal difference learning methods.
computeQ(state, action): Compute the Q value of the given state-action pair
state-action pairs and the desired Q-value for these pairs.
The function approximator interface.
Each function approximator must specify two methods: * computeQ * train
Compute the action with maximal Q-value for the given state
Computes the Q-value of the given state, action pair
It is assumed that a state is a n-dimensional vector where n is the dimensionality of the state space. Furthmore, the states must have been scaled externally so that the value of each dimension falls into the bin [0,1]. action must be one of the actions given to the constructor
Computes the V-value of the given state
Factory method that creates function approximator based on spec-dictionary.
Returns dict that contains a mapping from FA name to FA class.
Trains the function approximator using the given training set.
trainingSet is a dictionary containing training data where the key is the respective (state, action) pair whose Q-value should be updated and the dict-value is this desired Q-value.
The Cerebellar Model Articulation Controller (CMAC) function approximator.
The Cerebellar Model Articulation Controller (CMAC) function approximator.
number_of_tilings: | |
---|---|
: The number of independent tilings that are used in each tile coding | |
default: | : The default value that an entry stored in the function approximator has initially |
learning_rate: | : The learning rate used internally in the updates |
Function approximator based on k-Nearest-Neighbor interpolation.
Function approximator based on k-Nearest-Neighbor interpolation
A function approximator that stores the a given set of (state, action) -> Q-Value samples. The sample set is split into subsets, one for each action (thus, a discrete, finite set of actions is assumed). When the Q-Value of a state-action is queried, the k states most similar to the query state are extracted (under the constraint that the query action is the action applied in these states). The Q-Value of the query state-action is computed as weighted linear combination of the k extracted samples, where the weighting is based on the distance between the respective state and the query state. The weight of a sample is computed as exp(-(distance/b_x)**2), where b_x is an parameter that influences the generalization breadth. Smaller values of b_X correspond to increased weight of more similar states.
k: | : The number of neighbors considered in k-Nearest Neighbors |
---|---|
b_X: | : The width of the gaussian weighting function. Smaller values of b_X correspond to increased weight of more similar states. |
The linear combination function approximator
This module defines the linear combination function approximator. It computes the Q-value as the dot product of the feature vector and a weight vector. It’s main application area are discrete worlds; however given appropriate features it can also be used in continuous world.
The linear combination function approximator.
This class implements the function approximator interface. It computes the Q-value as the dot product of the feature vector and a weight vector. It’s main application area are discrete worlds; however given appropriate features it can also be used in continuous world. At the moment, it ignores the planned action since it is assummed that it is used in combination with minmax tree search.
learning_rate: | : The learning rate used internally in the updates. |
---|
This module defines a multi layer perceptron (MLP) function approximator.
Multi-Layer Perceptron function approximator.
The multilinear grid function approximator.
In this function approximator, the state space is spanned by a regular grid. For each action a separate grid is spanned. The value of a certain state is determined by computing the grid cell it lies in and multilinearly interpolating from the cell corners to the particular state.
The multilinear grid function approximator.
In this function approximator, the state space is spanned by a regular grid. For each action a separate grid is spanned. The value of a certain state is determined by computing the grid cell it lies in and multilinearly interpolating from the cell corners to the particular state.
default: | : The default value that an entry stored in the function approximator has initially. |
---|---|
learning_rate: | : The learning rate used internally in the updates. |
This module defines the QCON function approximator
Function approximator based on the connectionist QCON architecture
This class implements the QCON architecture which consists of a connectionist Q-learning model where each action has a separate network. The feed-forward neural network are implemented using the python package ffnet
hidden: | : The number of neurons in the hidden layer of the multi-layer perceptron |
---|---|
learning_rate: | : The learning rate used internally in the updates. |
This module defines the Radial Base Function function approximator.
The Radial Base Function function approximator
This class implements the function approximator interface by using the RBF function approximator. A RBF function approximator is composed of several radial base functions, one for each action.
learning_rate: | : The learning rate used internally in the updates. |
---|
This module defines the tabular storage function approximator
The tabular storage function approximator can be used for discrete worlds. Actually it is not really a function approximator but stores the value function exactly.
Function approximator for small, discrete environments.
This class implements the function approximator interface. It does not really approximate but simply stores the values in a table. Thus, it should not be applied in environments with continuous states.
default: | : The default value that an entry stored in the function approximator has initially. |
---|---|
learning_rate: | : The learning rate used internally in the updates. |