Function Approximators

MMLF function approximator interface

This module defines the interface for function approximators that can be used with temporal difference learning methods.

The following methods must be implemented by each function approximator:
  • computeQ(state, action): Compute the Q value of the given state-action pair

  • train(): Train the function approximator on the trainSet consisting of

    state-action pairs and the desired Q-value for these pairs.

class resources.function_approximators.function_approximator.FunctionApproximator(stateSpace, *args, **kwargs)

The function approximator interface.

Each function approximator must specify two methods: * computeQ * train

computeOptimalAction(state)

Compute the action with maximal Q-value for the given state

computeQ(state, action)

Computes the Q-value of the given state, action pair

It is assumed that a state is a n-dimensional vector where n is the dimensionality of the state space. Furthmore, the states must have been scaled externally so that the value of each dimension falls into the bin [0,1]. action must be one of the actions given to the constructor

computeV(state)

Computes the V-value of the given state

static create(faSpec, stateSpace, actions)

Factory method that creates function approximator based on spec-dictionary.

static getFunctionApproximatorDict()

Returns dict that contains a mapping from FA name to FA class.

train(trainingSet)

Trains the function approximator using the given training set.

trainingSet is a dictionary containing training data where the key is the respective (state, action) pair whose Q-value should be updated and the dict-value is this desired Q-value.

Cerebellar Model Articulation Controller

The Cerebellar Model Articulation Controller (CMAC) function approximator.

class resources.function_approximators.cmac.CMAC(stateSpace, actions, number_of_tilings, learning_rate, default, **kwargs)

The Cerebellar Model Articulation Controller (CMAC) function approximator.

CONFIG DICT
number_of_tilings:
 : The number of independent tilings that are used in each tile coding
default:: The default value that an entry stored in the function approximator has initially
learning_rate:: The learning rate used internally in the updates

K-Nearest Neighbors

Function approximator based on k-Nearest-Neighbor interpolation.

class resources.function_approximators.knn.KNNFunctionApproximator(stateSpace, actions, k=10, b_X=0.10000000000000001, **kwargs)

Function approximator based on k-Nearest-Neighbor interpolation

A function approximator that stores the a given set of (state, action) -> Q-Value samples. The sample set is split into subsets, one for each action (thus, a discrete, finite set of actions is assumed). When the Q-Value of a state-action is queried, the k states most similar to the query state are extracted (under the constraint that the query action is the action applied in these states). The Q-Value of the query state-action is computed as weighted linear combination of the k extracted samples, where the weighting is based on the distance between the respective state and the query state. The weight of a sample is computed as exp(-(distance/b_x)**2), where b_x is an parameter that influences the generalization breadth. Smaller values of b_X correspond to increased weight of more similar states.

CONFIG DICT
k:: The number of neighbors considered in k-Nearest Neighbors
b_X:: The width of the gaussian weighting function. Smaller values of b_X correspond to increased weight of more similar states.

Linear Combination

The linear combination function approximator

This module defines the linear combination function approximator. It computes the Q-value as the dot product of the feature vector and a weight vector. It’s main application area are discrete worlds; however given appropriate features it can also be used in continuous world.

class resources.function_approximators.linear_combination.LinearCombination(stateSpace, actions, learning_rate=1.0, **kwargs)

The linear combination function approximator.

This class implements the function approximator interface. It computes the Q-value as the dot product of the feature vector and a weight vector. It’s main application area are discrete worlds; however given appropriate features it can also be used in continuous world. At the moment, it ignores the planned action since it is assummed that it is used in combination with minmax tree search.

CONFIG DICT
learning_rate:: The learning rate used internally in the updates.

Multi-layer Perceptron (MLP)

This module defines a multi layer perceptron (MLP) function approximator.

class resources.function_approximators.mlp.MLP(stateSpace, actions, **kwargs)

Multi-Layer Perceptron function approximator.

Multilinear Grid

The multilinear grid function approximator.

In this function approximator, the state space is spanned by a regular grid. For each action a separate grid is spanned. The value of a certain state is determined by computing the grid cell it lies in and multilinearly interpolating from the cell corners to the particular state.

class resources.function_approximators.multilinear_grid.MultilinearGrid(stateSpace, actions, learning_rate, default, **kwargs)

The multilinear grid function approximator.

In this function approximator, the state space is spanned by a regular grid. For each action a separate grid is spanned. The value of a certain state is determined by computing the grid cell it lies in and multilinearly interpolating from the cell corners to the particular state.

CONFIG DICT
default:: The default value that an entry stored in the function approximator has initially.
learning_rate:: The learning rate used internally in the updates.

QCON

This module defines the QCON function approximator

class resources.function_approximators.qcon.QCON(stateSpace, actions, hidden, learning_rate, **kwargs)

Function approximator based on the connectionist QCON architecture

This class implements the QCON architecture which consists of a connectionist Q-learning model where each action has a separate network. The feed-forward neural network are implemented using the python package ffnet

CONFIG DICT
hidden:: The number of neurons in the hidden layer of the multi-layer perceptron
learning_rate:: The learning rate used internally in the updates.

Radial Base Function

This module defines the Radial Base Function function approximator.

class resources.function_approximators.rbf.RBF_FA(stateSpace, actions, learning_rate, **kwargs)

The Radial Base Function function approximator

This class implements the function approximator interface by using the RBF function approximator. A RBF function approximator is composed of several radial base functions, one for each action.

CONFIG DICT
learning_rate:: The learning rate used internally in the updates.

Tabular Storage

This module defines the tabular storage function approximator

The tabular storage function approximator can be used for discrete worlds. Actually it is not really a function approximator but stores the value function exactly.

class resources.function_approximators.tabular_storage.TabularStorage(actions=None, default=0, learning_rate=1.0, stateSpace=None, **kwargs)

Function approximator for small, discrete environments.

This class implements the function approximator interface. It does not really approximate but simply stores the values in a table. Thus, it should not be applied in environments with continuous states.

CONFIG DICT
default:: The default value that an entry stored in the function approximator has initially.
learning_rate:: The learning rate used internally in the updates.