Abstract base class for planners
This module contains the abstract base class “Planner” for planning algorithms, i.e. algorithms for computing the optimal state-action value function (and thus the optimal policy) for a given model (i.e. state transition and expected reward function). Subclasses of Planner must implement the “plan” method.
Abstract base class for planners
This module contains the abstract base class “Planner” for planning algorithms, i.e. algorithms for computing the optimal state-action value function (and thus the optimal policy) for a given model (i.e. state transition and expected reward function). Subclasses of Planner must implement the “plan” method.
New in version 0.9.9.
Factory method that creates planner based on spec-dictionary.
Returns dict that contains a mapping from planner name to planner class.
Sets the discrete states on which dynamic programming is performed.
Remove Q-Values of state-action pairs that are no longer required. NOTE: Setting states is only meaningful for discrete state sets,
where the TabularStorage function approximator is used.
A planner module which is used in the MBDPS agent.
A planner module which is used in the MBDPS agent.
The planner uses a policy search method to search in the space of policies. Policies are evaluated based on the return the achieve in trajectory sampled from a supplied model. A policy’s fitness is set to the average return it obtains in several episodes starting from potentially different start states.
Defaults to the module’s function estimatePolicyOutcome.
New in version 0.9.9.
gamma: | : The discounting factor. |
---|---|
planningEpisodes: | |
: The number of simulated episodes that can be conducted before the planning is stopped. | |
evalsPerIndividual: | |
: The number episodes each policy is evaluated. |
Planning based on prioritized sweeping.
Planning based on prioritized sweeping
A planner based on the prioritized sweeping algorithm that allows to compute the optimal state-action value function (and thus the optimal policy) for a given distribution model (i.e. state transition and expected reward function). It is assumed that the MDP is finite and that the available actions are defined explicitly.
stateSpace The state space of the agent (must be finite)
the Q-Function
gamma The discount factor of the MDP
actions The actions available to the agent
New in version 0.9.9.
minSweepDelta: | : The minimal TD error that is applied during prioritized sweeping. If no change larger than minSweepDelta remains, the sweep is stopped. |
---|---|
updatesPerStep: | : The maximal number of updates that can be performed in one sweep. |
Planning based on trajectory sampling.
This module contains a planner based on the trajectory sampling algorithm that allows to compute an improved state-action value function (and thus policy) for a given sample model. The MDP’s state space need not be discrete and finite but it is assumed that there is only a finite number of actions that are defined explicitly.
Planning based on trajectory sampling.
A planner based on the trajectory sampling algorithm that allows to compute an improved state-action value function (and thus policy) for a given sample model. The MDP’s state space need not be discrete and finite but it is assumed that there is only a finite number of actions that are defined explicitly.
stateSpace The state space of the agent
the Q-Function
gamma The discount factor of the MDP
actions The actions available to the agent
New in version 0.9.9.
maxTrajectoryLength: | |
---|---|
: The maximal length of a trajectory before a new trajectory is started. | |
updatesPerStep: | : The maximal number of updates that can be performed in one planning call. |
onPolicy: | : Whether the trajectory is sampled from the on-policy distribution. |
Planning based on value iteration
This module contains a planner based on the value iteration algorithm that allows to compute the optimal state-action value function (and thus the optimal policy) for a given distribution model (i.e. state transition and expected reward function). It is assumed that the MDP is finite and that the available actions are defined explicitly.
Planning based on value iteration
This module contains a planner based on the value iteration algorithm that allows to compute the optimal state-action value function (and thus the optimal policy) for a given distribution model (i.e. state transition and expected reward function). It is assumed that the MDP is finite and that the available actions are defined explicitly.
stateSpace The state space of the agent (must be finite)
the Q-Function
gamma The discount factor of the MDP
actions The actions available to the agent
New in version 0.9.9.
minimalBellmanError: | |
---|---|
: The minimal bellman error (sum of TD errors over all state-action pairs) that enforces another iteration. If the bellman error falls below this threshold, value iteration is stopped. | |
maxIterations: | : The maximum number of iterations before value iteration is stopped. |