This tutorial explains how you can let an agent learn in a certain environment using the command line interface. It assumes that you already installed the MMLF successfully.
Lets assume you just want to test the TD Lambda agent in the Mountain Car environment. Starting this is essentially a one-liner at the command line:
run_mmlf --config mountain_car/world_td_lambda_exploration.yaml
or for unix users with a local MMLF installation:
./run_mmlf --config mountain_car/world_td_lambda_exploration.yaml
This will start the MMLF and execute the world defined in the world_td_lambda_exploration.yaml file.
If this is your very first run of the MMLF, the MMLF will create the so-called “rw-directory” for your user. This rw-directory is essentially the place where the MMLF stores the configurations of all worlds, the log files, etc. By default this directory is $HOME/.mmlf (under MS Windows, $USERPROFILE\. is used for the $HOME directory). If you want to change the configuration of a world, the rw-directory is the place to do it (not /etc/mmlf). If you want to use a different directory as the rw-directory, you can specify this with the option --rwpath. For instance, run_mmlf --config mountain_car/world_td_lambda_exploration.yaml --rwpath /home/someuser/Temp/mmlfrw would use the directory /home/someuser/Temp/mmlfrw as the rw-directory. Note that the MMLF does not remember this path – this directory must be specified every time the MMLF is invoked.
Once the MMLF rw-directory is created, the world will be started. Some information is printed out, such as information received by the agent about state and action space from the environment. Then, the agent starts to perform in the environment and the environment prints out some information about how the agent performs:
'2011-02-15 11:12:39,700 FrameworkLog INFO Using MMLF RW area /home/jmetzen/.mmlf '2011-02-15 11:12:39,700 FrameworkLog INFO Loading world from config file mountain_car/world_td_lambda_exploration.yaml. '2011-02-15 11:12:40,642 AgentLog INFO TDLambdaAgent got new state-space: StateSpace: position : ContinuousDimension(LimitType: soft, Limits: (-1.200,0.600)) velocity : ContinuousDimension(LimitType: soft, Limits: (-0.070,0.070)) '2011-02-15 11:12:40,646 AgentLog INFO TDLambdaAgent got new action-space: ActionSpace: thrust : DiscreteDimension(Values: ['left', 'right', 'none']) '2011-02-15 11:12:59,130 EnvironmentLog INFO No goal reached but 500 steps expired! '2011-02-15 11:13:13,181 EnvironmentLog INFO No goal reached but 500 steps expired! '2011-02-15 11:13:22,174 EnvironmentLog INFO No goal reached but 500 steps expired! '2011-02-15 11:13:28,678 EnvironmentLog INFO Goal reached after 202 steps! '2011-02-15 11:13:28,797 EnvironmentLog INFO Goal reached after 6 steps! '2011-02-15 11:13:31,143 EnvironmentLog INFO Goal reached after 137 steps! ....
This shows that the agent wasn’t able to reach the goal during the first episodes but over time it finds its way to the goal more frequently and faster. You can observe the performance of the agent for some time and see if its performance improves. You can stop the world by pressing Ctrl-C.
Once you have stopped the learning, you can take a look in the MMLF RW area (the one created during your first run of the MMLF). There are now two subdirectories: config and logs. The logs directory contains information about the run you just conducted. Among other things, the length of the episodes is stored in a separated log file that can be used for later analysis of the agents performance. To learn more about this, you can take a look at the Logging page. In this tutorial, we only focus on the config directory.
The config directory contains configuration files for all worlds contained in the MMLF. Lets start with the world configuration file we just used to start our first MMLF run, which is located in config/mountain_car. The world_td_lambda_exploration.yaml file contains the following:
worldPackage : mountain_car environment: moduleName : "mcar_env" configDict: maxStepsPerEpisode : 500 accelerationFactor : 0.001 maxGoalVelocity : 0.07 positionNoise : 0.0 velocityNoise : 0.0 agent: moduleName : "td_lambda_agent" configDict: update_rule: SARSA gamma : 1.0 epsilon : 0.01 lambda : 0.95 minTraceValue : 0.5 stateDimensionResolution : 9 actionDimensionResolution : 7 function_approximator : name : 'CMAC' number_of_tilings : 10 learning_rate : 0.5 update_rule : 'exaggerator' default : 0.0 monitor: policyLogFrequency : 10
This file specifies where the python-modules for the agent and the environment are located and what parameters to use for the agent and environment. Furthermore, it specifies which information a module called “Monitor” will store periodically in the log directory (see Monitor for more details on that). The config directory contains several world specification files, for instance world_dps.yaml in the mountain_car directory:
worldPackage : mountain_car environment: moduleName : "mcar_env" configDict: maxStepsPerEpisode : 500 accelerationFactor : 0.001 maxGoalVelocity : 0.07 positionNoise : 0.0 velocityNoise : 0.0 agent: moduleName : "dps_agent" configDict: policy_search : method: 'fixed_parametrization' policy: type: 'linear' numOfDuplications: 1 bias: True optimizer: name: 'evolution_strategy' sigma: 1.0 populationSize : 5 evalsPerIndividual: 10 numChildren: 10 monitor: policyLogFrequency : 10 functionOverStateSpaceLogging: active : True logFrequency : 250 stateDims : None rasterPoints : 50
As you can see, the environments in the two configuration files are identical but the agents are different. This world can be started with a similar command as the first one, namely using
run_mmlf --config mountain_car/world_dps.yaml
This will start a different learning agent (one using a Direct Policy Search algorithm for learning) and let it learn the mountain car task.
If you are more interested in further experiments with the td_lambda_agent, you simply modify and use the world_td_lambda_exploration.yaml file, editing the agent part of it. The interesting part of this configuration file is the agent’s configDict dictionary, which contains the parameter values that are used by the agent. For instance, we see that the agent in world_td_lambda_exploration.yaml uses a discount factor gamma of 1.0 and follows an epsilon-greedy policy with epsilon=0.01 (for those are unfamiliar with the concepts behind how this agent works, check out the excellent (and free) book by Sutton and Barto). You can now modify the parameters and see how the learning performance is influenced. For instance, set epsilon to 0.0 to get an agent that always acts greedily and store the file as world_td_lambda_no_exploration.yaml. To start the world, simply run
run_mmlf --config mountain_car/world_td_lambda_exploration.yaml
If you want to run a specific world only for a certain number of episodes (say 100), you can give an additional parameter at the command line:
run_mmlf --config mountain_car/world_td_lambda_exploration.yaml --episodes 100
You can now use the basic features of the MMLF. Starting other worlds is done similarly, for instance
run_mmlf --config single_pole_balancing/world_dps.yaml
will start the single-pole-balancing scenario with the DPS agent enabled.