Quick start (command line interface)ΒΆ

This tutorial explains how you can let an agent learn in a certain environment using the command line interface. It assumes that you already installed the MMLF successfully.

Lets assume you just want to test the TD Lambda agent in the Mountain Car environment. Starting this is essentially a one-liner at the command line:

run_mmlf --config mountain_car/world_td_lambda_exploration.yaml

or for unix users with a local MMLF installation:

./run_mmlf --config mountain_car/world_td_lambda_exploration.yaml

This will start the MMLF and execute the world defined in the world_td_lambda_exploration.yaml file.


If this is your very first run of the MMLF, the MMLF will create the so-called “rw-directory” for your user. This rw-directory is essentially the place where the MMLF stores the configurations of all worlds, the log files, etc. By default this directory is $HOME/.mmlf (under MS Windows, $USERPROFILE\. is used for the $HOME directory). If you want to change the configuration of a world, the rw-directory is the place to do it (not /etc/mmlf). If you want to use a different directory as the rw-directory, you can specify this with the option --rwpath. For instance, run_mmlf --config mountain_car/world_td_lambda_exploration.yaml --rwpath /home/someuser/Temp/mmlfrw would use the directory /home/someuser/Temp/mmlfrw as the rw-directory. Note that the MMLF does not remember this path – this directory must be specified every time the MMLF is invoked.

Once the MMLF rw-directory is created, the world will be started. Some information is printed out, such as information received by the agent about state and action space from the environment. Then, the agent starts to perform in the environment and the environment prints out some information about how the agent performs:

'2011-02-15 11:12:39,700 FrameworkLog         INFO     Using MMLF RW area /home/jmetzen/.mmlf
'2011-02-15 11:12:39,700 FrameworkLog         INFO     Loading world from config file mountain_car/world_td_lambda_exploration.yaml.
'2011-02-15 11:12:40,642 AgentLog             INFO     TDLambdaAgent got new state-space:
                position        : ContinuousDimension(LimitType: soft, Limits: (-1.200,0.600))
                velocity        : ContinuousDimension(LimitType: soft, Limits: (-0.070,0.070))
'2011-02-15 11:12:40,646 AgentLog             INFO     TDLambdaAgent got new action-space:
                thrust          : DiscreteDimension(Values: ['left', 'right', 'none'])
'2011-02-15 11:12:59,130 EnvironmentLog       INFO     No goal reached but 500 steps expired!
'2011-02-15 11:13:13,181 EnvironmentLog       INFO     No goal reached but 500 steps expired!
'2011-02-15 11:13:22,174 EnvironmentLog       INFO     No goal reached but 500 steps expired!
'2011-02-15 11:13:28,678 EnvironmentLog       INFO     Goal reached after 202 steps!
'2011-02-15 11:13:28,797 EnvironmentLog       INFO     Goal reached after 6 steps!
'2011-02-15 11:13:31,143 EnvironmentLog       INFO     Goal reached after 137 steps!


This shows that the agent wasn’t able to reach the goal during the first episodes but over time it finds its way to the goal more frequently and faster. You can observe the performance of the agent for some time and see if its performance improves. You can stop the world by pressing Ctrl-C.

Once you have stopped the learning, you can take a look in the MMLF RW area (the one created during your first run of the MMLF). There are now two subdirectories: config and logs. The logs directory contains information about the run you just conducted. Among other things, the length of the episodes is stored in a separated log file that can be used for later analysis of the agents performance. To learn more about this, you can take a look at the Logging page. In this tutorial, we only focus on the config directory.

The config directory contains configuration files for all worlds contained in the MMLF. Lets start with the world configuration file we just used to start our first MMLF run, which is located in config/mountain_car. The world_td_lambda_exploration.yaml file contains the following:

worldPackage : mountain_car
    moduleName : "mcar_env"
        maxStepsPerEpisode : 500    
        accelerationFactor : 0.001
        maxGoalVelocity : 0.07
        positionNoise : 0.0
        velocityNoise : 0.0
    moduleName : "td_lambda_agent"
        update_rule: SARSA
        gamma : 1.0
        epsilon : 0.01
        lambda : 0.95
        minTraceValue : 0.5
        stateDimensionResolution : 9
        actionDimensionResolution : 7
        function_approximator : 
            name : 'CMAC'
            number_of_tilings : 10
            learning_rate : 0.5
            update_rule : 'exaggerator'
            default : 0.0
    policyLogFrequency : 10

This file specifies where the python-modules for the agent and the environment are located and what parameters to use for the agent and environment. Furthermore, it specifies which information a module called “Monitor” will store periodically in the log directory (see Monitor for more details on that). The config directory contains several world specification files, for instance world_dps.yaml in the mountain_car directory:

worldPackage : mountain_car
    moduleName : "mcar_env"
        maxStepsPerEpisode : 500    
        accelerationFactor : 0.001
        maxGoalVelocity : 0.07
        positionNoise : 0.0
        velocityNoise : 0.0
    moduleName : "dps_agent"
        policy_search : 
            method: 'fixed_parametrization'
                type: 'linear'
                numOfDuplications: 1
                bias: True
                name: 'evolution_strategy'
                sigma:  1.0
                populationSize : 5
                evalsPerIndividual: 10
                numChildren: 10
    policyLogFrequency : 10
        active : True
        logFrequency : 250
        stateDims : None
        rasterPoints : 50

As you can see, the environments in the two configuration files are identical but the agents are different. This world can be started with a similar command as the first one, namely using

run_mmlf  --config mountain_car/world_dps.yaml

This will start a different learning agent (one using a Direct Policy Search algorithm for learning) and let it learn the mountain car task.

If you are more interested in further experiments with the td_lambda_agent, you simply modify and use the world_td_lambda_exploration.yaml file, editing the agent part of it. The interesting part of this configuration file is the agent’s configDict dictionary, which contains the parameter values that are used by the agent. For instance, we see that the agent in world_td_lambda_exploration.yaml uses a discount factor gamma of 1.0 and follows an epsilon-greedy policy with epsilon=0.01 (for those are unfamiliar with the concepts behind how this agent works, check out the excellent (and free) book by Sutton and Barto). You can now modify the parameters and see how the learning performance is influenced. For instance, set epsilon to 0.0 to get an agent that always acts greedily and store the file as world_td_lambda_no_exploration.yaml. To start the world, simply run

run_mmlf  --config mountain_car/world_td_lambda_exploration.yaml

If you want to run a specific world only for a certain number of episodes (say 100), you can give an additional parameter at the command line:

run_mmlf  --config mountain_car/world_td_lambda_exploration.yaml --episodes 100

You can now use the basic features of the MMLF. Starting other worlds is done similarly, for instance

run_mmlf  --config single_pole_balancing/world_dps.yaml

will start the single-pole-balancing scenario with the DPS agent enabled.

See also

Tutorial Quick start (graphical user interface)
Learn how to use the MMLF’s graphical user interface
Learn more about Existing agents and Existing environments
Get an overview over the agents and environments that are shipped with the MMLF
Tutorial Writing an agent
Learn how to write your own MMLF agent
Tutorial Writing an environment
Learn how to write your own MMLF environment
Learn more about Experiments
Learn how to do a serious benchmarking and statistical comparison of the performance of different agents/environments