There is variety of applications containing some kind (or kinds) of units
having following properties:
Provided with some set of data by application, the unit chooses action from list of currently applicable actions. Action could be choosen with some predefined policy or using some kind of learning process. Application designer (toolkit "user") may be interested to use learning algorithm to choose next action or to evaluate performance of some predefined policy.
We would try to provide application designer with uniform and easy way to implement learning process.
Set of data provided by application is ussually contains too much information to make up reasonable learning process.So application designer (toolkit "user") has to create model which is reducing information used in the learning process but still describing process of execution reasonably well (precise).
We would provide application designer with toolkit for describing models of learning process and with library of learning algorithms to pick up the most suitable one.
The library is template library to permitt user to define different learning process models.
We'll call a unit which is provided with data set and chooses action to perform : "player". We'll refer to data type containing set of data provided to the "player" as "Observation" type. Enumeration type denoting all possible actions should be provided. We'll refer it as "Action " type.
Player class is templated with Observation & Action parameters.Generic definition of player class is placed in rl_headers/player.h file. In this file simple interface to be implemented by "player" is defined:
// new run is started : get initial observation & choose action
virtual Action start_run (Observation* startObserv) = 0;
// inside run : get current observation ,choose action ,
// statistics by using knoweledge of the last achieved reward
virtual Action do_step (Observation* currObserv, double lastReward) = 0;
// run is finished : get final observation, update statistics
virtual void end_run (Observation* currObserv, double lastReward) = 0;
// print accumulated statistics.
virtual void print_stats () = 0;
In the same file rl_headers/player.h generic definition of factory creating players is provided. All parameters needed for specific player creation are passed through string parameter.
virtual Player<Observation,Action>* CreatePlayer (const char* params) = 0;
Generally, there is no limitation on how player is defined, given it implements specified in player.h interface. But the toolkit provides the user with the template library which implements players that use Reinforcement Learning algorithms (look rl_headers/lrn_alg.h ) .
All Reinforcement Learning algorithms are based on the concept of "State" and "Reward" . Action made will transfer the player to some state (or leave it in the same state) and provide player with some reward. The algorithm computes real number valued function associated with State (look rl_headers/evaluation_alg.h ) or with [State,Action] pair (look rl_headers/steplrn_alg.h ) used to evaluate total reward for the user starting from any specific state or starting from any specific state by performing any specific action.
The toolkit also provides the user with way to create model which is suitable for reinforcement learning process by creating model definition file and providing it as input for code generation process.
We adistinguish two types of Reinforcement Learning (RL) Players.
As stated, players have observation & action as template parameters. Different implementations for player with same observation & action types would have different internal player states, different way of transformation observation to internal state, different subsets of permitted actions for state and so on.
There is way to generate specific player implementations by creating configuration file containing learning model definition.With the toolkit the Black Jack sample application is provided. The model definition file for the apllication is provided (look bjack_game/player3.rep ).
In the model definition file we place following definitions:
So, to inplement leaning process using the toolkit (look bjack_game/bjack2.cc sample):
Get rl_player.tar "tar" archive from the RL course site.
When opened it contains the current file and 4 directories:
Currently the toolkit is compiled and works properly on Linux & Sun OS platforms. Migration to MS Windows is not finished yet because of differences in C++ Standard Library & Perl implementations.