Optimal Asset Allocation

The Problem

The Model

### As an example for the above description of stock configuration, lets look at a specific stock definition and its behavior. The stock parameters and transition probabilities are chosen to simulate a situation where the value follows an increasing trend, but with higher values, a drop to a very low values becomes more and more probable. Using the above terminology it can be said that the stability of the stock is decreasing as its value is increasing. The exact configuration of the stock is as follows:

Minimum:       26,       Maximum:       40,                  Initial:         31

Trend:            [1.000000 0.966667 0.933333 0.900000 0.866667 0.833333 0.800000 0.766667 0.733333

0.700000 0.666667 0.633333 0.600000 0.566667 0.000000]

Stability:         [0.333333 0.666667 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000

3.333333 3.666667 4.000000 4.333333 4.666667 5.000000]

### A realization of this stock behavior will look like this:

Figure 1: A realization of a stock behavior

### e being the commission paid:  .

Portfolio Management as a Markovian Decision Problem

### In each time step the new values of the stocks and the portfolio are computed, and a new action is chosen. These three elements determine the new state of the system.

The Learning Process

The Application

### As demonstrated in the following screen shot, this information is presented graphically throughout the optimization process. In addition, the main window also presents some textual information about the progress of the optimization process. This data is presented in the bottom of the window.

Figure 2: The main control window of the application

### The first step while working with the application is loading a model. A model is described using a configuration file. The structure of the model configuration file is straight forward. It is demonstrated as follows:

2.0                              // Initialization value of the portfolio

2                                 // Number of stocks in the market

STCK1                       // Name of 1st stock

26 40 31                     // Minimum, Maximum and Initialization value of the 1st stock

1.000000 0.966667 0.933333 0.900000 0.866667 0.833333 0.800000 0.766667 0.733333 0.700000 0.666667 0.633333 0.600000 0.566667 0.000000                     // Trend of the 1st stock

0.333333 0.666667 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000 3.333333 3.666667 4.000000 4.333333 4.666667 5.000000                     // Stability of the 1st stock

STCK2                        // Name of the 2nd stock

26 40 31                      // etc.

0.800000 0.800000 0.300000 0.300000 0.800000 0.800000 0.300000 0.200000 0.800000 0.700000 0.200000 0.200000 0.700000 0.700000 0.200000

0.222222 0.222222 0.444444 0.444444 0.444444 0.666666 0.666666 1.000000 0.666666 0.666666 0.444444 0.444444 0.444444 0.222222 0.222222

### There are some general parameters of the simulation and optimization process that can be determined by the user. This can be done using the options window of the application:

Figure 3: The configuration options of the application

### In the log, every module and sub-module of the application reports of any major action it performes. The structure of the log file is straight forward. It is demonstrated as follows:

The logged running time is Fri May 05 22:04:34 2000

RandomGenerator::Generator is up

StockMarket::Market is up

Portfolio::Portfolio is up, init value is 2.000000

StockMarket::Adding a new stock (#1) STCK1 to market

Stock::Created stock STCK1 (16,30,21)

Stock::Initialized trend of stock STCK1: (1.000000 0.966667 0.933333 0.900000 0.866667 0.833333...)

Stock::Initialized stability of stock STCK1: (0.333333 0.666667 1.000000 1.333333 1.666667 2.000000 ...)

Model::Model is up with 1 stocks

Policy::Created a new policy with init value of 0.000000

Optimizer::Optimizer is up with Epsilon value of 0.100000 and Lambda value of 1.000000

Optimizer::Epsilon value was changed to 0.100000

Optimizer::Lambda value was changed to 1.000000

Optimizer::Alpha value was changed to -1.000000

Optimizer::Action select method was changed to 0

Optimizer::A single cycle of optimization is starting

Optimizer::>Current state (S) is <[21],0,1.000000,2.000000>

Optimizer::>Chose to perform optimal action

Optimizer::>Chose to perform first action (A) <1,1.000000>

Model::Performing action <1,1.000000>

Portfolio::Payed commission of 0.120000, current value is 1.880000

Portfolio::Invested 100% of total value 1.880000 in stock #1

StockMarket::A new day in the market

Stock::Increased STCK1 by 1, new value is 22

Portfolio::Investment changed in 4%, current value is 1.969524

Optimizer::>Reward for the action was -0.030476

Optimizer::>New state (SS) is <[22],1,1.000000,2.000000>

Optimizer::>Chose to perform optimal action

Optimizer::>Chose to perform second action (AA) <1,1.000000>

Optimizer::>Updated (S,A) with -0.030476

Optimizer::A single cycle of optimization is starting

Optimizer::>Current state (S) is <[22],1,1.000000,2.000000>

...

Experiments

### After ~10000 time steps (“days” in the market) the simulation was stopped and initialized. In order to achieve a more exploitative behavior the e was decreased to 0.01. The following graphs present the outcome of the agent actions for the next 300 days:

Figure 4: The stock value (top), the portfolio value and the decisions (bottom)

### The running parameters in this experiment were identical to the previous one. However, the obtained policy was checked only after ~100000 time steps. Since this model only added to the investment opportunities of the first model, it could be expected that (after training of the agent) the increase rate of the portfolio value of this model will be faster. After all, the agent could improve the previous policy using the second stock investment possibility. Indeed, the resulting increase rate of the capital was slightly faster in this experiment: After initializing the model, it took to the agent around 250 days to reach a portfolio value of 100 (in comparison to 300 days in the previous experiment).

Figure 5: The stocks values (top), the portfolio value and the decisions (bottom)

Conclusion

Bibliography

### A DOS Utility that can be used to display a policy (*.plc files) in a readable textual format

[1] Actually, in the implemented model and MDP there is a possibility to decide to invest only part of the portfolio value. Yet, while experimenting with the application is turned out that adding this level of freedom to the agent increased very drastically the convergence time of the learning process. Therefore it was decided that, in this stage, the investing possibility will be restricted to investing the total amount of portfolio value.

[2] Notice that the model configuration file and the optimal policy file of each experiment can be found at the download section.