**Optimal
Asset Allocation**

** **

**The
Problem**

**The
Model**

## n A stock market containing several different stocks each with different behavior pattern.

## n A portfolio consisting of the current assets value and the current investment details.

## n The stock market contains several stocks each with different behavior.

## n The investor can decide in each step either to stay out of the market or to invest in a single stock. It is not possible to invest in more then one stock simultaneously.

## n The investor is small and does not influence the market by his trading.

## n The investor has no risk aversion and always invests the total amount of money he has[1].

## n The investor may trade at each time step for an infinite time horizon.

## n Each transaction made by the investor costs a commission. The commission is a combination of a fixed base value and a variable cost, depending on the investment value.

## n

Minimum, Maximum and Initial value- defines the range of values that the stock can have.## n

Trend- defines the trend of the stock. The decision at each time step whether the stock will increase or decrease is based on the result of a coin toss (Bernoulli trial). The trend of the stock is actually defined by the probability of “success” (i.e. the probability of value increase) in this trail. The success probability can be defined for each possible value of the stock. Therefore the trend is actually represented as a vector [Min...Max] of success probabilities.## n

Stability- defines the stability of the stock. After deciding at each time step if the stock should increase or decrease, it should be decided what will be the amount of change. This amount defines the stability of the stock - as it is larger, the changes of the stock are much more drastic, and thus it is less stable.

The choice of the amount of change at each time step is based on aPoisson process. The number of “arrivals” at a single time step is translated to the amount of change in the stock value. This number is computed as a random deviate drawn from a Poisson distribution As the number of arrivals is bigger, the change will be grater.

The mean (l) of the Poisson distribution defines its “center” and therefore defines it in a precise manner. The mean value can be defined for each possible value of the stock. Therefore the stability is actually represented as a vector [Min...Max] of Poisson distribution mean values.

**Minimum**:
26, **Maximum**: 40,
**Initial**: 31

**Trend**: [1.000000
0.966667 0.933333 0.900000 0.866667 0.833333 0.800000 0.766667 0.733333

0.700000 0.666667 0.633333 0.600000 0.566667 0.000000]

**Stability**:
[0.333333 0.666667 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667
3.000000

3.333333 3.666667 4.000000 4.333333 4.666667 5.000000]

**Figure 1: A
realization of a stock behavior**

## n Sell current holding of STCK1, pay commission for selling.

## n Buy STCK2 using the total portfolio value, pay commission for buying.

## 1. The new value of each stock is determined.

## 2. The new value of the investor's portfolio is computed, based on his investments in the previous day. The new value is computed using the formula:

## where

_{}is the portfolio value at timet, and_{}is the value of the invested stock.

## 3. The investor decides on his action for the next day and performs it, paying commission if necessary:

_{}## e being the commission paid:

_{}.

**Portfolio
Management as a Markovian Decision Problem**

## n

The current value of each stock in the market- a vector of values.## n

The current invested stock- an index indicating the stock id, 0 indicating staying out of the market.## n

The current percentage of the portfolio invested- as mentioned earlier, in the current implementation only the total amount (100%) can be invested.## n

The current value of the portfolio- since the amount of money in the portfolio is actually represented as arealvalue, with infinite optional values, it is necessary to use discretization in order to reduce the possible number of value. Therefore this value is discretized intointeger“bins”. The size of every bin is a parameter that can be determined by the user (a size of 1, for example, causes a simple rounding of the real value).

## n

In which stock to invest- an index indicating the requested stock id, 0 indicating staying out of the market. As explained above, this decision is translated by the model into a series of buy and sell actions, depending on the current holding of the investor.## n

The percentage of the portfolio to invest- as mentioned earlier, in the current implementation only the total amount (100%) can be invested.

_{}

**The
Learning Process**

_{}

_{}

**The
Application**

## n

Model- implementation of the environment of the agent operation, as described above.## n

Optimizer- implementation of a MDP and of a SARSA-Optimizer, as described above.## n

User Interface- a graphic representation of the system state, giving the ability to determine parameters relevant to the model and to the optimization process.

## n The daily value of each stock in market.

## n The daily decision made by the agent.

## n The daily portfolio value.

**Figure 2: The main
control window of the application**

2.0
**//
Initialization value of the portfolio
**

2 **//
Number of stocks in the market**

STCK1
**// Name of 1**^{st}** stock**

26
40 31 **// Minimum,
Maximum and Initialization value of the 1**^{st}**
stock**

1.000000
0.966667 0.933333 0.900000 0.866667 0.833333 0.800000 0.766667 0.733333
0.700000 0.666667 0.633333 0.600000 0.566667 0.000000
**// Trend of the 1**^{st}** stock**

0.333333
0.666667 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000
3.333333 3.666667 4.000000 4.333333 4.666667 5.000000
**// Stability of the 1**^{st}** stock**

STCK2 **//
Name of the 2**^{nd}** stock**

26
40 31 **// etc.**

0.800000
0.800000 0.300000 0.300000 0.800000 0.800000 0.300000 0.200000 0.800000
0.700000 0.200000 0.200000 0.700000 0.700000 0.200000

0.222222
0.222222 0.444444 0.444444 0.444444 0.666666 0.666666 1.000000 0.666666
0.666666 0.444444 0.444444 0.444444 0.222222 0.222222

**Figure 3: The
configuration options of the application**

__The
logged running time is Fri May 05 22:04:34 2000__

** **

**RandomGenerator::**Generator
is up

**StockMarket::**Market
is up

**Portfolio::**Portfolio
is up, init value is 2.000000

**StockMarket::**Adding
a new stock (#1) STCK1 to market

**Stock::**Created
stock STCK1 (16,30,21)

**Stock::**Initialized
trend of stock STCK1: (1.000000 0.966667 0.933333 0.900000 0.866667
0.833333...)

**Stock::**Initialized
stability of stock STCK1: (0.333333 0.666667 1.000000 1.333333 1.666667
2.000000 ...)

**Model::**Model
is up with 1 stocks

**Policy::**Created
a new policy with init value of 0.000000

**Optimizer::**Optimizer
is up with Epsilon value of 0.100000 and Lambda value of 1.000000

**Optimizer::**Epsilon
value was changed to 0.100000

**Optimizer::**Lambda
value was changed to 1.000000

**Optimizer::**Alpha
value was changed to -1.000000

**Optimizer::**Action
select method was changed to 0

**Optimizer::A
single cycle of optimization is starting**

**Optimizer::**>Current
state (S) is <[21],0,1.000000,2.000000>

**Optimizer::**>Chose
to perform optimal action

**Optimizer::**>Chose
to perform first action (A) <1,1.000000>

**Model::**Performing
action <1,1.000000>

**Portfolio::**Payed
commission of 0.120000, current value is 1.880000

**Portfolio::**Invested
100% of total value 1.880000 in stock #1

**StockMarket::**__A
new day in the market__

**Stock::**Increased
STCK1 by 1, new value is 22

**Portfolio::**Investment
changed in 4%, current value is 1.969524

**Optimizer::**>Reward
for the action was -0.030476

**Optimizer::**>New
state (SS) is <[22],1,1.000000,2.000000>

**Optimizer::**>Chose
to perform optimal action

**Optimizer:**:>Chose
to perform second action (AA) <1,1.000000>

**Optimizer::**>Updated
(S,A) with -0.030476

**Optimizer::A
single cycle of optimization is starting**

**Optimizer::**>Current
state (S) is <[22],1,1.000000,2.000000>

...

**Experiments**

**Figure 4: The stock
value (top), the portfolio value and the decisions (bottom)**

## n At the beginning, the stock value oscillated at high values. The policy chose to stay out of the market during this period, because the risk was too high. This kind of behavior returned throughout all the time period (for example, days 97-105, 107-120 etc.). In these cases the agent chose not to risk his current holdings, rather to wait until the stock value will decrease.

## n In the beginning of the time period, when the portfolio value is relatively small, the agent tended to be much more conservative then later own. It chose to stay out of the market, when the risk was too high, for much longer periods then later. This course of action seems reasonable when taking into account the fact that in small values, the fixed commission paid for selling of buying is relatively significant, and thus the agent should invest only when he has a very good chance for winning. Loosing when the total portfolio value is small decreases dramatically the possibility for future investments. Therefore the agent tried to avoid such situations.

## n When a drop to low stock values became very probable, the agent anticipated the decrease and sold his holdings. This kind of behavior can be found at days 73, 140 etc.

## n Looking at day 67, 73 or 137 a similar behavior pattern emerges - when there is a small decrease in the stock value which is sufficient for the agent to get back into the market, he does so. Knowing that the stock has an increasing trend, in most cases this decision is justified.

**Figure 5: The
stocks values (top), the portfolio value and the decisions (bottom)**

**Conclusion**

**Bibliography**

**Download**

[1] Actually, in the implemented model and MDP there is a possibility to decide to invest only part of the portfolio value. Yet, while experimenting with the application is turned out that adding this level of freedom to the agent increased very drastically the convergence time of the learning process. Therefore it was decided that, in this stage, the investing possibility will be restricted to investing the total amount of portfolio value.

[2] Notice that the model configuration file and the optimal policy file of each experiment can be found at the download section.