Routing using Q-Learning

Final project in reinforcement learning


Doron Jacoby

  1. Introduction

This project deals with the problem of routing in networks. The question which this project tries to solve, is this:

Given a network of routers/hosts, and a stream of packets to pass between them (each packet, from one router/host to another) what policy should a router take for passing the packets?

For simplicity, from now on I will use only the term router instead of routers & hosts, this is because there is no difference between them in my model, since a router can also create packets or be the destination of them.

The model I use is as follow:

  1. The time is discrete (refer to as clocks-ticks).
  2. In each clock-tick, the following function is called:
  3. For I=1 to create-packets-rate:

    - Create packet p. The packet has a source router (s) and a destination router (d).

    - Put p in s packets-to-pass queue.

    For each router r do:

    - Pop packet p from r packets-to-pass queue.

    - Choose neighbor n to pass p to, based on the routing policy.

    - Put p on the transmission queue from r to n (i.e. the queue of packets, which have to be passed from r to n).

    For each router transmission queue:

    - Pop packet p from the queue (r to n).

    - Push the packet to the n packets-to-pass queue.

  4. If at any time a packet has to be pushed into a queue, and the queue is full – the packet is dropped (i.e. get out from the game).


The parameters I checked were:

The network load: The sum of packets numbers in all the transmission queues divided by the sum of all the queues max size.

The packets drop percent: The number of drops in the last 100 packets that have to be pushed into queues.

The average time of packet delivery: The average time that takes to pass a packet, based on the last 100 packets.

I have built this software model in order to compare some different policies. I check how these parameters change when using these different policies with varying "create packets rate". I try to show visually how the network behaves under these different policies.

This software can be used as a game between 2 people: each one has to write his policy, by filling some blank virtual function of his policy class. The basic function his:

Given a packet p that needs to be routed from r to d, what neighbor of r should p be passed to, and what update should be done to each router information.

In this project I compare between 2 options: Shortest-path and Q-Routing.

Shortest-path is the greedy way – lets assume that each router knows for each other router, what is the shortest path to, or more specifically, what neighbor is sitting on the shortest way to the destination router. When the router r will get a packet P with destination d, it will pass the packet to the router who is sitting on the shortest path to d . Its easy to see that this policy will give the best results for low queue loading but will have problem in high loading rate.

The second policy is Q-Routing, which is described in the article. This policy uses Q-Learning for routing. Each router r has a list for each destination router d, what is the expected time for passing packet to d if we r pass the packet to each neighbor n (expected time for each neighbor). This policy is expected to work almost like the shortest-paths for low loading, and better for high loading (e.g. much less drops).

  1. Related issues and Background materials
  2. Justin A. Boyan and Michael L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 671--678. Morgan Kaufmann, San Francisco CA, 1993. (http://www.cs.duke.edu/~mlittman/docs/routing-nips.abs, http://www.cs.duke.edu/~mlittman/docs/routing-nips.ps)

  3. Experiments

The application runs on Windows, and has to be used like this:

  1. Click Run.
  2. Try to change the request rate, and see what happens. Every connection gets a color according to its load. When the load grows, the connection become redder. You can also see how the parameters change (Network load percent, Average packet deliver time & Packets dump percent).
  3. You can also stop the running and see static picture of the net loading. And than run again, by clicking ‘Run’, or work clock by clock, by clicking ‘One Step’.
  4. The source is also attached, so you can compile it on Visual C++ 6, and change it, as you want.

    Here is some results (average time per packet and packets dump percent) for some different net load.


  5. Conclusion and open issues

I tried to run the application with some different configurations and request rate and see what happens.

For low load there was no difference between the 2 policies. This make the Q-Routing better because in Q-routing we don’t have to know the all structure of the net, or make a pre-processing, as we have in shortest-path.

For high load, some times there were no differences, and for some cases the Q-Routing even loss more packets. But I found that for each case, when the right parameters was setting for the learning process it became much better and dropped much less packets.

The biggest problem was when the network became too loaded. In this case, when the Q-Routing started to drop packets, he dropped much more packets. This was because the Q-Routing passed each packet in a long path and this made the network more loaded than the shortest path policy does.

My conclusions are:

The Q-Routing can be a good policy but the parameters of the policy should be chosen very carefully. The routers should be aware that when they start to drop packets, they should prefer shortest path. In other words, the routing-algorithm should give consideration to the network total load.

Open issues: