Incremental Evolution of Complex General Behavior

Faustino Gomez Risto Miikkulainen

Summery by Haim Helman

 

 

1 Introduction

 

The article deals with the problem of evolving complex general behavior.

Complex behavior means problems that require a series of steps that can not be evaluated individually but still require making a decision at any given situation. These problems are the same as those dealt by reinforcement learning (prey capture; following walls; obstacle avoidance). General behavior means that the problem is not of the form “find the shortest path between A and B” but rather demands dealing with a group of problems with the same general goal but with different parameters.

The problem when using direct evolution methods on such problems is that the behavior that evolves is “mechanical” which yields better results than any other simple behavior but is very far from the expected solution. For example – in the prey capture problem, moving in circles is a much more effective than staying still, yet it is definitely a lot worse that the expected behavior.

Another problem is that complex problems require the use of memory, this can not be accomplished by simple feed forward neural nets. To solve that problem the use of recurrent network was introduced. Recurrent networks are neural nets, which use their last output as part of their new input.

The solution presented by this article to the complex general behavior evolution problem is Incremental evolution, which means introducing increasingly tougher problems and evolving behavior to solve them. Two major tools are used for this:

    1. Enforced subpopulation.
    2. Delta Coding.

The test problem that was chosen is the prey-capture problem.

 

2 Prey Capture

 

Prey capture task is a task in which an agent (Predator) should catch another agent (prey) which tries to avoid him. The behavior of the prey is preprogrammed while the predator evolves.

This sort of problem is interesting for two reasons: first, it is a common behavior in nature, so we can compare our evolution to the natural evolution. Second, the task is simple, but requires complex sensorimotor coordination.

The predator and prey move in a synthetic discrete environment, the perception range of the predator is limited, and it is allowed a limited number of steps to catch its prey.

The speed of an agent is the probability of movement in each step. For example, if an agent’s speed is 0.5 and it decides to move north, it has 50% change of moving north and a 50% change of staying where he is.

 

3 Incremental Evolution

 

By attacking a complex general task such as prey capture “head-on”, from the outset of the evolution process, conventional methods suffer from insufficient performance. All the individuals perform poorly, and therefor the GA gets trapped in an unfruitful region of the solution space. However, if we present an easier version of the task, we might find some individuals that perform well and they will direct us to a region in the solution space, from which the real task is more accessible. If we still don’t get any result, we can repeat the process with an even easier task.

The real task is called the goal task, the easier tasks are called evaluation tasks.

Incremental evolutions means creating a series of evaluation tasks t1, t2, .. , tn in which ti is an easier version of ti+1 and tn is the goal task. Evolution of the goal task behavior is achieved by evolving on each evaluation task in succession.

The following two heuristics can be used to derive effective Evaluation tasks:

    1. Increase the density of the relevant experiences within a trial, so that a network can be evaluated based on greater information in a shorter amount of time.
    2. Make the evaluation task easier so that the acquisition of fundamental goal-task skills is more feasible.

 

4 NeuroEvolution method: enforced subpopulation and delta coding

 

The NE method used is based on Symbiotic, Adaptive NeuroEvolution (SANE). SANE has been shown to be a very powerful reinforcement learning method for tasks characterized by sparse reinforcement. To apply this method to evolving complex general behavior, two extensions are necessary: Enforced subpopulation, to build recurrent networks, and delta coding, to assist in transferring to new tasks.

4.1 SANE

What makes SANE different from standard NE methods is that in SANE the population is made out of neurons, and not full networks. The process of SANE is as follows:

    1. Initialization: The number of hidden units u in the network that will be formed is specified and a population of neuron chromosomes is created. Each chromosome encodes the input and output connection weights of a neuron with a random string of binary numbers.
    2. Evaluation: u neurons are selected randomly to create a feed forward network. The network is evaluated on the task and given a fitness score. The score is added to the cumulative fitness of each neuron that participated in the network. The process is repeated until each neuron was evaluated using an average of say, 10 trials.
    3. Recombination: The average score of each neuron is calculated, and the lowest ranking half of the population is replaced by pairs that are created by crossover and mutation of chromosomes from the top quartile.
    4. Goto 2.

4.2 Enforced subpopulation

Enforced subpopulation (ESP) means, dividing the population in advance to subpopulations. Each subpopulation is designed to produce a specific single neuron in the network. Neurons in different subpopulations are recombined separately.

ESP has two major advantages over simple SAME:

    1. The subpopulations that gradually form in SANE are already circumscribed by design in ESP.
    2. In ESP each neuron is evaluated on a specific role in the context of all the other players. In SANE, network can contain multiple members of some specialization and omit members of others, therefor, its evaluations are less consistent.

The most important aspect of ESP is that it allows the evolution of recurrent networks. The neurons in recurrent network are not symmetric and therefor can not be recombined with each other. In ESP a specific connection of a neuron from one subpopulation will always be connected to a neuron in another specific subpopulation.

4.3 Delta coding

As evolution progresses, each subpopulation will decline in diversity. This is a problem in incremental evolution, since a converged population cannot easily adapt to a new task. To accomplish task transfer despite convergence, ESP is combined with an iterative search technique known as delta coding. When the subpopulation has converged, delta coding is invoked, by first saving the best solution, and then initializing a population of new individuals called ∆-chromosomes. The ∆-chromosomes have the same length as the best solution, and they consist of values that represent difference from the best solution. The evaluation is done using the sum of the best solution and a ∆-chromosome for each neuron, but only the ∆-chromosomes evolve. When the ∆-chromosomes population converges their best solution is added to the best solution, and the result is the new best solution.

In the experiments presented in this article, delta coding is implemented with the ESP architecture, which means that the ∆-chromosomes population maintains the division to subpopulations.

The initial values of the ∆-chromosomes are chosen randomly using a probability distribution function that is concentrated around zero, but not to concentrated, this assures both stability and flexibility of the best solution.

 

5 prey capture performance

 

5.1 simulation setup

 

Grid world size – 22 x 22

Sensor range – 5

Subpopulation size – 40

NN units – 5

Initial random weight range – [-6.0, 6.0]

Chromosome mutation probability – 0.1

 

The agent is a fully connected recurrent network with 5 units, 4 of them are output units (E, W, N, S). The strongest output determines the direction of movement. The agent has 8 sensors that cover 8 sectors and a proximity sensor that covers the close environment. There are 4 wall detectors, which are activated proportionally to the Agent distance from the walls.

In each generation 400 networks were evaluated using three trials. In each tthe predator is placed in the center of the world and the prey is placed in a random place within the predator’s sensor range. In each time step, both agents move. The predator is given 60 steps to catch the prey. If the prey is caught it is placed in a random place within the predator’s sensor range, and the process repeats itself until the predator fails. The score is the number of times the prey was caught in the three trials.

The difficulty level of the tasks was modified using two parameters: 1) The speed of the prey – used to develop the predator tracking ability and 2) the number of steps the prey is allowed before the predator starts to move – used to develop memory. The goal task features a prey that moves at speed 1.0 (move every step) which is the speed of the predator, and is given a 4-moves head start. A score of 100 or more is considered a success.

5.2 Direct evolution

The direct evolution method started with the goal task as its evaluation task. After 20 generations the best agent reached a score of 10. After 200 generations the best score was 13. The best agents simply moved mechanically in the world.

5.3 Incremental evolution

The first task presented a prey with 0 speed and no head start. In the next three steps, the prey’s head start was gradually increased to a 4-steps head start. Then in the next three steps his speed increased to 1.0. The first task is very easy, and after very few generations agents evolved that moved directly toward the prey. In the next step still no real memory was necessary to capture the prey but as the head start increased, memory became more important. When the speed increased, again, at first the fast predator did not need a tracking ability in order to catch the much slower prey (which did not move most of the time). As the prey became faster, only agents that relied more on the current input rather then on old knowledge could catch it.

Each step in the evolution took an average of 25 generations to evolve an agent with a score of 80. After less then 200 generations altogether an agent evolved with a fitness score of 100 on the goal task.

 

6 Experimental Analysis

 

After successful agents have evolved, we want to know what kind of behaviors do they exhibit?

Looking at the behaviors of all the successful agents reveals that they all work in 5 distinct steps:

    1. In step 0 the prey is within the predator range and the predator recognizes him.
    2. In the next four steps the prey moves out of the predator’s sensory range.
    3. The predator moves to the last point in which it saw the prey, the prey begins to flee.
    4. The chase continues until the prey reaches a wall, and then it enters the predator’s range again.
    5. The predator moves towards the prey until it catches him.

The whole chase tasks about 25 steps.

The next step was to try and analyze the structure of the network that has evolved, which means, trying to find which neuron is responsible for what part of the behavior. For that purpose a lesion study was conducted. This means that one of the neurons was removed and the lesioned network was tested. Since 4 of the 5 units were also output units they could not be fully removed, so only their input connection were disconnected.

The result of the lesion study show that the network that evolved is quite robust and that it performs rather well even without one unit (22-63% of the original score), it doesn’t fail completely even when two units are missing (5-30% of the original score).

It seems that there is no strict relation between a unit and a specific behavior. The knowledge spreads all over the network.

 

7 Discussion and future work

 

In this article the goal task was decomposed to a sequence of task by the user. This may not easy or even possible in other kinds of tasks. However, the amount a tasks that can be decomposed easily justifies the exploration of that method. It also turns out, that delta coding can compensate for imperfect task design by applying delta coding when the evolution process gets stuck, without changing the task.

In the future, a systems can be developed that will perform fine tuning on the evaluation tasks parameters “on-line” to optimize the incremental evolution process. The tuning process can be done continuously from generation to generation, in response to average performance.

 

8 Conclusion

 

This article shows that even when a task is too difficult to evolve directly, NE can be applied incrementally to achieve the desired complex behavior. In this approach ESP allow evolution of recurrent networks, which are necessary for task that require memory. The delta coding technique allows evolution of transition between tasks, even when the population has lost diversity during the previous task. The approach should be applicable to many real-world domains such as game playing and robot control, as well as artificial life, where often a natural hierarchy of behaviors from simple to complex exists.

 

9 My Comments

 

I thought the article was very well written. It combined a comprehensive review of the theoretical background, and a thorough description of the experiment’s environment and protocol.

Obviously, it is only a preliminary research of the subject, the task of prey capture that was chosen is a good example, but the environment used was very basic. It is worth checking the results when dealing with a much larger world, or maybe a continuous world instead of a discrete one (with discrete time steps).

In the conceptual level, it is worth exploring the relation between incremental evolution, and co-evolution. The incrementally difficult task could be viewed as an evolution process of the prey. Co-evolution of the predator and the prey could produce an automatic task generator.