Name : Assaf Sinvani

IDNum : 00000000000

 

Evolution of Homing Navigation in a Real Mobile Robot

Dario Floreano, Francesco Mondada

 

Introduction

In the last few decades, considerable development in robotics has been achieved. The "manipulator robot" (arm) tool has introduced automation in several industrial domains, for example, the automobile industry. The manipulator has automated mostly simple, repetitive, and large scale tasks that did not require any decision.

However, a much larger number of applications require more complex operations and more flexibility at work: assembly of complicated pieces, quality control, etc. In those applications, the lack of perception abilities and "intelligence" of the robots currently used limits their economical viability and utility. When it comes to mobile robots, these limitations become even more apparent.

"Autonomous mobile robots" are robots capable of doing work without human intervention. The difference between autonomous mobile robotics and industrial robotics is the same as that between the natural world and the artificial world created and mastered by man. Creating a mobile robot that could, for example, guard a forest and signal abnormal situations, cannot be done with our technology. Is order to succeed in this challenge, it is necessary to modify our design methodologies so that they will provide a greater coherence between the robot and its operating environment and will give the robot a higher degree of autonomy.

Autonomous Biological Agents

Autonomous biological agents are characterized by robust and reliable self-adaptation to the characteristics of the environment without external supervision or control. This adaptation process takes place while the agent operates in its own environment.

As a reaction to the partial failure of the classical AI approach to develop robust control systems for robots that need to operate autonomously in real world situations, a novel approach, termed behavior-based robotics, has recently emerged. Whereas classic AI is more concerned with a high level definition of the environment and of the knowledge required by the system, behavior based-based robotics stresses the importance of continuous interaction between the robot and its own environment for the dynamic development of the control system and for the assessment of its performance.

Simulated Robots and Real Robots

A number of researchers have successfully employed an evolutionary procedure to develop the control system of simulated robots. The rich variety of structures that have been put under evolution (neural networks, Lisp code) and the large number of evolved behaviors (locating food sources, wall-following, obstacle avoidance) have empirically demonstrated the power and generality of the evolutionary methodology. However, the authors think that computer simulations of robots can hardly capture the complexity of the interaction between a real robot and the physical environment.

The Experiments in This Paper

In this paper, the authors described two experiments.

The first experiment serves as a test of the methodology and as a benchmark. In the first experiment explicitly evolve the ability to navigate in a corridor with several sharp convex and concave corners. Although the fitness function is precisely engineered to perform straight motion and avoid obstacles, the evolved robots display a number of interesting solutions that have not been pre-designed.

In the second experiment provides the robot with simulated battery. The environment is introduced a battery charger and a light source, and the fitness function is greatly simplified. Although the fitness function does not specify the location of the of the battery station or the fact that the robot should reach it, the robot learns to find and to periodically return to it while keeps moving and avoiding the walls. The resulting behavior and the evolved neural mechanisms are studied in detail by analyzing the neural activity while the robot is tested in a number of situations.

General Method

The robot

The Khepera robot employed in the experiment is circular, compact, and robust. The Khepera robot has a diameter of 55 mm, is 30 mm high, and its weight is 70 g. In its basic version it is provided with eight infrared proximity sensors placed around its body (six on one side and two on the opposite side) which are based on emission and reception of infrared light. Each receptor can measure both the ambient infrared light and the reflected infrared light emitted by the robot itself (for object closer than 4-5 cm in the described experiments).

Experimental setup

New tools and methodologies were developed to study the robot behavior. The setup employed here reflects our concern to study and understand the solutions provided by the evolutionary procedure. In the experiments described, the robot is attached to a Sun SPARCstation via a serial line by means of an aerial cable. All low-level processes, such as sensor reading, motor control, and other monitoring processes, were preformed by the on-board micro-controller, while other processes (neural network activation and genetic operators) were managed by the Sun CPU. This configuration allowed analysis of the robot trajectories and of the functioning of its neural control system.

The evolutionary procedure and the neuron model

The evolutionary procedure employed in the experiments consisted in applying a simple genetic algorithm to the synaptic weight values of the neural network that controlled the robot. The synaptic weight values were individually coded as floating point numbers on the chromosome. Each chromosome has a constant length. The Initial population was created randomly. The input units of the neural network were attached to the robot's sensors, and the output units were directly used to set the velocity of the wheels. The genetic operators used were selective reproduction, crossover and mutation.

The First Experiment - Navigation and Obstacle Avoidance

The first experiment was aimed at explicitly evolving the ability to perform straight navigation while avoiding the obstacles encountered in the environment.

The experiment

The robot was put in an environment consisting of a sort of circular corridor whose external size was approx. 80x50 cm large. The walls were made of light blue polystyrene and the floor was made of a gray thick paper. The robot could sense the walls with the IR proximity sensors. The fitness criterion F was described as a function of three variables, directly measured on the robot at each step, as follows,

0 £ V £ 1

0 £ D v £ 1

0 £ i £ 1

 

where V is a measure of the average rotation speed of the two wheels, D v is the absolute value of the algebraic difference between the signed speed values of the wheels (positive in one direction, negative the other) and I is the activation value of the proximity sensor with the highest activity. The function F has three components: the first one is maximized by speed, the second by straight direction, and the third by obstacle avoidance.

The neural network architecture was fixed and consisted of a single layer of synaptic weights from eight input units (each connected to the robot's proximity sensors), to two output units (directly connected to the motors) with discrete-time recurrent connections only within the output layer.

Results

Khepera learned to navigate and avoid obstacles in less than 100 generations, each generation taking approximately 40 minutes. However, around the 50th generation the best individuals already exhibited a near to optimal behavior (smooth navigation, never bumped into walls and corners, maintained a straight trajectory). Early during evolution the individuals evolved a frontal direction of motions, corresponding to the side where more sensors are available. When compared to the performance of a simple Braitenberg vehicle, the robots displayed a better global performance, especially when facing concave corners.

The Second Experiment - Battery Recharge

The goal of this new experiment was to test the hypothesis that, when employing an evolutionary procedure, more complex behaviors do not necessarily have to be specified in the objective fitness function, but rather emerge from a mere change of the physical characteristics of the robot and of the environment described in the previous experiment.

The experiment

The environment employed for the evolutionary training consisted of a 40x45 cm arena delimited by walls. A 25 cm high tower equipped with 15 small DC lamps oriented toward the arena was placed in one corner. The room did not have any other light sources. Under the light tower, a circular portion of the floor at the corner was painted black. The painted sector, that represented the recharging area, had a radius of approximately 8 cm. When the robot happened to be over the black area, its simulated battery became instantaneously recharged.

Another ambient light sensor was placed under the robot platform, pointing downward, and its signal was thresholded so that it was always active, except when over the black painted area in the corner. The robot was provided with a simulated battery characterized by a fast linear discharge rate (max duration: approx. 20 sec), and with a simulated sensor giving information about the battery status.

The neural network controlling the robot was a multi layer perceptron of continuous sigmoid units. The hidden layer consisted of 5 units with recurrent connections. Each robot started its life with a fully charged battery that allowed it to move for 50 time steps. An upper limit of 150 steps was allowed for each individual.

Each individual was evaluated during its life according to the following fitness function F ,

0 £ V £ 1

0 £ i £ 1

 

Where V and i are the same as in the first experiment.

The function F has two components: the first one is maximized by speed and the second by obstacle avoidance. The fitness value was computed and accumulated at each step, except when the robot was on the black area. The accumulated fitness value of each individual was then divided by the maximum number of steps.

Results

The robot was left in a dark room lit by only the small light-tower, and its evolution was monitored on the workstation for the next 10 days. The experiment lasted 240 generations. The number of actions performed by each robot increased at each generation, suggesting that the robot gradually learned to pass over the recharging zone. The combined data of the best fitness values and of the corresponding life duration showed that, mainly in the last 90 generations, the individuals increased their own life duration and spent a shorter period of time over the recharging area.

Neuro-ethological analysis

Since more precise conclusions could not be drawn from this data, the authors resorted to a method of analysis employed by neurophysiologists by testing the robots behavior in a number of situations while recording all its internal (Battery status) and external variables (positions, sensor activation, motor activation). The robot was fitted with a "helmet" for absolute position measurement. The tests were performed on the best individual from the last generation.

In the first test, the robot was placed in the recharging area and left free to move while the battery level and motor activities were recorded. The robot rapidly moved out of the recharging area where it returns only when its battery level is about 0.1 (approx. 5 steps before complete discharge). The robot was always extremely precise in timing its return to the recharging area. Also, the period of time spent over the recharging area is reduced to a minimum necessary. Most of the time the robot moved at nearly full speed along a slightly bended trajectory, and it always turned to the right when a wall was encountered.

In other tests, the robot was positioned at various locations and was left free to move while his position and activation of the hidden nodes were recorded every 380 ms. These tests were performed with the light tower turned on and off. The nodes in the hidden layer were labeled v_h0 - v_h4.

When placed in a few specific locations on the arena, the robot did not find the recharging zone.

The results showed that the robot starts planning a trajectory that will lead it to the recharging zone when the battery level is about one third of full charge. The robot relies on a set of semi-automatic procedures to perform the turns at the walls and the semi-linear trajectories. The node v_h4 seems to be responsible for battery check and path planning in the last steps, and nodes v_h0 and v_h2 are very likely responsible for the automatic behavior of straight navigation.

Discussion

Genetic algorithms can be successfully used to develop the control systems of a real mobile robot. Two methods were shown. The first method consists in a detailed specification of the fitness function that is tailored for a precise task. In this sense, there is not much difference between the tuning of the objective function in a supervised neural algorithm and the engineering of the fitness function in the genetic algorithm. The evolving agent can hardly be said to be anonymous because it is not possible to identify and specify in advance the desired actions of an autonomous agent. An alternative method is to consider the fitness measure to be a general survival criterion tart is automatically translated into a set of specific constraints by the characteristics of the interaction between the organism and the environment. This approach yields an ecologically grounded behavior, rather than a mere task.

The behavior of the evolved agent relies on a topological neural representation of the world that was gradually built through a process of interaction with the environment. The few failures in reaching the recharging area from some starting points thus might be due to a sub-optimal or not fully formed map (when the evolutionary process was stopped, the fitness function was still increasing).

The authors succeeded only partially when obstacles were introduced to the environment. When the obstacles were introduced in the beginning of the evolutionary run, the robot did not learn to reach regularly the recharging area. In order to make the fitness surface smoother, a more gradual approach was adopted by introducing obstacles only after the robot learned to locate the recharging area. In this tests the best individuals could reach the recharging area only from a very few starting positions. This limitation poses a serious question on how well this simple method would scale to harsher environments. The authors opinion is that allowing the agent to learn during its life would help to circumvent these difficulties.

An alternative solution to the scalability problems outlined above and to the reduction of the evolution time could be provided by a more efficient genetic encoding that would use more compact or suitable representations which capture the essential features of a neural network model.