Optimization From Nature - Genetic Algorithms - A Brief Introduction
In my previous article about optimization, I shared a couple of ways in which optimization appears in nature, from animals (or humans) cutting corners (or making corners, i.e. switchbacks) along steep slopes; to mimicry of another species to increase pollination; to conserving energy by resting during the heat of the day; to branches and leaves reducing the overlap between trees.
Each of these strategies, if executed correctly, can provide significant advantages to the organism to increase their own likelihood of survival and thus the propagation of their own genes through their progeny. Random mutation of those genes and the execution of strategies can occur resulting in a new strategy that may be “discovered” such that the organism has an even greater advantage compared to its own neighbors and/or predators. This organism will then live longer or be better in some way and have a higher chance of replicating itself (or at least its genes) through its own children who will, in turn, likely adopt those improved strategies and become increasingly more fit.
These processes in nature, such as survival of the fittest, random mutation, and species propagation have inspired mathematicians, computer scientists, and engineering designers to adapt these principles, and analogously apply them to improve the design of a process, part, or system.
In a way, we all do this with our own ideas and designs. We have an idea and we slowly make it better and evolve it, (whether in our mind or on paper) into a better and perhaps even optimal design through the various iterations or design generations. At times in the process, we incorporate ideas from other designs and cross-fertilize with other concepts exploring a variety of possible combinations. Sometimes those combinations work out well and the design becomes better. At other times, those combinations fail and we discard those infeasible, unattractive, or undesirable options. This is very much like biological evolution in which healthier, fitter, or “better” organisms can live longer and increase their chance of producing offspring. Similarly, good designs are more likely to make it further along the design process and closer to realization, while bad designs will be rejected or forgotten.
As designs pass from one generation to the next we can accelerate this process by leveraging computers and algorithms. One class of algorithms that perform an analogous process to the above description is called genetic algorithms. Genetic algorithms (GA) are usually categorized as one of many evolutionary optimization methods. In these methods, the individual designs in a population of designs will change and evolve into, hopefully, better designs over time.
To optimize using a GA, the population of designs must first be initiated. This may be accomplished by defining 100s or even 1000s of different individual designs sampled from across the full design space. Initial brainstorming or random sampling (e.g. using Latin hypercube sampling if a model exists of the design space) can be used to define this first generation of designs. Each of these designs must be converted into a set of design variables that can be compared and combined in different ways. For example, each smartphone in the population will have a screen resolution, camera quality, weight, cost, computational speed, and many other attributes. The fitness, quality, performance, or attractiveness of each of these designs will be dependent on those input design variables. Generally, higher screen resolution increases the desirability of a smartphone (up to a point), but it may be that the high cost will reduce the desirability. The desirability, output response, or evaluation metric of the designs is often called the objective function and it is formulated in such a way that the designer wants to minimize or maximize its value.
In a large enough population, many designs will be highly desirable and many will not be. If the designs cover a large number of possibilities, the population is generally labeled as diverse because the the input design variables take on various combinations of high or low values, which are then integrated and converted to the a large range in desirability across the design space (both low and high desirability).
A genetic algorithm will first select from among the high-desirability designs and allow them to survive into the next iteration or generation of designs. This step is often called selection. A portion, say 50%, of the initial population is chosen to live into the next generation. These designs are now considered parent designs and are allowed to have children designs, akin to the cross-fertilization of ideas described previously.
In the algorithm, a portion of a child’s design variables will come from the first parent and the remainder will come from a second parent design, called cross-over. If both parents were considered “good” or “highly desirable”, it is possible that the child will be even more desirable. Of course, this isn’t guaranteed and in some cases, the child design will be worse than either of the parents. Still, with 100s of parents (from the population) having 100s of children, some of the children designs will be better, even if it is a small percentage of the time.
These children are then included into the next generation which is composed of the parents (50% of the initial population) and the new children (the other 50% of the population) to return to the full population size. This process is repeated where the children and parents (who are now all just parents in this second generation) are evaluated for desirability and then the best half are selected to make it into the third generation and have offspring. There are variations on this selection process that will be discussed in a future article, but this describes the basic genetic algorithm.
This process is repeated continually until a generation is reached in which all the designs are the same (i.e. the diversity is zero) or if there has been a sequence of generations and not one child was born that was better than any of the parents. Essentially, a point is reached in which no new improvement is observed and therefore the algorithm cannot do any better and is terminated.
Often one more step is introduced after the cross-over step, called mutation. During mutation, the child design can undergo a possible deviation in one or more of its design variables. Although the probability of this occurring is set quite low, so as to not make a child design completely random (which would be like resetting the population to the initial generation), it is very important to introduce some variability in the diversity of the designs.
If mutation is set to zero, or at too low of a value, there is less of a chance for random exploration of the design space and very quickly the population’s generations will not have new combinations to consider. The result could be a less optimal design because the set of designs or population were unable to have a diversity of ideas or more specifically design variable values. These design variable values are what are considered genes in a GA and are allowed to persist into the next generation if the fitness or desirability of its organism (i.e. design) is high. This is where the “genetic algorithm” gets its name.
In a future article, we’ll continue exploring GAs and describe some of the variations and techniques commonly applied for the three typical stages of GA, namely selection, cross-over, and mutation.