algorithmic modeling for Rhino

Evolutionary Principles applied to Problem Solving

This blog post is a rough approximation of the lecture I gave at the AAG10 conference in Vienna on September 21st 2010. Naturally it will be quite a different experience as the medium is quite different, but it my hope the basic premise of the lecture remains intact. This post deals with Evolutionary Solvers in general, but I use Rhino, Grasshopper and Galapagos to demonstrate the topics.

September 24th 2010

David Rutten

Evolutionary Principles applied to Problem Solving

using

There is nothing particularly new about Evolutionary Solvers or Genetic Algorithms. The first references to this field of computation stem from the early 60's when Lawrence J. Fogel published the landmark paper "On the Organization of Intellect" which sparked the first endeavours into evolutionary computing. The early 70's witnessed further forays with seminal work produced by -among others- Ingo Rechenberg and John Henry Holland. Evolutionary Computation didn't gain popularity beyond the programmer world until Richard Dawkins' book "The Blind Watchmaker" in 1986, which came with a small program that generated a seemingly endless stream of body-plans called "Bio-morphs" based on human selection. Since the 80's the advent of the personal computer has made it possible for individuals without government funding to apply evolutionary principles to personal projects and they have since made it into the common parlance.

The term "Evolutionary Computing" may very well be widely known at this point in time, but they are still very much a programmers tool. 'By programmers for programmers' if you will. The applications out there that apply evolutionary logic are either aimed at solving specific problems, or they are generic libraries that allow other programmers to piggyback along. It is my hope that Galapagos will provide a generic platform for the application of Evolutionary Algorithms to be used on a wide variety of problems by non-programmers.

Pros and Cons

Before we dive into the subject matter too deeply though I feel it is important to highlight some of the (dis)advantages of this particular type of solver, just so you know what to expect. Since we are not living in the best of all possible worlds there is often no such thing as the perfect solution. Every approach has drawbacks and limitations. In the case of Evolutionary Algorithms these are luckily well known and easily understood drawbacks, even though they are not trivial. Indeed, they may well be prohibitive for many a particular problem.

Firstly; Evolutionary Algorithms are slow. Dead slow. It is not unheard of that a single process may run for days or even weeks. Especially complicated set-ups that require a long time in order to solve a single iteration will quickly run out of hand. A light/shadow or acoustic computation for example may easily take a minute per iteration. If we assume we'll need at least 50 generations of 50 individuals each (which is almost certainly an underestimate unless the problem has a very obvious solution.) we're already looking at a two-day runtime.

Secondly, Evolutionary Algorithms do not guarantee a solution. Unless a predefined 'good-enough' value is specified, the process will tend to run on indefinitely, never reaching The Answer, or, having reached it, not recognizing it for what it is.

All is not bleak and dismal however, Evolutionary Algorithms have strong benefits as well, some of them rather unique amongst the plethora of computational methods. They are remarkably flexible for example, able to tackle a wide variety of problems. There are classes of problems which are by definition beyond the reach of even the best solver implementation and other classes that are very difficult to solve, but these are typically rare in the province of the human meso-world. By and large the problems we encounter on a daily basis fall into the 'evolutionary solvable' category.

Evolutionary Algorithms are also quite forgiving. They will happily chew on problems that have been under- or over-constrained or otherwise poorly formulated. Furthermore, because the run-time process is progressive, intermediate answers can be harvested at practically any time. Unlike many dedicated algorithms, Evolutionary Solvers spew forth a never ending stream of answers, where newer answers are generally of a higher quality than older answers. So even a pre-maturely aborted run will yield something which could be called a result. It might not be a very good result, but it will be a result of sorts.

Finally, Evolutionary Solvers allow -in principle- for a high degree of interaction with the user. This too is a fairly unique feature, especially given the broad range of possible applications. The run-time process is highly transparent and browsable, and there exists a lot of opportunity for a dialogue between algorithm and human. The solver can be coached across barriers with the aid of human intelligence, or it can be goaded into exploring sub-optimal branches and superficially dead-ends.

The Process

In this section I shall briefly outline the process of an Evolutionary Solver run. It is a highly simplified version of the remainder of the blog post, and I'll skip over many interesting and even important details. I'll show the process as a series of image frames, where each frame shows the state of the 'population' at a given moment in time. Before I can start however, I need to explain what the image below means.

What you see here is the Fitness Landscape of a particular model. The model contains two variables, meaning two values which are allowed to change. In Evolutionary Computing we refer to variables as genes. As we change Gene A, the state of the model changes and it either becomes better or worse (depending on what we're looking for). So as Gene A changes, the fitness of the entire model goes up or down. But for every value of A, we can also vary Gene B, resulting in better or worse combinations of A and B. Every combination of A and B results in a particular fitness, and this fitness is expressed as the height of the Fitness Landscape. It is the job of the solver to find the highest peak in this landscape.

Of course a lot of problems are defined by not just two but many genes, in which case we can no longer speak of a 'landscape' in the strict sense. A model with 12 genes would be a 12-dimensional fitness volume deformed in 13 dimensions instead of a two-dimensional fitness plane deformed in 3 dimensions. As this is impossible to visualize I shall only use one and two-dimensional models, but note that when we speak of a "landscape", it might mean something terribly more complex than the above image shows.

As the solver starts it has no idea about the actual shape of the fitness landscape. Indeed, if we knew the shape we wouldn't need to bother with all this messy evolutionary stuff in the first place. So the initial step of the solver is to populate the landscape (or "model-space") with a random collection of individuals (or "genomes"). A genome is nothing more than a specific value for each and every gene. In the above case, a genome could for example be {A=0.2 B=0.5}. The solver will then evaluate the fitness for each and every one of these random genomes, giving us the following distribution:

Once we know how fit every genome is (i.e., the elevation of the red dots), we can make a hierarchy from fittest to lamest. We are looking for high-ground in the landscape and it is a reasonable assumption that the higher genomes are closer to potential high-ground than the low ones. Therefore we can kill off the worst performing ones and focus on the remainder:

It is not good enough to simply pick the best performing genome from the initial population and call it quits. Since all the genomes in Generation 0 were picked at random, it is actually quite unlikely that any of them will have hit the jack-pot. What we need to do is breed the best performing genomes in Generation 0 to create Generation 1. When we breed two genomes, their offspring will end up somewhere in the intermediate model-space, thus exploring fresh ground:

We now have a new population, which is no longer completely random and which is already starting to cluster around the three fitness 'peaks'. All we have to do is repeat the above steps (kill off the worst performing genomes, breed the best-performing genomes) until we have reached the highest peak.

In order to perform this process, an Evolutionary Solver requires five interlocking parts, which I'll discuss in something resembling detail. We could call this the anatomy of the Solver.

Fitness Function
Selection Mechanism
Coupling Algorithm
Coalescence Algorithm
Mutation Factory

Fitness Functions

In biological evolution, the quality known as "Fitness" is actually something of a stumbling block. Usually it is very difficult to say exactly what it means to be fit. It certainly has little or nothing to do with being the strongest, or the fastest, or the most vicious. The reason there are no flying dogs isn't that evolution hasn't gotten around to making any yet, it is that the dog lifestyle is supremely incompatible with flying and the sacrifices required to equip a dog with flight would certainly detract more from the overall fitness than flight would add to it. Fitness is the result of a million conflicting forces. Evolutionary Fitness is the ultimate compromise.

A fit individual is on average able to produce more offspring than an unfit one, so we could say that fitness equals the number of genetic children. A better measure yet would be to count the number of grand-children. And a better measure yet would be to count the allele frequency in the gene-pool of the genes that made up the individual in question. But these are all rather ad-hoc definitions that cannot be measured on the spot.

At least in Evolutionary Computation, fitness is a very easy concept. Fitness is whatever we want it to be. We are trying to solve a specific problem, and therefore we know what it means to be fit. If for example we are seeking to position a shape so that it may be milled with minimum material waste, there is a very strict fitness function that leaves no room for argument.

Let's have a look at the fitness landscape again and let's imagine it represents a model that seeks to encase an object in a minimum volume bounding-box. A minimum bounding-box is the smallest orthogonal box that completely contains any given shape. In the image below, the green shape is encased by two bounding boxes. B has a smaller area than A and is therefore fitter.

When we need to mill or 3D-print a shape, it is often a good idea to rotate it until it requires the least amount of material to be used during manufacturing. For a real minimum bounding-box we need at least three rotation axes, but since that will not allow me to display the real fitness landscape, we will restrict ourselves to rotation around the world X and Y axes. So, Gene A will represent the rotation around the X axis and Gene B will represent rotation around the Y axis. There is no need to allow for rotation higher than 360 degrees, so both genes have a limited working domain. (In fact, since we are talking about orthogonal boxes, even a 0-90 degree domain would suffice). Behold rotation around a single axis:

When we pick two rotational angles at random, we end up somewhere on the fitness landscape. If we allow for 4 decimal places in the rotation angles it means we can actually generate almost 810,000,000,000 (or 810 billion) unique rotations. It is therefore exceptionally unlikely that we manage to pick a random rotation that yields the best possible answer. But let's say we don't even manage to get close. Let's say we manage to pick a random genome that is at the bad end of the fitness scale, i.e. at the bottom of the fitness landscape. What can we say about the blood-line of this genome? When we track the descendants of a particular genome there is always a large amount of randomness involved due to the workings of the Solver, but there is a strong general tendency that can be distinguished. Just like water will always flow downhill along the steepest slope, so genetic descendants will generally climb uphill along the steepest slope:

Every individual tries to maximize its own fitness, as high fitness is rewarded by the solver. And the steepest uphill climb is the fastest way towards high fitness. So if the black sphere represents the location of the ancestral genome, the orange track represents the pathway of its most successful offspring. We can repeat this exercise for a large amount of sample points which will tell us something about how the Solver and the Fitness Landscape interact:

Since every genome is pulled uphill, every peak in the fitness landscape has a basin of attraction around it. This basin represents all the points in model-space that will converge upon that specific peak. It is important to notice that the area of the basin is in no way representative of the quality of the peak. Indeed, a very poor solution may have a large basin of attraction while a good peak might have a small catchment area. Problems like this are typically very difficult to solve, as the solution tends to get stuck in local optima. But we'll have a look at problematic fitness functions later on.

First, let's have a closer look at the actual fitness landscape for our minimum bounding-box model. I'm afraid it's not quite as simple as the image we've been using so far. I was actually quite surprised how organic and un-box-like the actual fitness landscape for this problem is. Remember, the x-axis rotation is mapped along the Gene A direction and the y-axis rotation along the Gene B direction. So every point on the AB plane represents a unique rotation composed of two angles. The elevation of this point is a direct mapping of the volume of the bounding-box at those two rotation angles:

The first thing to notice is that the landscape is periodic. I.e., it repeats itself every 90 degrees in both directions. Also, this landscape is in fact inverted as we're looking for a minimum volume, not a maximum one. Thus, the orange peaks in fact represent the worst solutions to this problem. Note that there are 16 of these peaks in the entire range and that they are rounded. When we look at the bottom of this fitness landscape, we get a rather different view:

It would appear that the lowest points in this landscape (the minimum bounding-boxes) are both fewer in number and of a different kind. We only get 8 optimal solutions and they are all very sharp, indicating a somewhat more fragile state.

Still, on the whole we have nothing to complain about. All the solutions are of equal quality and there are no local optima at all. We can generalize this landscape to a 2-dimensional graph:

No matter where you end up as an ancestral genome, your blood-line will always find its way to a minimum bounding box. There's nowhere for it to get 'stuck'. So it's really just a question about who gets there first. If we look at a slightly more complex fitness graph, it becomes apparent that this need not be the case:

This fitness landscape has two kinds of solutions. The high quality sharp ones near the bottom of the graph and the low quality flat ones near the top. The basin of attraction is given for both solutions (yellow for high quality, pink for low quality) and you can see that about half of the model space is attracted to the low quality solutions.

An even worse example (flipped upright again this time, so high values indicate good solutions) would be the following fitness landscape:

The basins for these peaks are very small indeed and therefore easy to miss by a random sampling of the landscape. As soon as a lucky genome finds the peak on the left, its offspring will rapidly populate the low peak causing the rest of the population to go extinct. It is now even less likely that the better peak on the right will be found. The smaller the basins for solution, the harder it is to solve a problem with an evolutionary algorithm.

Another example of a cumbersome problem to solve would be a discontinuous fitness landscape:

Even though there are strictly speaking no local optima, there is also no 'improvement' on the plateaus. A genome which finds itself in the middle of one of these horizontal patches doesn't know where to go. If it takes a step to the left, nothing changes. If it takes a step to the right, nothing changes. There's no 'pressure' in this fitness landscape, so all the genomes will wander about aimlessly, until one of them has the good fortune to suddenly step onto a higher plateau. At this point it will quickly dominate the gene-pool and the wandering starts again until the next plateau is accidentally found.

Even worse than this though is a landscape that has a high degree of noise or chaos. A landscape may be continuous and yet feature so much detail that it becomes impossible to make any intelligible pronunciations regarding the fitness of a local patch:

In a landscape like this, mommy and daddy may both be very similar and both be very fit, but when they mate the offspring might end up in one of the fissures. A landscape like this defies navigation.

Selection Mechanisms

Biological Evolution proceeds by Natural Selection. The ruthless force identified by Darwin as the arbiter of progress. Put simply, Natural Selection affects the direction of the gene-pool over time by regulating who gets to mate. In extreme cases mating is prevented because a specific genome is so unfit that the bearer cannot survive until reproductive age. Another rather extreme case would be sterility. However, there's a myriad ways in which Natural Selection can make it difficult or impossible for certain individuals to pass on their genetic footprint.

However, Natural Selection isn't the only game in town. For a long time now humans have been using Artificial Selection in order to breed specific characteristics into a (sub)species. When we try to solve problems using an Evolutionary Solver, we always use some form of artificial selection. There's no such thing as sex or gender in the computer. The process of selection is also much simpler than in nature, as there is basically only one question that needs to be answered: Who gets to mate?

Allow me to enumerate the mechanisms for parent selection that are available in Galapagos. This is only a small subset of the selection algorithms that are possible, but they seem to cover the basics rather well.

First off, we have Isotropic Selection, which is the simplest kind of algorithm you can imagine. In fact, it is the absence of a selection algorithm. In Isotropic Selection everyone gets to mate:

No matter where you find yourself on this fitness graph, your chances of ending up in a mating couple are constant. You might think that this is a particularly pointless selection strategy as it does nothing to further the evolution of the gene-pool. But it is not without precedent in nature. Take for example wind-pollination or coral spawning. If you're a sexually functioning member of such a species, you get to play ball come mating season. Another example would be females in a walrus colony. Every female in a colony gets to breed with the dominant male, no matter how fit or unfit she is. Isotropic Selection is certainly not without function either. For one, it dampens the speed with which a population runs uphill. It therefore acts as a safe-guard against a premature colonization of a local -and possibly inferior- optimum.

Another mechanism available in Galapagos is Exclusive Selection, where only the top N% of the population get to mate:

If you're lucky enough to be in the top N%, you'll likely have multiple offspring. A good analogy in nature for Exclusive Selection would be Walrus males. There's only a few harems to go around and far too many males to assign them all (a harem of one female after all is not really a harem). The flunkies get to sit on the side-line without a single chance to father a walrus baby, doing whatever it is walruses do when they can't get any action.

Another common pattern in nature is Biased Selection, where the chance of mating increases as the fitness increases. This is something we typically see with species that form stable couples. Everyone is basically capable of finding a mate, but the really attractive individuals manage to get a lot of hanky-panky on the side, thus increasing their chances of becomes genetic founders for future generations. Biased Selection can be amplified by using power functions, which have the effect of flattening or exaggerating the curve.

Coupling Algorithms

Coupling is the process of finding mates. Once a genome has been elected to mate by the active Selection Algorithm, it has to pick a mate from the population to complete the act. There are of course many ways in which mate selection could occur, but Galapagos at the moment only allows one; selection by genomic distance. In order to explain this in detail, I should first tell you how a Genome Map works. This

is a Genome Map. It displays all the genomes (individuals) in a certain population as dots on a grid. The distance between two genomes on the grid is roughly analogous with the distance between the genomes in gene-space. I say roughly because it is in fact impossible to draw a map with exact distances. A single genome is defined by a number of genes. We assume that all the genomes in a species have the same number of genes (this is not technically a limitation of Evolutionary Algorithms, even though it is currently a limitation of Galapagos). Therefore the distance between two genomes is an N-Dimensional value, where N equals the number of genes. It is not possible to accurately display an N-Dimensional point cloud on a 2-Dimensional screen so the Genome Map is only a coarse approximation. It also follows that the axes of this graph have no meaning whatsoever, the only information a Genome Map conveys is which genomes are more or less similar (close together) and which genomes are more or less different (far apart).

Imagine you are an individual that has been selected for mating (yay). The population is well distributed and you are somewhere near the average (I'm sure you are a wildly original and delightful person in real life, but for the time being try to imagine you are in fact sort of average):

That red dot is you. Who looks attractive?

You could of course limit your search of potential partners to your immediate neighbourhood. This means that you mate with individuals who are very much like you and it means your offspring will also be very much like you.

When this is taken to extremes we call it incestuous mating behaviour and it can become detrimental pretty quickly. Biological incest has a nasty habit of expressing unhealthy but recessive genes, but in the digital world of Evolutionary Solvers the biggest risk of incest is a rapid decline in population diversity. Low diversity decreases the chances of finding alternate solution basins and thus it risks getting stuck in local optima.

The other extreme is to exclude everyone near you. You'll often hear it said that opposites attract, but that's true only up to a point. At some point the genomes at the other end of the scale become so different as to be incompatible.

This is called zoophilic mating and it can be equally detrimental. This is especially true when a population is not a single group of genomes, but in fact contains multiple sub-species, each of which is climbing their own little fitness peak.

You definitely do not want to mate with a member in a different sub-species, as the offspring would likely land somewhere in the middle. And since these two species are climbing different peaks, "in the middle" actually puts you in a fitness valley.

It would seem that the best option is to balance in-breeding and out-breeding. To select individuals that are not too close and not too far. In Galapagos you can specify an in-breeding factor (between -100% and +100%, total out-breeding vs. total in-breeding respectively) that allows you to guide this relative offset:

Note that mate selection at present completely ignores mate fitness. This is something that needs looking into for future releases, but even without any advanced selection algorithms the solver still works.

Coalescence Algorithms

Once a mate has been selected, offspring needs to be generated. On the genetic level this is anything but fun and games. The biological process of gene recombination is horrendously complicated and itself subject to evolution (meiotic drive for example). The digital variant is much more basic. This is partially because genes in evolutionary algorithms are not very similar to biological genes. Ironically, biological genes are far more digital than programmatic genes. As Mendel discovered in the 1860's, genes are not continuously variable qualities. Instead they behave like on-off switches. Genes in evolutionary solvers like Galapagos behave like floating point numbers, that can assume all the values between two numerical extremes.

When we mate two genomes, we need to decide what values to assign to the genes of the offspring. Again, Galapagos provides several mechanisms for achieving this.

Imagine we have two genomes of four genes each. There is no gender and no sex-based characteristics in the solver so the combination of M and D is potentially a completely symmetrical process. A mechanism that is somewhat synonymous with biological recombination is Crossover Coalescence.

In Crossover mating, junior inherits a random number of genes from mommy and the remainder from daddy. In this mechanism gene value is maintained.

Blend Coalescence will compute new values for genes based on both parents, basically averaging the values:

It is also possible to add a blending preference based on relative fitness. If mum is fitter than dad for example, her gene values will be more prominent in the offspring:

Mutation Factories

All the mechanisms we have discussed so far (Selection, Coupling and Coalescence) are designed to improve the quality of solutions on a generation by generation basis. However all of them have a tendency to reduce the bio-diversity in a population. The only mechanism which can introduce diversity is mutation. Several types of mutation are available in the Galapagos core, though the nature of the implementation in Grasshopper at the moment restricts the possible mutation to only Point mutations.

Before we get to mutations though, I'd like to talk briefly about Genome Graphs. A popular way to display multi-dimensional points on a two-dimensional medium is to draw them as a series of lines that connect different values on a set of vertical bars. Each bar represents a single dimension. This way we can quite easily display not just points with any number of dimensions, but even points with a different number of dimensions in the same graph:

Here for example we have a genome consisting of 5 genes. This genome is thus a point in the 5-dimensional space that delineates this particular species. When G0 is drawn at ⅓, it means that the value is one-third between the minimum and maximum allowed limits. The benefit of this graph is that it becomes quite easy to spot sub-species in a population, as well as lone individuals. When we apply mutations to a genome, we should see a change in the graph, as every unique genome has a unique graph.

The above modification shows a Point Mutation, where a single gene value is changed. This is currently the only mutation type that is possible in Galapagos. We could also swap two adjacent gene values, in which case we get an Inversion Mutation:

Inversion mutations are only useful when subsequent genes have a very specific relationship. It tends to drastically modify a genome and thus in most cases also drastically modify fitness. This is almost always a detrimental operation.

Two examples of mutations that cannot be used on a species which requires a fixed number of genes are Addition and Deletion mutations.

Conclusion

Galapagos is still a very young product and hasn't really had time to position itself firmly in any work-flow, provided that it could. It seems to be capable of solving relatively small problems quite quickly, but it certainly needs a lot of work to make it more robust and usable. It is likely that the most effective applications for a solver of this type and capability are small or partial problems. To try and evolve anything complicated will almost certainly result in frustration.

Comment

You need to be a member of Grasshopper to add comments!

Comment by Amelia on February 15, 2023 at 12:23am: How inspiring when people make art! I would also like to learn, but I am constantly at work, I invite you to visit us, you will receive quality services and a relaxing atmosphere.

Comment by Alice Billson on October 31, 2022 at 7:13am: You will find a wide range of quality foods to choose from when buying your pet’s food from Amazon.com or another website like Petco or Petsmart, etc. You can even go into a store like Petco and see Holistic Supplements For Dog available there but unless you know what type of food your pet needs then this may not be an option for you since most people don’t realize that they need to know what their pets need in order for them to live healthy lives.

Comment by Thomas Wortmann on February 6, 2017 at 3:14am: Hi David,

Thanks for the excellent explanation!

One question: Did you come up with this framework yourself, or is it based on some published algorithm? If the latter, I'd be curious to know which one. (Especially the coupling algorithm seems a bit different. However, I can see why in theory it's better than using selection twice.)

Comment by Fariz Badi Uzzaman on September 24, 2014 at 9:03pm: what kind of problem that evolutionary solver or galapagos can solve? actually i am interested to apply this in my final project "Sacred Building". Thanks before

Comment by Joao Calheiros on January 23, 2014 at 5:18am: Hello David,

I'm working around in my thesis and i need to talk about evolutionary solvers (galapagos, goat, octopus) and the principles behind those, but i'm messing around with terms. I read already evolutionary solvers, generic solver, genetic solvers, evolutionary algorithms, genetic algorithms, generic algorithms etc. Could you brief explanation about these terms. They represent the same thing (solvers on one side and algorithms on another) or there are some (even slightly) differences ?

Thanks in advance, Joao Calheiros

Comment by Omar Helmy on January 20, 2013 at 1:59am: Sorry if this has been answered before, but is there a limit to the number of genes Galapagos can handle? If there is, is it a memory limit or a practicality limit?

Thanks

Comment by fancy_no1 on June 14, 2012 at 8:04pm

Comment by Santiago Perez on April 1, 2012 at 9:47pm: Update-

I just read the Multiple Fitness Workaround post and understand the problem better.

So- how to make a good fitness function?

In my Mesh Table example, too few edges would result in an uninteresting mesh table.

Too many and the number of joint connections becomes a problem.

So we need to have a goal in mind for the number of edges, and THEN maximize the surface area, hoping that this creates a nicely folded mesh, rather than an (uninteresting) flattened mesh (I'm using a random point generator for X,Y,Z points in space).

My question then revolves around how to make this a Generative Tool for optimizing what may be both pragmatic and aesthetic design goals.

Perhaps understanding how to develop a catalog of case studies for fitness functions is what I'm after- developing an intuitive sense of how to set up the system? For now the solution that I've developed is to divide the Mesh Surface Area by the total number of Faces. This is giving me a good result, but their may be a better fitness function.

Comment by Santiago Perez on April 1, 2012 at 8:46pm: I've just accomplished my first working system, so hopefully this question is not too basic.

Question: Can you run 2 parameters simultaneously to create a more complex fitness landscape?

I am interested in achieving a Delauney Mesh configuration based on two (Fitness) constraints: A) Maximize the Surface Area and B) Minimize the Number of Edges.

This is for a "real-world" project, (welded steel mesh tables) so I'm excited that I have the first fitness landscape running already.