Does Galapagos prefer some genome data structures over others?

Greetings

I've used Karamba for finding the maximum displacements, w[max], of a rectangular plate with uniformly distributed, lateral load and three point supports.

Displacement plot

I've run a brute force algorithm analysing all possible support constellations and their related w[max] values. Therefore I know the global optimum value of the fitness (smallest w[max]) and its related genome (location of the three point supports).

Now the reason I've done this, is because I wanted to test whether one type of genome data structure would yield better results when fed to the Evolutionary Solver of Galapagos. The two genome data structures tested were the following types:

Sequential genome indexing for support positions, uses one slider per point support
[supports @ node 2, 23 & 25]

Coordinate genome indexing for support positions, uses two sliders per point support
[supports @ node (0;2), (3,5) & (4,1)]

My own hypothesis beforehand was that the coordinate indexing system would be the better performer, as I had imagined that the solver might build up some kind of geographic intelligence based on this genome information system.

Based on my tests with the Evolutionary Solver the two genome indexing types seem to perform equally well; they both manage to find the optimal fitness sooner or later with same hit rates. But now I worry that the used example might be too small to draw a final conclusion upon, and as I will be applying the same type of deflection analysis for more complicated deflection cases with both larger and arbitrarily shaped meshes having more than three point supports, I would like to ask:

Should the coordinate indexing system, theoretically speaking, perform better than the sequential indexing system?

Replies to This Discussion

Permalink Reply by Dimitar Baldzhiev on June 8, 2015 at 2:24pm

I must say that I'm talking well over my head but I think genetic algorithms work better when there is some logical structure underneath them. Probably David is the only one who can explain it properly but the basic way I understand it is as follows:
The indexing system might give better/quicker results because there are rows/columns that get into the "good" generation more often, therefore arriving to a solution (that might as well be a local minimum/maximum) quicker. On the other hand having a single sequential identifier removes the logic of rows/columns and the "good" generations are selected not on coordinate basis (2paramenters) but on a single (brute force) basis. I guess the right decision whether to take sequential vs coordinate parameters depends on problem to problem basis. I would surmise that if you can eyeball a coordinate system that may make sense (in terms of a grid-columns and rows) it may be better to use it but if you want to brute force it a sequential system is better.

In this sense the answer to your question is that "Yes" galapagos will eventually pick that for example genome N2 needs a value of 5 and will stick to it (ignoring random mutations).

Please more competent people elaborate on the topic!

Permalink Reply by Daniel Kolling Andersen on June 9, 2015 at 5:13pm

Thanks for your input Dimitar - at least it’s comforting to know that we share the same assumptions!

In this specific optimisation case of solving for structural supports the underlying math becomes particularly merciless as the fitness (smallest w[max]) is a highly nonlinear function of the gathered genomes (support positions). To elaborate, the w[max] does not occur at a fixed node in the mesh for all support position constellations, rather it occurs at variable mesh nodes as a function of the support position constellation. For the same reason no stationary support position can be considered “good” related to the fitness. (E.g. the support position (4;1) is a good support position when the two other support positions are (0;2) and (3;5), but the support position (4;1) is not a good support position when the two other supports are also located at (4;1))
As I see it, the global fitness landscape is a compilation of two kinds of sub fitness landscapes; one-dimensional landscapes and two-dimensional landscapes.
The one-dimensional landscapes arise when two support positions are fixed and the third is free to move around.
The two-dimensional landscapes arise when one support position is fixed and the two others are free to move around.

Because of this complexity I also worry, that it will be almost impossible for Galapagos to build up any intelligence on the system, especially as the fitness landscape will only grow in dimensionality and complexity with the amount of supports. But of course Galapagos always presents the possibility of “getting lucky in higher dimensions” :)

Permalink Reply by Daniel Kolling Andersen on June 26, 2015 at 6:30am

A last gasp, anyone?

Permalink Reply by David Rutten on June 26, 2015 at 6:43am

Theoretically the coordinate system should perform better. You're imposing a sequential structure onto the problem by using ordered indices (though unordered indices would be the worst possible case) which doesn't exist in the actual problem spec. If galapagos is allowed to change the column position in all possible directions, it is less likely to get stuck in some local optimum.

Let's assume that (all other things being equal) column 20 would yield the best possible answer. The current state of the system though is at column 26, which is pretty good too, just not as good. Galapagos is more likely to 'mutate' the state a little bit instead of a lot, so it'll explore the columns near 26. However 20 isn't near 26 at all, only 25 and 27 are nearby, and maybe 24 and 28. But they'll all worse answers, so after sampling in those directions GP will abandon that as fruitless.

If however you specify the columns using two variables, then the columns near 26 are 20, 25, 27 and 32. That's a far richer space to explore which much better approximates the real problem.

Permalink Reply by Daniel Kolling Andersen on June 26, 2015 at 6:51am

Thank you David!

RSS