Fly! Component - accomodate larger-scale, complicated models

Hi everyone,

I was wondering if there is a way to improve upon the Fly! component. I am currently working on Machine Learning and Parametric Design models using HB and LB. My first case study was a success. It was a simple model with only 23,000 different combinations. The 23,000 lists of parameter combinations were computed at around 20 mins (if memory serves)

The second model is a bit more comprehensive and has a total of 360,000 combinations. I am again using the Fly component to cycle through the sliders and produce (through a few native GH components) the 360,000 different combinations of model parameters (one list for every model). But I fear the time it will take will just be too long. It's also hard to say if GH itself has stopped working, since after enabling Fly! the environment freezes. I remember reading somewhere that GH might have an issue with large lists, but not sure if that's true.

Is there maybe a way for a faster implementation of this? Anyone knows of a GH native way of producing combinations from lists of values (instead of sliders)?

I could probably some of this outside Rhino/GH/HB/LB but I'd rather not too.

Thanks in advance.

Kind regards,

Theodore.

Replies are closed for this discussion.

Replies to This Discussion

Permalink Reply by Leland Curtis on November 29, 2016 at 8:33am

Theodoros,

I use the Brute Force component to iterate through all list combinations. It appears to be identical to fly, so it may not speed things up. I haven't found a way to use lists directly.

It sounds like the process of rebuilding a unique model/file for each iteration is what is slowing you down. For 360,000 combinations it probably makes more sense to manipulate the idf or rad file directly rather than rebuild each time. Maybe use GH to build each unique geometry option, then manipulate the files to add the simpler changes, such as material options. I haven't done this myself, but in theory it could save time.

This may be down the road a while, but Sarith and I have recognized that our iterative models have an outrageous amount of redundant calculation because we treat each model as a unique run. If we only change the glass type and no surfaces, the internal and external calc haven't changed, only the façade. Why waste time calculating the same thing over and over? Consider DaySim. It uses one DF matrix to calculate the internal bounces and then adjusts the skydome to calc each unique hour. No redundant internal calc. We need a similar method designed for iterative calculations. Re-use as much of the previously run iterations as possible. 360,000 runs could be shrunk to a fraction of that. What are your thoughts? 5-phase calc method is an obvious fit for massive iterations.

Permalink Reply by Mostapha Sadeghipour Roudsari on November 29, 2016 at 10:47am

Can't agree more! I am very excited to say that, thanks to Sarith's amazing contribution, the 3-phase is finally supported in Honeybee! It will save people a lot of time and will also open up a lot of new opportunities which wasn't possible before. Sarith, Chris and I will do a workshop on 3-Phase in New York on December 8, and will hopefully officially release it early next year.

Permalink Reply by Theodoros Galanos on November 29, 2016 at 3:46pm

Hi Leland,

Thanks a lot for you thoughts. What you mention makes a lot of sense and I feel that it would help a lot in schematic/design development stages of a real life project where higher accuracy results for different design alternatives are needed. I haven't thought of manipulating on that level, it sounds very interesting.

For my case, since this is on a concept design level, the number of combinations in an actual practical model (which should be in the tenths of millions) makes even this reduction impractical.

However, ML can also be used for exactly this thing, what is called dimensionality reduction. By producing millions of different design alternatives opens new potential for statistical studies of models and results. One of the processes embedded in ML is feature (i.e. parameter) selection where the parameters that end up not being important can be ommitted. In a way this is similar, but also different I imagine, to what you describe only one step earlier and with less resources.

Interesting stuff! Let's see where they lead!

Kind regards,

Theodore.

Permalink Reply by Mostapha Sadeghipour Roudsari on November 29, 2016 at 10:41am

What about sampling all the possibilities and run them instead of running all the possible combinations?

Let's say you have 5 sliders and each of the has values between 0-9 and you only want to test 200 options. This code will generate all those options as lists. Now you can use fly to only iterate between this 200 values using an item selector components.

import itertools import random

# generate values for slider. You can change this to get the values from the sliders.
sliders = (range(10),) * 5
population = itertools.product(*sliders)
sample = random.sample(tuple(population), 200)

Permalink Reply by Theodoros Galanos on November 29, 2016 at 3:36pm

Hi Mostapha,

Thank you so much for the code! I was actually using the random list component from GH to break the master list (all possible combinations) into a smaller one.

This smaller list is what is called the Training Set in ML. That list I actually put into my model and produce results for, and use later on to train the model.

After the model is trained, I need to pass the master list to it (as a numpy array) in order for it to predict all outcomes. That is why I need to also run all options of the sliders and produce the whole space of combinations.

I do that as a second step, right after I've finished designing the model. I disable the whole definition apart from these sliders so that there is no computational cost anywhere. Then I just run the fly component. That worked like a charm with 23000 combinations (1st case), it was at the 2nd hour of calculation before I left work yesterday. Now that is probably normal, since it's a lot of computation. I was just wondering if there's something faster.

Would using code like the one you posted be faster in any way, or different than what Fly does?

Regards,

Theodore.

Permalink Reply by Theodoros Galanos on November 29, 2016 at 7:24pm

Hi Mostapha,

Just an update: I came to the office and the Fly! component was finished. It too 264 mins to run 354816 different combinations. 4 hours isn't that bad but this quickly grows out of hand when having a few million of combinations.

Could this be related to the way I save the combination lists? Due to the way recorders save data, I use the old concatenate / text split trick to actually save each parameter value in one list which I later merge with entwine to create the data frame. This is the only way I have so far to create all combinations (concatanate/text split create one unique string at each iteration assuring all are created).

Is this taking too much processor time? Is there a better GH native way to save these lists? Is it even better if I have a code like yours above saving all combinations?

Thanks in advance!

Kind regards,

Theodore.

Permalink Reply by Devang Chauhan on December 15, 2016 at 9:30am

Hi Theodore,

I thank you for sharing your thoughts here and sparking this great discussion. So far, I have only tried optimization experiments such as finding a good balance between daylight and energy use. I have been reading about ML too, and I have one question for you.

In ML, the training set is typically used to derive the function that establishes the connection between the inputs and outputs. Now you said that after you design your model, you disable everything and then only run your sliders. From that, I infer that you have successfully caught a function. I like that idea very much. I am really curious to know how you catch that in GH?

Thanks,

-Devang

Permalink Reply by Sarith Subramaniam on November 29, 2016 at 4:33pm

Hi Theodore,

Since we are talking of building simulations, I assume that when you talk of ML you are referring to supervised learning (i.e. wherein we know what we are looking for and are not after patterns and such).

I am not convinced that Machine Learning is the way to go for deciding which (or how many) simulations should be run. Having studied a fair bit of ML as well as Design of Experiments (DOE) in grad school, I think a hybrid approach is much more efficient.

More specifically, initial simulations based on screening, factorial design etc. can help in identifying which simulations need to be run. This can be followed by running those simulations and then performing the "learning" tests (Actually you could forgo ML altogether and just tweak your simulations based on something like Response Surface Design).

I learnt ML with R and DOE with Minitab, but I know for a fact that the functionality needed for such hybrid studies can now be realized with Python using scikit and r.

Of course, all of this is easier said than done, but I am hoping that someone would take up this line of investigation. Dr. Susan Sanchez has published some very interesting research on this topic.

Sarith

Permalink Reply by Theodoros Galanos on November 29, 2016 at 4:55pm

Hi Sarith,

Thanks for the insight, it's really helpful! I have just begun on this trip so I have a lot to learn as I go.

Yes you are right this is supervised learning so I am not trying to discover patterns in data or anything like that but merely fill in the gaps let's say. This is by far the simplest way of incorporating ML into our field. While ML I feel is a bit overrated in general, the impact it can have to AEC models I feel is underrated.

I talk about parametric design a lot (mostly to myself lol) and I even try to practice it a bit on a personal and work level. But how many parametric designs are actually feasible? What percentage of iterations can we run in the 2 weeks of time we have to prepare the concept design report in a typical project. These are the questions that motivated me to find a way to make this a practical and useful exercise. ML is my first stab at this, and I'm in the very first steps of it too.

I also don't think I'll be using ML to decide which simulations to be run. My goal, with my limited knowledge on the subject, was to use in the typical way: train a model that can predict outcomes. I feel it's quite good at that, especially when it comes to effort (both in terms of real time and computational time).

I am indeed currently using scikit-learn which is an amazing library with (as always it seems in python) a pretty incredible community! I was initially using R, for different studies, and thought of starting there but I would eventually like to integrate this directly to environments like GH and Dynamo (in the future), so python (and possibly .NET) seemed a better way to start.

I have only skimmed the surface but there are quite a few developments in ML that seem, to my untrained eye, as having potential in reducing the dimensionality of our models in an efficient way (e.g. random search vs grid search). I plan to have a few million of alternatives on the third model to test these out. Will post my findings and the walls I hit here for those interested!

P.S.: Thanks for link I'll try to access a few papers even though without being a student it's a rich man's hobby lol!

Edit: Seems most of the research is available for free, that's awsome!

Kind regards,

Theodore.

Permalink Reply by Sarith Subramaniam on November 29, 2016 at 5:12pm

Hi Theodore,

Just a quick update. Bo Lin, one of my colleagues at Penn State (who now works with Leland!), actually did his dissertation on a similar theme. Although I don't think referred to those statistical methods as ML, but I think you will find a lot of similarity in concepts. I think his dissertation is open source: https://etda.libraries.psu.edu/catalog/27233

Sarith

Permalink Reply by Theodoros Galanos on November 29, 2016 at 7:19pm

Hi Sarith,

Thanks for the link, already added to the queue :) Starting to hate 24h days, need more hours!

Permalink Reply by Theodoros Galanos on December 1, 2016 at 10:05pm

Hi everyone,

Anyone knows of a better way to save parametric model results than the native GH recorder?

I am still using a variation of the pollination definition. But after I left my computer run around 3000 E+ simulations on my day off I came back to a Rhino crash which meant all (even if it was 99.99% of them) simulation data that were produced till then got lost..

I know this isn't yet very popular but I'd love the idea of a component or workflow that makes this a bit more efficient. Is something as mandane as streaming the output of the recorder in a .csv file going to work (as a safeguard to possible crashes)?

I'd love some ideas if someone has them!

Kind regards,

Theodore.