algorithmic modeling for Rhino
Hi Giulio and Steve,
I've recently noticed a couple of bottleneck with the GHPython component which can quite severely impede on performance. Thought I would bring them up here so as to hopefully help others facing similar issues.
1) Letting Grasshopper perform the "implied" loop can be substantially slower than making the loop yourself inside the Python script. This is understandable, however the strangest thing is that it is MUCH slower if the definition has been saved than when it has not (by about a factor of 10)!
2) Setting type hints seems to be slower than inputting data with "No Type Hint". This depends a bit on which type is being input, but this seems to be fairly consistent. In the attached example by about a factor of 3. I suppose this is understandable, but not exactly ideal.
3) Outputtings lists with many items will often take longer than the actual computation performed by the script. I suppose this is more of a Grasshopper thing. My workaround has been to wrap the list in a Python list and pass this along as an item, which will be ALOT faster with large lists (this was crucial to both the Tower and ShapeOP where we pass around large amounts of constraints).
4) Calling certain RhinoCommon methods appear to be randomly much more expensive than using the C# scripting component. For instance, when iterating over a mesh's vertices and calling Mesh.Vertices.GetConnectedVertices() the elapsed time sum of these calls seem to be comprised of only a few vertices which randomly change every time the script is run. The amount of vertices differ on different machines, but the pattern remain consistent.
I'm not sure if these bottlenecks are just examples of me being dumb, if so I hope you can enlighten me to the errors of my ways :)
Attached some screenshots of an unsaved/saved definition which demonstrates the described issues. Also please find the gh definition attached.
Edit: Logged this on Github here.
Update: Added point 4), new screenshot and file demonstrating this behaviour.
In case anyone is following this:
The same issues as described above are seemingly also present with the C# component. Although point 1) is to a vastly lesser degree than with GHPython. Note that there may be some inefficiencies with the C# code I wrote (noobz). Never the less, the results seem to suggest that one might be better off avoiding both the implied Grasshopper loop, type hints and "large" output parameters if one is concerned with performance.
I've been looking into developing my models to compute as fast as possible, specifically when implementing iterative solvers and when searching the design space. Both cases where even small speed gains can be instrumental to the value of a design model. If you're dabbling with similar issues I'd be interesting in hearing if I'm the only only noticing these issues. For instance, are the same issues present with a compiled C# component?
Attached an updated definition with all the examples. Here's an image:
thanks for this performance review. I have to say, how could one not at least be following this discussion, with all the insider knowledge you have about GhPython and more in general scripting in Grasshopper?
As a general recommendation for developers in Grasshopper who are writing a part of their library which is performance-sensitive (please note: often the performance sensitive part is very limited) is to write it in C#, or maybe even C, or maybe even assembly :). Of course, the closer to the machine you will be, the easier it will be to harness all minimal optimizations. However, there is always a compromise between "getting things done" and "making them best" and this boundary is not very easy to catch, right?
If you want to have significant speed improvements for numerical calculations, I would at least recommend developing with C# in a compiled component using Visual Studio or SharpDevelop. The reason is: in order to provide the line number of possible errors, Grasshopper compiles C# scripts in debug mode! They will be much less optimized than what is possible even with today's technology. This does not preclude keeping the project open-source, if that is one of your goals.
Regarding the actual list:
1) Yes, the implied loop will probably be slower than just a simple for loop. This is because Grasshopper code has to keep track of more things than the ones you could be considering with your knowledge of of your very-special case. However, a factor of 10 is simply not acceptable and is likely a symptom of something else. In fact, I think I remember fixing a bug around that in Rhino WIP. However, it appears to be still slower also there. I've added a bugtracking item here.
2) If you are able to do all casts that are involved, and do them as Grasshopper does, please write that code that way. For example, if you supply a curve to an input with number hint, Grasshopper computes the length of the curve. There will have to be an "if" that checks if the input is a curve somewhere (or some similar construct). This aid for designers is what slows down the hint input.
3) Grasshopper has to keep side effects at bay. For example, components B and C are both connected to outputs of A. If you edit data in component B, and that data came from A you of course expect that data to be unchanged in C. This means that, for even lists of numbers, Grasshopper has to perform a deep copy of the output for each input. Otherwise, what happens if B sorts the list and C finds the index of the smallest number? This could be improved if GH components had some way of flagging themselves as non-data-mutating (constant). The fact that, by supplying special types, Grasshopper has no way of performing copies will likely speed things up. But be aware of possibly very annoying side effects creeping in if data is not immutable. Another option is performing the copy "optimally", just where you need it, because you know where your data is used. This is not information that is available to GH at present.
Does this help?
Thanks again for your input,
for Robert McNeel & Associates
That's terrific, thanks Giulio!
Also many thanks for the breakdown above. That really clarifies the issues. I'm still a bit puzzled by 2). I would have thought that explicitly declaring the type (using "Type hint") would be faster than "No Type Hint" (which I suppose is essentially a form of Duck Typing). But I guess that is simply due to the small cost of casting which accumulates and become more apparent when performed on large input lists. The other two points are very clear and understandable. It's interesting to hear that compiled components will be faster. Do you think this is also the case for the new compiled GHPython component in the Rhino 6 WIP?
Your general advice on performance also make perfect sense. This was what initially made me run these tests. And the fact that by making just a few simple alterations to a scripting component you can make it substantially faster (whether C# or Python), seemed like important information to share. Hehe, with regards to which language to implement I try to adhere to the Rule of Least Power, which as you correctly mention is more often than not about productivity before optimization :)
I have previously been testing whether or not it is worth it to implement things like Numpy/Scipy for numerically intense calculations, but have since dropped it due to its incompatibility with 64 bit IronPython. We have also been experimenting with using ctypes to call compiled C++ code which actually works quite well (although debugging can be a hassle). I suppose in the end it'll likely be much less of a headache to simple go with C# for this kind of work and use libraries from the .NET world. It is a bit of shame though, what with the large amount of potentially very useful Python modules from the world of scientific computing (which seem to all rely on Numpy).
Hi again Anders
just to clarify that point 2) above: by itself, with "No Type Hint", GhPython does not need to do anything. With any special hint, it has to check if the type is correct, or, if it is not, it has to try to modify that type to the matching type. This is similar to the coerce_xxx() functions in rhinoscriptsyntax. Like always, some optimizations might be possible, but I heavily suspect that "No Type Hint", being a no-op, will always be faster. Duck typing is independent of this, and will still be available in both modes: the hint modifies the type while the object is passed into the script, then it will follow normal Python conventions.
About NumPy, I am hearing mixed reports - maybe contacting the people over at enthought will us getting a better idea about where that project is going?
for Robert McNeel & Associates
Thanks Giulio, thanks makes it perfectly clear.
Indeed, there's been several threads about Numpy here and over on the Discourse board. I also got it working on 32 Bit Rhino last year, but have been hesitant to go any further since it appeared that Enthought have dropped development on SciPy and NumPy for .NET entirely. With both IronPython and .NET being open source now, perhaps there's more incentive for them to pick up the project again. My impression was that it was dropped in the first place after MicroSoft abandoned/set free IronPython. Anywho, certainly worth it asking them what's going on :)
A small update to these tests: I looked further into how these types of components (i.e. calling a function for each item in a large list of items) could be made even faster. Turns out that using Python generators for looping over the input list (instead of GH loop, for loop or list comprehension) is substantially faster. Like, by a lot! Note that there are several disadvantages to using generators, but for this type of pattern it does seem like a potentially good (i.e. fast) approach. See attached file for an example.
Wow!! Thanks for sharing this Anders!
Getting a list out of generator (for the last component) is still faster in comparison with others:
a = (foo(n) for n in Numbers)
b = list(a)
No worries Djordje, seemed like rather significant information for anyone using GHPython (especially the points in the initial post). In the tests I've been running it looks like the computation cost of casting a generator to a list is roughly equivalent to using list comprehension in the first place, but I may be wrong about that.
Just enabled the GH loops in the example, quite the speed boost:
Edit: Screwed up the file. Attached correct one.
Edit2: Also note that Guilio's V6 fix above should make the GH loop substantially faster.
If the content of the generator is actually created, it is still fast, but a little less.
The reason is that just by creating a generator, the function is not called for each item. Does it make sense?
for Robert McNeel & Associates
Thanks Giulio. When you say "actually created" what does that imply? I learned that you can return the generator as an output parameter and it will shoot out all the lines. Guess this is because its has the __iter__ method? Although this of course negates the performance improvements as the gain is relatively small compared to the cost of outputting the list of lines (in this case). Anywho, guess I'll have to do some more Googling :)
Will hopefully get access to the Rhino WIP when I'm back in Copenhagen next week. Looking forward to it.