GHPython Some Bottlenecks (Bugs?)

Hi Giulio and Steve,

I've recently noticed a couple of bottleneck with the GHPython component which can quite severely impede on performance. Thought I would bring them up here so as to hopefully help others facing similar issues.

1) Letting Grasshopper perform the "implied" loop can be substantially slower than making the loop yourself inside the Python script. This is understandable, however the strangest thing is that it is MUCH slower if the definition has been saved than when it has not (by about a factor of 10)!

2) Setting type hints seems to be slower than inputting data with "No Type Hint". This depends a bit on which type is being input, but this seems to be fairly consistent. In the attached example by about a factor of 3. I suppose this is understandable, but not exactly ideal.

3) Outputtings lists with many items will often take longer than the actual computation performed by the script. I suppose this is more of a Grasshopper thing. My workaround has been to wrap the list in a Python list and pass this along as an item, which will be ALOT faster with large lists (this was crucial to both the Tower and ShapeOP where we pass around large amounts of constraints).

4) Calling certain RhinoCommon methods appear to be randomly much more expensive than using the C# scripting component. For instance, when iterating over a mesh's vertices and calling Mesh.Vertices.GetConnectedVertices() the elapsed time sum of these calls seem to be comprised of only a few vertices which randomly change every time the script is run. The amount of vertices differ on different machines, but the pattern remain consistent.

I'm not sure if these bottlenecks are just examples of me being dumb, if so I hope you can enlighten me to the errors of my ways :)

Attached some screenshots of an unsaved/saved definition which demonstrates the described issues. Also please find the gh definition attached.

Best,

Anders

Edit: Logged this on Github here.

Update: Added point 4), new screenshot and file demonstrating this behaviour.

System:

Rhino
Version 5 SR11 64-bit
(5.11.50226.17195, 02/26/2015)

Grasshopper
0.9.0073

GHPython
0.6.0.3

Laptop

Attachments:

150429_GHPython_SomeBottleNecks.gh, 10 KB
150810_RhinoCommonCallsFromGHPythonRandomlySlow.gh, 64 KB

Replies to This Discussion

Permalink Reply by Giulio Piacentino on June 4, 2015 at 4:35am

I just mean that, for generators and sequences, unless you want to use their content, the creator function is not called. It is what is known as "lazy". This is similar to the difference between range() and xrange() in Python 2.

If you pass the content of a generator to an output variable, GhPython will notice that it is iterable, it will buffer the data for GH, and it will call the generator function for each item. At that point, all items will be stored in memory. Before, they were just "potential" for items.

Does it make sense?

Giulio
--
Giulio Piacentino
for Robert McNeel & Associates
giulio@mcneel.com

Permalink Reply by Anders Holden Deleuran on June 4, 2015 at 4:40am

It does indeed, thanks again :)

Permalink Reply by Arend on June 3, 2015 at 5:24am

My experience is that setting the type to DataTree instead of list without typehint can also shave off a few seconds.

Permalink Reply by Anders Holden Deleuran on June 3, 2015 at 5:40am

Thanks Arend, do you a an example file demonstrating this behaviour? Would be interesting to figure out why..

Permalink Reply by Ángel Linares on June 4, 2015 at 4:16am

Anchoring to the post :)

Permalink Reply by James Ramsden on June 13, 2015 at 7:12pm

Thanks Anders and Giulio for this interesting discussion. I found something similar when using the 'double' type hint versus the 'object' hint when using the C# component - I got a huge speed increase when I manually did the cast myself.

I didn't know the reason at the time, and it seemed counter-intuitive that providing the component with more information would result in it working slower!

http://www.grasshopper3d.com/forum/topics/setting-data-hints-on-c-c...

Permalink Reply by Ángel Linares on June 19, 2015 at 6:04am

Perhaps it could be interesting having two kinds of type hint: "hardcoded" and "softcoded"; the first make the usual GHish type hint converting between types when necessary, the second just let you setup a normal type hint without any check or conversion, raising errors in the script component when necessary telling you that that conversion was not possible or a typically raise TypeError.

This will just make your code a little bit cleaner, but nothing else...

Permalink Reply by Ángel Linares on June 19, 2015 at 6:09am

And reading the replies I saw Anders saying something about a compiled ghPy component in the WIP. Is there any documentation available about it?

Permalink Reply by Ángel Linares on June 19, 2015 at 8:01am

I actually found it :) Performing some tests.

Permalink Reply by Anders Holden Deleuran on August 10, 2015 at 4:49am

Update: Added another bottleneck. Point number 4: Calling certain RhinoCommon methods appear to be randomly much more expensive than using the C# scripting component.

Permalink Reply by Giulio Piacentino on August 10, 2015 at 5:20am

Hi Anders; there is nothing strange with any call (not only RhinoCommon calls) being slower in Python than C# -- Python is a dynamic, intrinsically evaluated, language that focuses on readability, not on execution speed. You could make 4. slightly faster in Python if you made it evaluate less (in this case, compiling could understand that one could bring mv = M.Vertices outside the loop). But the overall timing will likely still be larger in Python, increasing the likelihood that a single instance will exceed E, your boundary. For this reason, number 4. is not really a bottleneck in the stricter sense. This is really just comparing (Iron)Python and C#.

Please note also that EllapsedMilliseconds does not make much sense in profiling when not averaged. This is due to the garbage collector kicking in, system interrupts, etc.

Permalink Reply by Anders Holden Deleuran on August 10, 2015 at 5:45am

Thanks Giulio. I'm well aware of the reasons why Python generally would be slower. The thing I found strange was that a single call (or a few calls) can be MUCH larger than the other calls in the loop, and, that this expense appear to change randomly (i.e. it is not the same vertex for example). Like in this case where two calls out of 2158 calls seem to make up of the bulk of the computation time (approach 16 ms each):

But I guess then that these numbers might not be that trustworthy due to the reasons that you mention (garbage collector kicking in, system interrupts, etc). Is that correct?