Grasshopper

algorithmic modeling for Rhino

Hi all,

Here is an implementation of K-Means for grasshopper I wrote for an aspect of my thesis. What does it do? K-Means Algorthims are a automated way of grouping data. For now my implementation only works on points but other types should work in theory. The VB script is uses alot of loops and i am sure that is far from optimzed.

Check out the K-Means Page on wikipedia

And the general page on data clustering...

The code goes like this:

Private Sub RunScript(ByVal P As DataTree(Of Point3d), ByVal N As DataTree(Of Integer), ByVal D As Double, ByRef G As Object)
Dim t As Integer = 0
Dim i As Integer = 0
Dim j As Integer = 0
Dim k As Integer = 0
Dim numGroups As Integer
Dim groupNum As Integer
Dim X As Double
Dim Y As Double
Dim Z As Double
Dim minDist As Double
Dim dist As Double
'Dim cluster As Integer
'initialize random number generator
Dim r As New Random(System.DateTime.Now.Millisecond)
Dim goupings As New List(Of List(Of Integer))
Dim oldCentroids As New List(Of Point3d)
Dim newCentroids As New List(Of Point3d)
Dim pathList As New List(Of GH_Path)
Dim newCentroid As Point3d
Dim pt As Point3d
Dim isStillMoving As Boolean
Dim groupingsTree As New DataTree(Of Point3d)
Dim inputpts As New List(Of Point3d)

For t = 0 To P.BranchCount - 1

'this mess dose data matching for inputs P and N
'this if is ture is the input branches are equal
If N.BranchCount >= P.BranchCount Then
'check to see if, in the case the branches are equal, that there is only one item in each branch of N
If N.Branch(t).Count = 1 Then
numGroups = N.Branch(t)(0)
'I don't know what to do if the above statment isn't true so let's exit the program
Else
Print("Error: data missmatch between inputs P and N, Please reformat inputs")
Exit Sub
End If
'here we check if it is a simple list...
Else If N.Branch(0).Count = P.BranchCount
numGroups = N.Branch(0)(t)
'here we check if it is a simple list but is shorter or longer than the tree P
Else If N.Branch(0).Count < P.BranchCount Then
If t <= N.Branch(0).Count - 1 Then
numGroups = N.Branch(0)(t)
Else
numGroups = N.Branch(0)(N.Branch(0).Count - 1)
End If
'if all else fails exit the program!
Else
Print("Error: data missmatch between inputs P and N, Please reformat inputs")
Exit Sub
End If

inputpts = P.Branch(t)
print("--------Accessing Branch " & t & "--------")
'***************** Initialize K-Means Algorithm ****************************
print("--------Initializing " & numGroups & " groupings-------")

For i = 0 To numGroups - 1
'Dim pt As Point3d
'Dim group As New List(Of Integer)
Dim tempPath As New GH_Path

tempPath.FromString(t & ";" & i)
print("creating grouping " & i)
Next
'radomly assign points from the input list (intialze the k-mean groups)
print("--------Initializing group assinments (random)-------")
For j = 0 To inputpts.Count - 1
groupNum = RandomNumber(r, numGroups - 1)
print("Adding index value " & goupings(groupNum)(goupings(groupNum).Count - 1) & " to group " & groupNum & ".")
Next

'***************** Do K-Means Algorithm ************************************
print("--------entering main k-means loop--------")
isStillMoving = True
Do While isStillMoving
isStillMoving = False

'find new centroids
print(".....finding new centroids")
For i = 0 To numGroups - 1
oldCentroids(i) = newCentroids(i)
For j = 0 To goupings(i).Count - 1
pt = inputpts(goupings(i)(j))
X = X + pt.X
Y = Y + pt.Y
Z = Z + pt.Z
Next
X = X / (j + 1)
Y = Y / (j + 1)
Z = Z / (j + 1)
newCentroid.X = X
newCentroid.Y = Y
newCentroid.Z = Z
newCentroids(i) = newCentroid
print("new centroid for group " & i & " is " & newCentroids(i).ToString)
goupings(i).Clear
dist = newCentroid.DistanceTo(oldCentroids(i))
'print("dist = " & dist)
If dist > D Then
print("need to do more work")
isStillMoving = True
End If
Next

'create new groupings
print(".....creating new groupings")
For j = 0 To inputpts.Count - 1
pt = inputpts(j)
minDist = pt.DistanceTo(newCentroids(0))
For i = 0 To numGroups - 1
dist = pt.DistanceTo(newCentroids(i))
If minDist >= dist Then
groupNum = i
minDist = dist
'print(minDist)
End If

Next
'Print("Adding index value " & j & " to group " & groupNum & ".")
If isStillMoving = False Then
End If
Next

If isStillMoving = False Then
print("***************all groups setled exiting loop********************")
print("***************total loop count is " & k & "************************")
End If

'a hard limit on the number of loops to execute....
If k > 1000 Then
isStillMoving = False
End If

k = k + 1
Loop
goupings.Clear
newCentroids.Clear
oldCentroids.Clear
pathList.Clear
Next
G = groupingsTree
End Sub

'function taken from http://www.freevbcode.com/ShowCode.Asp?ID=4451 (and slightly modified)
Public Function RandomNumber(ByVal r As Random, ByVal MaxNumber As Integer, Optional ByVal MinNumber As Integer = 0) As Integer

'if passed incorrect arguments, swap them
'can also throw exception or return 0

If MinNumber > MaxNumber Then
Dim t As Integer = MinNumber
MinNumber = MaxNumber
MaxNumber = t
End If

Return r.Next(MinNumber, MaxNumber)

End Function

... So any thoughts?

Views: 10747

Attachments:

### Replies to This Discussion

I'm very curious to try your work. Thanks. Paolo

Interesting stuff. I can see an immediate application for this in Galapagos for finding sub-species in a population :)

--

David Rutten

david@mcneel.com

Thanks Guys.

David, you might want to consider the c-means process for Galapagos since it utilizes fuzzy logic. This means that elements can have degrees of belonging-ness to groups. My goal is to implement k-means, k-means++, c-means and qt (which is apparently computationally expensive) and use them to learn how to create a gha library.

For now I wonder if you see anything that is very obvious about how i did it which is bad since i have gotten to the point were i can hack together vb code but i suspect i am doing things in fairly ugly ways...

There's a lot of code there to review. I can make some superficial comments but I haven't really looked it through.

1) Why are you operating on DataTree(Of Point3d)? Wouldn't it be much easier to just work on lists? There may be a good reason for this, I just cuoldn't work it out while skimming the code.

2) I'd recommend declaring variables at the last possible moment, not all at the top of the file. It makes it very difficult to see what variable is used where that way. Also, if you change code, it's a lot of work figuring out what variables just became obsolete.

3) In VB.NET you can declare for loop iteration variables inside the loop, cleaning up the code: For t As Integer = 0 To X

4) If statements with conditionals should not be written like this: If (value = False) Then. There's nothing technically wrong with it, but the general rule is to write If (Not value) Then or If (value) Then.

5. Things like k = k+1 can be written shorter in VB.NET, namely k += 1. I just think that looks cooler :)

6. In VB.NET, Exit Sub is still legal (for legacy purposes) but the Return keyword is to be preferred.

7. I'm happy to see you're using sensible variable names and casing.

8. For a program like Grasshopper, one would expect to get the same results when the same setup is run at a later time. That means creating Random instances with a fixed seed value, not DateTime.Now.Millisecond. If your result depends in any way on the seed value, it should be kept constant.

On the whole pretty good work, code is quite self-documenting, properly commented and fast. Hats off.

--

David Rutten

david@mcneel.com

Oh, and a personal rule I break myself all the time: "Never make functions longer than a single screen. If I have to scroll up and down to see the whole function, it's much harder to get a feel for the flow."

--

David Rutten

david@mcneel.com

Hve you see Clean Code by Robet C Martin? http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp...

What's amazing is that when you click to look inside for this book on amazon it gives you the whole text not just the start.

I got it working with a list(of point3d) but I couldn't extend it to work distinct groupings of points in data trees as input. The output would always be a single tree depth long (it should have been {P-BranchCount;N-GroupCount} but was always only {N-GroupCount}. I think this has to do with the fact that i take a given list and split it into multiple trees for output When i get home i'll post the version that used lists and a version cleaned up with your suggestions.

One thing i think i will attempt is the use of arrays in places. I understand they are somewhat faster than lists. Also, with the exception of GH_Path and DataTree i have no idea how to use your datatypes :-(

Forget Arrays. Stick to Lists. The performance benefits are so incredibly small that it's just not worth it. The only time you should be considering arrays is when writing an SDK that other people will use or when you need to interop with unmanaged code, or if for some reason, you need to declare so incredibly many of them that memory consumption becomes an issue.

Micro-optimization is tempting but almost always disappointing in the end. If you want to make code faster, there's two things you should always start with:

1) Profile your code. Don't ever assume a specific operation is slow. Measure it. You can use the System.Diagnostics.StopWatch class to accurately measure duration (it's better than using the OS timer that is accessed via DateTime.Now, read this blog post if you want to know more about the accuracy of timers).

2) Macro-Optimize. Instead of making operations faster by switching to Arrays or removing a variable by combining two lines of code into one, try and figure out how you can change your algorithm by doing less work. Not doing the same amount of work faster.

Lastly, the big problem with optimizing code is that it is difficult. And you should remember it's always more difficult to debug code than it is to write code. So if you write code on the very limit of your ability, you won't be able to debug it.

--

David Rutten

david@mcneel.com

This is another implementation that uses C# in Grasshopper. It works on any type of object uses an arbitrarily long list of data associated with the object to cluster it.

http://parametricmodel.com/K-meansclustering/16.html

ops, well yes so it is... oh well at least i learnt something in making it.

(Keys) my focus was on sorting geometry... don't understand the avantage key's method but i need to look at it more...

I don't mean it in a bad way. It is useful to have more than one way to do something, I am sure some people will find VB much more friendly. I just thought it would be useful to have the C# and VB method together in one thread if someone is searching.

good point.

I now think i understand what you were doing with yours. The keys alow you to specify a centroid (or any arbitrary point really) and then sort the data based on this. This strikes me as similar to the 'co-sort' capablity avaliable in david list sort componet. I would ultimately like to make it so that it could take any abitrary sort of geometry and find a center point directly from it and then create a varialbe lenght input for the co-sort just as david's componet does.

by June Lee

by June Lee

• View All