Grasshopper

algorithmic modeling for Rhino

# K-Means

Hi all,

Here is an implementation of K-Means for grasshopper I wrote for an aspect of my thesis.

What does it do? K-Means Algorthims are a automated way of grouping data. For now my implementation only works on points but other types should work in theory. The VB script is uses alot of loops and i am sure that is far from optimzed.

Check out the K-Means Page on wikipedia

And the general page on data clustering...

The code goes like this:

Private Sub RunScript(ByVal P As DataTree(Of Point3d), ByVal N As DataTree(Of Integer), ByVal D As Double, ByRef G As Object)
Dim t As Integer = 0
Dim i As Integer = 0
Dim j As Integer = 0
Dim k As Integer = 0
Dim numGroups As Integer
Dim groupNum As Integer
Dim X As Double
Dim Y As Double
Dim Z As Double
Dim minDist As Double
Dim dist As Double
'Dim cluster As Integer
'initialize random number generator
Dim r As New Random(System.DateTime.Now.Millisecond)
Dim goupings As New List(Of List(Of Integer))
Dim oldCentroids As New List(Of Point3d)
Dim newCentroids As New List(Of Point3d)
Dim pathList As New List(Of GH_Path)
Dim newCentroid As Point3d
Dim pt As Point3d
Dim isStillMoving As Boolean
Dim groupingsTree As New DataTree(Of Point3d)
Dim inputpts As New List(Of Point3d)

For t = 0 To P.BranchCount - 1

'this mess dose data matching for inputs P and N
'this if is ture is the input branches are equal
If N.BranchCount >= P.BranchCount Then
'check to see if, in the case the branches are equal, that there is only one item in each branch of N
If N.Branch(t).Count = 1 Then
numGroups = N.Branch(t)(0)
'I don't know what to do if the above statment isn't true so let's exit the program
Else
Print("Error: data missmatch between inputs P and N, Please reformat inputs")
Exit Sub
End If
'here we check if it is a simple list...
Else If N.Branch(0).Count = P.BranchCount
numGroups = N.Branch(0)(t)
'here we check if it is a simple list but is shorter or longer than the tree P
Else If N.Branch(0).Count < P.BranchCount Then
If t <= N.Branch(0).Count - 1 Then
numGroups = N.Branch(0)(t)
Else
numGroups = N.Branch(0)(N.Branch(0).Count - 1)
End If
'if all else fails exit the program!
Else
Print("Error: data missmatch between inputs P and N, Please reformat inputs")
Exit Sub
End If

inputpts = P.Branch(t)
print("--------Accessing Branch " & t & "--------")
'***************** Initialize K-Means Algorithm ****************************
print("--------Initializing " & numGroups & " groupings-------")

For i = 0 To numGroups - 1
'Dim pt As Point3d
'Dim group As New List(Of Integer)
Dim tempPath As New GH_Path

tempPath.FromString(t & ";" & i)
print("creating grouping " & i)
Next
'radomly assign points from the input list (intialze the k-mean groups)
print("--------Initializing group assinments (random)-------")
For j = 0 To inputpts.Count - 1
groupNum = RandomNumber(r, numGroups - 1)
print("Adding index value " & goupings(groupNum)(goupings(groupNum).Count - 1) & " to group " & groupNum & ".")
Next

'***************** Do K-Means Algorithm ************************************
print("--------entering main k-means loop--------")
isStillMoving = True
Do While isStillMoving
isStillMoving = False

'find new centroids
print(".....finding new centroids")
For i = 0 To numGroups - 1
oldCentroids(i) = newCentroids(i)
For j = 0 To goupings(i).Count - 1
pt = inputpts(goupings(i)(j))
X = X + pt.X
Y = Y + pt.Y
Z = Z + pt.Z
Next
X = X / (j + 1)
Y = Y / (j + 1)
Z = Z / (j + 1)
newCentroid.X = X
newCentroid.Y = Y
newCentroid.Z = Z
newCentroids(i) = newCentroid
print("new centroid for group " & i & " is " & newCentroids(i).ToString)
goupings(i).Clear
dist = newCentroid.DistanceTo(oldCentroids(i))
'print("dist = " & dist)
If dist > D Then
print("need to do more work")
isStillMoving = True
End If
Next

'create new groupings
print(".....creating new groupings")
For j = 0 To inputpts.Count - 1
pt = inputpts(j)
minDist = pt.DistanceTo(newCentroids(0))
For i = 0 To numGroups - 1
dist = pt.DistanceTo(newCentroids(i))
If minDist >= dist Then
groupNum = i
minDist = dist
'print(minDist)
End If

Next
'Print("Adding index value " & j & " to group " & groupNum & ".")
If isStillMoving = False Then
End If
Next

If isStillMoving = False Then
print("***************all groups setled exiting loop********************")
print("***************total loop count is " & k & "************************")
End If

'a hard limit on the number of loops to execute....
If k > 1000 Then
isStillMoving = False
End If

k = k + 1
Loop
goupings.Clear
newCentroids.Clear
oldCentroids.Clear
pathList.Clear
Next
G = groupingsTree
End Sub

'function taken from http://www.freevbcode.com/ShowCode.Asp?ID=4451 (and slightly modified)
Public Function RandomNumber(ByVal r As Random, ByVal MaxNumber As Integer, Optional ByVal MinNumber As Integer = 0) As Integer

'if passed incorrect arguments, swap them
'can also throw exception or return 0

If MinNumber > MaxNumber Then
Dim t As Integer = MinNumber
MinNumber = MaxNumber
MaxNumber = t
End If

Return r.Next(MinNumber, MaxNumber)

End Function

... So any thoughts?

Views: 11081

Attachments:

### Replies to This Discussion

Very interesting work. Thank you for sharing.

I was wondering if this might be something that could work with a feature I was thinking of developing in Kangaroo.

Instead of trying to tile a freeform surface with identical panels (which in general is not possible), I'm looking at whether one could work with a small set of tile shapes (say 6 triangles of different proportions), and have all the panels (starting off all being different) 'snap' their proportions to the closest one in that set.

Maybe this technique could be useful in determining which 6 triangles will best match the existing variations.

I'm not sure I see how this would work exactly but tell me if i understand the broad strokes.

1. Take triangles and group them into 7 groups
2. (somehow) reintegrate these triangles into the form
3. measure the deviation to the original form
4. use info in 3 to influence 1 and repeat

Yes, that's pretty much what I was thinking of.

So at each iteration every triangle would try and adjust itself to become a little closer to one of the 7 target triangles, they would all interact and pull on each other, and redistribute slightly across the surface.

Making the new groups (and taking the average of each group as the target for the members of that group) might only need to be something that happens every 100 iterations or so, depending on how heavy it is to calculate.

Does this sound like something where this would be applicable ?

I know that these algorithms are used for cluster analysis on non-geometric scientific/marketing data so this application (abstract generalization of geometry) should work. The problem will be in figuring out how the triangles could be comparable to each other. It seems like it could be a fit.

Yea that is how projects like the Carlos Slim Museum have been rationalised. Basically after panelising the surface every panel was unique, so they used k-means clustering to identify families of panels. In the end they had something like 21 panels, although there was a control gap between the panels so they didn't have to touch edge to edge.

The pseudo-code for grouping panels into n groups is:

1. Randomly choose n panels, these will be the representative of each group.

2. For the rest of the panels, assign them to the group where the representative most closely matches the panel (this algorithm can be tricky, but something like difference between points sometimes works)

3. For each group find the most representative panel (the average panel of that group)

4. Go back to 2 and repeat until the representative of the groups is no longer changing.

It is certainly a practical way to make families of panels after the surface has been panelised. I have not seen anyone do it while relaxation is happening, but if there is one person to make it happen, Daniel Piker is it!

Thanks Daniel for the explanation, this is very helpful.

Were you involved in that project ? or is a description of this process published somewhere ?

Hi Daniel,

I don't think it is published, but Alex Pena was involved in the project when he was working for Gehry Technology. The clustering was done in Excel and it took hours to run, but I think if you did it properly (and didn't run it on thousands of panels) it could be quite efficient.

Daniel

Daniel,

I've gotten pretty far with this using Kangaroo as you were proposing!  Problem is I don't have an adequate "spatial" description of triangle uniqueness to get effective k means clustering output (see column 2, row 2).

I have been working with various combinations of triangle sides and angle combinations.  It's a bit tricky but at least it's pretty easy to see what's going on.  Still pondering...

Hey Taz,

Very cool to see! And it looks like you are getting some reasonable results.

Yea the spatial description is a tricky one - k-means is a classic garbage-in, garbage-out algorithm. The description will depend on what you are trying to achieve, but for most manufacturing purposes you need to know the maximum distance a panel deviates from its cluster's prototype panel. Assuming you have access to the k-means algorithm you could measure this distance as:
1. Align centroid of the candidate panel with the centroid of the prototype panel.
2. Rotate the candidate until a vertex on this panel is in line with a vertex on the prototype.
3. Measure the distance between vertexes on the panel - the highest being the distance for this orientation.
4. Rotate the candidate panel into the two other possible positions - the lowest orientation distance being the overall distance the candidate panel deviates from the prototype.
5. Repeat with all the other panels.

Daniel

Hi Daniel D,

First off, thanks for posting your k-means definition.  It's been a great help!

I was actually vaguely following your pseudo-code to work out a semi-manual GH/Kangaroo routine that I was hoping would lead to convergence with a discrete number of unique panels.  I think, in addition to your advice above, I need to rethink my approach.

This is what I have implemented:

1. Group all triangles into n groups.

2. Derive representative (average) triangle for each group.

3. Repopulate surface with corresponding avg triangle

4. Formfind with Kangaroo to regenerate base mesh

5. Repeat until triangle vertices are within tolerance.

I'll post some action photos of problem areas, but basically when I don't get a good grouping the avg tri doesn't fit very well!

Humpty dumpty (taz - step 1)

n = 6

Humpty dumpty back together again (taz - step 3)

Note the darker shade of a group color means the avg tri has been flipped for better alignment.  The reason why is easier to see with the pink isosceles tris.

Hello Taz, would you mind to share the definition for this work?

by June Lee