In this fifth and final post of the series on Spatial Clustering in C++, we’ll look at some of the final outputs produced and wrap up the discussion of techniques and codes used.
In the last post I presented some code that can be used to generate an Alpha Shape from a set of points, create it as an OGR Polygon, and output these polygons to MID/MIF files. I’ve since found that one of the functions used in that code, GetNextRing, is quite inefficient when the polygon has a complex perimeter (many points on its boundary), which happens with very small values for alpha. Still it wouldn’t be too hard to speed that code up.
It’s interesting to generate some different Alpha Shapes with varying of the alpha parameter. Recall that alpha corresponds to how tightly the polygon is “shrink wrapped” around the points it encloses; it is actually the radius of circles that define the “bites” taken out around the perimeter of the shape. If alpha is too high, one gets a very approximate shape that starts to approach the convex hull of the points, while if alpha is too low, the shape starts to get very “spiky”, and eventually one starts to lose whole parts of the polygon completely. The good news is that the polygons produced are not too visually dissimilar if you stay within reasonable bounds for alpha – that is, they are not overly sensitive to its value.
Let’s have a look with the San Francisco bay area as our example. Here I’m using an eps in the DBSCAN of 0.009 with minPts = 8, which gives the following clusters:
And for the Alpha Shapes, an alpha of 2500 gives the following:
In the next picture you can see how the shapes represent the points (note also the hole in the large green cluster – the Alpha Shapes code we’re using doesn’t allow for actual holes within the shapes produced):
Here I’ve used Bing Aerial in MapInfoPro to give the shapes produced some spatial context, and I’ve used 33% transparency on the polygon layer (use Layer Properties). Incidentally you can use Style Overrides to give patterned output as well, such as:
With alpha 5000, the results are very similar, though some of the “holes” along the polygon boundaries have now been “filled in”:
The colours of the shapes are different here just because of the funky unique RGB generator function I’m using which depends on the creation index of the polygon; the clusters are the same, only alpha is different.
Even with alpha = 20000, the shapes are still good visual approximations of the points:
With 50000 the shapes are still OK but some big holes have now been filled in:
And with alpha 5000000 one starts to get nearer to the convex hull:
With smaller values of alpha, such as 1000, the shapes are similar to alpha 2500. Below that, however, the Alpha Shapes algorithm starts to produce multiple, very spiky (many points defining a complex boundary) polygons for each cluster, such as this example with alpha 500:
These anemic shapes no longer approximate the points very well.
With larger data sets just showing the clustered points gets slower, so you could potentially also make a geoset and use zoom levels, with polygons shown at higher zoom levels, then show the clustered points when zoomed in closer.
In terms of output, I’ve found Tab2tab is a useful little program to convert between Mid/Mif and Tab formats, though it is not without its problems: older versions have a loss of precision, and current versions appear to lose the colours of the polygons (regions in the parlance). One could potentially also write out the polygons or clusters using OGR functions directly in the output format desired. And if you don’t have MapInfoPro, QGIS is a good open source alternative.
If you’ve made it this far, then I hope you’ve enjoyed this five part blog post. I’ll leave you with a summary of the tools and algorithms that have been deployed in this exercise:
What it doesNotes/shortcomings Display and edit spatial dataCommercial product, but very good Data structure for storing 2 dimensional pointsAllows fast lookup of nearby points. Good C++ implementation here. Library for computational geometryGood utility functions for reading and writing points and polygons Algorithm for spatial clusteringSimple to implement; previous blog posts in this series show how to make this run really fast. Shape containing a set of pointsNot a good approximation to spatial clusters, but easy and fast to compute e.g. with OGR functions Old style C code for producing Alpha Shapes (polygons representing sets of points)Slightly dodgy code, doesn’t produce polygons with holes inside, but runs OK if called as a black box executable. Map data conversion utilityOlder versions lose precision; new versions lose polygon style/colour information Open source alternative to MapInfoProSlower and not as feature rich as MapInfoPro
|TIGER spatial data||Source for free US spatial data||In this exercise, used as a data source of points for spatial clustering||MapInfo||kdtree||OGR||DBSCAN||Convex Hull||Clarkson code for Alpha Shapes||Tab2Tab||QGIS (Quantum GIS)|