In previous posts we saw how to run the DBSCAN spatial clustering algorithm over a set of spatial data points: in our example, all block centroids in California. We also saw how to make it run really quickly. In this post we’ll investigate a way to turn those clusters into polygons. These polygons will represent our clusters in an approximate way, but will be fast to display and provide a visually compelling way to appraise the clusters we’ve created.
One way to get a polygon from a set of points is to form the Convex Hull. However, for our clusters, convex hulls, though easily generated using OGR’s ConvexHull() on a MultiPoint object, do not look so great – they cover over large swathes of what we’d intuitively think of as the “shape” describing the points. An example from a real cluster produced by DBSCAN is shown below:

In this example, the light green coloured cluster would be badly represented by a convex hull (which is shown in pink):

Another example can be found in Part 1 of this blog series, for the largest cluster in the picture.
Unfortunately there’s no unambiguous way to wrap a set of points in a non-convex polygon (that is, there’s no such thing as a “bounding polygon” – think about it…). However, we can use something called Alpha Shapes, which have a parameter (alpha) which describes “how concave” we’d like our shape to go (intuitively, how tightly to “shrink wrap” our points). Good descriptions of Alpha Shapes are provided here, here and here, and in the context of spatial data, here.
For the above example, with a suitably chosen alpha (in this case, 2500), the alpha shape looks like the following:

There’s not a lot of Alpha Shape C++ code around and it’s not something you’d probably want to code up yourself unless you’re very interested in computational geometry and have lots of spare time. One source is Ken Clarkson’s code. This is old-style C code which I initially made the mistake of trying to integrate into my own code as a library, eschewing its command line interface. Unfortunately this meant making the code (which is full of global variables) re-entrant; it turns out the code contains subtle crashes and can generate unexpected results if repeatedly run over many inputs. So keeping it as a “black box” and calling it as an exe from my own code seems like the only option – however it does work well in that way, though it is somewhat slower as it has to invoke an exe for every polygon to be created.
Another shortcoming of this particular code is that the alpha shapes produced do not include any “holes”.
For my alpha I’ve chosen 2500, and I’ve also used a multiplier in the command line to reduce the risk of numerical instability with points that are very close to each other (latitude/longitude coordinates going down to 6 or more decimal places).
Another minor issue to note is that when your data includes more than one data item “stacked” at the same spatial point (for example if the data has several points representing the distinct addresses within an apartment building or suchlike), then in the worst case if such stacked points are isolated, you can end up with a cluster that only occupies a single point and hence doesn’t get processed as a polygon.
For very large clusters (bigger than say, 40000 or so) one could potentially also reduce the time for the Alpha Shape calculation by putting a grid over the points and taking only a sample of points in each grid cell.
In the next post we’ll see the results of running the Alpha Shapes code on our clusters. For now, here is the code that calls the Clarkson executable. Note that it contains a Windows-centric function which is not cross-platform, RunExeFromCommandLine.
#include
#include
#include
#include
#include
#include // for SHELLEXECUTEINFO etc
#include "node.h"
using namespace std;
bool RunExeWithCommandLine(wstring wstrExelocation, wstring wstrCommand)
{
SHELLEXECUTEINFO lpExecInfo;
lpExecInfo.cbSize = sizeof(SHELLEXECUTEINFO);
lpExecInfo.lpFile = wstrExelocation.c_str();
lpExecInfo.fMask = SEE_MASK_DOENVSUBST | SEE_MASK_NOCLOSEPROCESS;
lpExecInfo.hwnd = NULL;
lpExecInfo.lpVerb = L”open”;
lpExecInfo.lpParameters = wstrCommand.c_str();
lpExecInfo.lpDirectory = NULL;
lpExecInfo.nShow = SW_SHOWMINNOACTIVE;
lpExecInfo.hInstApp = (HINSTANCE)SE_ERR_DDEFAIL;
BOOL fSuccess = ShellExecuteEx(&lpExecInfo);
if (lpExecInfo.hProcess != NULL)
{
::WaitForSingleObject(lpExecInfo.hProcess, INFINITE);
::CloseHandle(lpExecInfo.hProcess);
}
return fSuccess == TRUE;
}
OGRLinearRing* GetNextRing(vector<pair<int, int> >& rgptsPoly, const vector<Node*>& rgpts, int first, set<int>& new_points)
{
int rgpt_index = first;
OGRLinearRing* ring = NULL;
for (;;)
{
int from = -1, to = -1;
for (vector<pair<int, int> >::iterator it = rgptsPoly.begin(); it != rgptsPoly.end(); it++)
{
if (it->first == rgpt_index)
{
from = it->first;
to = it->second;
*it = pair<int, int>(-1,-1);
break;
}
if (it->second == rgpt_index)
{
from = it->second;
to = it->first;
*it = pair<int, int>(-1,-1);
break;
}
}
if (from == -1)
return NULL;
OGRPoint point(rgpts[from]->second, rgpts[from]->first);
if (ring == NULL)
ring = new OGRLinearRing;
ring->addPoint(&point);
rgpt_index = to;
new_points.insert(from);
new_points.insert(to);
if (to == first)
break;
}
return ring;
}
int GetFirstNonZero(vector<pair<int, int> >& rgptsPoly)
{
for (vector<pair<int, int> >::const_iterator it = rgptsPoly.begin(); it != rgptsPoly.end(); it++)
if (it->first >= 0)
return it->first;
return -1;
}
int GetRings(vector<pair<int, int> >& rgptsPoly, const vector<Node*>& rgpts, vector<OGRLinearRing*>& rings)
{
set<int> setStartingPoints;
setStartingPoints.insert(GetFirstNonZero(rgptsPoly));
for (;;)
{
if (GetFirstNonZero(rgptsPoly) < 0)
return rings.size();
set<int> new_points;
for (set<int>::const_iterator it = setStartingPoints.begin(); it != setStartingPoints.end(); it++)
{
OGRLinearRing* ring = GetNextRing(rgptsPoly, rgpts, *it, new_points);
if (ring != NULL && ring->getNumPoints() >= 3)
rings.push_back(ring);
}
if (new_points.size() == 0)
{
new_points.insert(GetFirstNonZero(rgptsPoly));
if (new_points.size() == 0 || *(new_points.begin()) < 0)
break;
}
setStartingPoints.clear();
setStartingPoints.insert(new_points.begin(), new_points.end());
}
return rings.size();
}
int GetPolygonBoundary(const vector<Node*>& rgpts, int nCluster, vector<OGRPolygon*>& rgNewPolys)
{
wstring wstrPathForTempFiles = L”C:/temp/”;
wstring wstrTempFileInput = wstrPathForTempFiles + L”sites”;
wstring wstrTempFileOutput = wstrPathForTempFiles + L”sites-alf”;
wstring wstrCommandLineArgs = L”-A -oN -i” + wstrTempFileInput + L” -aa2500 -m10000″;
ofstream os;
os.open(wstrTempFileInput.c_str(), ios::trunc);
if (!os.is_open())
return 0;
for (vector<Node*>::const_iterator it = rgpts.begin(); it != rgpts.end(); it++)
os << setprecision(9) << (*it)->second << ” ” << (*it)->first << endl;
os.close();
if (RunExeWithCommandLine(L”C:/Temp/hull/Release/addcl.exe”, wstrCommandLineArgs))
{
vector<pair<int, int> > rgptsPoly;
ifstream in;
in.open(wstrTempFileOutput.c_str());
if (!in.is_open())
return 0;
char blank[500];
in.getline(blank, 500); // ignore first line
while (!in.eof())
{
int n1, n2;
in >> n1 >> n2;
rgptsPoly.push_back(pair<int, int>(n1, n2));
}
if (rgptsPoly.size() == 0)
return 0;
vector<OGRLinearRing*> rings;
if (rgptsPoly.size() <= 2)
return 1;
if (GetRings(rgptsPoly, rgpts, rings) <= 0)
return 0;
int nTotalNumberOfPointsInRings = 0;
for (vector<OGRLinearRing*>::const_iterator it = rings.begin(); it != rings.end(); it++)
nTotalNumberOfPointsInRings += (*it)->getNumPoints();
// Note if nTotalNumberOfPointsInRings < (int)rgptsPoly.size(), then GetRing only processed
// nTotalNumberOfPointsInRings out of rgptsPoly.size() points. This is OK, as sometimes
// the shape includes a line (polygon with 2 points), which we’ll generally ignore.
for (vector<OGRLinearRing*>::const_iterator it = rings.begin(); it != rings.end(); it++)
{
OGRPolygon* poly = new OGRPolygon;
poly->addRing(*it);
if (poly->getExteriorRing()->getNumPoints() >= 3)
poly->closeRings();
rgNewPolys.push_back(poly);
}
return rgNewPolys.size();
}
return 0;
}
Note that I do not consider the code above to be particularly good; I was still trying to gain an understanding of the output of Clarkson’s alpha shape code as I was developing it, and it shows. Anyhow, it has the useful property of working fairly robustly.
The above code is called with code like the following:
vector<OGRPolygon*> rgNewPolys;
map<int, vector<Node*> > mapNodesInEachCluster;
for (vector<Node*>::const_iterator it = vecNodes.begin(); it != vecNodes.end(); it++)
if ((*it)->m_nCluster >= 0)
mapNodesInEachCluster[(*it)->m_nCluster].push_back(*it);
for (map<int, vector<Node*> >::const_iterator it = mapNodesInEachCluster.begin(); it != mapNodesInEachCluster.end(); it++)
if (GetPolygonBoundary(it->second, it->first, rgNewPolys) <= 0)
cout << “Failure with ” << it->second.size() << ” points” << endl;
WritePolygonsToMidMifFile(“C:/Temp/Polygons”, rgNewPolys);
Where the WritePolygonsToMidMifFile function is:
bool WritePolygonsToMidMifFile(string strFileName, vector<OGRPolygon*>& polys)
{
ofstream osFileMid, osFileMif;
if (!OpenMidMifAndWritePreamble(strFileName, osFileMid, osFileMif))
return false;
int i = 0;
for (vector<OGRPolygon*>::const_iterator it = polys.begin(); it != polys.end(); it++, i++)
{
osFileMid << “i” << endl;
osFileMif << “Region 1″ << endl
<< ” ” << (*it)->getExteriorRing()->getNumPoints() << endl;
for (int n = 0; n < (*it)->getExteriorRing()->getNumPoints(); n++) // first and last point are the same
{
OGRPoint pt;
(*it)->getExteriorRing()->getPoint(n, &pt);
osFileMif << pt.getX() << ” ” << pt.getY() << endl;
}
int rgb = (40 + ((i * 13) % 215)) * 255 * 255 + (40 + ((i * 4573) % 215)) * 255 + (40 + (i % 215));
osFileMif << ” Pen (1,2,0) ” << endl
<< ” Brush (” << 2 << “,” << rgb << “)” << endl;
}
osFileMif.close();
osFileMid.close();
return true;
}
We’ll see some of the outputs produced in the next and final post in this series, where we’ll also look at the effect of different values of alpha.