Spatial Clustering in C++ (post 3 of 5) – A Speedier DBSCAN

Welcome to part 3 of this 5 part blog post on Spatial Clustering using C++. In the two previous posts we read in some map data (Californian census block centroids) and performed some clustering on the points using the DBSCAN algorithm. In this post we’ll investigate how to make the code run faster. This is important if you want this type of algorithm to run in reasonable time over a very large data set (e.g. all of the USA), or you want to run it repeatedly (e.g. to investigate different parameters).

In this investigation I’ll employ some of the techniques for code optimisation that I keep in my mental kit bag, namely: using raw arrays instead of container objects; pre-calculate and cache values wherever possible; use statics to avoid memory overheads; “pool” up frequently re-allocated objects.

But first, the golden rule: always profile. The first thing I’ll do is surround the call to DBScan() with a simple time clocking:

time_t tmStartTime;

time(&tmStartTime);

int nClusters = RunDBScan();

time_t lTimeNow;

time(&lTimeNow);

time_t tsDuration = lTimeNow – tmStartTime;

cout << tsDuration << ” seconds to run DBScan” << endl;
The version of DBSCAN presented in the previous post runs in about 86 seconds, so this is the benchmark where we start from (note that is in the Debug build configuration).

Next I do my “poor man’s profiling” – in Visual Studio, I just hit Debug / Break when the main algorithm is running. Breaking a few times gives a picture of where the time is being spent the most. [Incidentally, I love this method of profiling, as it’s just so quick, given that it doesn’t require any external tools or specialised performance builds, yet it still often enables you optimise 90% of the worst bottlenecks in the code].

In this case I find the majority of time is spent allocating memory when pushing items onto the STL vector rgpNeighbourhood2 inside ExpandCluster / GetNodesInRadius.

I tried to use vector’s “reserve” function to allocate memory in larger chunks. However, it’s not clear what the best chunk size to use would be; in fact experimenting with different values can dramatically affect the overall performance. For my test data set, I found the best value was 25, and this reduced the running time to 60 seconds.

Another similar strategy when using STL vectors is to use static and clear; replace the declaration of the vector with the following code:

static vector<Node*> rgpNeighbourhood2;

rgpNeighbourhood2.clear();

vector::clear does not change the vector’s capacity in this version of STL (with MS Visual Studio 8), so the vector is made only once, and it is re-allocated presumably only a handful of times (whenever it’s required to get larger than it previously was). This method is better than using reserve, as it doesn’t require any arguments or parameters, and it is faster too – only 53 seconds.

Of course one can also fall back on good old raw C arrays instead of using vectors. I haven’t posted the code for this, but it reduces the time further to 39 seconds. Using raw C arrays does really impact the readability and maintainability of the code and would need to be wrapped up in some nice class wrapper or suchlike in an industrial strength application.

Once this is all done, the bottleneck is now inside kdtree, specifically where it allocates and de-allocates its result objects (objects of type kdres). Looking into the code in this area, I saw the allocation guarded with #defines: USE_LIST_NODE_ALLOCATOR. Looking at the documentation, it advocates switching this tag on. In Windows you will also need to define NO_PTHREADS and comment out the line #ERROR that warns about the code no longer being thread-safe.

Using the list node allocator reduces the time down to an astounding 15 seconds. Finally, running the code in Release instead of Debug reduces the time down to 4 seconds – not bad considering the first naive version took nearly a minute and a half to run! Profiling now shows most time is spent inside kdtree functions to return the nearest nodes, and there’s not much we can do about that.

Summarising what we’ve found:

86 seconds for the first version of the algorithm

60 seconds with reserve of 25 on rgpNeighbourhood2

53 seconds with static instead of reserve

39 seconds with raw C arrays instead of STL vectors

15 seconds with kdtree list_node_allocator switched on (and raw C arrays)

4 seconds in Release configuration (and raw C arrays)

And the lessons learnt (as always, it seems) from optimising code:

  • always measure timings – bottlenecks are very often not where you’d think they would be
  • it’s often sufficient to use very simple profiling
  • it’s frequently surprising how fast you can eventually make the code run

Now that we have our DBSCAN running blazingly fast, we’ll in the next post see how to create alpha shapes that will give the output more visual appeal.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published.