Lightweight Geocoding

At Biarri we’re always on the look out for inexpensive, light tools that we can integrate with and pass on the value to our Workbench customers. We’ve been hunting for quite a while but haven’t yet found a good free Geocoder that is suitable for possibly intensive commercial use (and preferably worldwide!). Geocoders turn addresses into latitude/longitude coordinates; reverse geocoding, which is also sometimes useful, finds the closest valid address given a coordinate.

Any geocoder worth its salt needs to be able to do a good job at fuzzy string matching because addresses are frequently given incorrectly (e.g. “Road” instead of “Street”, multiple names for highways – “Hume Highway” vs “M31”, common misspellings, ambiguity at suburb boundaries, confusion over locality names, etc). It’s the quality of this matching in combination with the correctness and up-to-date-ness of the underlying map data that sets different geocoders apart.

I recently tried out TinyGeocoder. It has a simple API which appears to work quite well. It doesn’t give you any direct information about what it matched to or how well it matched, though you can use the reverse geocoding function to get an idea. Though I’m finding the results a little bit patchy and fickle (some well known streets just inexplicably won’t geocode, for example), and not particularly fast, overall it is a reasonable tool.

Generating KML Thematic Maps

The Biarri workbench now has a tool for showing the frequency of an event per postcode. The colourised map (funkier name Choropleth map) is generated as KML from postcode regions that are stored in a postgis database. This just requires a little bit of sql and python:

dict_cur.execute("select p.gid, p.poa_2006, ST_AsKML(the_geom) as \
                  polygon, frequency, (frequency - stats.min) / (stats.max - \
                  stats.min)::float as scaled_frequency from "+table+" c, \
                  postcode_regions p, (select max(frequency), min(frequency) \
                  from "+table+") as stats where p.poa_2006 = c.postcode")
return my_lookup.get_template("thematic_kml.mako").render(rows=dict_cur.fetchall())

The sql just normalises the column we want as our colour intensity and returns the KML for each postcode region.

The mako template:

    Thematic Map
        generated with the Biarri workbench
    % for row in rows:

 ${row['polygon'].replace(" ", '\n')}
 % endfor

import webcolors
import colorsys

def intensity(frequency):
#frequencies are already scaled between 0 and 1
 return 'cc' webcolors.rgb_to_hex(tuple((int(a * 256) for a in colorsys.hsv_to_rgb(0.134, frequency, 0.90)))).strip('#')

The mako template is quite easy to read if you notice the % sections are actually python code. The only challenge I had was that ST_AsKML produces kml with the co-ordinates space separated. Google earth is fine with this but the kml viewer we’re using, web map lite, only likes them newline separated. My colour function just changes the intensity of the colour. A gradient between blue and red would probably be more appropriate.


Biarri’s first Open Source contribution

Today Biarri open sourced its first code. We’ve used Google Code hosting with an MIT License to open source a small program called osm2mif that converts Open Street Map data (OSM) to MapInfo Interchange Format (MID/MIF).

From the Google Code project homepage for osm2mif:

Converts an Open Street Map file (.OSM) to mid/mif output, creating a topology for routing (that is, breaking up the OSM ways at intersection points, and also processing “no right turn” restriction relations).

Can use filtering to convert only within a given coordinate bounding box; also can convert only a subset of features (e.g. just roads, or just coastlines and lakes/rivers, or some given mixture); can also transform tag values to given strings, and use particular MIF styles (pen style, polyline vs region etc) based on the given tag values.

The program is written in C++ and contains only one file, with less than 1000 lines, and no external dependencies apart from the standard library. Compiles under both Windows (Visual Studio) and Linux (GCC). Runs pretty quickly for OSM files under a Gigabyte, but does use up lots of RAM when running.

Using Open Street Map data for Australia

I’ve been using Open Street Map (OSM) data for Australia for generating matrices of travel times and distances that are suitable for input to truck routing optimisation problems. OSM is a world-wide map data set that anyone can edit and is impressively comprehensive.

Some of the resources I’ve used are, which does a regular batch job to extract country by country OSM data, and which keeps an up to date list of map data errors. These types of errors are important to know about when producing routable data from OSM, in particular dead end one way streets and “almost junctions”.

I’ve been using Quantum GIS (QGIS) to look at the data after converting it to MID/MIF format. I can get it to label the lines that are one way streets, but unfortunately there is no way to show the line direction in QGIS, which is a major pain. (Actually there is a way but you have to get a source code branch from their repo and build it all etc etc). [Now that I think of it, a silly cheat way to do it would also be to produce another field in the map data, say a letter which represents whether the arc is “D” for down or “L” for left etc (based on the difference in the lat/longs of the start and end nodes of each arc), and label that way.]

I also found that the performance of QGIS is quite different depending on whether the data is opened as TAB or MIDMIF format. TAB format is fairly fast (just zooming, panning etc) but MIDMIF is quite noticeably slower. You’d think it wouldn’t make a difference as it would be using the same internal data representation, but obviously not.

I’m using some extra layers to show some of the processed data (picture below). For example, I have a layer which just shows the arcs involved in restricted turns, which I can layer on top of the street network and use a thicker line style for. I also use dots in another layer which I have produced based on my “island” processing. The dots represent “orphaned” nodes which are not on the main “island” of map data (which is connected in the sense that all nodes can reach all other nodes). These orphaned nodes will be ignored by the travel time calculation. There are around 9600 of these nodes in the entire set of Australia OSM routable data I’m using (which has 620042 nodes and 1503668 arcs in total). This filtered subset of OSM data I am using includes only street segments with a “highway” tag – this will exclude cycleways, pathways, hiking trails, ferry routes, ski slopes, building outlines, administrative boundaries, waterways etc.

Some observations on the OSM data (for Australia):
• One way information seems quite complete.
• Only 12% of the streets have road speed information (“max_speed” tag). This is an issue as vehicle routing needs highways to have a faster speed otherwise they won’t be used (and there will be lots of rat-running for example). In longer routes (e.g. interstate) the travel time will also be grossly overestimated. A couple of things we could do here: search for all segments with “highway”, “motorway”, or “freeway” in their names and assume some sort of speed like 80km/h on these segments. Or, with a bit more coding effort, if there are chains of segments where one has a speed and the others with the same street name don’t, assume that speed on all of those segments.
• About 70% of the road segments have street names. Some of the unnamed segments are bits of roundabout, service roads, motorway on/off-ramps etc. But, there are also a lot of streets which are classified (by their “highway” tag) as “secondary”, “tertiary”, “unclassified” and even “residential” which are not named. This is an issue when vehicle routing needs to produce driver directions or verbose path information.
• There’s only several hundred instances of streets which have restricted manoeuvre information (banned right hand turns being the prime example). Most of these instances are in Sydney. In reality this number should be in the thousands or tens of thousands. This will likely be the biggest issue from a routing quality point of view – it will mean you can do many illegal right turns.
• I found a weird character or two at the end of the file, which causes both Python and C++ file reading functions to get confused and read blanks forever. Weird, but easily avoided.

The benefits of hosted solutions

Cloud computing offers many advantages over more traditional software channels. There is an interesting analogy here between computing power and electricity (elucidated well in the book The Big Switch: rewiring the world, from Edison to Google by Nicholas Carr 2008) – cloud computing offers computing power and business applications served up from a “centralised power plant”.

At Biarri we are excited about the advantages of cloud-hosted solutions instead of the more traditional desktop approach, in particular:

  • All clients run the latest and greatest version of the software
  • Accessible anywhere there is an internet connection
  • From a development point of view, problems can be solved once in a controlled computing environment. This lets developers concentrate more on features of value to customers and less on making software robust enough to run under many different environments (OS versions or variants for example).
  • Avoiding desktop licensing issues (e.g. tying software to a particular machine with awkward process for transferring license to another machine)
  • Short-circuits problems with desktop software deployment
  • Instant support – pursuing the electricity analogy, this would be like having a handyman always immediately on call to fix your problem
  • Scalability – ability to increase the computing power required on demand by “spinning up” more machine instances in the cloud
  • And best of all for our business clients – available on a low-cost monthly subscription (with the ability to easily turn on and off the service) without an arduous IT deployment

Biarri’s Workbench continues to be developed and enhanced and offers a hosted solution that comprises many different optimisation engines that all “plug in” to a central, flexible hosted architecture. In keeping with Biarri’s key values of accessibility and power, the user interface offers both a workflow-oriented interface with rich data manipulation and visualisation functionality.

Accessibility – at work every day

The posts below describe Biarri’s approach to deliver optimisation through accessibility of a number of different channels. Whether it is modelling in excel, a Biarri inside engine, by using the Biarri Workbench or even a managed service – these are all great ways for businesses to quickly and affordably benefit from the power of optimisation and Operations Research.

On a more practical level each day we are seeing examples where Biarri’s open and accessible approach is giving customers the power to “drive” the model or application and get the benefits of new insights and approaches to complex problems which deliver savings.

The power of being able to change parameters and input data and re-run optimisations all in a matter of seconds means in a very short time customers have used Biarri’s tools to get a deep understanding of what drives cost and yield in a supply chain or business operation and the impact of the different levers they can pull to affect change.

We at Biarri believe that powerful insight and analysis should be available quickly, to answer the burning questions that management have. For this reason, every day the team at Biarri look for new ways to make our commercial mathematics more accessible for our customers.

Accessible Engines: Biarri Inside

From the start Biarri has been committed to providing easy access to powerful optimisation.  Our focus at inception was twofold being:

  • Custom optimisation models delivered through consultancy projects and
  • To deliver task specific engines made available on a Software as a Service (SaaS) basis via the “Biarri WorkBench” (WB).

The strong demand for our analytical consulting services continues to surprise us and the WB is now in Beta with a launch expected later in the year.

However, a new delivery channel has emerged to provide even greater access to our optimisation capability.  Over the last few months we have been developing optimisation formulations and engines that other companies are embedding in their own software.  This ‘Biarri Inside’ approach allows companies to quickly and affordably add optimisation capability to an existing application and leverage Biarri’s expertise in formulating and coding optimisation engines.

We are responding to the demand for the ‘Biarri Inside’ approach by ensuring the optimisation engines that we develop have a well documented and designed API so  each can be easily incorporated into existing solutions.  We are now involved in a number of projects that leverage this delivery channel for accessible optimisation.