Database vs pure python performance

I was just looking through some of our common code and noticed some maths code in python that I knew could be replaced by a function in our geo-database, Postgis. However I wondered what the performance difference between calling out to some C code via the database and doing a simple bit of maths in python would be. The code has no loops, is simple and should be very fast.

It turns out the database approach is around 100X slower…

>>> timeit.Timer('common.lat_long_distance_db(db_wrap, 144.865906, 144.865237, -37.836183, -37.831359 random.random())', 
         "from __main__ import common, db_wrap, random").timeit(10000)
>>> timeit.Timer('common.lat_long_distance(144.865906, 144.865237, -37.836183, -37.831359 random.random())', 
         "from __main__ import common, random, db_wrap").timeit(10000)

I would suggest this isn’t the right way to use the DB! 🙂

If we write a function where we just select a constant value from the database we can find the call over head of the database calls and remove the postgis process and parsing cost.

timeit.Timer('lat_long_distance_db_just_select(db_wrap, 144.865906, 144.865237, -37.836183, -37.831359 random.random())', 
         "from __main__ import db_wrap, random, lat_long_distance_db_just_select").timeit(10000)

Large percentage of overhead.

Seems to suggest that I should look into the difference between selecting out the data, apply the python function then updating the db with results. I suspect apply the postgis functions to data that is already in the db instead of looping in python will be best, but empirical data is enlightening!

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply