Database vs pure python performance
I was just looking through some of our common code and noticed some maths code in python that I knew could be replaced by a function in our geo-database, Postgis. However I wondered what the performance difference between calling out to some C code via the database and doing a simple bit of maths in python would be. The code has no loops, is simple and should be very fast.
It turns out the database approach is around 100X slower…
>>> timeit.Timer('common.lat_long_distance_db(db_wrap, 144.865906, 144.865237, -37.836183, -37.831359 random.random())',
"from __main__ import common, db_wrap, random").timeit(10000)
2.001802921295166
>>> timeit.Timer('common.lat_long_distance(144.865906, 144.865237, -37.836183, -37.831359 random.random())',
"from __main__ import common, random, db_wrap").timeit(10000)
0.02936482429504394
I would suggest this isn’t the right way to use the DB! 🙂
If we write a function where we just select a constant value from the database we can find the call over head of the database calls and remove the postgis process and parsing cost.
timeit.Timer('lat_long_distance_db_just_select(db_wrap, 144.865906, 144.865237, -37.836183, -37.831359 random.random())',
"from __main__ import db_wrap, random, lat_long_distance_db_just_select").timeit(10000)
1.4460480213165283
Large percentage of overhead.
Seems to suggest that I should look into the difference between selecting out the data, apply the python function then updating the db with results. I suspect apply the postgis functions to data that is already in the db instead of looping in python will be best, but empirical data is enlightening!
Leave a Reply
Want to join the discussion?Feel free to contribute!