Biarri’s customers are regularly faced with the problem of making decisions “here and now” which will have consequences for the future. To make life harder, there is often uncertainty about what will happen in the future. Historically, there have been two main approaches to this sort of problem:
- Optimise for “here and now”, and let the future take care of itself – the myopic policy.
- Apply some simple rule of thumb – a very common policy
Many of these problems are resource allocation problems. What we would like to know is this: if we optimise for “here and now”, what value should we give to left over resources? In other words, how can we try to take care of the future when optimising.
If we can assign reasonable estimated values to left over resources then we can make better decisions. Approximate dynamic programming is a way of doing this. The values of left over resources are known as “value function approximations”.
A couple of examples will help. These are taken from “Approximate Dynamic Programming: Solving the Curses of Dimensionality” by Warren B. Powell.
A blood bank needs to decide each day how to allocate its stocks of blood to operations. Operations are either critical (life saving) or elective. There are rules for substitution of blood types. The aim is to maximise the “value” of the operations performed, where critical operations are much more valuable. Each day new (random) blood supplies arrive. Each day, there is a new (random) list of demands for blood for operations.
A myopic policy would just assign the blood to operations in the best way possible for day. This might mean that some elective operations are done today using a rare blood type, leading to some critical operations not being done the next day. Approximate dynamic programming places a value on blood of each type – at different levels of storage – so that enough blood is kept back to meet critical demand the next day.
Long Distance Trucking
A long distance trucking company needs to assign drivers, with their prime movers, to requests to haul trailers. Random requests arrive every day and the assignments are made every day. A driver needs to return to their home base for a couple of days off at regular intervals. A myopic policy would just assign the “cheapest” driver to each job. Maybe the driver that can get to the start of the job quickest. Even a myopic policy would consider if the driver would be able to get home in time after the job.
Approximate dynamic programming would place a value on a driver from city X being in city Y with Z days left until they need to return to their home base. This would be some measure of the chance they would be able to work their way home, rather than just dead heading. When making the assignment of drivers to jobs, we can consider not just the cost of assigning the drivers to the jobs, but an estimate of the value of the driver when he has finished the job.
So how does it work?
The long answer is in the book by Powell, but briefly… Approximate dynamic programming uses a combination of simulation and optimisation to iteratively calculate the value function approximations – the value of left over resources. Once this is done, operational decisions are taken using an optimisation procedure that combines the value function approximations and the short term costs.
The actual operational outcomes can even be used to update the value function approximations. This means that if the underlying statistical model is different from what was used in the simulation process, the value function approximations will gradually get better.
Of course there is a lot of “devil in the detail”. Typically the number of possible combinations of left over resources is huge, so we need tricks to estimate a value for any particular combination that may occur. Again, see Powell for more details.
What data do I need?
- The same type of cost and rule data you need to run your operations anyway.
- Enough historic information on demand/supply so that a reasonable simulation model can be built.
Approximate Dynamic Programming is still in its infancy as a tool, but it promises a way to close the gap between moderate sized, deterministic applications that can be solved to optimality and larger, random applications that have previously only been addressed by simulation.