Posts

Dynamic scheduling in the face of uncertainty

Many businesses operate in uncertain environments, at the mercy of breakdowns, human error, traffic or the weather. The traditional approach is to build slack into a schedule, to survive some expected level of uncertainty, and then to hope this schedule can be executed in spite of whatever unforeseen events actually occur. In instances where the schedule cannot be implemented as expected, operational decisions are made to try to meet the business goals as much as possible. Under complexity and time pressure, decision makers must often resort to making short term, locally focused decisions without the time or tools to consider implications downstream.

In this blog post, optimisation consultant Dave Scerri describes how recent algorithmic advances and smart tools have enabled the best of both worlds: an operational scheduling approach that responds to changing circumstances while focusing on system wide efficiency.

Dealing with operational uncertainty is one of the biggest challenges facing many businesses today. The most well thought out plans and schedules rarely survive contact with the real world, and the consequences are often a loss of productivity and overall efficiency. Businesses should be endeavouring to make the best decision at any point in time given the current circumstances and the best information available to them. Operational scheduling, when done right, can allow businesses to make quality, timely decisions, maximising their productivity in the face of uncertainty.

The best laid plans of mice and men often go awry

– Robert Burns

Approximate Dynamic Programming

The Challenge

Biarri’s customers are regularly faced with the problem of making decisions “here and now” which will have consequences for the future.  To make life harder, there is often uncertainty about what will happen in the future.  Historically, there have been two main approaches to this sort of problem:
  • Optimise for “here and now”, and let the future take care of itself – the myopic policy.
  • Apply some simple rule of thumb – a very common policy
Many of these problems are resource allocation problems.  What we would like to know is this:  if we optimise for “here and now”, what value should we give to left over resources?  In other words, how can we try to take care of the future when optimising.
If we can assign reasonable estimated values to left over resources then we can make better decisions.  Approximate dynamic programming is a way of doing this.  The values of left over resources are known as “value function approximations”.
A couple of examples will help.  These are taken from “Approximate Dynamic Programming: Solving the Curses of Dimensionality” by Warren B. Powell.

Blood Bank

A blood bank needs to decide each day how to allocate its stocks of blood to operations.  Operations are either critical (life saving) or elective.  There are rules for substitution of blood types.  The aim is to maximise the “value” of the operations performed, where critical operations are much more valuable.  Each day new (random) blood supplies arrive.  Each day, there is a new (random) list of demands for blood for operations.
A myopic policy would just assign the blood to operations in the best way possible for day.  This might mean that some elective operations are done today using a rare blood type, leading to some critical operations not being done the next day.  Approximate dynamic programming places a value on blood of each type – at different levels of storage – so that enough blood is kept back to meet critical demand the next day.

Long Distance Trucking

A long distance trucking company needs to assign drivers, with their prime movers, to requests to haul trailers.  Random requests arrive every day and the assignments are made every day.  A driver needs to return to their home base for a couple of days off at regular intervals.  A myopic policy would just assign the “cheapest” driver to each job.  Maybe the driver that can get to the start of the job quickest.  Even a myopic policy would consider if the driver would be able to get home in time after the job.
Approximate dynamic programming would place a value on a driver from city X being in city Y with Z days left until they need to return to their home base.  This would be some measure of the chance they would be able to work their way home, rather than just dead heading.  When making the assignment of drivers to jobs, we can consider not just the cost of assigning the drivers to the jobs, but an estimate of the value of the driver when he has finished the job.

So how does it work?

The long answer is in the book by Powell, but briefly…  Approximate dynamic programming uses a combination of simulation and optimisation to iteratively calculate the value function approximations – the value of left over resources.  Once this is done, operational decisions are taken using an optimisation procedure that combines the value function approximations and the short term costs.
The actual operational outcomes can even be used to update the value function approximations.  This means that if the underlying statistical model is different from what was used in the simulation process, the value function approximations will gradually get better.
Of course there is a lot of “devil in the detail”.  Typically the number of possible combinations of left over resources is huge, so we need tricks to estimate a value for any particular combination that may occur.  Again, see Powell for more details.

What data do I need?

  • The same type of cost and rule data you need to run your operations anyway.
  • Enough historic information on demand/supply so that a reasonable simulation model can be built.
Approximate Dynamic Programming is still in its infancy as a tool, but it promises a way to close the gap between moderate sized, deterministic applications that can be solved to optimality and larger, random applications that have previously only been addressed by simulation.