Supply Chain planning

Making the most of storage facilities in large construction projects

Large construction projects have a huge logistics component often requiring equipment from all over the globe to be brought together through a complex supply chain. In an ideal world, our suppliers would produce items on time and they would be transported to our construction contractors just as they needed them.

Unfortunately, we don’t work in an ideal world – suppliers can fail to produce on time, problems can occur in transit, and external forces outside our control can interrupt the construction schedule. For this reason, we want a contingency plan – usually in the form of a storage facility where we can hold spare or excess items if our supply and demand schedules don’t match up.

These facilities are used heavily over a short period of time, meaning that any inefficiencies quickly add up to large, unnecessary costs. However, given the size of these projects, identifying and fixing these issues is far from easy or intuitive. It is through analytics that you can drive quantitative answers and support effective decisions in your logistics and facility planning.

Read more

Mitigation by Iteration

Mitigation by Iteration – Facilitating the Optimisation Journey

For companies to remain competitive, they require smart systems that solve their unique day to day business problems. However, when applying these systems many decision makers get lost in the complexity due to limited communication and collaboration within the implementation process.

We are at the cutting edge of the latest optimisation methodologies and web technologies. However, unlike many other optimisation, and analytics companies out there, one of our main goals is to make powerful optimisation accessible in the real world to bring value to our clients.

We specialise in the development of web applications and smart optimisation engines– delivered in less time than you would probably think. It’s not unusual for us to go from an initial workshop with a client, to understanding their problem, and then having a fully functional optimiser in a production environment within three months.

On top of this the same people are often involved through the entire SDLC (software development life cycle) i.e. from the spec/design, theory, implementation and delivery/support. This reduces the overhead many organisations incur by having different people in business analytics and developer positions. The people implementing the solution actually understand and work with you to solve your problem.

Read more

Biarri FIFO Management

Grounding the complexity to Fly in Fly Out management

Being able to close the labour and skill gap is a critical factor in sustaining growth and maximising profitability for remote operations. It is imperative that companies have the tools and skills available to unravel the complexity to FIFO management.

FIFO workforces are commonly used by large infrastructure and resource projects in remote regions including rural and offshore. These regions often don’t have adequate infrastructure or an available local workforce with the right skillset which leads to companies requiring the use of workers from interstate and sometimes overseas.

The FIFO problem is complex for many companies. It involves determining efficient ways to move people via aircraft, taking into consideration: multiple projects at various phases over multiple locations, with a dynamic workforce utilising different skillsets on a variety of roster patterns, as well as using a fleet consisting of different types and numbers of aircraft.

Often the goal with FIFO management is to determine the number, and type, of aircraft needed in order to minimise cost whilst working with the opposing objectives of ensuring: the staff arrive before the start of their shift (but not too early), depart after the end of their shift (but not too late) and keeping travel durations to acceptable lengths (to ensure low fatigue).

Balancing FIFO Complexity

Analytics to break through the complexity

With this level of complexity, a traditional excel approach lacks the rigour and power to find the most efficient and effective results. As a result we’ve developed a number of different FIFO optimisers at Biarri to help ensure the best outcome for clients.

The reality is that there are often many more factors that need to be considered which complicates the problem further. Each FIFO optimisation problem often turns out to be quite different once the detail of the problem is better understood.

High Level FIFO Requirements

Some companies just want us to help them “define their fleet, or travel requirements” so they can then go out to tender (it also helps to keep the vendors honest), others actually want an operational tool. Others may be looking to see if there is a business case for upgrading an airport (e.g. if the airport is upgraded, then larger aircraft can be used which can reduce the need for bus in bus out (BIBO) which will alter their risk profile due to road km and can dramatically alter travel durations).

Specific FIFO requirements

Our clients often want different levels of detail in the solution. Some are happy with a solution that ensures adequate movements at the week level (e.g. 15 flights of aircraft type A between locations B and C per week), others want very detailed minute by minute schedules which take into account: turnaround time, time between takeoff and landing, number of aircraft gates with solutions showing exactly who is travelling on which flight and aircraft and when.

Across Multiple Projects

Our clients have also had multiple projects which are often on the go at the same time and sometimes different priorities are given to different projects. These priorities can be used to ensure that if all the people movement demands can’t be met, then the lower priority movements are less likely to be satisfied.

Optimising the time horizon

The optimisation time horizon can also vary significantly with some clients optimising over a 24 hour period (or even less if they want to re-optimise in the middle of the day due to unpredictable events such as delays due to weather) through to clients wanting higher level schedules over several years to help them make strategic decisions and determine how their fleet needs to change over time.

Understanding the constraints

Constraints such as: the maximum distance an aircraft can travel before needing to refuel, maintenance schedules and the refuelling locations themselves often also need to be considered. We’ve dealt with both fixed and rotary wing (helicopters) aircraft. Helicopters have the additional complication of sometimes having to take more fuel (and thus weight) to travel further, which results in the reduction of passengers because of the helicopter’s limited total payload capacity.

Finding the right FIFO parameters

We have outlined some of the parameters that our FIFO optimisers have considered. It is by no means comprehensive and we can always include new parameters if a different problem requires them but it gives a good understanding into the different variables that can, and should be considered.

Some of the typical inputs include:

Airport information

  • Location
  • Hours of operation
  • Refuelling capability
  • Refuelling duration
  • Availability (i.e. you can specify a start and end date for which the airport is available)

Aircraft information

  • Serial number
  • Category (e.g. fixed wing or rotary wing)
  • Type (e.g. DASH 8-200)
  • Average speed
  • Passenger seats
  • Maximum payload
  • Fuel density
  • Fuel tank capacity
  • Re-fuelling time
  • Fuel burn rate
  • Base location
  • Availability (i.e. you can specify a start and end date for which the aircraft is available)
  • Costs

Flight Legs

  • From location
  • To Location
  • Distance
  • Aircraft types able to fly this leg

Project priorities

People Movement Demands

  • Origin
  • Destination
  • Project
  • Number of passengers
  • From Date
  • To Date
  • Arrive Before (i.e. must arrive on their first working day of the roster by this time)
  • Depart After (i.e. must depart after this time on the last working day of the roster)
  • Roster Pattern (e.g. 14:14 = 14 days on, 14 days off)
  • Day of week (i.e. which day of the week can this person travel)
  • Group (demands can be grouped together to allow the user to specify which demands can be grouped on the same aircraft)

Some of the typical outputs include:

KPIs - There are around 40 KPIs, some of them are listed below

  • Total flights
  • Total distance flown
  • Total fuel burned
  • Total number of aircraft required
  • Utilisation Percentage
  • Total unused pax capacity
  • Total passenger demand
  • Total passenger demand satisfied

Resource Summary (i.e. which aircraft are required and when)

  • Serial number
  • Date
  • Total pax
  • Total hours flown
  • Total distance flown
  • Total fuel burned
  • Total flights
  • Total legs
  • Cost

Flight leg details (i.e. which flight legs are required and when)

  • Flight ID
  • Resource ID
  • Pax capacity
  • Available pax capacity (this is < pax capacity if the fuel weight is a limiting factor)
  • Total used pax
  • Utilisation Percentage
  • Departure location
  • Departure date and time
  • Arrival location
  • Arrival date and time
  • Day of week
  • Total distance
  • Total hours flown
  • Total fuel burned
  • Fuel weight at start of leg
  • Refuel at destination (true or false)
  • Turn around time
  • Cost

Flight leg pax details (i.e. which people movement demands travel on which flight legs)

  • Flight ID
  • Origin
  • Departure date and time
  • Destination
  • Arrival date and time
  • Project
  • Pax

Project summary (i.e. which demands from which projects were satisfied)

  • Project name
  • Total demand
  • Total satisfied demand
  • Total unsatisfied demand (e.g. this will be non zero if there is not enough capacity to transport demand)
  • Total impossible to satisfy demand (e.g. this will be non zero if a flight path has not been specified in the inputs that results in some demand being impossible to satisfy regardless of aircraft resources available)

Flight summary

  • Flight ID
  • Number of instances (i.e. how many times is this flight route flown at the same time – but on different dates)
  • Resource
  • Date of first flight
  • Date of last flight
  • Day of week
  • Departure time
  • Arrival time
  • Total people
  • Total distance
  • Total hours flown
  • Total fuel burned

Unravel the complexity to FIFO Management

The work we have done for companies such as Arrow, Origin, QGC, BMA, IBS, and Santos has shown us that despite having FIFO problems, they all required different approaches in order to achieve the right result.

This has demonstrated to us that when approaching a FIFO problem, where so many different variables have to be considered depending on the client, a standard approach (Commercial off the shelf product) and excel models will generally struggle with the complexity.

Having a tool built around specific variables demonstrates the benefits to bespoke solutions for FIFO problems.

Find out more about Biarri in Mining >>
Find out more about Biarri in Oil & Gas >>
Find out more about Biarri and FIFO Scheduling >>

Or, Get in contact so we can discuss your requirements.

Driving efficiency in the Oil and Gas industry

Driving efficiency in the Oil and Gas industry

The oil and gas industry has been under severe pressure since late 2014 when oil prices dropped significantly. The highly volatile international market and oversupply of oil has meant companies have had to reduce costs in one way shape or form.

Bill Kroger, co-chair of law firm Baker Botts told Rigzone in an interview that, “Energy companies may need to lower their prices in response to a drop in demand …. For this reason, we may see CAPEX [capital expenditures] begin to decline until there is some stability with oil prices,”

This has been evident in Australia where many oil and gas companies have reduced capital spending significantly. However, with a lot of oil and gas projects shifting towards the operational phase, how can we make processes and decisions more efficient and effective?

Read more

Biarri Optimisation - 21

One tech to rule them all

Back in high school when you were battling polynomials with derivatives, you learnt one of your first all-purpose optimisation techniques. Given a function, take its derivative, set it to zero and solve for all optima. Voila! After you had coerced the equation to give up all its maxima secrets, you could then be safe in the knowledge that you had found the optimal values.

If you enjoyed computing the optimal speed to jump your calculus hurdles, then you probably went on to university to learn more about other mathematical tools. Once there you most likely met your second all-purpose optimisation technique, Linear Programming. Given a simple linear function with linear constraints, you learnt that you could use any of a number of tools such as the Simplex Method to easily and quickly solve such problem.

Ant Colonay Optimisation in Action

Figure 1 Ant Colony Optimisation in Action

From there, if you had more of an applied orientation, you probably went on to learn about the exotic optimisation creatures. For example, Genetic Algorithms, Simulated Annealing as well as a host of other nature-inspired algorithms such as Ant Colony Optimisations, Swarm Optimisation, Artificial Bee Colony Optimisation techniques, etc. In the 1990s and 2000s, for almost any type of problem, it seemed that one could find a new nature-inspired optimisation technique to solve it better than previous more mainstream tools.

However, in spite of the swarm of new optimisation tools, most problems for which these new techniques were developed could be solved with existing tools. Although probably not as efficiently. Hence the question was which method was best? The answer to this question came in 1995 when two researchers, David Wolpert and William MacReady at the renowned Sante Fe Institute, proved a number of theorems collectively referred to as the “No free lunch” theorems. These results could be seen as implying that there is no one optimisation method that is best for all problems. In addition, when we average over all problems, we expect all methods to be equal.

This result has important and deep implications. It means that if you swear by a single generic optimisation method and try and use it to solve all your problems, then don’t expect it to perform better than your three year old son who guesses random solutions to all your different stochastic multi-objective constrained optimisation techniques.

Given this it would seem strange then that I am about to suggest the idea of solving numerous problems with a single optimisation technique. Besides the fact that a number of these problems don’t look like optimisation problems at all! The reason for doing this is to see the power of an interesting optimisation technique, as well as its flexibility and generality. In addition, one needs to bear in mind that the “No free lunch” theorem is a result about every imaginable optimisation problem, which is just a tad more than the few I will touch on here.

The ST5 antenna’s final optimised design

Figure 2 The ST5 antenna’s final optimised design

The class of optimisation technique that I want to discuss here is generally referred to as Genetic Algorithms. They have been successfully used on thousands of research and industrial problems and continue to amaze researchers with their potential to solve problems far beyond their original scope. For example, one of the most famous applications of Genetic Algorithms was by NASA in 2006 to develop a new antenna design for their ST5 antenna. It can be seen that the optimal design was anything but intuitive and most likely would not have been found by a “standard” optimisation technique based on initial human guesses.

So what type of problems would be considered inappropriate for Genetic Algorithms? First of all, you couldn’t do much worse than write some code to give to your son or daughter so that in their next algebra exam they can solve x + 5 = 6 with your tailored made Genetic Algorithm. They could just let your code chug away and sit there patiently while it aces the exam for them. Although not probably the most effective use of Genetic Algorithms, it is entirely possible.

So let’s take that thought one step further. What about solving the humble quadratic equation with a genetic algorithm? It has been done (and done againand again). But the quadratic equation belongs to pure mathematics right? In addition, it’s an equation you can solve directly isn’t it? Yes and yes but interestingly enough Genetic Algorithms have started to make their way into some of the purest area of mathematics to help solve problems that are stumping pure mathematicians, This is truly one area where you would not expect the tools of applied mathematicians to come to the rescue of pure mathematicians.

We have only scratched the surface of unexpected applications of Genetic Algorithms. In fact, they have made an appearance in almost every endeavour of research from physics to economics, from architecture to chemistry and even all the way back to their nature-inspired beginnings with numerous applications in biology. So in spite of our knowledge that there is no one method to solve all problems, Genetic Algorithms present us with a versatile and powerful tool that seems to have a lot more in store for our future problem solvers.

Read the second part here.

Biarri Group

Optimising With Big Data

Can optimisation save the world? Well it depends on the problem.

Optimisation may not be able to solve all of humanity’s problems but it has recently made important strides to help avert the impending global warming disaster. As I’m sure you’re aware, the global warming crisis is rated by scientists worldwide as being one of the most important issues of the twenty-first century. For decades now scientists have been warning us that we need to wean ourselves off our dependency on fossil fuels. Although replacements to fossil fuel-based energy have existed for over half a century, none of them seem to have made a significant impact on the global warming issue yet. For example wind power can currently produce electricity at a price that is close to fossil fuels based electricity. However, in spite of the impending crisis, wind power is still to enter our energy mix in a big way. One of the reasons for this is not simply because of lack of political will but because of a lack of good optimisation tools. Although, in the last five years it seems that the winds of change are blowing because some of wind power’s biggest opponents have gone silent. So what has happened?

To be able to appreciate what has happened and why optimisation is so important, we need to understand what the problem is and to be honest, the problem is wind. Unfortunately, like visits from your in-laws, wind does not come when we want it to nor does it behave like we want it to (imagine generating wind power during a cyclone!). So, due to its unpredictability, naively using wind power requires companies to have back-up power plant on standby just in case the wind suddenly stops. Back-up power plants are much less efficient than a permanently on power plant and counter-intuitively a situation like this can lead to more CO2 pollution than if we had just used a coal fired power plant!

What has happened in the recent past, which has turned wind energy from a much debated dream into a reality, is an optimal use of wind turbines to generate energy. Due to such optimal use of wind turbines, in Colorado, USA, the energy company Xcel Energy (which mailed its customers to oppose a meagre 10% renewable energy contribution), is now planning on profitably sourcing up to 30% of its energy from wind by 2030 (Reference). This is more than the famous 20% by 2020 proposed by Germany (which is now coming under fire due to high energy costs) and owes its success to better weather prediction and an optimal use of the wind turbines.

The case of optimising wind turbine electricity production is obviously a complicated affair given the fact that wind can change its direction and strength continuously throughout the whole day. So, in its simplest form, the challenge for those operating wind turbines is, given predictions about wind strength and direction in the next moment, how should the turbine’s parameters be adjusted to maximise electricity output. These parameters typically translate into physical properties of the wind turbine such as the direction the turbine is facing, the angle of the turbine blades, the speed and torque of the turbine drivetrain, and so on. Given this significant challenge, even if we could perfectly predict the wind conditions in the next moment, we probably would have no way of knowing the perfect parameter values for each given set of wind conditions. To find a solution to this problem, scientists have turned to a set of tools used in big data called machine learning tools to predict optimal parameters (and also to help predict the weather too!).

Machine learning tools are typically used when there is a large amount of data available to teach a system to behave in a certain optimal way. In our case, it is to teach the wind turbines to adjust their parameters optimally to maximize their electricity output given the predicted wind conditions. Typical examples of machine learning techniques include Bayesian Networks (used everywhere from biology to economics), Support Vector Machines (often for recognizing handwriting) and, probably most well-known, Artificial Neural Networks (which were used famously in 2012 by Google to pursue the noble cause of identifying pictures of cats in videos).

In the current case of optimising parameter selection for optimal electricity output of wind turbines, one promising tool to find the best choice of parameters is Artificial Neural Networks. For those working in the industry, this choice may seem natural, however, implementing a good neural network is wrought with difficulties. One needs to choose a correct model for the function to be optimised, given the available data one needs to choose a good learning algorithm to train the neural networks correctly and finally the computations need to be robust to new and unseen inputs. It is precisely in these fields that we have seen significant advances (and also stellar buyouts such as DeepMind for $450 million in 2014) which have led to more efficient algorithms for parameter selection. In turn this has led to the recent adoption of more and more wind turbines to generate renewable energy due to the more electricity produced by each turbine. In 2013 in Australia alone, plans were underway to install another eleven wind farms adding 1627 MW to the grid (Reference).

However, when talking about advances in wind turbine technology what constitutes a significant advance? Let’s take an example of a country that is not even a major player on the world scene but exemplary when it comes to wind energy generation, Spain. According to the Guardian, wind energy was Spain’s top source of electricity in 2013. It generated approximately 54,000 GWh of electricity in 2013 from wind turbines. So assuming that we are able to improve our optimisation methods so that electricity production increases by a mere 0.1%, then, in Spain alone, we would generate an extra 54 GWh of electricity. Given that 1 MWh supplies up to roughly 200 households, this would mean that simply by increasing output by 0.1% in Spain alone, we can supply two cities with Sydney’s population with the extra electricity!
Facts like this are what cause insignificant sounding headlines by Siemens and GE to be such game changers. Recently Siemens claims to have had a breakthrough in their neural network algorithms that enables their turbines to produce one percent more electricity annually under moderate wind conditions. Given our previous calculation we can see how important this is. Even more impressive than this, given what a one percent difference can do, was GE’s headline that they are able to increase their wind turbine output by five percent (reference)! They haven’t stated specifically how they did it but their description of “big data” and “predictive algorithms” would lead one to guess at similar machine learning algorithms as for the Siemens success story.
As we can see all these small gains can add up and as the reader can guess, without the optimisation tools available today these feats would not have been possible. We can only hope that these improvements point to an industry that is poised to revolutionise our energy industry and, with the help of optimisation, maybe save the world.

Dynamic scheduling in the face of uncertainty

Many businesses operate in uncertain environments, at the mercy of breakdowns, human error, traffic or the weather. The traditional approach is to build slack into a schedule, to survive some expected level of uncertainty, and then to hope this schedule can be executed in spite of whatever unforeseen events actually occur. In instances where the schedule cannot be implemented as expected, operational decisions are made to try to meet the business goals as much as possible. Under complexity and time pressure, decision makers must often resort to making short term, locally focused decisions without the time or tools to consider implications downstream.

In this blog post, optimisation consultant Dave Scerri describes how recent algorithmic advances and smart tools have enabled the best of both worlds: an operational scheduling approach that responds to changing circumstances while focusing on system wide efficiency.

Dealing with operational uncertainty is one of the biggest challenges facing many businesses today. The most well thought out plans and schedules rarely survive contact with the real world, and the consequences are often a loss of productivity and overall efficiency. Businesses should be endeavouring to make the best decision at any point in time given the current circumstances and the best information available to them. Operational scheduling, when done right, can allow businesses to make quality, timely decisions, maximising their productivity in the face of uncertainty.

The best laid plans of mice and men often go awry

– Robert Burns

Data! Data! Who owns the data?

When I started in this industry there was no such thing as a CIO, though it wasn’t long in coming.  IT was usually found in the Finance department under the CFO (who had absolutely NO CLUE about IT at all).  I typed memos on a PC using some character-based word processor but had to print the memos out and put them into internal mail because we didn’t have email!  At the time, companies owned and managed their own big mainframes, and corporate information systems were generally character-based and accessed via a terminal, or, in advanced cases, through a terminal emulator running on a PC – woo hoo!.  There was no concept of “data as an asset” and it was bloody expensive to buy storage, so every effort was made to minimise the size of any database.  ERP was the hottest thing ever and was going to revolutionise business by eliminating all those pesky employees required to manually execute transactions.

So, what’s different a quarter of a century on?  Lots of things, obviously; I’ll just cherry pick a few to make my point.  The falsities around ERP marketing hype were harshly discovered by everyone who bought into it and the message shifted from “you can get rid of employees by automating your transactions” to “think what you can do with access to all of the data that’s now available!”  The computing platform and apps have shifted so far they’re on another planet; who needs a word processor now that our email programs are so sophisticated?  Do companies even have internal mail rooms any more?  “Data is an asset” and, with relatively cheap storage, companies have lots and lots and lots of it.  We have Chief Information Officers (CIOs) who are supposedly responsible for the company’s information but, after a quick search for a modern definition of the role, seem to be mainly focused on the appropriate use of technology within the organisation.  Now, Analytics is going to revolutionise business!

OK, I’ve bought into that line about analytics.  It’s really cool stuff.  However, analytics is data hungry.  In fact, it’s famished.  But, it doesn’t need just any data.  It needs good, clean, sanitary data!  “So, what is that?” you ask.

Let me illustrate with an example of what it is NOT; I’ve got to be a bit cryptic to protect the innocent, but hopefully you’ll get the idea.

Let’s take a company that has over 10 years of sales data in a 120GB database.  The level of detail tracked for each purchase if fantastic!  Almost overwhelming, in fact, as there are hundreds of tables to mine.  We know each product purchased, quantity and date; each transaction is lovingly coded with a 3-character transaction type and 3-character discount code (if any) amongst a plethora of other highly informative codes.  Codes relate back to a “master” table where you can find the highly informative description.

“Wow!”  “Great!”, you think.  Now we can use those handy codes to look for buying patterns and, maybe, predict sales.  If we are good, we might be able look for a slow down in sales and trigger a “why don’t you introduce discount x” message which, again if we’re good, will result in a boost in sales which we can track (testing our hypothesis and adding that information to the mix).

Everything seems good.  Then you realise that the end-users have the ability, and right, to add any code they want at any time, even if it means the same as a previously used code (just has a slightly different 3-character code).  Even better, they can reuse a code with a totally different description (hence meaning) from year-to-year or product-to-product!  This data is now pretty much useless because there is no consistency in the data at all. We can’t look for patterns programatically.  A human would have to trawl through the hundreds of thousands of records to recode everything consistently.

In talking with the customer, the department that is interested in doing the sales analysis has no control over the department doing data entry.  The department doing data entry is following their own agenda and doesn’t see why they should be inconvenienced by entering data according to the shims of another department.

At another customer, the spatial data that is used to catalogue assets is owned by the spatial specialists who enter the data.  The spatial database has been designed to provide information on which assets are located where (the final state of the asset).  It does not support the question: “what do I need to do to install the asset?”  For example, installation of an asset might require significant infrastructure to be created.  Let’s say a platform needs to be constructed and some holes need to be dug for supports.  Even though someone has gone out and assessed the site ahead of time (it’s going to be on concrete so we need to get through the concrete first, which is harder than just going into grass, and then need to make sure the concrete is fixed after installation) and that information is held in a separate, Excel (ugh) file with a reference back to the asset identifier, it is not supplied in the spatial database.  Why?  because it’s not relevant to the final state of the asset, only the construction of the asset. Once construction is complete they don’t care about the fact that it’s installed over concrete.  So, in planning construction someone has to manually review the Excel files against the spatial database to plan the cost, timing and means of constructing the asset.  The spatial specialists don’t see why the construction information should be entered into the database; it will take them longer to update the database and the information needs to be maintained until it becomes obsolete (after construction).  Yet, by having that data in the database the cost, timing and means of construction could be automatically generated saving not only time, but also errors generated through the manual process!

Am I the only one who finds these situations bizarre?  Irritating?  Annoying?  Unbelievable?

Remember the tree-swing cartoons? http://www.businessballs.com/treeswing.htm

How were these issues resolved in the manufacturing industry?  Someone realised that sales increased and costs reduced when they got it right!  And those companies who didn’t solve the problem eventually went out of business.  Simple, right?

So, I pose the following questions:

  • Are companies who aren’t able to solve this problem with their data going to die a painful death, as those who can solve it overtake them through the “power of analytics?”  I think, Yes!  And, they deserve to die!  (though, what will their employees do?).
  • Who in the organisation has ultimate responsibility for ensuring that data meets the organisation’s needs today and into the future?  I naively assumed that the CIO would make sure that data would be relevant and useful across the entire organisation, across the years (as much as possible).  However, this does not seem to be the case.  Departments are still fighting over who owns what data and don’t seem to be willing to help each other out for the overall good of the company.  Surely we don’t need to pay yet another executive obscene amounts of money to get this right?
  • Maybe the Universe is just trying to send me a message here by sending me all the difficult cases?
  • Maybe I’m just being overly cynical due to lack of sleep and food…

Here’s to better data!

Sexy Graphs with nvd3

For a recent project I needed to implement some nice-looking graphs.  A colleague recommended the nvd3 library, which is built on top of the awesome d3.js package, so sexiness was all-but-guaranteed.  Here’s my experience with using nvd3.

First, I found that there is no documentation for nvd3 per se (there’s plenty for d3).  Fortunately there are lots of source code examples on the github page.

The most trouble I had was with the stacked bar chart.  I’m using a Discrete Stacked Bar Chart (as described here).  It is somewhat fussy with its data format.

The basic format for the data points is:

[
{
key: “series name”,
color: “#FF00FF”,
values:
[
{ x: “A”, y: 33 },
{ x: “B”, y: 44 },
{ x: “C”, y: 14 }
]
},
{

}

]

The chart will start acting quite strangely (weird behaviours including drawing artifacts when displaying and also animating, and “NaN”-style messages in the developer toolbar) if the x’s for different series are out of order (ie. different one series to the next); also all values have to exist for all the series, which means padding out the data with zeroes if necesary.

The library also tries to avoid labels overlapping on the x-axis; it generally seems to do this by missing out values.  However this can run into trouble when the labels are lengthy, for example:

The multichart also has a few slight layout issues, where it can cuts off values on the right axis; it also gets a little finnicky when you filter off (using the little legend circles) all of the series on one side (for me, it refused to get into a state where it would only show series from one side).  Other than these minor issues it seems to work quite well.

It’s worth pointing out that there seems to be more examples in the github area for nvd3 than are displayed on the main website; for example, this rather busy multichart which features several different types of graph:

I found myself needing to simplify some of the examples, to test how particular things work and to expose a bit more explicitly the required format of the data series (as some of the examples feature programmatically-generated data); for example to play with multicharts which have only line graphs in them, I used the following code:

var testdata = [{},{},{},{},{},{},{}];

for (var i = 0; i <= 6; i++) {
testdata[i].type = “line”;
testdata[i].yAxis = (i <=2 ? 1 : 2);
}

nv.addGraph(function() {
var chart = nv.models.multiChart();

chart.xAxis.tickFormat(d3.format(‘,f’));
chart.yAxis1.tickFormat(d3.format(‘,.1f’));
chart.yAxis2.tickFormat(d3.format(‘,.1f’));

for (var i in testdata) {
testdata[i].key =’Stream’ + i;
var c = i * 8 * (0.1*i);
testdata[i].values= [ {x:9,y:12+c},{x:10,y:94+c},{x:12,y:99+c},{x:13,y:121+c},{x:15,y:133+c},{x:16,y:148+c},{x:19,y:199+c},{x:22,y:299+c}];
testdata[i].color=’#FF’ + (100+(150-i*25)).toString(16) + (100+(i*25)).toString(16);
}

d3.select(‘#chart svg’).datum(testdata).transition().duration(500).call(chart);

return chart;
});
Further customisation of the graphs is certainly possible with some tinkering and help from StackOverflow and Google Groups posts (e.g. having one of a series switched off by default – you just include disabled: true when defining the series’ object).

Because nvd3 is SVG based, saving a chart to an image is not easy like it is with, say, canvas.  Two options for accomplishing this appear to be phantomjs, or, possibly more simply, canvg.

Of course nvd3 has competitors, such as xCharts (which is also built on d3.js), polychart.js and Highcharts JS (both commercial products).

Despite the small issues I had with nvd3, most of which are easily enough worked-around once you’re aware of them, the visual results are certainly nice – seeing bars animate and shift when filtering from one set of data to the next is guaranteed to get some “ooh’s” and “aah’s” from most audiences.

scirate logo

Using Scirate to stay up to date

So for the past few months, I’ve been using a website called Scirate to catch up on the latest research in my field (quantum computing), and while Scirate is quite well-known in my community, it seems that it is not so widely known outside it. So in this post I would like to introduce you to it, and comment briefly on my workflow, hoping to inspire you to use it, and create your own!

Scirate assumes familiarity with arXiv – a very popular website among the physics/maths/computer science community for publishing “preprints” – research articles before they are published in professional journals. The arXiv doesn’t do much in the way of filtering for technical quality, so it is an “exercise for the reader” to decide which papers are worth reading. Partly, Scirate helps to solve this problem, but depending on how you use it, you will still need to exercise some discretion based on arbitrary criteria of your choosing. I’ll admit to favouring authors I know (of), or research institutions that I know, or observe do a lot of work in a particular area.

Before continuing on with this article, I recommend you sign up for Scirate. It’s possible to use Scirate without signing up, but by default it only shows papers in the “quantum physics” category.

Having signed up, the most important task is to properly configure the “Subscriptions” section. Navigate to this section of the website, and you will be faced with a very large list of of words with checkboxes. The words correspond to arxiv categories. To find out what they mean, they are listed here – arXiv categories. As an example, below is my part of my selection:

Having properly configured the list of categories you are interested in, head back to the Scirate homepage, and you will see a list of the papers for the currently-selected period. By the default this period will be for the current “day”. I put day in quotes, as on the weekend, no new papers are processed, so it remains constant over that period. Note in particular that you can select different periods to browse.

As mentioned above, one of the main features of Scirate is the ability to “Scite” things. For example, consider my current homepage view (figure below).

Note I have Scited all these papers, as have a few other people. The idea being that particularly “popular” papers appear at the top. The implication is that the more people in your field who use the system the better it becomes!

My personal workflow for Scirate is to browse it daily, at around 2-3pm, and again in the morning, at about 8-9am. I choose these times as it appears that Scirate actually updates its daily listing at around 12pm in my timezone (Melbourne, Australia). So it means that I can go carefully through all the days papers, around lunchtime, “Scite” on the ones I want to read, and then come back the following morning to see what everyone else is interested in. In this way I catch any papers that I might’ve missed in my first pass!

Perhaps you have another way of using it! Feel free to share it!