Posts

Custom Excel Charts: Scatter Plots

 

I am often required to create some very specific charts for clients that are far outside of the capability of regular Excel charting functionality. I use VBA to create these custom charts in a dynamic way to visualise outputs of mathematical modelling. These outputs are important to clients as they allow them to view the complex outputs in simple, familiar ways.

I have in the past created L-Shaped Graphs for clients that helped optimise the use of space for a dynamic Excel Dashboard that needed to be constrained to an A4 print out.

More recently, I have been required to create a custom ‘March Chart’ that plots the movements of different work crews over space and time. I received a print out of an old ‘March Chart’ that had been manually created by the client, and was asked to replicate it exactly in Excel so that it could be viewed as an output of a mathematical model.

 

THE POWER OF SCATTER PLOTS

When creating a custom chart in Excel, you must use one of the existing charts as a template. This template can then be twisted and abused into doing what you need through tricks and manipulation within VBA.

For the March Chart project, I discovered just how very useful a scatter plot can be as the template for an advanced Excel chart. The scatter plot can be thought of as a blank canvas on to which x,y data points can be plotted. Each data point can then be joined by a line to another data point, creating a straight line. A series of sequential data points with straight lines joining them called a ‘data series’ can be used to create paths and even ‘curves’. With this functionality in mind, you essentially now have a blank canvas that you can draw anything on in Excel with VBA!

This all seems far too good, so Excel had to put some barriers in our way to make it a little trickier.

  • For starters, there is the ever present issue of backward compatibility to Excel 2003. The object model used to create charts was changed from Excel 2003 to Excel 2007, so it is very important to be aware of this, and find out early if your client is using Excel 2003. Fortunately for myself, the client for the ‘March Chart’ had upgraded to at least Excel 2007 so this was not an issue for me.
  • More specific to the advanced use of scatter plots, Excel limits you to 255 data series on a chart. Each sequence of coordinates that join together to make a continuous line are defined as a data series. This means that you can only have 255 distinct continuous lines on your chart.
  • Each of these data series, can only consist of 255 data points. Unless your chart is extremely big and going to be shown on a very large screen, this is pretty easy to manage. There are two options:

o Split the data series into multiple data series. If you make the two data series the same colour, and begin the second data series at the last point of the first data series then they will appear to be a continuous line. This is a good option if you are not close to the limit of data series, however can create more issues to overcome as an additional legend entry will be created.

o For any data series that has more than 255 potential data points, skip over every second, third, etc. depending on the actual number so that the number of data points implemented to the actual plotted data series remains under 255. This is my preferred method as no additional work is required in fixing the legend and is in my opinion neater.

The following is an example output of the March Chart showing different colours for different work crew types. You can track the geographic location over time of each work crew very simply with this visualisation.

 

Excel 2D Data Lookup using INDEX with MATCH

INDEX with MATCH enables lookup of a cell in with an unknown reference in a matrix with row and column labels.

The two formulas:

 

INDEX

Syntax: INDEX(array, row_num, [column_num])

Index is very simple and the syntax is fairly self explanatory.

INDEX takes an array of data as the first parameter, a row number as the second and a column number as the third. The function then return the value in the array referenced by the row and column specified.

An example array in Excel:

To return ‘Black’ using the INDEX function use the formula:

=INDEX(B2:D4,3,2)

Note that the second parameter refers to the 3rd row in the array B2:D4 and not the 3rd row of the spreadsheet. This applies for the column reference also.

When used like this, INDEX appears to be fairly limited in its practical applications, as if you already know the row and column number of a cell in the array, you can generally reference the cell directly.

However, when used with MATCH, INDEX becomes one of the most powerful and useful functions in Excel.

 

MATCH

Syntax: MATCH(lookup_value, lookup_array, [match_type])

MATCH is again quite simple on its own however requires a little more consideration and care than INDEX.

MATCH takes a lookup value as its first parameter and a lookup array as the second. For our purposes the lookup array needs to be a single dimension. This can be row wise or column wise.

MATCH then returns the index of the lookup value if it is found in the lookup array.

However false positives are easy, as the third parameter of MATCH is the [match_type] which takes one of three values:

The default value is 1, and this can cause issues as if you forget to add this optional parameter then you could very easily get incorrect lookups occurring that are hard to notice. This is because the lookup will still return a value even if it can not find what you are looking for. MATCH will return some value that is ‘Less than’ the lookup value, which is very rarely what you actually want. It is important when using MATCH with INDEX for matrix lookups to remember to set match_type parameter equal to 0.

 

An example array in Excel:

To return the row index of the array for the value “Colour 2”:

 

=MATCH(“Colour 2”,A2:A4,0)

This will return: 2

Note that this is the row number of the lookup value “Colour 2” in the lookup array and not in the worksheet.

 

INDEX with MATCH

MATCH is used to provide logic to the row_num and column_num parameters of INDEX.

If a matrix in Excel has labels for both the rows and columns, then INDEX with MATCH can lookup a specific cell in the matrix with ease.

An example of such a matrix:

INDEX with MATCH can be used with this matrix to answer questions such as ‘What is Colour 2 of Group 3?

Repeating from above, the syntax for INDEX is:

INDEX(array, row_num, [column_num])

The input array for this example is B2:D4.

The row_num and column_num parameters need to be found using MATCH as shown above.

row_num: MATCH(“Colour 2”,A2:A4,0) will return 2

col_num: MATCH(“Group 3”,B1:D1,0) will return 3

These can then be put into INDEX to create our formula:

=INDEX(B2:D4,MATCH(“Colour 2”,A2:A4,0),MATCH(“Group 3”,B1:D1,0))

This will return pink.

 

I now use INDEX with MATCH nearly every time I do data analysis and manipulation in Excel. Clients often have data in 2D Matrices, and it is no longer a hurdle for me to use this data with ease.

 

 

 

The ‘L’ Shaped Excel Graph

 

I recently had to develop a graphical report in Excel that was required to fit perfectly on a single A4 page for printing purposes. A lot of information was supposed to fit on this page, so using the space well was very important. After the initial formatting was reviewed it was decided that one graph in particular needed to be bigger and clearer. With a bit of reshuffling the final space available for this graph was a big ‘L’ shape at the bottom of the page.

A perfect amount of space but unfortunately part of the ‘L’ would have to go to waste – unless some sort of ‘L’ shaped graph could be created…

The ‘L’ Shaped Graph:

 

The ‘L’ shaped graph utilises the extra space by moving the title and legend away from the actual data. This results in the data having space to expand and become more readable (very important with this detailed ‘Clustered-Stacked Column Graph’). It is important to note that once created, it is very difficult to edit this graph and it should most likely be treated as read only.

How it is done:

  • Finalise the data for your chart. (The legend will no longer update to reflect changes in the data once we are done)
  • Create a copy of the chart
  • Delete all axis labels and gridlines on the copy.

Now comes the trick – removing the data without affecting the legend:

  • To do this, select each data point on the copy (not data series) and right-click: Format data point. When you first click a data point it most likely selects the entire series, so click again to target just the data point. If you were to do this for the series then the legend would update to reflect the change, which we don’t want.
  • Select No-fill in the Fill tab. This will white out that data point. Do this for all the data points until just the legend and title remain on the copy.

Now to fix up the original chart:

  • Select the legend and title of the original chart and delete
  • Resize to utilise the space

The rest is just simple formatting to customise the graph to your needs. Here are some of the things I did:

  • Select the legend of the copy and right-click: Format legend
  • In Legend Options: Legend Position select ‘Top’.
  • Now resize the legend of the copied chart to whatever you like. You may notice that the chart area, although blank, is still selectable and may be an annoyance while resizing other aspects of the copy. This can not be deleted however you can resize it to be very tiny in one of the corners of that graph.
  • Remove the borders of both the original graph and the copy. Right-click: Format Chart area: Border Colour: No line.
  • Now create borders to surround the ‘L’ shaped chart on the spreadsheet itself to make it appear like the original and the copy are joined.

 

The process of whiting out each data point can be very tedious. In practice I created this graph with VBA to automate the process. There are some more tricks required if using VBA for the whole graph however I won’t go into them here. Like all VBA, heavy use of the macro recorder, trial and error, a little blind luck and a lot of patience go a long way.

SQLite

I recently needed to organise and filter a heap of data from a new client. I didn’t want to deal with the overhead of a full-blown database and decided to try sqlite3. As it turns out, it was really easy to work with since the bindings are included with Python2.6. All I needed to do was read a bit on how to interface Python with sqlite here: http://docs.python.org/library/sqlite3.html#module-sqlite3 and I was good to go! I also installed a nice database management utility called SQLite Database Browser v2.0b 1 which you can get here: http://sqlitebrowser.sourceforge.net. It makes managing the structure of the database a bit easier than working in a windows command prompt and you can write SQL on-the-fly if you’re having some problems with your Python. I find that it is pretty stable (though some of my poorly written SQL queries do send it into a tizzy and I need to kill it and reopen).

Now, I’ve decided to create a sqlite database and integrate it with the Excel front-end for one of our solvers (used when clients require desktop deployment). I anticipated that integration with an SQL database would greatly simplify and speed-up the reporting (with the added bonus of a significant reduction in the need for me to write complex vba code). Initially I banged around getting really frustrated with Excel, and DAO (even after I installed the ODBC driver available here: http://www.ch-werner.de/sqliteodbc/). Then I discovered SQLite for Excel here: http://sqliteforexcel.codeplex.com/. Whew! So far, I have found it very easy to work with and I am busy completing my reporting tool.