Melbourne Open Science Workshop 19th of July 2014

Melbourne Open Science Workshop on the 19th of July!

I am very excited to announce that on the 19th of July, the first Open Science Workshop will be held in Melbourne, Australia, at Inspire 9.

 

It’s going to be a really exciting day. It’s completely free to attend, and there are still some tickets available here – Melbourne Open Science Workshop Tickets – or on the website itself.

 

The plan is to get together 100 scientists and researchers, and talk to them about all the cool open science software that is out there, and get everyone started using the really cool collaborative tools, that can help make papers and research truly reproducible.

 

On the day we will be talking specifically about GitHub and the SageMathCloud. We will spend the morning learning getting set up with Git, practising pushing, pulling, forking and pull requests, motivated with examples.

 

In the afternoon, we will hear from some people working on open science projects; and then will take a look into how to work with IPython notebooks in Git and the SageMathCloud (SMC). Among other things, SMC provides a way to run IPython notebooks in a collaborative environment. This means you can share computations with colleagues, and keep everything nice and controlled in repositories.

 

If you are a scientist or researcher and want to learn about easy ways to make papers more reproducible; about the best ways to collaborate with your colleagues and generally have fun and work towards make the scientific process more open and available to all, then you should come along! Hope to see you on the day! Head over to the website for more informatio and to reserve your spot!

 

I should also thank our sponsors: Biarri Networks, GitHub, and Inspire 9.

Data! Data! Who owns the data?

When I started in this industry there was no such thing as a CIO, though it wasn’t long in coming.  IT was usually found in the Finance department under the CFO (who had absolutely NO CLUE about IT at all).  I typed memos on a PC using some character-based word processor but had to print the memos out and put them into internal mail because we didn’t have email!  At the time, companies owned and managed their own big mainframes, and corporate information systems were generally character-based and accessed via a terminal, or, in advanced cases, through a terminal emulator running on a PC – woo hoo!.  There was no concept of “data as an asset” and it was bloody expensive to buy storage, so every effort was made to minimise the size of any database.  ERP was the hottest thing ever and was going to revolutionise business by eliminating all those pesky employees required to manually execute transactions.

So, what’s different a quarter of a century on?  Lots of things, obviously; I’ll just cherry pick a few to make my point.  The falsities around ERP marketing hype were harshly discovered by everyone who bought into it and the message shifted from “you can get rid of employees by automating your transactions” to “think what you can do with access to all of the data that’s now available!”  The computing platform and apps have shifted so far they’re on another planet; who needs a word processor now that our email programs are so sophisticated?  Do companies even have internal mail rooms any more?  “Data is an asset” and, with relatively cheap storage, companies have lots and lots and lots of it.  We have Chief Information Officers (CIOs) who are supposedly responsible for the company’s information but, after a quick search for a modern definition of the role, seem to be mainly focused on the appropriate use of technology within the organisation.  Now, Analytics is going to revolutionise business!

OK, I’ve bought into that line about analytics.  It’s really cool stuff.  However, analytics is data hungry.  In fact, it’s famished.  But, it doesn’t need just any data.  It needs good, clean, sanitary data!  “So, what is that?” you ask.

Let me illustrate with an example of what it is NOT; I’ve got to be a bit cryptic to protect the innocent, but hopefully you’ll get the idea.

Let’s take a company that has over 10 years of sales data in a 120GB database.  The level of detail tracked for each purchase if fantastic!  Almost overwhelming, in fact, as there are hundreds of tables to mine.  We know each product purchased, quantity and date; each transaction is lovingly coded with a 3-character transaction type and 3-character discount code (if any) amongst a plethora of other highly informative codes.  Codes relate back to a “master” table where you can find the highly informative description.

“Wow!”  “Great!”, you think.  Now we can use those handy codes to look for buying patterns and, maybe, predict sales.  If we are good, we might be able look for a slow down in sales and trigger a “why don’t you introduce discount x” message which, again if we’re good, will result in a boost in sales which we can track (testing our hypothesis and adding that information to the mix).

Everything seems good.  Then you realise that the end-users have the ability, and right, to add any code they want at any time, even if it means the same as a previously used code (just has a slightly different 3-character code).  Even better, they can reuse a code with a totally different description (hence meaning) from year-to-year or product-to-product!  This data is now pretty much useless because there is no consistency in the data at all. We can’t look for patterns programatically.  A human would have to trawl through the hundreds of thousands of records to recode everything consistently.

In talking with the customer, the department that is interested in doing the sales analysis has no control over the department doing data entry.  The department doing data entry is following their own agenda and doesn’t see why they should be inconvenienced by entering data according to the shims of another department.

At another customer, the spatial data that is used to catalogue assets is owned by the spatial specialists who enter the data.  The spatial database has been designed to provide information on which assets are located where (the final state of the asset).  It does not support the question: “what do I need to do to install the asset?”  For example, installation of an asset might require significant infrastructure to be created.  Let’s say a platform needs to be constructed and some holes need to be dug for supports.  Even though someone has gone out and assessed the site ahead of time (it’s going to be on concrete so we need to get through the concrete first, which is harder than just going into grass, and then need to make sure the concrete is fixed after installation) and that information is held in a separate, Excel (ugh) file with a reference back to the asset identifier, it is not supplied in the spatial database.  Why?  because it’s not relevant to the final state of the asset, only the construction of the asset. Once construction is complete they don’t care about the fact that it’s installed over concrete.  So, in planning construction someone has to manually review the Excel files against the spatial database to plan the cost, timing and means of constructing the asset.  The spatial specialists don’t see why the construction information should be entered into the database; it will take them longer to update the database and the information needs to be maintained until it becomes obsolete (after construction).  Yet, by having that data in the database the cost, timing and means of construction could be automatically generated saving not only time, but also errors generated through the manual process!

Am I the only one who finds these situations bizarre?  Irritating?  Annoying?  Unbelievable?

Remember the tree-swing cartoons? http://www.businessballs.com/treeswing.htm

How were these issues resolved in the manufacturing industry?  Someone realised that sales increased and costs reduced when they got it right!  And those companies who didn’t solve the problem eventually went out of business.  Simple, right?

So, I pose the following questions:

  • Are companies who aren’t able to solve this problem with their data going to die a painful death, as those who can solve it overtake them through the “power of analytics?”  I think, Yes!  And, they deserve to die!  (though, what will their employees do?).
  • Who in the organisation has ultimate responsibility for ensuring that data meets the organisation’s needs today and into the future?  I naively assumed that the CIO would make sure that data would be relevant and useful across the entire organisation, across the years (as much as possible).  However, this does not seem to be the case.  Departments are still fighting over who owns what data and don’t seem to be willing to help each other out for the overall good of the company.  Surely we don’t need to pay yet another executive obscene amounts of money to get this right?
  • Maybe the Universe is just trying to send me a message here by sending me all the difficult cases?
  • Maybe I’m just being overly cynical due to lack of sleep and food…

Here’s to better data!

scirate logo

Using Scirate to stay up to date

So for the past few months, I’ve been using a website called Scirate to catch up on the latest research in my field (quantum computing), and while Scirate is quite well-known in my community, it seems that it is not so widely known outside it. So in this post I would like to introduce you to it, and comment briefly on my workflow, hoping to inspire you to use it, and create your own!

Scirate assumes familiarity with arXiv – a very popular website among the physics/maths/computer science community for publishing “preprints” – research articles before they are published in professional journals. The arXiv doesn’t do much in the way of filtering for technical quality, so it is an “exercise for the reader” to decide which papers are worth reading. Partly, Scirate helps to solve this problem, but depending on how you use it, you will still need to exercise some discretion based on arbitrary criteria of your choosing. I’ll admit to favouring authors I know (of), or research institutions that I know, or observe do a lot of work in a particular area.

Before continuing on with this article, I recommend you sign up for Scirate. It’s possible to use Scirate without signing up, but by default it only shows papers in the “quantum physics” category.

Having signed up, the most important task is to properly configure the “Subscriptions” section. Navigate to this section of the website, and you will be faced with a very large list of of words with checkboxes. The words correspond to arxiv categories. To find out what they mean, they are listed here – arXiv categories. As an example, below is my part of my selection:

Having properly configured the list of categories you are interested in, head back to the Scirate homepage, and you will see a list of the papers for the currently-selected period. By the default this period will be for the current “day”. I put day in quotes, as on the weekend, no new papers are processed, so it remains constant over that period. Note in particular that you can select different periods to browse.

As mentioned above, one of the main features of Scirate is the ability to “Scite” things. For example, consider my current homepage view (figure below).

Note I have Scited all these papers, as have a few other people. The idea being that particularly “popular” papers appear at the top. The implication is that the more people in your field who use the system the better it becomes!

My personal workflow for Scirate is to browse it daily, at around 2-3pm, and again in the morning, at about 8-9am. I choose these times as it appears that Scirate actually updates its daily listing at around 12pm in my timezone (Melbourne, Australia). So it means that I can go carefully through all the days papers, around lunchtime, “Scite” on the ones I want to read, and then come back the following morning to see what everyone else is interested in. In this way I catch any papers that I might’ve missed in my first pass!

Perhaps you have another way of using it! Feel free to share it!

What Google Does Right

I’ve appreciated Google’s mission and its modus operandi for a long time now.  I’ve avidly read Planet Google, many Wired articles, and a number of blogs and other pieces about the company.  But what I want to address here is how Google provides a great user experience, what enables it as a company to follow the path it does, and what smaller companies can learn from it.
Keeping it Simple
It’s easy to state but hard to do right, and often requires deep design to accomplish, but it’s one thing that Google does extremely well: it keeps its interfaces simple.  This ability is, for sure, enabled and exploited by the very nature of the company: it’s a web-based company through and through.  That means it can radically simplify so many things that mass consumer computing users find so hard: a big example being navigating and using a file-based storage system.  Instead, of course, everything is stored by Google (in its “cloud”, if you like) – and this simply obviates the need for a Save button, a Load Button, and all that junk.  Nor do you need any IT infrastructure to use most of Google’s products (email, for example).  By saving your documents or writing automatically – quietly and regularly, the way it should be done – the user never needs to even think about the where or how of storing data.  Except, that is, if you need to categorise – but here again Google makes sure that its core capacity – that of Search – is always front and center and powerful enough to find whatever you need.  A user experience should aim to empower the user, not baffle or frustrate them, and in this regard Google generally succeeds admirably.
The Power of Free
By providing many of its products free to the mass consumer market, Google owes its audience nothing.  This gives it free rein to change and improve (in short, to innovate).  By having lots of small but focused products, it can bring on or cull away products quickly (generally at the lightning-quick speed of the web world, and impressively fast for such a big company).  Here again Google understands right in its DNA both freemium and the web’s “Everything, free” tendencies.  Google is also very good at knowing what to keep hidden – its apps are great at hiding functionality that is less relevant to day-to-day usage from the user (they’re often there, but you have to dig a little to find them).
But Will It Scale?
Google as a company has shown an almost terrifying ability to grow, but to grow without collapsing under its own weight.  One way that they do this is by – in the main – using low cost easily available hardware (which has financial benefits as well as intangible benefits), even in huge data centers; a Commodity Computing approach (they even store their servers in shipping containers).  Development has an open feel to it, and is often open sourced or provides public platforms and APIs; Google Labs and techniques that expose Beta versions show Google developing software often in public view – compare to the secrecy that often surrounds Apple development.  Product support is often scaled by using open forums where members of the public helps each other.  Internally there is an almost astounding lack of management hierarchies.  In fact one could conjecture that Google is probably not really a big company as such, but a network of highly connected small companies that share common DNA and some common base technologies (often through open sourcing or open standards).  The shelter of the larger entity (not to mention its profitability) give it the ability to take risks – if one of the smaller companies/products fails, it can be easily absorbed.
Of course, Google’s flagship Web Search also scales (it has to, to have any chance of covering billions of web pages).  But interestingly, it seems to me that the success of Google’s PageRank algorithm – the core of it’s Web Search function – is largely because at heart the algorithm combines both the human and the machine in a very effective way – the human aspect is the importance of a page due to linking (a result of human activity) along with a series of quantifications (the rank).  But we are now also starting to see meta-data aware algorithms, that are getting nearer to natural speech, for example the Wolfram Alpha service (a so-called “computational engine”).  You can be sure that if Google truly cracks the problem of natural language search (which may or may not be equivalent to – perhaps a very dumb – AI), it will change the world (again!).  Indeed Google’s founders have stated that Google’s aim is to develop an Artificial Intelligence by way of Search, and there have been some startling successes: Google’s language translation service is apparently very good, and has resulted from a statistical approach enabled by massive data sets.
What Type of Company is Google, Anyway?
Google might be construed as an “information” company – after all, it wants to “provide access to all the world’s data”.  But there’s an important distinction to be made here – Google only cares about data insomuch as it is useful to someone (typically, consumers or businesses) – it does not care about information per se.  (That’s not to say Google will lose your data!).  The point is that Google is above all a technology company – it is enabling and automating the use of technology, predominantly software, but increasingly also hardware, to solve all sorts of engineering problems, and lots of data just happen to be the input.  Storing millions of search results, using millions of documents in different languages in order to automate translation, and many other examples support this view.  Google as a company is a master engineering problem solver, including solving some of its own internal problems.  Many of its products are happy accidents, or the results of its famous “20% time”, where its employees are given one day a week to pursue their own interests.  Google is like a giant R&D lab that also happens to be a corporation.  It also places huge importance on hiring the right people (smart ones), because it knows that great solutions come from clever minds – in fact CEO Larry Page personally signs off on every new hire.
What We Can Learn
The humble web start-up right through to the big unwieldy enterprise can learn much from Google’s approach, particularly if your products or services are targeting the mass consumer or massive business arenas:
  • Don’t discount the ability of technology to be a game changer.  Google has disrupted many industries.
  • User experience matters.  Strip away everything but the essentials to get the job done.  What’s left should work well.
  • Make sure your core product is healthy and pursue improvement and innovation as aggressively as you can.
  • Keep your technology and processes as open as possible.  Closed solutions harm innovation and sharing, which helps problem solving.
  • Scale through technology – automate as much as possible.
  • Give some of your product(s) away for free.

Further Reading: “Google Thinks Small” , Google’s “Ten Things We Know To Be True”, “How Google Works”

Pandoc for Document writing

I recently needed to write a brief report for university, and I was about to start it in Word, when I remembered that I had recently heard about pandoc by John MacFarlane.

Pandoc is written in Haskell (a language I’m trying to learn), and so I thought it would be fun an appropriate to give it a try. Happily, the Haskell environment is available on Windows and Linux (and I’d already installed it on both), so I installed it and got started.

As we all know, my editor of choice is Vim, so I fired it up and started a new document (let’s call it “test.markdown”). Initially I started with:

Brief Comments
==

Herein I describe my this document and the ...

Then I compiled this with the following pandoc command:

pandoc test.markdown -o test.pdf

Indeed this does produce a pdf! I was happy. It was, of course, formatted as if it had gone through a LaTeX stage; and indeed this is the case. Beautiful; “what more could I want!” I thought to myself.

Well, it turns out I shortly wanted an actual title for the pdf; the above produces only headings. So I soon found myself reading (probably with a little bit too much enthusiasm) the documentation on pandoc’s extended markdown format.

I noted with much happiness that the LaTeX-style math I included in the document was rendered appropriately, and you can do equation referencing in the typical way (just write the appropriate LaTeX command).

I was also very happy to learn that the correct way to start a document with a title is as follows:

% Title
% Author
% 

Brief Comments
==

Herein I describe my this document and the ...

(if you leave the line below Author blank, it will generate the date, otherwise you can simply write the appropriate data there). And I was also really overjoyed to find the section of bibtex support and bibliography style handling via CSL. In particular, the pandoc documentation will direct you to a github repository that contains an amazing number of bibliography styles.

From here I did as any good vim user would; I tried to find a vim plugin. I found one, called vim-pandoc. However, after a little bit of use on Windows, I noted that it was really slow in some aspects (mainly because it uses inline python in the plugin, instead of vim code). I noticed some other problems as well, so with the project being on github I decided to fork it: silky/vim-pandoc. My version will probably be in a little bit of flux over the next few days, but will hopefully stabilise shortly thereafter.

All-in-all using pandoc for the small report I had to write was successful, and I do hope to try it with future documents containing maths. In particular the markdown format also matches nicely with my other vim-based notes (so I could convert them if I decided it was appropriate) and it’s just plain easy to read and nice to use.

Another reason I was attracted to pandoc was because of the ability to output to slide formats (including beamer), so I’m really excited to give that a go.

And of course, this blog post itself was written in pandoc (ouput to .html)! So, maybe you will consider it the next time you need to write something!

Subscription payments

Selling things on the internet is quite easy and there are plenty of way to process credit cards and accept online payments. However subscriptions / recurring payments aren’t easy. They aren’t easy at all. Subscription changes, card failures, multiple subscriptions for a single user (different workbench products in our case) and multiple currency billing make it hard. Dealing with banks 80’s style systems makes it even harder.

When we were first working out how to cope with online payments we found lots of good looking options, all of which were only available to American companies or via paypal. (RecurlyCheddarGetterSpreedly) Eventually we found RBS Worldpay.Worldpay turned out to be too inflexible and far to much effort to set up. The time required was also insane. From the start of our application to ready to get our payments up took 7 months and significant amounts of cash. We gave up on Worldpay when we were trying (hard) to integrate it. Their system for recurring payments is so ill documented and inflexible to be useless. Users appear to require a separate login and password for Worldpay. This is unacceptable. Currently we’re looking into Saasy, from fastspring. They respond fantastically quickly to support request and got our testing site up and approved us in a day. So far it looks great, flexible and easy but I’ll post another blog when we’ve got it all live. I’ll also put some code for the server side part in python using cherrypy up on github, when it’s a bit more polished.

Loki

Debugging Internet Traffic

Below are some tools and thoughts that may be useful to people who are having difficulties trying to debug issues related to internet traffic.
There are a number of other tools (e.g. Firebug, Wireshark/ethereal, etc) that are not mentioned below. However I thought a quick one pager on some of the tools that I have used and found useful might be useful to someone else.

Tamper (Firefox plug-in)

https://addons.mozilla.org/en-US/firefox/addon/tamper-data/
This is a Firefox plugin which allows users to view the exact data they’re sending out through their browser. It’s useful for debugging traffic, as well as those times when your browser has remembered your password and you’ve forgotten your password (as you can see the password that is being sent). It also gives you the opportunity to intercept this data and change it which can also be useful for debugging purposes. The disadvantage with this tool though is that it will only work on data sent through Firefox. Fiddler2 (below) can be used to intecept traffic from a wider range of sources (e.g. iexplorer, firefox, wget, curl, … and other applications).

wget and curl

http://www.gnu.org/s/wget/
http://curl.haxx.se
These tools are very similar and are useful for sending web requests via command line. They both work in Linux and Windows (and other OSes). I was using this to help debug an issue we were having at a client site that was not allowing our newt traffic to be sent to our servers. It means you can easily change things like proxy settings, username/passwords, URLs, the posted data – without having to recompile an exe and copy it onto the client machine. I personally found curl worked better than wget. I couldn’t get the username/password information to be sent correctly with wget. There was plenty of help on the net, but it just didn’t seem to work for me for some reason. A typical example of the types of curl requests I was sending looked like this:
curl -L -x proxy_details:proxy_port –user username:password –output outuptfile.html http://website.com/method –data “user_info=stuffhere%3Bmorestuffhere%3Bextra_info=morestuffhere”
These command line tools can of course called via scripts as well if required.

Filddler 2

http://www.fiddler2.com/fiddler2/
I found this tool paricularly useful when diagnosing firewall/proxy issues as I could see exactly why various requests from curl/wget were, or were not, working.

I’m certainly not an expert on Fiddler2, and I’m not abou to try and tell you all the ins and outs of how to use this tool: firstly because I don’t know all of them, secondly because a quick Google search will help you there.
However, some of the useful features that I have found useful are:

  • It can monitor traffic from differnt sources. The “Process” column in the left pane in the above image shows that curl and firefox traffic has been detected.
  • The upper pane on the right shows the request details of the selected item in the left pane:
    • Headers
    • TextView
    • WebForms
    • HexView
    • Auth (useful if want to see how usernames and passwords are being handled)
    • Raw
    • XML
    • JSON
  • The lower pane on the right gives the response details of the selected item in the left pane:
    • Auth
    • Caching
    • Privacy
    • Raw
    • XML
    • JSON
  • Another one of the cool features of fiddler2 is that it can be used to debug encrypted traffic. It does this by installig a security certificate on your computer. Ideally you would only install this security certificate on a machine for debuggnig purposes and one that was not passing sensitive information over the internet as it poses a security risk. It should be removed from the computer once you have finished debugging (it’s easy enough to put another back in the future if requred).

Custom Excel Charts: Scatter Plots

 

I am often required to create some very specific charts for clients that are far outside of the capability of regular Excel charting functionality. I use VBA to create these custom charts in a dynamic way to visualise outputs of mathematical modelling. These outputs are important to clients as they allow them to view the complex outputs in simple, familiar ways.

I have in the past created L-Shaped Graphs for clients that helped optimise the use of space for a dynamic Excel Dashboard that needed to be constrained to an A4 print out.

More recently, I have been required to create a custom ‘March Chart’ that plots the movements of different work crews over space and time. I received a print out of an old ‘March Chart’ that had been manually created by the client, and was asked to replicate it exactly in Excel so that it could be viewed as an output of a mathematical model.

 

THE POWER OF SCATTER PLOTS

When creating a custom chart in Excel, you must use one of the existing charts as a template. This template can then be twisted and abused into doing what you need through tricks and manipulation within VBA.

For the March Chart project, I discovered just how very useful a scatter plot can be as the template for an advanced Excel chart. The scatter plot can be thought of as a blank canvas on to which x,y data points can be plotted. Each data point can then be joined by a line to another data point, creating a straight line. A series of sequential data points with straight lines joining them called a ‘data series’ can be used to create paths and even ‘curves’. With this functionality in mind, you essentially now have a blank canvas that you can draw anything on in Excel with VBA!

This all seems far too good, so Excel had to put some barriers in our way to make it a little trickier.

  • For starters, there is the ever present issue of backward compatibility to Excel 2003. The object model used to create charts was changed from Excel 2003 to Excel 2007, so it is very important to be aware of this, and find out early if your client is using Excel 2003. Fortunately for myself, the client for the ‘March Chart’ had upgraded to at least Excel 2007 so this was not an issue for me.
  • More specific to the advanced use of scatter plots, Excel limits you to 255 data series on a chart. Each sequence of coordinates that join together to make a continuous line are defined as a data series. This means that you can only have 255 distinct continuous lines on your chart.
  • Each of these data series, can only consist of 255 data points. Unless your chart is extremely big and going to be shown on a very large screen, this is pretty easy to manage. There are two options:

o Split the data series into multiple data series. If you make the two data series the same colour, and begin the second data series at the last point of the first data series then they will appear to be a continuous line. This is a good option if you are not close to the limit of data series, however can create more issues to overcome as an additional legend entry will be created.

o For any data series that has more than 255 potential data points, skip over every second, third, etc. depending on the actual number so that the number of data points implemented to the actual plotted data series remains under 255. This is my preferred method as no additional work is required in fixing the legend and is in my opinion neater.

The following is an example output of the March Chart showing different colours for different work crew types. You can track the geographic location over time of each work crew very simply with this visualisation.

 

Excel 2D Data Lookup using INDEX with MATCH

INDEX with MATCH enables lookup of a cell in with an unknown reference in a matrix with row and column labels.

The two formulas:

 

INDEX

Syntax: INDEX(array, row_num, [column_num])

Index is very simple and the syntax is fairly self explanatory.

INDEX takes an array of data as the first parameter, a row number as the second and a column number as the third. The function then return the value in the array referenced by the row and column specified.

An example array in Excel:

To return ‘Black’ using the INDEX function use the formula:

=INDEX(B2:D4,3,2)

Note that the second parameter refers to the 3rd row in the array B2:D4 and not the 3rd row of the spreadsheet. This applies for the column reference also.

When used like this, INDEX appears to be fairly limited in its practical applications, as if you already know the row and column number of a cell in the array, you can generally reference the cell directly.

However, when used with MATCH, INDEX becomes one of the most powerful and useful functions in Excel.

 

MATCH

Syntax: MATCH(lookup_value, lookup_array, [match_type])

MATCH is again quite simple on its own however requires a little more consideration and care than INDEX.

MATCH takes a lookup value as its first parameter and a lookup array as the second. For our purposes the lookup array needs to be a single dimension. This can be row wise or column wise.

MATCH then returns the index of the lookup value if it is found in the lookup array.

However false positives are easy, as the third parameter of MATCH is the [match_type] which takes one of three values:

The default value is 1, and this can cause issues as if you forget to add this optional parameter then you could very easily get incorrect lookups occurring that are hard to notice. This is because the lookup will still return a value even if it can not find what you are looking for. MATCH will return some value that is ‘Less than’ the lookup value, which is very rarely what you actually want. It is important when using MATCH with INDEX for matrix lookups to remember to set match_type parameter equal to 0.

 

An example array in Excel:

To return the row index of the array for the value “Colour 2”:

 

=MATCH(“Colour 2”,A2:A4,0)

This will return: 2

Note that this is the row number of the lookup value “Colour 2” in the lookup array and not in the worksheet.

 

INDEX with MATCH

MATCH is used to provide logic to the row_num and column_num parameters of INDEX.

If a matrix in Excel has labels for both the rows and columns, then INDEX with MATCH can lookup a specific cell in the matrix with ease.

An example of such a matrix:

INDEX with MATCH can be used with this matrix to answer questions such as ‘What is Colour 2 of Group 3?

Repeating from above, the syntax for INDEX is:

INDEX(array, row_num, [column_num])

The input array for this example is B2:D4.

The row_num and column_num parameters need to be found using MATCH as shown above.

row_num: MATCH(“Colour 2”,A2:A4,0) will return 2

col_num: MATCH(“Group 3”,B1:D1,0) will return 3

These can then be put into INDEX to create our formula:

=INDEX(B2:D4,MATCH(“Colour 2”,A2:A4,0),MATCH(“Group 3”,B1:D1,0))

This will return pink.

 

I now use INDEX with MATCH nearly every time I do data analysis and manipulation in Excel. Clients often have data in 2D Matrices, and it is no longer a hurdle for me to use this data with ease.

 

 

 

Tech Start Up Business Lessons

Biarri commenced as a commercial maths start up almost two years ago. In this time we have learnt a lot.
We focus hard on deploying optimisation and quantitative analytics in accessible ways to deliver the power of mathematics quickly and cheaply. Our just launched workbench (www.biarriworkbench.com) is a great example of this – providing monthly low cost rental of powerful maths engines available over a browser.

While we have been building products and models we have also been building a business and have learnt a few things along the way. Below are a few of the business lessons we have learnt growing a tech start up in Australia.

  1. Be in the cloud – because we were delivering our optimisation workbench using the cloud, we sought out cloud services for our internal business needs. Our accounting software, CRM, email and timesheets are all rented from Software as a Service companies. We learnt a lot about what makes a good web app by using these services and we saved a lot of capital cost upfront. Specifically let me say that SAASU (www.saasu.com.au) is a really good accounting system for a small business – much easier to use than MYOB or quicken in my view.
  2. Always push back against one-sided contract terms from big corporates – we find almost always you will get at least what you ask for. In house lawyers and legal departments will always try it on, especially when they are dealing with a small business – push back hard there is always some flex
  3. Not all phone companies are the same – one large Australian telco sells conference calling enabled handsets while their network does not support conference (e.g. 3 way) calling. This is not disclosed up front- we found out when we tried our first conf call. Ask the question, be wary of penguins and remember Skype is your friend.

Hope these few thoughts help. There is more we are learning each day so will stick up some more thoughts soon.