When I started in this industry there was no such thing as a CIO, though it wasn’t long in coming. IT was usually found in the Finance department under the CFO (who had absolutely NO CLUE about IT at all). I typed memos on a PC using some character-based word processor but had to print the memos out and put them into internal mail because we didn’t have email! At the time, companies owned and managed their own big mainframes, and corporate information systems were generally character-based and accessed via a terminal, or, in advanced cases, through a terminal emulator running on a PC – woo hoo!. There was no concept of “data as an asset” and it was bloody expensive to buy storage, so every effort was made to minimise the size of any database. ERP was the hottest thing ever and was going to revolutionise business by eliminating all those pesky employees required to manually execute transactions.
So, what’s different a quarter of a century on? Lots of things, obviously; I’ll just cherry pick a few to make my point. The falsities around ERP marketing hype were harshly discovered by everyone who bought into it and the message shifted from “you can get rid of employees by automating your transactions” to “think what you can do with access to all of the data that’s now available!” The computing platform and apps have shifted so far they’re on another planet; who needs a word processor now that our email programs are so sophisticated? Do companies even have internal mail rooms any more? “Data is an asset” and, with relatively cheap storage, companies have lots and lots and lots of it. We have Chief Information Officers (CIOs) who are supposedly responsible for the company’s information but, after a quick search for a modern definition of the role, seem to be mainly focused on the appropriate use of technology within the organisation. Now, Analytics is going to revolutionise business!
OK, I’ve bought into that line about analytics. It’s really cool stuff. However, analytics is data hungry. In fact, it’s famished. But, it doesn’t need just any data. It needs good, clean, sanitary data! “So, what is that?” you ask.
Let me illustrate with an example of what it is NOT; I’ve got to be a bit cryptic to protect the innocent, but hopefully you’ll get the idea.
Let’s take a company that has over 10 years of sales data in a 120GB database. The level of detail tracked for each purchase if fantastic! Almost overwhelming, in fact, as there are hundreds of tables to mine. We know each product purchased, quantity and date; each transaction is lovingly coded with a 3-character transaction type and 3-character discount code (if any) amongst a plethora of other highly informative codes. Codes relate back to a “master” table where you can find the highly informative description.
“Wow!” “Great!”, you think. Now we can use those handy codes to look for buying patterns and, maybe, predict sales. If we are good, we might be able look for a slow down in sales and trigger a “why don’t you introduce discount x” message which, again if we’re good, will result in a boost in sales which we can track (testing our hypothesis and adding that information to the mix).
Everything seems good. Then you realise that the end-users have the ability, and right, to add any code they want at any time, even if it means the same as a previously used code (just has a slightly different 3-character code). Even better, they can reuse a code with a totally different description (hence meaning) from year-to-year or product-to-product! This data is now pretty much useless because there is no consistency in the data at all. We can’t look for patterns programatically. A human would have to trawl through the hundreds of thousands of records to recode everything consistently.
In talking with the customer, the department that is interested in doing the sales analysis has no control over the department doing data entry. The department doing data entry is following their own agenda and doesn’t see why they should be inconvenienced by entering data according to the shims of another department.
At another customer, the spatial data that is used to catalogue assets is owned by the spatial specialists who enter the data. The spatial database has been designed to provide information on which assets are located where (the final state of the asset). It does not support the question: “what do I need to do to install the asset?” For example, installation of an asset might require significant infrastructure to be created. Let’s say a platform needs to be constructed and some holes need to be dug for supports. Even though someone has gone out and assessed the site ahead of time (it’s going to be on concrete so we need to get through the concrete first, which is harder than just going into grass, and then need to make sure the concrete is fixed after installation) and that information is held in a separate, Excel (ugh) file with a reference back to the asset identifier, it is not supplied in the spatial database. Why? because it’s not relevant to the final state of the asset, only the construction of the asset. Once construction is complete they don’t care about the fact that it’s installed over concrete. So, in planning construction someone has to manually review the Excel files against the spatial database to plan the cost, timing and means of constructing the asset. The spatial specialists don’t see why the construction information should be entered into the database; it will take them longer to update the database and the information needs to be maintained until it becomes obsolete (after construction). Yet, by having that data in the database the cost, timing and means of construction could be automatically generated saving not only time, but also errors generated through the manual process!
Am I the only one who finds these situations bizarre? Irritating? Annoying? Unbelievable?
Remember the tree-swing cartoons? http://www.businessballs.com/treeswing.htm
How were these issues resolved in the manufacturing industry? Someone realised that sales increased and costs reduced when they got it right! And those companies who didn’t solve the problem eventually went out of business. Simple, right?
So, I pose the following questions:
- Are companies who aren’t able to solve this problem with their data going to die a painful death, as those who can solve it overtake them through the “power of analytics?” I think, Yes! And, they deserve to die! (though, what will their employees do?).
- Who in the organisation has ultimate responsibility for ensuring that data meets the organisation’s needs today and into the future? I naively assumed that the CIO would make sure that data would be relevant and useful across the entire organisation, across the years (as much as possible). However, this does not seem to be the case. Departments are still fighting over who owns what data and don’t seem to be willing to help each other out for the overall good of the company. Surely we don’t need to pay yet another executive obscene amounts of money to get this right?
- Maybe the Universe is just trying to send me a message here by sending me all the difficult cases?
- Maybe I’m just being overly cynical due to lack of sleep and food…
Here’s to better data!