Thanks to the Big Data era we now have a menagerie of new tools and techniques for working with data. It seems new ones crop up every day, each one specialized for a given problem. While this may make certain analysis work easier – perhaps even fun – you can still go pretty far with everyone’s old favorite, the spreadsheet. Just ask John Foreman.
I see why the modern analytics crowd would laugh at spreadsheets. Compared to working in Python or R – especially with the interactive environments such as RStudio or IPython – a spreadsheet’s point-and-click interface can feel pretty clumsy. Spreadsheets are also largely tuned to working with numbers, which makes them less than stellar for text analysis. Last, and perhaps greatest, spreadsheets don’t really do Big Data, which is all the rage these days.
Who, then, would want to use spreadsheets? They’re so yesterday.
Still, even in this era of Big Data™, not everyone needs large-scale data analysis. Nor is everyone hot to build some next-generation predictive model. That may come as a shocker if you’re knee-deep in terabytes of ad data, but quite a bit of the business world involves small- to mid-sized datasets on such staples as sales, marketing, and operations. Sums, means, and the occasional linear regression go a long way in those cases. There’s hardly even a need to automate the whole process. Dull? Not to the people who run these businesses, no.
Spreadsheets offer this crowd basic data analysis with a low barrier to entry. The graphical interface works well for exploratory analysis. Menus and click-through wizards spare people from memorizing function names and arguments. This puts calculation power in the hands of people who can’t (or don’t want) to crack open programming tools.
(This strength is also a spreadsheet’s greatest weakness: “misuse of Excel” is why so many spreadsheets serve as makeshift databases, or even data exchange formats.)
With some experience, a person can become proficient in a spreadsheet’s more advanced functions. Before you laugh at this, remember: spreadsheets have been Wall Street power tools for ages. Some traders have turned to R and Python for their analysis work, but a good many still trust Excel.
Granted, spreadsheets can’t do everything. That’s part of why R, Python, Hadoop, and other more modern tools are so appealing: they fill the gaps left by Excel. The question, then, isn’t “should my team use spreadsheets for data analysis work?” but “when does it make sense to use a spreadsheet versus something else?” 1 The tools complement one another more than they compete, and they play to different strengths. In some cases, the fancy Hadoop and Python work serves to boil down a dataset to something an analyst can run through Excel.
Have you ever skipped over the hip data tools in favor of good old spreadsheets? Let me know.
Do you want to put your company's data to good use? Whether it fits in a spreadsheet or a Hadoop cluster, I want to help. Please [contact me](/contact/) to start the discussion.
A lot could be said on this topic, and most of it is a grey area. To keep this post short, I’ll save that for another day. ↩︎