The Importance of Data Infrastructure
2016-11-21 | tags: AI data literacy

A successful data science shop requires more than just data scientists.

Many firms assume that the key to winning the data race is to hire several data scientists. While this is not entirely wrong -- you'll certainly need people to analyze the data -- nor is it entirely correct. To have data scientists on your team is only one of several necessary ingredients.

By analogy, consider the world of medicine: surgery involves more than surgeons. Think of all of the people, procedures, and spaces that must be in place and running smoothly so that your surgeon can focus on your operation: scheduling, room preparation, patient preparation, and anesthesiology are just a few. In a well-run hospital, neither you nor the surgeon need be aware of most of what goes on behind the scenes. Everything just works.

You want the same for your firm's data science practice. Effective use of data requires developing a culture of experimentation, in which people (even those who do not consider themselves to be data scientists) are able to develop and test hypotheses at will. In turn, that requires the ability for people to quickly get to the data they need so they can analyze it. They don't want to waste time fumbling around looking for the data, sussing out what each field means, or connecting their tools to it. They want things to just work so they can focus on the task at hand.

How do you make your data shop run like the hospital in our example? How can you enable your data scientists to skip the data access and data prep, so they can get straight to the analysis? You'll need a solid, well-maintained data infrastructure based on the following:

In turn, a proper data infrastrucure requires some key hires:

In an ideal world, you would have all of this infrastructure in place before your first data scientist joins. It's tougher to implement all of this after your data efforts are already underway; but the longer you wait, the tougher it gets to establish the necessary policies and boundaries.

Furthermore, to develop and maintain a solid data infrastructure requires discipline because you'll sometimes trade short-term gains for long-term stability. For example: it's tempting to just throw data into the repository and start analyzing it. It takes extra time and effort to first update your data dictionary -- to track the dataset's source provider, location in the repository, and meaning of each field -- and people are likely to resist because doing this extra work slows them down. You may require someone in a senior leadership role to enforce the rules so that short-term thinking doesn't take over.

In closing: a solid data infrastructure smooths the road for data science efforts. Data scientists (and analysts, and anyone else) do their best and most efficient job when they have stable, ready access to data that is up-to-date, fit for purpose, and well-documented. To invest in such a data infrastructure is to invest in the long-term success of your firm's data science activities.

Do you want to lay the groundwork for your company's data science efforts? Contact me to get started.

What is a data strategy, and why do I need one?

The what, why, and how of a data strategy -- a road map for your company's data efforts

Data Science Hiring as a Sales Process

Having trouble hiring data scientists? Borrow some ideas from your sales team.