Why do we need data scientists, then?

Posted by Q McCallum on 2023-07-06
Laptop showing charts on-screen. Photo by Carlos Muza on Unsplash

(Photo by Carlos Muza on Unsplash)

I’ve been running a series of blog posts on hiring in the data space. Having written about improving data hiring, finding that first data scientist, and hiring antipatterns, I put out a call on LinkedIn asking for other topics that I should cover before closing out.

Fractional CFO/OO Sierra Hinson offered this gem:

“Do we really need data scientists anymore with the rise of off the shelf AI/ML models 😉

(Full disclosure: Sierra and I know each other and we’ve worked together on multiple client engagements.)

Beyond chatbot models like OpenAI’s GPT-4, I’ll extend Sierra’s question to include AI as a Service (AIaaS) offerings, automated machine learning (autoML) tools, and prebuilt models like those on Tensorflow Hub and Hugging Face Hub. All of these are available with just a few mouse clicks and maybe a credit card. So why the hell do we need to hire data scientists, then?

The short version is: Frankly, some of you don’t.

Before you delete those data scientist job postings, keep in mind: It’s not a question of either/or. It’s a matter of “under what circumstances” and “how many.”

We’ve seen this before

Does your company need e-mail hosting or an online store? Do you want to post your thoughts to a blog? Fifteen or twenty years ago this required a lot of effort. You had to roll up your sleeves and write code, buy servers, and probably hire an IT department to manage it all. This was a distraction from your main business, but it was the only option.

These days mass-market SaaS tools like Google Workspace, MS Office 365, Squarespace, and Wordpress.com have completely rewritten that story. You get all of the benefits of technology without touching a line of code. And if you still need some infrastructure for custom application development, AWS or Google Cloud will provide that on-demand.

Still, plenty of companies run their own datacenters and develop custom applications. Are they laggards? Not necessarily. They just have data privacy concerns, special app integration issues, or other needs that aren’t met by the SaaS tools. They may also have grown accustomed to their self-managed technology infrastructure that predates the likes of Squarespace and cloud-based mail providers.

So while turnkey SaaS tools reduce some burden for existing companies, their real claim to fame is creating the on-ramp for new companies that wouldn’t otherwise have access.

Purpose-built

That same point holds for the new era of AI-related tools.

The out-of-the-box LLMs, autoML tools, and other such services are great under certain circumstances. If you’re a tiny company that doesn’t have an in-house technology team, or you’re a larger company with simple needs, then you’re all set.

But these mass-market tools are geared to a mass-market audience. Sometimes your plans call for something more specific:

  • You want predictions tailored to your customers’ behavior, so you need to train ML models on your own data. For example, ChatGPT was trained on the general internet and often fails on deep, domain-specific questions. The finance-focused Bloomberg GPT should perform much better on questions about the stock market.
  • Your privacy concerns make you hesitant to send any data outside of your company’s walls. Some AIaaS providers state in their TOS that they’ll use your data to improve their models. What if you’re handling sensitive patient data or internal product metrics?
  • The major SaaS tools only offer the most common, mainstream ML techniques and algorithms. If they feel that your challenge isn’t sufficiently common, they won’t include it in the tools. (Some days I could really use a cloud-based service to run unsupervised learning jobs. Hint hint, people. Hint. Hint.)

That takes us back to Sierra’s question: when you find yourself in this boat, it’s time to build that team of data scientists or ML engineers. They’ll develop solutions focused on your company’s data and business model.

That covers the “under what circumstances” point I raised earlier. To wrap up, let’s take a look at “how many.”

A mix of both

Let’s say you plan to build custom ML models. Your company will need data scientists and ML engineers, sure. But you might operate with a smaller team if you support their work with off-the-shelf models and AIaaS tools.

Starting with a prebuilt or autoML model, for example, gives your data team extra time to build something that outperforms the off-the-shelf solution. You get the benefit of a shortened time-to-market (instant gratification) plus the power of a custom model down the road (long-term benefit).

In the right hands, a mass-market tool can be a real force multiplier for your company’s AI projects.

So, now it’s your turn to ask: “how do we make the most of the entire landscape of off-the-shelf models, AIaaS /AI-based SaaS offerings, and our internal data team?”

What else would you like to know?

Do you have a question about building or restructuring your company’s ML/AI team? Let me know and I may cover that in a future blog post.

For more personalized assistance on starting or evaluating your AI efforts, please enquire about my consulting services.