Common Mistakes in Data Science Hiring : Part 2

Posted by Q McCallum on 2018-01-30

Are you having trouble hiring data scientists? or, once you hire them, do they not stick around? You may be tripping over your own feet.

This is the second of two posts on the topic of common mistakes in data science hiring. You’ll want to read the first post if you haven’t already.

Problem 6: unrealistic interview techniques or metrics

Media coverage of data science often mentions the salaries: high pay for young people, and for intellectually stimulating work, to boot. Who wouldn’t want that? Companies have developed interview processes to weed out the hopefuls. Some of these processes are so unpleasant and unrealistic that they turn away the very candidates they want to hire.

For example, say you use off-the-shelf tools like scikit-learn, but you still want candidates to whiteboard pseudocode on how to implement a K-means clustering. You send someone a take-home project that requires several days’ effort to complete. Maybe you reject anyone who hasn’t done a Kaggle competition, or who doesn’t have anything on GitHub, or who doesn’t have certain academic pedigree.

This is less of a filter and more of an obstacle course. It doesn’t really test their meaningful skills, and (most importantly) it drives away qualified candidates: they know they’re in high demand, and they look elsewhere.

Solution: develop realistic interviews.

As a baseline, limit technical questions to tools you actually use and the ways you use them. You can dig deeper into theoretical knowledge by asking candidates how (or, even, whether) they would apply a given technique to a problem. Ask experienced candidates open-ended questions about their work, and give them the room to go in-depth so you can see how they’ve tackled problems in the past.

If you insist on some kind of demo/whiteboard session – and, really, you don’t want this – make sure it’s damned close to what they’ll really do on the job.

(For a deeper look at good, realistic interview practices, I encourage you to read Greg Reda’s “Hiring Data Scientists” post. It’s a short read with a lot of useful info.)

Problem 7: making the data scientist your first data hire

Unless you already have a solid data infrastructure and internal business intelligence (BI) practice, you’ll need a data engineer to build pipelines and otherwise prepare data for the data scientist to use. A lot of companies skip this step because it’s not “real data science” and that is a costly mistake.

If you hire the data scientist first, they won’t have any data to use, so they’ll get bored and leave. Or, they’ll assume the role of the data engineer and begrudge you for it – remember, they signed up for a different job! – which is usually a precursor to leaving. All of this assumes you’re able to hire them in the first place, because the good data scientists will catch on during the interview that you’re not ready and they’ll turn you down.

Solution: hire a data engineer first.

Your peers or superiors may balk – “I thought we were going to do data science, but this person you hired is spending all day with the database!” – but it’s up to you to stand firm and explain why this is a necessary first step. (Feel free to quote me: you can copy/paste the two preceding paragraphs into your e-mails to those people.)

As an added perk, hiring a data engineer reduces the scope of work for the data scientist, because your data scientist will spend less time on data prep. 1 You can therefore scale back on the data scientist job requirements, which should help you close on that role even faster.

You can soften this requirement if your first hire is a very experienced data professional – someone who has a well-rounded technology background in addition to pure data skills – and you already have a strong BI function in-house. In this case, most of the data foundation is already in place, and the data scientist can serve as their own data engineer to start. Do this only if you expect the data team will grow, and this person will in turn hire a data engineer or data scientist to take over some of their work.

But that leads into my next point:

Problem 8: hiring a junior-level data scientist first

One side-effect of data scientists being in such strong demand is that they command similarly strong paychecks. Some companies – especially startups – cringe at the salary expectations of a seasoned professional. They try to save money by hiring someone with little to no workplace experience, either someone fresh out of school or someone whose only exposure to data science is one of the “boot camp”-style programs.

This is one of those decisions that looks good on paper, because it’s easy to quantify the savings on salary. It’s a bad idea in practice, mostly because it’s unfair to the person you’ve hired: as the first and only data scientist, they can’t turn to someone more experienced when they get stuck. (They may not even realize they’re stuck.) Having to learn everything on their own makes it harder for them to develop their skills. There’s even a chance this person will get frustrated and leave.

Solution: get an experienced practitioner for your first data hire.

Someone who has been around the proverbial block will be able to move quickly and with minimal assistance, which means you’ll see a faster return on your data science investment. These people command a higher salary than an entry-level candidate but they pay for themselves.

Should you ever hire a junior data scientist? Certainly! The key is to first build a team of experienced data scientists and otherwise create an environment that will help that person grow.

Problem 9: insisting on DIY

It’s a chicken-and-egg scenario: in order to implement these solutions and hire that first data scientist, you’ll need someone with data science experience to take point. If you already had in-house data experience, though, you would have already implemented these solutions!

Some companies try to fake their way out of it: they lump software developers and data scientists into the same group (“it’s all just tech, right?”) and appoint the lead developer or CTO to lead the hiring effort. That point person skips developing a data strategy and moves straight to assembling a job description. The point person has tech skills but lacks a data background, so the job posting doesn’t make sense, so HR and the recruiters go on a wild goose chase. The candidates show up for the interview and realize there’s no data science talent in-house – no one speaks their language, so no one can truly evaluate their skills and experience – and the interviews fall apart.

What about the people who manage to make it through the interview? The point person keeps saying: “yeh, they’re good, but they’re not quite what we’re looking for… Next!” The position, unsurprisingly, remains open for several months and sometimes more than a year.

Solution: get outside help to write the job posting and interview candidates.

If you don’t already have data science experience in-house, engage an experienced data professional to work with you on the job descriptions and interviews. This person would not replace your HR and recruiter teams, but enhance their work by narrowing their scope and working with candidates.

Problem 10: demanding instant results

Hopefully, you’ve been able to use the preceding tips to improve your approach to data science hiring. You’ve developed a realistic data strategy, cleaned up your interview practices, and engaged some experienced outside help to shape the job postings and interview candidates. Now what? You now swing open your front door and the data scientist applicants just storm in, right?

Solution: be patient.

All of the solutions mentioned here will help you to put your best foot forward and stack the deck in your favor. You’ll still need to take an active role in your search so you can source propects and interview candidates. Even when you’re doing all the right things, you’re still competing with other companies for data talent. Give it time.

Are you having trouble building or growing your data science team? I want to help. I can help. Please [contact me](/contact/) to start the discussion.

  1. A data scientist will still spend some time on data prep, mostly as part of feature engineering, but they’ll spend a lot less time on it if a data engineer has laid the groundwork. ↩︎