Whether you have your next data science interview lined up and don’t know what questions you should ask when it is your turn or you are preparing for a day of work shadowing and want to make the most of it. In any case, you don’t want to waste your chance to ask the right questions and get a good picture of what you can expect from your potential future job.

If you aren’t a candidate applying for a job but a data science team lead or department head, this will also be an interesting read for you. You can check whether your team or department does a good job at facilitating great data science work. Of course, this list of questions is not exhaustive, but if you can get a positive answer to all of these, you have found a really fine environment for data scientists to flourish.

Question 1: How much control will I have over my compute infrastructure?

This topic is critical because the amount of control you have over your compute infrastructure will have a huge impact on your productivity. You can waste immense amounts of time working around restrictions in this area, leaving you little time to work on actual data science problems.

Since the compute infrastructure you need as a data scientist is made up of many individual parts, there is a lot to consider here:

  1. Are you allowed to install whatever software you need for your job on your work computer? Some companies have completely locked down their machines and only allow installing software that has gone through the company’s official internal software repackaging process. This process can take months for a single piece of software. It is not a lot of fun if you are waiting for essentials like your Anaconda Python or RStudio.
  2. Do you have admin access on your work computer? Many companies use custom operating system images with settings you won’t encounter on private computers. An example of this is a local firewall rule that breaks docker functionality. If you don’t have admin access to your machine you can’t change these things easily and will be left with a broken setup. Also, some software, like for instance database drivers usually need admin access to be installed. Without admin access you’ll be out of luck.
  3. How free is the internet access at the company? Can you simply pip install, install.packages or docker pull and move on with your life or is internet access restricted? Internet access restrictions can go from “you have to go through our corporate HTTP proxy”, which is not a big deal if you know how to do it, to “our HTTP proxy requires authentication”, which can lead to more elaborate workarounds, to “you have to work offline”, which leads to situations like “if you need a Python / R package or docker image, you have to open an IT support ticket so they can install the package / image for you, and by the way, this may take one to two weeks”.
  4. Can I crunch data on my personal work computer? As long as the data you need to work with is not too big for your personal work computer, it is advantageous to stay on your own machine. You can easily use your favorite IDE, don’t have to worry about disturbing colleagues’ workloads with yours and just generally keep productivity high by staying in your familiar environment. However, some companies have policies that completely disallow putting (confidential) company data on your personal work computer, let alone process it there. Will you need to create an SSH tunnel every time you want to access your Jupyter notebook or write your code in VIM via SSH? Are you supposed to do all of your work in this great new web interface that unfortunately doesn’t have any of the of the advanced editing features your IDE does, let alone any way to version your code? You better know before you take the job.
  5. Am I allowed to work in the cloud? Being able to use cloud services can speed up things considerably. Instead of waiting for weeks or even months to get a virtual machine for that quick proof-of-concept (PoC) you wanted to do, you can spin up a machine and test things in minutes.

Some of these restrictions might seem overly protective to you, but for companies in highly regulated industries, which might handle very confidential data like banks and insurances, this is an important topic and restrictions have been put in place for good reasons.

In fact, it is in the best interest of all companies to set up their IT infrastructure in a secure way as to keep out attackers and prevent data leaks. This however may inadvertently create hurdles for your data science work. Especially in companies that are just getting started with data science, nobody might have noticed these problems before because they don’t impact the typical Outlook + Powerpoint office workflow.

Look out for this one! You don’t want to spend half of your tenure at your future employer fighting the IT security team, finding workarounds for things that should be simple or flat out being unable to do your work properly.

Question 2: What is your tech stack?

Asking this question is a good way to find out how far developed the team is. Are they running their first PoCs or do they deploy data science artifacts regularly? They should at least be able to tell you what the main programming language of the team is and what machine learning frameworks they work with.

Following up on this, you can ask how they deploy batch prediction jobs, live prediction services and dashboards. If they begin to stumble here, you know that there will be a lot of foundational work to be done. That’s ok if you like infrastructure work and have the required breadth in your skillset. Still you should ask what their plan is to build up these capabilities and who is supposed to do this work. You don’t want to end up in the project crunch where you have hard deadlines but no infrastructure to deploy your things once they are finished.

Another reason for asking this question is to find out whether you will be able to work with technology you love or you will have to work with technology you hate. Also think about your next round of job applications after this one. Which skills will you have added to your CV during the time at this company and will this make it easy or hard to score your next but one data science job?

Question 3: What roles does a data scientist play in your team?

Are there dedicated project managers for data science projects? Are there data engineers who set up and maintain the data infrastructure and deployments? Are there software engineers who build data applications around your machine learning models?

In many companies, especially the ones who are just starting out in data science, the data scientist is on his own and is expected to play all of the aforementioned roles. Again, you might like this if you like to be a jack of all trades, but you might also like to specialize.

If the company is one of the places that expects data scientists to do all of these tasks, you should think first if you feel you have the required skillset to do all of these things properly. Learning new things is great but you should make sure to stay realistic here.

Question 4: How is data provisioning typically handled?

Is there a well-oiled process in place to get data provisioned for data science projects at this company? Let them walk you through the process. What will you as a data scientist have to do to get the data you need for your project. What bureaucratic hurdles exist? Also, how long does it typically take at this company to get data for a data science project.

Of course there will be many sources of data in a company and different teams will have different processes in place to provision data. But typically, most of the data you will use comes from one or two other teams and your potential future boss should know how easy it is to work with them. It is good to know beforehand, if this topic will be a constant struggle, or if it won’t pose an issue.

Question 5: How is personal data handled in data science projects?

Not every company has customers in the European Union. Thus, not every company needs to abide by the GDPR. Nevertheless, the proper handling of personal data is becoming a serious matter for more and more companies globally.

For some employers, like for instance insurances, most data that you will work with will contain personal information. In this case, data science work can become very cumbersome. Depending on the regulations the company has to comply with and the strictness with which the legal or data protection department interpret these, your projects can become delayed by weeks of internal discussions if not canceled completely.

The decisions that are made here can be very different from company to company, even for the same topic. Personally, I am a supporter of protecting personal data, especially when it comes to critical data like health insurance records. Thus, I try to choose working environments with more non-personal than personal data when possible.

Question 6: How is capacity managed in the team?

This is a topic that is not specific to data science, but is still something that can ruin an otherwise great working environment if done wrong. If you are anything like me, you like to deliver good quality work in your projects. One prerequisite for this is that you have enough time to produce said high quality.

Although it is well known that working on many projects at the same time is prohibitively expensive in terms of time lost to project switching (see the below table or Todd Waits great visualization (corresponding article)), many work places still happily stack one project after another on an employees back. The accompanying justification is often that the team or department wants to portray itself as being fast towards the rest of the company, which it can’t if it puts project requests on a many-month waiting list. It takes some discipline on the side of the team lead to keep team members focused.

Number of Simultaneous Projects Percent of Working Time Available per Project Loss to Context Switching
1 100% 0%
2 40% 20%
3 20% 40%
4 10% 60%
5 5% 75%

Table from: Weinberg, Gerald M. (1992) Quality Software Management: Systems Thinking. Dorset House, p. 284.

Another problem with too many simultaneous projects is the mounting pressure experienced by an employee. Each project has a sponsor awaiting timely results. These people really don’t like getting used to the fact that their project progresses at half a day’s work per week. Instinctively they will try to speed things up by putting some pressure on you. Combining multiple people like this with the continuous struggle to find time to work on any of the projects (see table again) is a great recipe for burnout.

A final problem with broken capacity management is something that is not directly tangible. I have noticed that for me personally, a lot of the thought process within a project happens subconsciously. This leads to the well known showertime epiphanies. However, these don’t happen if I work on too many different things at once.

Question 7: Can I meet my future colleagues?

Your future colleagues will be a big factor when it comes to your job satisfaction. Most places will be fine with you meeting your future colleagues or will even actively endorse it. This makes sense, since your employer benefits from a good team atmosphere, too.

Apart from the interpersonal fit, you will want to find out the seniority level of your peers. It is always good to surround yourself with senior people. If you are a rookie, this is useful because being able to just ask somebody whenever you’re stuck and get a quick explanation of a concept you need but don’t know will vastly accelerate your learning. If you are a senior yourself, this still holds true because different people will know different things and even if you run into problems for which nobody on the team has an answer, you have somebody for a helpful discussion.

Question 8: What are some of your recent data science projects?

Although we have covered a lot of ground with questions one to seven, you can never comprehensively find all the pros and cons of a workplace with a fixed set of questions. This question gives you the chance to fill in some of the gaps. Also, you can find out if the types of projects the team works on are exciting to you or not.

Wrap up

You can never really know what it will be like working at a place, until you have spent considerable time actually working there. However, asking the above questions can help you avoid many of the typical pitfalls and increase the probability for you to find a job that you will actually like.

If this helped you get more out of your interview or work shadowing or helped you identify a gap in your company’s data science environment, please let me know at mail@haveagreatdata.com. I’d be happy to hear from you! Also, maybe you have found a topic missing from my article. I’d love to hear about that, too! Finally, be sure to subscribe to my mailing list to get notified when I publish my next article.