Data Science Use Case Evaluation

This guide provides a three step process for evaluating and prioritizing data science use cases. First, information is gathered about the projects to enable informed decision making. Then, obviously bad candidates are delisted, after which the remaining use cases are prioritized. Understanding Use Cases In order to understand a use case well enough to make a decision about its feasibility and prioritize it, some information has to be gathered. Asking the below questions facilitates a discussion that aims at bringing use cases from vague ideas to concrete project concepts.
Read more →

Step-By-Step: Creating a Docker Image for Your Data Science Project

Docker is widely used for software development nowadays and has become the de-facto standard for creating scalable systems in the cloud together with Kubernetes. Sooner or later, many data scientists need to use Docker to deploy their projects or at least become interested in trying it out. This article will show you how to create a Docker image from your conda environment step-by-step. We assume you have already installed Docker, besides this the article should be pretty self-contained.
Read more →

How I Manage Python Dependencies in Data Science Projects

The sheer size of the Python package management ecosystem can be daunting not only to beginners, but also to Python veterans as the number of available package managers seems to be ever-increasing. Should you go with the default pip and its requirements.txt file, or should you use Pipenv and a corresponding Pipfile? Maybe you should go with the modern Poetry and a pyproject.toml. Or should you rather use the data science specific Conda and an environment.
Read more →

Podcast Appearance: Coding With Holger

I have recently had the honor of getting invited to appear as a guest on Holger Steinhauer‘s podcast Coding with Holger. As I am myself an avid podcast listener, I was very excited to get this opportunity. In the podcast episode, we have a chat about all things data science: its definition, technologies, testing in data science and much more. If you want to check it out, you can listen to it right here using the below widget.
Read more →

Why You Don’t Want to Use CSV Files

“Microsoft has incompatible versions of CSV files between its own applications, and in some cases between different versions of the same application (Excel being the obvious example here).” Eric S. Raymond We have all been there: you are assigned an exciting new data science project, you talk to your enthusiastic clients, generate grand ideas and visions, can’t wait to sit down and write the first lines of code and then it happens.
Read more →

Eight Questions You Might Want to Ask in Your Next Data Science Interview

Whether you have your next data science interview lined up and don’t know what questions you should ask when it is your turn or you are preparing for a day of work shadowing and want to make the most of it. In any case, you don’t want to waste your chance to ask the right questions and get a good picture of what you can expect from your potential future job.
Read more →

Will Data Science Be Automated?

If you are a data scientist or plan on becoming one, you probably know this short sensation of unease in your stomach every time you see somebody on the internet claim that data science is ripe for automation. Just yesterday I read the following question directed at a data scientist in a discussion on Hacker News: Can you please describe what part of your job CANNOT be automated? Although it was a serious question (see the context), I feel the question has a threatening ring to it.
Read more →