How to Create a Reliable Python Environment Using Docker: Tips for Data Science

It’s really important to work and develop your code in a trusted and isolated environment—an environment that’s easy to set up and lets you start working on your project right away.

Docker: Key terms and definitions

What’s Docker?

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Docker simplifies and accelerates your workflow while giving you the freedom to innovate with your choice of tools, application stacks, and deployment environments for each project.

Containers?

Containers are similar to Virtual Machines, but they don’t carry a full OS, just primarily the libraries and other dependencies that are needed to run the project. Multiple containers can run on the same machine and share an OS kernel with other containers, each running as isolated processes in userspace. Containers take up less space than VMs (container images are typically tens of MBs in size).

Why Docker?

It’ll guarantee that your software will have the same behavior regardless of where it’s deployed.

In this tutorial, I’ll simply show you a way to manage your projects/environments with Docker, even if you’re not familiar with Docker!

If this is your first time using Docker, download Docker Community Edition.

I prefer to have an isolated environment for every project I’m working on, so it’s very important for me to quickly create a ready-to-go environment that has 2 core components:

Baseline Dependencies: These are dependencies that I use in almost every project, e.g. Python, NumPy, Pandas, and Jupyter Notebooks.
Specific Dependencies: Dependencies for a specific, single project.

To do this, I create a main Docker image packaged with all the baseline dependencies that I need to start a project. And then, to initialize an environment for a new project, I just run the image and work directly on that container. Later, I can commit an image from that container if needed (which means generating a Docker image from the container I’m working on). I also can stop/start the container whenever I want.

Here’s how to do it:

Step 1:
Start by creating a Dockerfile and a requirements.txt file containing all the baseline dependencies you’ll need.

This is the requirements.txt I use:

And this is the Dockerfile I use:

Of course, you’re free to customize these files to your own preferences; for example, if you want to use Python version 2.7, you can change it here.

Step 2:
Build the baseline image, using this command:
docker build -t proj-baseline .

Step 3:
Start a new project—or in other words, run a new container on which you can develop your project.

First, you have to create a new directory for the project’s workspace. For example, I’ll create a new folder for my project’s workspace with the path /home/maher/Desktop/proj1

Use this command to run the container:

The -p 8888:8888 maps the port 8888 from your physical machine to the port 8888 inside the container, so that you can open Jupyter Notebooks in your browser.

The –name containerName names the container, so that we can start this container using this name.

The -v [path1]:[path2] links path1 (which is the path of your workspace) to path2 (which is inside the container).

Step 4 (Close the container):
After you’re done working, close the container by simply typing the command exit inside the container, and it will close.

Step 5 (Open the container):
To reopen a terminal in the container and continue your work, type this command: docker start -ai containerName

Step 6 (Open new terminal):
To open several terminals inside the container, you can use this command: docker exec -it containerName bash

Open a Jupyter Notebook:
To use a Jupyter Notebook inside the container and show it in your browser, you can simply write jupyter notebook –ip 0.0.0.0 –allow-root inside the container, and you’ll get something like this, after typing the command:

You’ll also need to copy the token 573dfddaad7e15420048b63a247e9a75d7c9ab55d99611fc from the URL in the terminal.

Now open your browser, type the URL localhost:8888/tree, and paste the token in the text input labeled “Password or token”.

You now can create completely isolated environments and work on each one of them separately. From here you can expand your use of docker and learn how to deploy your work on the cloud easily, either to expand your computational resources or to deploy your work on a production server.

Have a nice time Making your Machine CREATIVE.

How to Create a Reliable Python Environment Using Docker: Tips for Data Science

Table of contents:

Docker: Key terms and definitions

What’s Docker?

Containers?

Why Docker?

Here’s how to do it:

Fritz

Comments 0 Responses

Leave a Reply Cancel reply