Using Docker to Build Data Science Environments with Anaconda

I have been using Docker to create environments for data science work. With Docker, I was able to painlessly create the environments with a degree of accuracy and consistency. After getting exposed to using Docker for environment creation, it is hard to imagine doing it any other ways.

For more information, I encourage you to check out the Anaconda images and this blog post about using Anaconda and Docker.

Step 1: Create VM and update OS as necessary

I created virtual machines on VMware using CentOS 7 and make it accessible through bridged networking. I also used CentOS’ minimum installation as it just needs the basic components to run Docker. We will need to access the VM via SSH with the port (8080 in my case) opened for the Jupyter notebook instance.

Step 2: Access the VM via SSH (through a non-root user call docker_admin) and install Git with the sudo command.

Step 3: Install Docker for the non-root docker_admin user. Verify the installation with the command “docker image ls.”

More information on installing Docker CE can be found at here and here.

It boils down to:

sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo \
  https://download.docker.com/linux/centos/docker-ce.repo
sudo yum -y install docker-ce
sudo systemctl start docker && sudo systemctl enable docker
sudo usermod -aG docker docker_admin

Step 4: For my environments, I need to clone some Python template scripts. This step is not mandatory if you do not require it.

git clone https://github.com/daines-analytics/template-latest.git examples

For my environments, I also need to make some environment variables accessible to the scripts. Again, this step may not be mandatory for your installation.

scp docker_env.txt cloud_user@<IP_Address>:/home/cloud_user

Step 5: Create the Dockerfile or use the one from the template directory

FROM continuumio/anaconda3
LABEL com.dainesanalytics.anaconda.version=v1.0
EXPOSE 8080
RUN conda install -c conda-forge -y --freeze-installed imbalanced-learn xgboost
RUN useradd -ms /bin/bash dev_user
USER dev_user
WORKDIR /home/dev_user
COPY --chown=dev_user:dev_user examples/ /home/dev_user
CMD /opt/conda/bin/jupyter notebook --ip=0.0.0.0 --port=8080 --no-browser --notebook-dir=/home/dev_user

Step 6: Build the Docker image with the command:

docker image build -t anaconda3/nonroot:v1 .

Step 7: Run the Docker container with the command:

docker container run --rm --env-file docker_env.txt -p 8080:8080
--name jupyter-server anaconda3/nonroot:v1

Step 8: After we are done with the container and/or the virtual machine, we can shut down the container with the command:

docker container stop [container ID]

The templates (Python and Docker) can be found here on GitHub.