I have been using Docker to create environments for data science work. With Docker, I was able to painlessly create the environments with a degree of accuracy and consistency. After getting exposed to using Docker for environment creation, it is hard to imagine doing it any other ways.
For more information, I encourage you to check out the Anaconda images and this blog post about using Anaconda and Docker.
Step 1: Create VM and update OS as necessary
I created virtual machines on VMware using CentOS 7 and make it accessible through bridged networking. I also used CentOS’ minimum installation as it just needs the basic components to run Docker. We will need to access the VM via SSH with the port (8080 in my case) opened for the Jupyter notebook instance.
Step 2: Access the VM via SSH (through a non-root user call docker_admin) and install Git with the sudo command.
Step 3: Install Docker for the non-root docker_admin user. Verify the installation with the command “docker image ls.”
More information on installing Docker CE can be found at here and here.
It boils down to:
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo \ https://download.docker.com/linux/centos/docker-ce.repo
sudo yum -y install docker-ce
sudo systemctl start docker && sudo systemctl enable docker
sudo usermod -aG docker docker_admin
Step 4: For my environments, I need to clone some Python template scripts. This step is not mandatory if you do not require it.
git clone https://github.com/daines-analytics/template-latest.git examples
For my environments, I also need to make some environment variables accessible to the scripts. Again, this step may not be mandatory for your installation.
scp docker_env.txt cloud_user@<IP_Address>:/home/cloud_user
Step 5: Create the Dockerfile or use the one from the template directory
FROM continuumio/anaconda3 LABEL com.dainesanalytics.anaconda.version=v1.0 EXPOSE 8080 RUN conda install -c conda-forge -y --freeze-installed imbalanced-learn xgboost RUN useradd -ms /bin/bash dev_user USER dev_user WORKDIR /home/dev_user COPY --chown=dev_user:dev_user examples/ /home/dev_user CMD /opt/conda/bin/jupyter notebook --ip=0.0.0.0 --port=8080 --no-browser --notebook-dir=/home/dev_user
Step 6: Build the Docker image with the command:
docker image build -t anaconda3/nonroot:v1 .
Step 7: Run the Docker container with the command:
docker container run --rm --env-file docker_env.txt -p 8080:8080 --name jupyter-server anaconda3/nonroot:v1
Step 8: After we are done with the container and/or the virtual machine, we can shut down the container with the command:
docker container stop [container ID]
The templates (Python and Docker) can be found here on GitHub.