I have been using Docker to create environments for data science work. With Docker, I was able to painlessly create the environments with a degree of accuracy and consistency. After getting exposed to using Docker for environment creation, it is hard to imagine doing it any other ways.
Step 1: Create VM and update OS as necessary
I created virtual machines on VMware using CentOS 7 and make it accessible through bridged networking. I also used CentOS’ minimum installation as it just needs the basic components to run Docker. We will need to access the VM via SSH with the port 8787 opened for the RStudio server instance.
Step 2: Access the VM via SSH (through a non-root user call docker_admin) and install Git with the sudo command.
Step 3: Install Docker for the non-root docker_admin user. Verify the installation with the command “docker image ls.”
It boils down to:
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum -y install docker-ce
sudo systemctl start docker && sudo systemctl enable docker
sudo usermod -aG docker docker_admin
Step 4: For my environments, I need to clone some R template scripts. This step is not mandatory if you do not require it.
git clone https://github.com/daines-analytics/template-latest.git examples
For my environments, I also need to make some environment variables accessible to the scripts. Again, this step may not be mandatory for your installation.
scp .Renviron cloud_user@<IP_Address>:/home/cloud_user
Step 5: Create the Dockerfile or use the one from the template directory
FROM rocker/verse LABEL com.dainesanalytics.rstudio.version=v1.0 RUN Rscript -e "install.packages(c('knitr', 'tidyverse', 'caret', 'corrplot', 'mailR', 'DMwR', 'ROCR', 'Hmisc', 'randomForest', 'e1071', 'elasticnet', 'gbm', 'xgboost'))" COPY --chown=rstudio:rstudio .Renviron /home/rstudio COPY --chown=rstudio:rstudio examples/ /home/rstudio
My environments require many of the machine learning packages, but these packages may not be mandatory for your installation.
Step 6: Build the Docker image with the command:
docker image build -t rstudio/nonroot:v1 .
Step 7: Run the Docker container with the command:
docker container run --rm -e PASSWORD=rserver -p 8787:8787 --name rstudio-server rstudio/nonroot:v1
The password can be any string, and the RStudio Server just requires one.
Step 8: After we are done with the container and/or the virtual machine, we can shut down the container with the command:
docker container stop [container ID]
The templates (R and Docker) can be found here on GitHub.