Provisioning AWS Ubuntu-based EC2 for Python Deep Learning Projects

Below are my notes for provisioning a Python3/Jupyter server using AWS’ Ubuntu AMI. These notes captured my installation process during the week of 25 November 2019.

My goal for this document is to list the various steps required to create a GPU-equipped Linux instance in AWS that can be used for machine/deep learning project in Python.

Key Abbreviations

  • AWS: Amazon Web Services
  • VPC: Virtual Private Cloud
  • EC2: Elastic Compute Cloud
  • IAM: Identity and Access Management
  • AMI: Amazon Machine Image

Requirements

I needed to find a workable configuration for modeling deep learning problems using a GPU-equipped Linux EC2 instance in AWS. The Linux instance needs to support Python 3, Jupyter Notebook/Lab, and TensorFlow 2.0.

AWS does offer deep learning AMIs that can support Python 3, Jupyter, and TensorFlow 1.x. I used this exercise to build a customized Ubuntu Linux instance with TensorFlow 2.0.

Background and Prerequisite Information

The following tools and assumptions were present before the provisioning of the cloud instance.

  • AWS Console with the necessary rights and configuration elements to launch an instance. I had configured a VPC subnet, an IAM role, a security group, and a key pair for setting up the instance.
  • Web browsers
  • Access to SSH

AMI: I performed the following steps using the Ubuntu Server 18.04 LTS x86 AMI (ami-04b9e92b5572fa0d1)

EC2: I used the GPU instance p2.xlarge with 4 CPUs and 61 GiB of memory.

VPC: This exercise requires only a subnet that is accessible via the Internet.

Security Group: I configured the security group to allow only TCP ports 22 from any IP address because I had planned to use an SSH tunnel to access the Jupyter server.

IAM Role: I assign all my AWS instances to an IAM role by default. For this exercise, an IAM role is not critical.

Key Pair: I attached the instance to an existing key pair. The key pair is necessary to access the instance via the SSH protocol.

Additional Reference: TensorFlow > Install > GPU Support (https://www.tensorflow.org/install/gpu)

Provisioning the AWS instance

Step 1) Create and launch the instance. Access the instance with the ssh command (with tunneling enabled for port 8888):

ssh -i ~/.ssh/da-ml-keypair.pem -L localhost:8888:localhost:8888 ubuntu@<instance IP>

Step 2) Update the Ubuntu installation.

sudo apt-get update

sudo apt-get upgrade -y

Step 3) Add NVIDIA package repositories.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb

sudo apt-key adv –fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub

sudo apt-get update

wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt-get update

Step 4) Install the NVIDIA driver (version 440 as of this writing).

sudo apt-get install -y nvidia-driver-440

Step 5) Reboot the instance and ssh back into it. Run the nvidia-smi command to check that GPUs are visible.

Step 6) Install CUDA development and runtime libraries. (Large install with ~4GB of files)

sudo apt-get install -y cuda-10-0 libcudnn7 libcudnn7-dev

Step 7) Verify Python installation and install/upgrade PIP as necessary. TensorFlow 2 packages require a pip version >19.0.

sudo apt install -y python3-pip

sudo pip3 install –upgrade pip

Step 8) Install the necessary Python machine/deep learning modules. Some screenshots were skipped.

sudo pip3 install numpy scipy matplotlib ipython sympy pandas jupyterlab

sudo pip3 install PyMySQL imbalanced-learn xgboost scikit-learn statsmodels

sudo pip3 install seaborn pmdarima lxml html5lib requests beautifulsoup4

sudo pip3 install tensorflow tensorflow-gpu keras

Step 9) Start up iPython and verify the TensorFlow and GPU installations.

In [1]: import tensorflow as tf

In [2]: tf.__version__

In [3]: tf.config.experimental.list_physical_devices()

The physical device types of CPU and GPU (not just XLA_CPU and XLA_GPU) should appear.

Step 10) Start up the Jupyter Notebook/Lab server.

jupyter lab –no-browser

Step 11) Start a browser session to access the Jupyter Notebook/Lab server in AWS. The web browser should be run on the same workstation, where we started the ssh tunneling session.

There you have it! A working Jupyter server on an AWS cloud instance that you can access via a secured protocol. Now load up your favorite scripts and let them run.

Step 12) Shut down the un-used GPU instance to save money.