Below are my notes for provisioning an R server using AWS’ Linux 2 AMI. These notes captured my installation process during the week of 23 July 2018.
My goal for this document is to list the various reference points where you can find the step-by-step setup instructions for this provisioning task. This post will also comment on the obstacles I had run into during the provisioning, and what I needed to do to get past those obstacles.
You can find the installation and configuration notes here on the website.
- AWS: Amazon Web Services
- VPC: Virtual Private Cloud
- EC2: Elastic Compute Cloud
- IAM: Identity and Access Management
- AMI: Amazon Machine Image
- DLAMI-DG: Deep Learning AMI Developer Guide
Needed to find workable configurations for modeling machine learning problems by exploring the use of AWS Ec2 instances. The source code language was in R and contained in an RMD script format.
The baseline performance was defined by running the Python script on a Dell Latitude E7450 with Intel i7-5600U CPU at 2.60GHz, 16GB RAM, and Windows 10. While this can be a decent configuration for ML modeling, occasionally we may need a system with larger memory capacity or more CPUs for the tasks at hand.
Background and Prerequisite Information
The following tools and assumptions were present prior to the provisioning of the cloud instance.
- AWS Console with the necessary rights and configuration elements to launch an instance. I had configured a VPC subnet, an IAM role, a security group, and a key pair for setting up the instance.
- AWS Deep Learning AMI Developer Guide, released June 6, 2018
- Web browsers
AWS Configuration Notes
AMI: I performed the following steps using Amazon Linux 2 AMI with an m5.large general-purpose instance.
VPC: This exercise requires only a subnet that is accessible via the Internet.
Security Group: I configured the security group to allow only TCP ports 22 from any IP address because I had planned to use an SSH tunnel to access the R server.
IAM Role: I assign all my AWS instances to an IAM role by default. For this exercise, an IAM role is not critical.
Key Pair: I attached the instance to an existing key pair. The key pair is necessary to access the instance via the SSH protocol.
Provision an instance with the Amazon Deep Learning AMI
Step 1) Create and launch the instance. I used an m5.large instance as the starting point.
Step 2) Install R base package (R3.4 as of this writing).
$ sudo amazon-linux-extra install R3.4
Step 3) Install R Server with the following commands. Check www.rstudio.org for the latest release of the server.
$ sudo yum install rstudio-server-rhel-1.1.456-x86_64.rpm
Step 4) Configure the client workstation to connect to the R server. I configured my Windows workstation to connect to the R server using an SSH tunnel. The DLAMI-DG document has a write-up on how to do this for Windows, Linux, and MacOS clients (pages 15-20).
See the PuTTY screenshot below for configuring an SSH tunnel.
Step 5) Install Git.
$ sudo yum install -y git
Step 6) Add an user to access the R server.
$ sudo useradd rstudio
$ echo rstudio:rstudio | sudo chpasswd
Step 6) Start a browser on the workstation running the SSH tunnel and point to the URL http://localhost:8787. A login screen should appear.
Step 7) Go to the Terminal tab and run the “git clone” commands to copy my R scripts from GitHub to the cloud server. Locate the R script and run it.
There you have it! A working R server on an AWS cloud instance that you can access via a secured protocol. Now install your favorite packages and let the scripts run.
When compared to a client workstation, the right types of cloud instance can help our modeling effort. For anyone who will be attempting a similar installation, I hope these instructions can help in some way. My next step is to automate the instance creation with a CloudFormation script further. I will write down what I run into and share my findings later.