Running RStudio Server on Amazon EC2

I recently purchased an ASUS Chromebook as I wanted an affordable and portable (lightweight and decent battery life) laptop. The Chromebook can run Linux (Debian by default) using ChromeOS's Linux development environment, so it essentially has everything I need for work.

RStudio is an essential tool for my work and I could install and use RStudio but as you might expect, it's a bit slow on a Chromebook. As an alternative, I tried Posit Cloud, which is a service that makes it extremely easy to use RStudio (and Jupyter Notebook) in the cloud. A free account comes with 25 hours of usage that gets refreshed every month. If that's not enough, you can upgrade to a Plus plan that costs 5 USD + tax and has an additional 50 hours (total 75 hours) of usage. The resource allocation for the free and Plus plans are 1 CPU and 1GB of memory. Some objects created in my work definitely occupies more than 1GB of memory, so that's a deal-breaker. The Cloud Premium plan provides more resources (200 hours, 16GB memory, and 4 CPUs) but costs 99 USD + tax a month. In this post I set up my own RStudio cloud service on an EC2 instance, which I think will be cheaper than Posit Cloud but requires some background knowledge of Amazon Web Services and Unix.

To follow this post, you will need an AWS account and it should cost you less than 0.10 USD + tax if you terminate the EC2 instance and delete the EBS Storage within an hour. If you have followed this post and it costed more, please let me know! If you're eligible for the AWS Free Tier please select options that lie within the tree tier instead.

The first step is to launch an EC2 instance and I have used Ubuntu 22.04 on a t2.medium instance type, which costs 0.0608 USD per Hour in the Tokyo region, and an existing key pair. This type of instance can only use Amazon's EBS Storage (please read this post to get more acquainted with EBS) and incur separate costs on top of the EC2 instance cost. For the storage, I created a 20G EBS volume (gp2), which I estimated would cost 2.4 USD a month if I retained it for a month. If this volume is deleted within an hour, it should cost 0.003 USD. In the security group and Inbound rules add your IP to the SSH source and also open port 8889 by selecting custom TCP and for Source enter 0.0.0.0/0, which allows access from any IP address. Double-check your settings and then click launch instance.

On the instance page get the IP address and then SSH into the instance using your key pair. Remember this IP because we need it to SSH into the instance and to access RStudio.

ssh -i my_key ubuntu@43.206.161.48

We will use Docker to run RStudio server, so we need to install Docker on Ubuntu first.

sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release \
    unzip

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

sudo usermod -aG docker ubuntu
logout

We added the docker group to the ubuntu user, so we need to logout and log back in so that the changes are applied. You should see the Hello from Docker! message if all went well.

ssh -i my_key ubuntu@43.206.161.48
docker run hello-world

# --snip--
# Hello from Docker!
# This message shows that your installation appears to be working correctly.
# --snip--

We'll install AWS Command Line Interface (AWS CLI) on this instance so that we can transfer our analysis results to S3. Run the following steps below:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

aws --version
# aws-cli/2.9.4 Python/3.9.11 Linux/5.15.0-1026-aws exe/x86_64.ubuntu.22 prompt/off

You can follow this post on how to setup S3. I have created a bucket called davetangrstudioec2.

# enter IAM details
aws configure

echo hello > test.txt
aws s3 cp test.txt s3://davetangrstudioec2/
# upload: ./test.txt to s3://davetangrstudioec2/test.txt

aws s3 rm s3://davetangrstudioec2/test.txt
# delete: s3://davetangrstudioec2/test.txt

Alternatively instead of S3, you can use GitHub to store your analysis results. Follow this guide to generate a SSH key. Then create ~/.ssh/config to contain the following:

Host github.com
 HostName github.com
 User git
 IdentityFile ~/location/of/key

Change the permissions of the config and SSH key and check whether the setup worked.

chmod 600 ~/.ssh/config
chmod 600 ~/location/of/key

ssh -T git@github.com
# Hi davetang! You've successfully authenticated, but GitHub does not provide shell access.

Remember to configure your Git username and email.

git config --global user.name "Langdon Alger"
git config --global user.email "langdon_alger@gmail.com"

Finally, run RStudio Server with Docker on the EC2 instance.

git clone https://github.com/davetang/learning_docker.git
cd learning_docker/rstudio
./run_rstudio.sh

# --snip--
# rstudio_ml listening on port 8889

Now open your favourite browser and enter the IP address of your instance with port 8889. In my case I entered http://43.206.161.48:8889 and arrived at the Sign in to RStudio login screen. The username is rstudio and password is password.

The run_rstudio.sh script we ran before to start the RStudio Server container, will mount ~/github to /home/rstudio/work. You can modify the run_rstudio.sh to mount /home/ubuntu to /home/rstudio if you like, so that when you open RStudio you can see all your local files in the Files pane. Since AWS CLI and GitHub have been set up, you can copy or push your changes from the instance to more persistant (and cheaper) storage options.

As mentioned before, keeping a 20G EBS gp2 volume permanently (without snapshots) should cost 2.4 USD a month. After installing all the required tools, less than 5G was used, so you can use a 10G volume instead to save some money if you don't need much space.

df -h
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/root        20G  4.7G   15G  25% /
# tmpfs           2.0G     0  2.0G   0% /dev/shm
# tmpfs           785M  948K  784M   1% /run
# tmpfs           5.0M     0  5.0M   0% /run/lock
# /dev/xvda15     105M  5.3M  100M   5% /boot/efi
# tmpfs           393M  4.0K  393M   1% /run/user/1000

Another alternative for saving some money is to save all the steps in this post into a script and each time you want to use RStudio, start a new instance and run the script to perform all the required setup. When you're finished, store your changes in S3 or GitHub and then terminate the instance and delete the EBS volume. We can create an Amazon Machine Image from our instance, instead of setting up all the tools from scratch, but the image is stored in S3 and will cost a little and it doesn't take long to setup all the required tools.

Simply download the script, make it executable, and run it!

wget https://gist.githubusercontent.com/davetang/7bc90ba0cb35acb32fa2271fb2d5f25e/raw/3d9b2f69558872b29ecfdb96dc51a1b4caf427bf/ec2_docker_setup.sh

chmod 755 ec2_docker_setup.sh
sudo ./ec2_docker_setup.sh

One nice thing about the EBS volume is that it is independent from an instance. Therefore, if we need more computational resources for an analysis, we can start an instance type with more resources and mount the same EBS volume.

A t2.xlarge instance (4 CPUs and 16G memory) is similar to the resources provided by a Cloud Premium plan on Posit cloud costs 0.2432 USD an hour (on demand) in the Tokyo region. 200 hours on this instance will cost a minimum of 48.64 USD (there may be additional charges I'm not aware of) for computation but is still cheaper (even when paying for the EBS volume). However, as mentioned before, setting up our own RStudio cloud requires a lot more work and potential surprises in costs (which is why you should set up a billing alert if you do plan on using AWS).

Finally, if you were just following this post and have no intention of using this setup, make sure to terminate your instance and that the EBS volume is deleted.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
One comment Add yours

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.