The Infrastructure

In the following steps, we are going to create the initial infrastructure for our learning project. These are the servers that will be used to host the cluster, the registry, and any other services that we may need:

A workspace Kubernetes cluster: We will use K3s to deploy a single-node Kubernetes cluster that will be used as a workspace to manage the other clusters. Rancher Manager will be installed on this cluster.

We will also need a private registry for Docker images and a Git server to host our code. To optimize resources, we will deploy both services on the same cluster (the workspace cluster). However, we will consider them as separate services (even if they are running on the same cluster) throughout this guide.

We will also install and configure other tools on this cluster.

A workload cluster: We are going to create a cluster running RKE2 and use Rancher to manage it. The size of this cluster will change throughout the guide as we add more components and services to it.

This cluster will be used to run our applications. When we will put in place a GitOps workflow using Fleet, we will use this cluster to deploy our sample application. It will also be used to deploy Longhorn and NeuVector later.

Other components and services will be added to these clusters later in the guide.

For both clusters, we are going to use DigitalOcean to create Ubuntu 24.04 servers to host the clusters. DigitalOcean can be replaced with any other cloud provider, including private providers like OpenStack or VMware. The choice of DigitalOcean is based on the simplicity and ease of use of the platform (you can use my referral link to get $200 in free credit for 60 days on DigitalOcean).

We are also going to use the following versions:

K3s: We will use v1.31.4+k3s1.
RKE2: We will use v1.31.4+rke2r1.

If you want to use another Ubuntu version, another distribution or another version of K3s or RKE2, you should check the compatibility matrix to make sure that the versions are compatible with the operating system you would like to use:

These are the list of machines that we are going to create our two clusters:

workspace: This is the name of the server that we are going to use to deploy the first cluster (the workspace).
rke2-controlplane-01: This is the name of the control plane node of the second Kubernetes cluster (the workload cluster).
rke2-worker-01: This is the name of the worker node of the second Kubernetes cluster.
rke2-extlb-01: We will use this machine in an external load balancer pool for the second Kubernetes cluster. More details will be provided later.

It's important to note that in a production-grade environment, not all your nodes should have public IPs. However, for the sake of simplicity and since networking is not the focus at this point, we are going to assign public IPs to all the machines. We are also using Digital Ocean as a cloud, and by default, all droplets (virtual machines) are created with a public IP. Digital Ocean provides a firewall service that can be used to restrict access to the machines even if they have a public IP. Many other cloud providers offer similar services like AWS Security Groups, Google Cloud VPC Firewall, etc.

For the size of the machines, we are going to use 3 vCPU and 8GB RAM machines, which are good enough for running K3s and RKE2 clusters. You can use smaller or larger machines depending on your needs, but make sure to follow the official recommendations and requirements:

We are going to start with a minimal cluster with one control plane node and one worker node. Adding additional nodes to the cluster is easy once the cluster is up and running. Let's start.

Install zip, unzip, and jq on your local machine and install Terraform.

# install zip and unzip
apt update && apt install zip unzip jq -y

# Set the Terraform version
TERRAFORM_VERSION="1.10.3"
TERRAFORM_ZIP="terraform_${TERRAFORM_VERSION}_linux_amd64.zip"
TERRAFORM_URL="https://releases.hashicorp.com/terraform/${TERRAFORM_VERSION}/${TERRAFORM_ZIP}"

# Download and extract the Terraform binary
curl -LO $TERRAFORM_URL
unzip $TERRAFORM_ZIP
sudo mv terraform /usr/local/bin/

Create a directory where we will store some initial files.

# First of all, choose a directory where you want to store the files we will use.
PROJECT_NAME="learning-rancher"

# Create the folder structure
mkdir -p $PROJECT_NAME

Generate SSH keys for the servers. We will use the same key for all the servers.

# Create a unique name for the SSH key to avoid conflicts
# with other keys in your ~/.ssh directory
# Make sure you are not overwriting an existing key
SSH_UNIQUE_NAME="$HOME/.ssh/$PROJECT_NAME"

# generate the keys (public and private)
# This will overwrite the keys if they already exist
ssh-keygen -t rsa \
    -b 4096 \
    -C "$PROJECT_NAME" \
    -f $SSH_UNIQUE_NAME -N "" \
    <<< y

# add the key to the ssh-agent
ssh-add $SSH_UNIQUE_NAME

Export the Digital Ocean token as well as other variables that we will use later when creating the servers using Terraform.

# Export the DigitalOcean token.
# Get one here: https://cloud.digitalocean.com/account/api/tokens
export DIGITALOCEAN_TOKEN="[CHANGE_ME]"

# Choose the best region for you.
# More options here: https://www.digitalocean.com/docs/platform/availability-matrix/
export DIGITALOCEAN_REGION="fra1"

# I recommend using Ubuntu 24.04 for this project.
export DIGITALOCEAN_IMAGE="ubuntu-24-04-x64"

# SSH key variables
export DIGITALOCEAN_SSH_KEY_NAME="$SSH_UNIQUE_NAME"
export DIGITALOCEAN_SSH_PUBLIC_KEY_PATH="$SSH_UNIQUE_NAME.pub"
export DIGITALOCEAN_SSH_PRIVATE_KEY_PATH="$SSH_UNIQUE_NAME"

# VPC variables.
# You can use the default VPC or create a new one.
# Use doctl to get the VPC UUID (`doctl vpcs list`)
export DIGITALOCEAN_VPC_UUID="[CHANGE_ME]"
export DIGITALOCEAN_PROJECT_NAME="$PROJECT_NAME"

# Workspace cluster variables
export DIGITALOCEAN_WORKSPACE_VM_NAME="workspace"
export DIGITALOCEAN_WORKSPACE_VM_SIZE="s-4vcpu-8gb"

# Workload cluster variables
export DIGITALOCEAN_WORKLOAD_VMS_NAMES='["rke2-controlplane-01", "rke2-node-01", "rke2-extlb-01"]'
export DIGITALOCEAN_WORKLOAD_VMS_SIZE="s-4vcpu-8gb"

Create a Terraform workspace file that will be used to store the variables for the workspace server.

# Create a Terraform variable file.
cat << EOF > $PROJECT_NAME/variables.tf
variable "region" {
  default = "${DIGITALOCEAN_REGION}"
}
variable "image" {
  default = "${DIGITALOCEAN_IMAGE}"
}
variable "vpc_uuid" {
  default = "${DIGITALOCEAN_VPC_UUID}"
}
variable "workspace_vm_size" {
  default = "${DIGITALOCEAN_WORKSPACE_VM_SIZE}"
}
variable "workspace_vm_name" {
  default = "${DIGITALOCEAN_WORKSPACE_VM_NAME}"
}
variable "workload_vms_size" {
  default = "${DIGITALOCEAN_WORKLOAD_VMS_SIZE}"
}
variable "workload_vms_names" {
  default = ${DIGITALOCEAN_WORKLOAD_VMS_NAMES}
}
variable "project_name" {
  default = "${DIGITALOCEAN_PROJECT_NAME}"
}
variable "ssh_key_name" {
  default = "${DIGITALOCEAN_SSH_KEY_NAME}"
}
variable "ssh_public_key_path" {
  default = "${DIGITALOCEAN_SSH_PUBLIC_KEY_PATH}"
}
variable "ssh_private_key_path" {
  default = "${DIGITALOCEAN_SSH_PRIVATE_KEY_PATH}"
}
EOF

Let's move to creating the Terraform script that will launch our infrastructure.

cat << EOF > $PROJECT_NAME/main.tf
terraform {
  required_providers {
    digitalocean = {
      source = "digitalocean/digitalocean"
      version = "~> 2.0"
    }
  }
}

# Create a DigitalOcean project
resource "digitalocean_project" "learning_rancher_project" {
  name        = var.project_name
  description = "Project for managing Rancher VMs"
  purpose     = "Development"
  environment = "Development"
}

data "digitalocean_project" "project" {
  depends_on = [digitalocean_project.learning_rancher_project]
  name       = var.project_name
}

# Resource: Define the SSH key to be used for all VMs
resource "digitalocean_ssh_key" "default_ssh_key" {
  name       = var.ssh_key_name
  public_key = file(var.ssh_public_key_path)
}

# Resource: Workload VMs
resource "digitalocean_droplet" "workload_vms" {
  for_each = { for name in var.workload_vms_names : name => name }

  image      = var.image
  name       = each.value
  region     = var.region
  size       = var.workload_vms_size
  ssh_keys   = [digitalocean_ssh_key.default_ssh_key.id]
  monitoring = false
  vpc_uuid   = var.vpc_uuid

  connection {
    agent       = false

End-to-End Kubernetes with Rancher, RKE2, K3s, Fleet, Longhorn, and NeuVector

The full journey from nothing to production

Enroll now to unlock all content and receive all future updates for free.

Unlock now $35.00 Learn More

Previous Next