Cloud RADIUS: How to Build a Highly-Available and Secure Authentication Cluster
Creating a high availability (HA) RADIUS cluster in the cloud is a complex but crucial step for ensuring that your network authentication and authorization services are always available to your customers. In this blog post, I will discuss the right way to create a HA RADIUS cluster in the cloud.
Step 1: Choose the right cloud provider
The first step in creating a HA RADIUS cluster in the cloud is to choose the right cloud provider. There are several popular cloud providers, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Each provider has its own strengths and weaknesses, so it is important to carefully evaluate your options and choose the one that best meets your needs. Location is important, as RADIUS requires low latency connections, so choose a location close to your own.
In my design, I make use of all three cloud providers and spread the RADIUS cluster across the US East & West coasts, as well as into Europe. This will add to the cost but provides for a virtually always-on design. Only a disaster of global proportions could take all three locations down.
Step 2: Create a virtual private cloud (VPC)
Once you have chosen a cloud provider, the next step is to create a virtual private cloud (VPC). A VPC allows you to create a virtual network in the cloud, which can be used to create and configure your RADIUS cluster. This includes creating subnets, security groups, and network ACLs. All AAA RADIUS traffic from inside your network will then travel securely over this tunnel, with minimal additional delay. Cross-cloud cluster synchronisation traffic will also make use of this secure tunnel for replication traffic.
Step 3: Create a load balancer
To ensure that your RADIUS cluster is highly available, it is important to create a load balancer. A load balancer distributes incoming traffic across multiple servers, which helps to ensure that your network authentication and authorization services are always available to your users. This is especially important in a cloud environment where instances can fail or be terminated.
In my design, I make use of multiple load balancers. The internal network uses a load balancer to determine to which cloud to send a RADIUS packet. Once the packet arrives at the destination cloud, the cloud load balancer then directs the packet to the appropriate internal RADIUS container that will process the request.
Note that load balancers can mask the original source IP, making it tricky for the downstream RADIUS to know where to send the reply.
Step 4: Create a MySQL database cluster
For RADIUS to work across multiple clouds and Kubernetes pods, it needs to make use of a shared database. I use Percona Server for MySQL for this purpose, and to ensure high availability the database will also be clustered across the same three clouds, in an active active design. This not only increases resilience, but also performance, as load balancing will be performed across all database nodes.
The secure VPN tunnel is used to perform database replication and coordinate transactions.
Step 5: Create a RADIUS cluster
The final step in creating a HA RADIUS cluster in the cloud is to create the actual RADIUS cluster. There are several different ways to do this, depending on your cloud provider. For example, on AWS, you can use the Elastic Load Balancing service to create a HA RADIUS cluster. On Azure, you can use the Azure Load Balancer service. And on GCP, you can use the Cloud Load Balancing service.
Step 6: Test and Monitor
In my design, I use containers to run multiple RADIUS instances on each cloud. On the Azure cloud, I make use of Azure Kubernetes Service (AKS), on the AWS cloud I make use of Amazon Elastic Kubernetes Service (EKS), and on Google cloud, I use Google Kubernetes Engine (GKE). All three come with their own internal load balancers and automatic scaling by spinning up additional nodes as required.
Once you have created the HA RADIUS cluster, it is important to test and monitor it to ensure that it is working as expected. This can include testing the authentication and authorization services, monitoring the load balancer, and keeping an eye on the overall health of the cluster.
Each cloud provider uses their own monitoring tools, but for a complete overview, I suggest building a combined dashboard in a cloud-based performance monitoring tool like DataDog.
The result – a highly resilient RADIUS Cluster
By following these steps, you can create a highly available RADIUS cluster in the cloud that will help ensure that your network authentication and authorization services are always available to your users.
If you follow my design, you will end up with a RADIUS cloud that has multiple levels of redundancy. Should an individual pod within one of the Kubernetes clusters fail, the cloud container management system will automatically redirect traffic to a healthy pod whilst it recycles the faulty pod.
Should an entire cloud go down, the local load balancer will simply stop using it and redirect traffic to one of the other two clouds. In the unlikely event that all cloud providers are unavailable, I suggest using a disaster RADIUS, which will simply accept all authentication and authorization requests until the outage has been restored.
Final comments
It’s also important to note that you should always keep your RADIUS servers patched and updated to ensure that you are protected against any known vulnerabilities.
Additionally, it’s also critical to use a backup solution to ensure you can restore your cluster in case of any accidental deletion or failure.
It’s important to keep in mind that creating a HA RADIUS cluster in the cloud is a complex process, and it may be beneficial to work with a cloud expert or managed service provider to help you implement and maintain it.