5. Production-Ready Kubernetes Clusters – Serverless Architectures with Kubernetes

5. Production-Ready Kubernetes Clusters

Learning Objectives

By the end of this chapter, you will be able to:

  • Identify the requirements of Kubernetes cluster setup
  • Create a production-ready Kubernetes cluster in Google Cloud Platform (GCP)
  • Manage cluster autoscaling to add new servers to a Kubernetes cluster
  • Migrate applications in production clusters

In this chapter, we will learn about the key considerations for the setup of Kubernetes. Following that, we will also study the different Kubernetes platform options. Then, we move on to creating a production-ready Kubernetes cluster in cloud platforms and performing administrative tasks.

Introduction

In the previous chapter, we created Kubernetes clusters for the development environment and installed applications into it. In this chapter, the focus will be on production-ready Kubernetes clusters and how to administer them for better availability, reliability, and cost optimization.

Kubernetes is the de facto system for managing microservices running as containers in the cloud. It is widely adopted in the industry by both start-ups and large enterprises for running various kinds of applications, including data analysis tools, serverless apps, and databases. Scalability, high availability, reliability, and security are the key features of Kubernetes that enable its adoption. Let's assume that you have decided to use Kubernetes, and hence you need a reliable and observable cluster setup for development and production. There are critical considerations that depend on your requirements, budget, and team before choosing a Kubernetes provider and how to operate the applications. There are four key considerations to analyze:

  • Service Quality: Kubernetes runs microservices in a highly available and reliable way. However, it is critical to install and operate Kubernetes reliably and robustly. Let's assume you have installed the Kubernetes control plane into a single node in the cluster, and it was disconnected due to a network problem. Since you have lost the Kubernetes API server connectivity, you will not be able to check the status of your applications and operate them. Therefore, it is essential to evaluate the service quality of the Kubernetes cluster you need for your production environment.
  • Monitoring: Kubernetes runs containers that are distributed to the nodes and enables checking their logs and statuses. Let's assume that you rolled out a new version of your application yesterday. Today, you want to check how the latest version is working for any errors, crashes, and response time. Therefore, you need a monitoring system integrated into your Kubernetes cluster to capture logs and metrics. The collected data is essential for troubleshooting and diagnosis in a production-ready cluster.
  • Security: Kubernetes components and client tools work in a secure way to manage the applications running in the cluster. However, you need to have specific roles and authorization levels defined for your organization to operate Kubernetes clusters securely. Hence, it is essential to choose a Kubernetes provider platform that you can securely connect to and share with your customers and colleagues.
  • Operations: Kubernetes is the host of all applications, including services with data compliance, auditing, and enterprise-level requirements. Let's assume you are running the backend and frontend of your online banking application system on Kubernetes. For a chartered bank in your county, the audit logs of your applications should be accessible. Since you have deployed your entire system on Kubernetes, the platform should enable fetching audit logs, archiving them, and storing them. Therefore, the operational capability of the Kubernetes platform is essential for the production-ready cluster setup.

In order to decide how to install and operate your Kubernetes clusters, these considerations will be discussed for the Kubernetes platform options in this chapter.

Kubernetes Setup

Kubernetes is a flexible system that can be installed on various platforms from Raspberry Pi to high-end servers in data centers. Each platform comes with its advantages and disadvantages in terms of service quality, monitoring, security, and operations. Kubernetes manages applications as containers and creates an abstraction layer on the infrastructure. Let's imagine that you set up Kubernetes on the three old servers in your basement and then install the Proof of Concept (PoC) of your new project. When the project becomes successful, you want to scale your application and move to a cloud provider such as Amazon Web Services (AWS). Since your application is designed to run on Kubernetes and does not depend on the infrastructure, porting to another Kubernetes installation is straightforward.

In the previous chapter, we studied the development environment setup using minikube, the official method of Kubernetes. In this section, production-level Kubernetes platforms will be presented. The Kubernetes platforms for production can be grouped into threes, with the following abstraction layers:

Figure 5.1: Kubernetes platforms

Let's now look at each of these types, one by one.

Managed Platforms

Managed platforms provide Kubernetes as a Service, and all underlying services run under the control of cloud providers. It is easy to set up and scale these clusters since the cloud providers handle all infrastructural operations. Leading cloud providers such as GCP, AWS, and Microsoft Azure have managed Kubernetes solution applications, intending to integrate other cloud services such as container registries, identity services, and storage services. The most popular managed Kubernetes solutions are as follows:

  • Google Kubernetes Engine (GKE): GKE is the most mature managed service on the market, and Google provides it as a part of GCP.
  • Azure Kubernetes Service (AKS): AKS is the Kubernetes solution provided by Microsoft as a part of the Azure platform.
  • Amazon Elastic Container Service for Kubernetes (EKS): EKS is the managed Kubernetes of AWS.

Turnkey Platforms

Turnkey solutions focus on installing and operating the Kubernetes control plane in the cloud or in on-premise systems. Users of turnkey platforms provide information about the infrastructure, and the turnkey platforms handle the Kubernetes setup. Turnkey platforms offer better flexibility in setup configurations and infrastructure options. These platforms are mostly designed by organizations with rich experience in Kubernetes and cloud systems such as Heptio or CoreOS.

If turnkey platforms are installed on cloud providers such as AWS, the infrastructure is managed by the cloud provider, and the turnkey platform manages Kubernetes. However, when the turnkey platform is installed on on-premise systems, in-house teams should handle the infrastructure operations.

Custom Platforms

Custom installation of Kubernetes is possible if your use case does not fit into any managed or turnkey solutions. For instance, you can use Gardener (https://gardener.cloud) or OpenShift (https://www.openshift.com) to install Kubernetes clusters to cloud providers, on-premise data centers, on-premise virtual machines (VMs), or bare-metal servers. While the custom platforms offer more flexible Kubernetes installations, they also come with special operations and maintenance efforts.

In the following sections, we will create a managed Kubernetes cluster in GKE and administer it. GKE offers the most mature platform and the superior customer experience on the market.

Google Kubernetes Engine

GKE provides a managed Kubernetes platform backed by the experience that Google has of running containerized services for more than a decade. GKE clusters are production-ready and scalable, and they support upstream Kubernetes versions. In addition, GKE focuses on improving the development experience by eliminating the installation, management, and operation needs of Kubernetes clusters.

While GKE improves developer experience, it tries to minimize the cost of running Kubernetes clusters. It only charges for the nodes in the cluster and provides a Kubernetes control plane free of charge. In other words, GKE delivers a reliable, scalable, and robust Kubernetes control plane without any cost. For the servers that run the workload of your applications, the usual GCP Compute Engine pricing is applied. For instance, let's assume that you will start with two n1-standard-1 (vCPUs: 1, RAM: 3.75 GB) nodes:

The calculation would be as follows:

1,460 total hours per month

Instance type: n1-standard-1

GCE Instance Cost: USD 48.54

Kubernetes Engine Cost: USD 0.00

Estimated Component Cost: USD 48.54 per 1 month

If your application requires scalability with the higher usage and if you need 10 servers instead of 2, the cost will also scale linearly:

7,300 total hours per month

Instance type: n1-standard-1

GCE Instance Cost: USD 242.72

Kubernetes Engine Cost: USD 0.00

Estimated Component Cost: USD 242.72 per 1 month

This calculation shows that GKE does not charge for the Kubernetes control plane and provides a reliable, scalable, and robust Kubernetes API for every cluster. In addition, the cost linearly increases for scaling clusters, which makes it easier to plan and operate Kubernetes clusters.

In the following exercise, you will create a managed Kubernetes cluster in GKE and connect to it.

Note

In order to complete this exercise, you need to have an active GCP account. You can create an account on its official website: https://console.cloud.google.com/start.

Exercise 13: Creating a Kubernetes Cluster on GCP

In this exercise, we will create a Kubernetes cluster in GKE and connect to it securely to check node statuses. The Google Cloud Platform dashboard and CLI tools maintain a high level of developer experience. Therefore, if you need a production-ready Kubernetes cluster, you will have a fully functioning control plane and server nodes in less than 10 minutes.

To complete the exercise, we need to ensure the following steps are executed:

  1. Click Kubernetes Engine in the left menu under Compute on the Google Cloud Platform home page, as shown in the following figure:
    Figure 5.2: Google Cloud Platform home page
  2. Click Create Cluster on the Clusters page, as shown in the following figure:
    Figure 5.3: Cluster view
  3. Select Your first cluster in the left from Cluster templates and write serverless as the name. Click Create at the end of the page, as shown in the following figure:
    Figure 5.4: Cluster creation
  4. Wait a couple of minutes until the cluster icon becomes green and then click the Connect button, as you can see in the following figure:
    Figure 5.5: Cluster list
  5. Click Run in Cloud Shell in the Connect to the cluster window, as shown in the following figure:
    Figure 5.6: Connect to the cluster view
  6. Wait until the cloud shell is open and available and press Enter when the command is shown, as you can see in the following figure:
    Figure 5.7: Cloud shell

    The output shows that the authentication data for the cluster is fetched, and the kubeconfig entry is ready to use.

  7. Check the nodes with the following command in the cloud shell:

    kubectl get nodes

    Since the cluster is created with a single node pool of one node, there is only one node connected to the cluster, as you can see in the following figure:

    Figure 5.8: Node list
  8. Check for the pods running in the cluster with the following command in the cloud shell:

    kubectl get pods --all-namespaces

    Since GKE manages the control plane, there are no pods for api-server, etcd, or scheduler in the kube-system namespace. There are only networking and metrics pods running in the cluster, as shown in the following screenshot:

Figure 5.9: Pod list

With this exercise, you have created a production-ready Kubernetes cluster on GKE. Within a couple of minutes, GKE created a managed Kubernetes control plane and connected the servers to the cluster. In the following sections, administrating the clusters for production environments will be discussed, and the Kubernetes cluster from this exercise will be expanded.

Autoscaling Kubernetes Clusters

Kubernetes clusters are designed to run scalable applications reliably. In other words, if the Kubernetes cluster runs 10 instances of your application today, it should also support running 100 instances in the future. There are two mainstream methods to reach this level of flexibility: redundancy and autoscaling. Let's assume that the 10 instances of your application are running on 3 servers in your cluster. With the redundancy, you need at least 27 extra idle servers to be capable of running 100 instances in the future. It also means paying for the empty servers as well as operational and maintenance costs. With autoscaling, you need automated procedures to create or remove servers. Autoscaling ensures that there are no excessive idle servers and minimizes the costs while meeting the scalability requirements.

GKE Cluster Autoscaler is the out-of-box solution for handling autoscaling in Kubernetes clusters. When it is enabled, it automatically adds new servers if there is no capacity left for the workload. Similarly, when the servers are underutilized, the autoscaler removes the redundant servers. Furthermore, the autoscaler has a minimum and maximum number of servers defined to avoid limitless increases or decreases. In the following exercise, the GKE cluster autoscaler will be enabled for the Kubernetes cluster. Then the automatic scaling of the servers will be demonstrated by changing the workload in the cluster.

Exercise 14: Autoscaling a GKE Cluster in Production

In this exercise, we will enable and utilize the GKE cluster autoscaler in a production cluster. Let's assume that you need a large number of replicas of your application running in the cluster. However, it is not currently possible since you have a low number of servers. Therefore, you need to enable autoscaling and see how new servers are created automatically.

To successfully complete the exercise, we need to ensure the following steps are executed:

  1. Install nginx in the cluster by running the following command in the cloud shell:

    kubectl create deployment workload --image=nginx

    This command creates a deployment named workload from the nginx image, as depicted in the following figure:

    Figure 5.10: Deployment creation
  2. Scale the workload deployment to 25 replicas by running the following command in the cloud shell:

    kubectl scale deployment workload --replicas=25

    This command increases the number of replicas of the workload deployment, as shown in the following figure:

    Figure 5.11: Deployment scaling up
  3. Check the number of running pods with the following command:

    kubectl get deployment workload

    Since there is only 1 node in the cluster, 25 replicas of nginx could not run in the cluster. Instead, only 5 instances are running currently, as shown in the following figure:

    Figure 5.12: Deployment status
  4. Enable autoscaling for the node pool of the cluster using the following command:

    gcloud container clusters update serverless --enable-autoscaling  \

    --min-nodes 1 --max-nodes 10 --zone us-central1-a  \

    --node-pool pool-1

    Note

    Change the zone parameter if your cluster is running in another zone.

    This command enables autoscaling for the Kubernetes cluster with a minimum of 1 and a maximum of 10 nodes, as shown in the following figure:

    Figure 5.13: Enabling autoscaler

    This command can take a couple of minutes to create the required resources with the Updating serverless... prompt.

  5. Wait a couple of minutes and check for the number of nodes by using the following command:

    kubectl get nodes

    With autoscaling enabled, GKE ensures that there are enough nodes to run the workload in the cluster. The node pool is scaled up to four nodes, as shown in the following figure:

    Figure 5.14: Node list
  6. Check the number of running pods with the following command:

    kubectl get deployment workload

    Since there are 4 nodes in the cluster, 25 replicas of nginx could run in the cluster, as shown in the following figure:

    Figure 5.15: Deployment status
  7. Delete the deployment with the following command:

    kubectl delete deployment workload

    The output should be as follows:

    Figure 5.16: Deployment deletion
  8. Disable autoscaling for the node pool of the cluster by using the following command:

    gcloud container clusters update serverless --no-enable-autoscaling \

    --node-pool pool-1 --zone us-central1-a

    Note

    Change the zone parameter if your cluster is running in another zone.

    You should see the output shown in the following figure:

Figure 5.17: Disabling autoscaling

In this exercise, we saw the GKE cluster autoscaler in action. When the autoscaler is enabled, it increases the number of servers when the cluster is out of capacity for the current workload. Although it seems straightforward, it is a compelling feature of Kubernetes platforms. It removes the burden of manual operations to check your cluster utilization and take action. It is even more critical for serverless applications where user demand is highly variable.

Let's assume you have deployed a serverless function to your Kubernetes cluster with autoscaling enabled. The cluster autoscaler will automatically increase the number of nodes when your functions are called frequently and then delete the nodes when your functions are not invoked. Therefore it is essential to check the autoscaling capability of the Kubernetes platform for serverless applications. In the following section, migrating applications in production environments will be discussed, as it is another important cluster administration task.

Application Migration in Kubernetes Clusters

Kubernetes distributes applications to servers and keeps them running reliably and robustly. Servers in the cluster could be VMs or bare-metal server instances with different technical specifications. Let's assume you have connected only standard VMs to your Kubernetes cluster and they are running various types of applications. If one of your upcoming data analytics libraries requires GPUs to operate faster, you need to connect servers with GPUs. Similarly, if your database application requires SSD disks for faster I/O operations, you need to connect servers with SSD access. These kinds of application requirements result in having different node pools in your cluster. Also, you need to configure the Kubernetes workload to run on the particular nodes. In addition to marking some nodes reserved for special types of workloads, taints are used. Similarly, pods are marked with tolerations if they are running specific types of workloads. Kubernetes supports workload distribution to special nodes with taints and tolerations working in harmony:

  • Taints are applied to nodes to indicate that the node should not have any pods that do not tolerate the taints.
  • Tolerations are applied to pods to allow pods to be scheduled on nodes with taints.

For instance, if you only want to run database instances on your nodes with SSD, you need first to taint your nodes:

kubectl taint nodes disk-node-1 ssd=true:NoSchedule

With this command, disk-node-1 will only accept pods that have the following tolerations in their definition:

tolerations:

- key: "ssd"

  operator: "Equal"

  value: "true"

  effect: "NoSchedule"

Taints and tolerations work in harmony to assign pods to specific nodes as a part of the Kubernetes scheduler. In addition, Kubernetes supports securely removing the servers from the cluster by using the kubectl drain command. It is particularly helpful if you want to take some nodes for maintenance or retirement. In the following exercise, an application running in the Kubernetes cluster will be migrated to a particular set of new nodes.

Exercise 15: Migrating Applications Running in a GKE Cluster

This exercise aims to teach us to perform migration activities in a production cluster. Let's assume that you are running a backend application in your Kubernetes cluster. With the recent changes, you have enhanced your application with better memory management and want to run on servers with higher memory optimization. Therefore, you will create a new node pool and migrate your application instances into it.

To successfully complete the exercise, we need to ensure the following steps are executed:

  1. Install the backend application to the cluster by running the following command in the cloud shell:

    kubectl create deployment backend --image=nginx

    This command creates a deployment named backend from an nginx image, as you can see in the following figure:

    Figure 5.18: Deployment creation
  2. Scale the backend deployment to 10 replicas by running the following command in the cloud shell:

    kubectl scale deployment backend --replicas=10

    This command increases the number of replicas of the backend deployment, as shown in the following figure:

    Figure 5.19: Deployment scaling up
  3. Check the number of running pods and their nodes with the following command:

    kubectl get pods -o wide

    All 10 replicas of the deployment are running successfully on the 4 nodes, as you can see in the following figure:

    Figure 5.20: Deployment status
  4. Create a node pool in GCP with a higher memory:

    gcloud container node-pools create high-memory-pool --cluster=serverless \

    --zone us-central1-a --machine-type=n1-highmem-2 --num-nodes=2

    Note

    Change the zone parameter if your cluster is running in another zone.

    This command creates a new node pool named high-memory-pool in the serverless cluster with the machine type n1-highmem-2 and two servers, as you can see in the following figure:

    Figure 5.21: Node pool creation

    This command can take a couple of minutes to create the required resources with the Creating node pool high-memory-pool prompt.

  5. Wait for a couple of minutes and check the nodes in the cluster:

    kubectl get nodes

    This command lists the nodes in the cluster, and we expect to see two extra high-memory nodes, as shown in the following figure:

    Figure 5.22: Cluster nodes
  6. Drain the old nodes so that Kubernetes will migrate applications to new nodes:

    kubectl drain -l cloud.google.com/gke-nodepool=pool-1

    This command removes the workloads from all nodes with the label cloud.google.com/gke-nodepool=pool-1, as shown in the following figure:

    Figure 5.23: Node removal
  7. Check the running pods and their nodes with the following command:

    kubectl get pods -o wide

    All 10 replicas of the deployment are running successfully on the new high-memory node, as shown in the following figure:

    Figure 5.24: Deployment status
  8. Delete the old node pool with the following command:

    gcloud container node-pools delete pool-1 --cluster serverless --zone us-central1-a

    Note

    Change the zone parameter if your cluster is running in another zone.

    This command deletes the old node pool, which is not being used, as you can see in the following figure:

Figure 5.25: Node pool deletion

In this exercise, we have migrated the running application to new nodes with better technical specs. Using the Kubernetes primitives and GKE node pools, it is possible to migrate applications to a particular set of nodes without downtime. In the following activity, you will use autoscaling and Kubernetes taints to run serverless functions while minimizing the cost.

Activity 5: Minimizing the Costs of Serverless Functions in a GKE Cluster

The aim of this activity to take administrative tasks on production clusters to run serverless functions while minimizing the costs. Let's assume that your backend application is already running in your Kubernetes cluster. Now you want to install some serverless functions to connect to the backend. However, backend instances are running memory-optimized servers, which are costly for also running serverless functions. Therefore, you need to add preemptible servers, which are cheaper. Preemptible VMs are already available in GCP; however, they have low service quality and a maximum lifespan of 24 hours. Therefore, you should configure the node pool to be autoscaled and only to run serverless functions. Otherwise, your backend instances could also be scheduled on preemptible VMs and degrade the overall performance.

At the end of the activity, you will have functions connecting to the backend instances, as shown in the following figure:

Figure 5.26: Backend checker functions

Backend instances will run on high-memory nodes and function instances will run on preemptible servers, as shown in the following figure:

Figure 5.27: Kubernetes pods and the corresponding nodes

Note

In order to complete the activity, you should use the cluster from Exercise 15 with backend deployments running.

Execute the following steps to complete the activity:

  1. Create a new node pool with preemptible servers.
  2. Taint the preemptible servers to run only serverless functions.
  3. Create a Kubernetes service to reach backend pods.
  4. Create a CronJob to connect to the backend service every minute. The CronJob definition should have tolerations to run on preemptible servers.
  5. Check the node assignments of the CronJob functions.
  6. Check the logs of the CronJob function instances.
  7. Clean the backend deployment and the serverless functions.
  8. Remove the Kubernetes cluster if you do not need it anymore.

    Note

    The solution to the activity can be found on page 412.

Summary

In this chapter, we first described the four key considerations to analyze the requirements for the Kubernetes cluster setup. Then we studied the three groups of Kubernetes platforms: managed, turnkey, and custom. Each Kubernetes platform has been explained, along with their responsibility levels on infrastructure, Kubernetes, and applications. Following that, we created a production-ready Kubernetes cluster on GKE. Since Kubernetes is designed to run scalable applications, we studied how to deal with increasing or decreasing workload by autoscaling. Furthermore, we also looked at application migration without downtime in production clusters to illustrate how to move your applications to the servers with higher memory. Finally, we performed autoscaling and migration activities with a serverless function running in a production cluster to minimize the costs. Kubernetes and serverless applications work together to create reliable, robust, and scalable future-proof environments. Therefore, it is essential to know how to install and operate Kubernetes clusters for production.

In the next chapter, we will be studying the upcoming serverless features in Kubernetes. We will also study virtual kubelets in detail and deploy stateless containers on GKE.