Chapter 2. Understanding the Technologies
As I identified in the first chapter, Jenkins X is designed to make creating and using CI/CD pipelines in cloud environments a simple process for you. It does this, in large part, by automating the running and control of several powerful applications that include Kubernetes, Helm, and Git, among others. A thorough understanding of these technologies isn’t required for you to use Jenkins X. But understanding the basics of them will help you understand more about how Jenkins X works, the broader set of capabilities it has, how it can be applied to a wider range of situations, and how it can be extended to handle others.
In this chapter, I’ll provide you with an overview of the core technologies used by Jenkins X and how it makes use of them. The technology discussions in this chapter will be organized into three categories:
- Cloud - containers via Docker, container orchestration via Kubernetes, and cluster updates via Helm
- Pipeline - multibranch pipelines, Jenkinsfiles, Blue Ocean, Tekton pipelines, Prow, Lighthouse
- Supporting - Nexus, ChartMuseum, etc.
If you already are familiar with these technologies, you should feel free to move on to chapter 3. However, I still encourage you to review the list of technologies covered here to make sure you have a basic familiarity with them as you are certain to run into their use and terminology when working with Jenkins X.
We’ll start with the section on Cloud.
Technologies for working in the Cloud
Since Jenkins X is targeted for creating CI/CD pipelines that run well in any cloud, it’s important to first understand what is meant by running well in a cloud. I’ll briefly cover some guiding principles for running well in the cloud.
After that, and building on that background, I’ll cover the primary technologies that allow you to run applications well in the cloud - containers, Kubernetes, and Helm.
Guiding Principles for Running Well in the Cloud
Cloud environments typically have the following benefits/requirements:
- Ability to automatically have infrastructure provisioned
- Ability to spin up applications on demand
- Ability to install, configure, and launch applications automatically
- Automatic monitoring, alerts, and system management
- Ability to scale (add or remove additional instances of an application) as needed
- Ability to completely stop and remove running instances automatically
- Ability to resume interrupted or terminated processing from one instance by spinning up a new instance
- Billing for only the resources that are used during the time the application is run
Applications that can run this way are referred to as cloud-ready or cloud-native. Traditional applications can have significant challenges running in this way. In the past, developers have produced software that was intended to be started and monitored (usually with human oversight). And, the code was intended to run on infrastructure that was already provisioned and would run continuously. Legacy systems of this type can encounter significant barriers to running well in cloud environments.
Ideally, software today is architected to run with the cloud requirements in mind. But for both new software and legacy applications, the functionality and environments provided by containers, and orchestration tools such as Kubernetes, can provide the ideal mechanisms for running well in the cloud.
A container is a way of packaging software so that applications and their dependencies have a self-contained environment to run in, are insulated from the host OS and other applications, and are easily ported to other environments.
They have become a standard unit for talking about, and working with, software. Similar to an individual computer, they function like a fully provisioned machine installed with all the software needed to run an application. The pieces within a container can include the application itself, the runtime environment, dependencies, configuration, tools, etc.
This may sound similar to a virtual machine (VM). However, containers are not VMs. For a typical VM to be used, you need a set of technology layers:
- The host machine and its operating system
- An application called a hypervisor that is installed on the host, and understands how to load and run the VM from an image file
- The VM’s environment, including the operating system running within the VM
- The runtime application and associated environment that run within the VM
A container’s resources for execution come from “carving out” a portion of the host operating system’s resources - grouping them in such a way that the container thinks it is running on its own system. Figure 2-1 illustrates the difference between a VM and a container.
A container is created from an image that is put together from a file specifying what goes into the image. The next section explains more of that process.
In the context of containers, an image is a software stack where software has been pre-installed, along with any needed configuration and environment setup. This pre-configured stack is used as a basis for creating running containers. Images can be stored in image repositories and pulled down to build containers from.
Images are built up in layers – where each layer roughly corresponds to an installed piece of software, data being copied in, or configuration being done. Through leveraging functionality built in to the Linux operating system , the multiple layers can be viewed as a single filesystem – looking through from the topmost layer to the other layers. Layers in an image are read-only and images can share layers between them.
To create a container, a thin read-write layer is added on top of an image’s layers.
The most common application to create images and containers today is Docker. Understanding more about how Docker works is not required, but again, is useful in the overall view of how the pieces for Jenkins X fit together.
Docker is a technology for creating and managing containers. It leverages core pieces of Linux that actually provide the container functionality – cgroups, union filesystem, and namespaces. Docker provides a formal approach for utilizing these to create “spaces” for containers. The pieces of Docker include:
- a restful interface for services
- a description format for containers
- an API for orchestration
- a format for creating images by specifying content and operations (a Dockerfile)
- a command line interface for working with containers and images
For many people, the terms “Docker” and “containers” have become synonymous. However, Docker is really just the first well-known application that provided an easy and formalized way to create and manage containers.
The approach that Docker takes to creating images and containers is fairly straightforward. It starts with a specification in a Dockerfile.
A Dockerfile is a text file that defines how to construct a Docker image. It generally consists of instructions like FROM, USER, RUN, COPY, etc. that specify a base image (an operating system image that may also be configured with one or more applications), and then layer on other applications, configuration, and environment settings. Execution of most instructions in a Dockerfile results in a new layer being added to the image.
The Docker command “build” is used to create an image from a Dockerfile. Typically, the first command that the build operation encounters in the Dockerfile is to pull down a base image. Then the remaining steps in the Dockerfile are executed, adding additional layers to the image. Listing 2-1 shows the contents of a custom Dockerfile and then what happens when you run the docker build command and it creates an image.
$ cat Dockerfile_roar_db_image FROM mysql:5.5.45 COPY docker-entrypoint-initdb.d /docker-entrypoint-initdb.d/ ENTRYPOINT ["/entrypoint.sh"] CMD ["mysqld"] $ docker build -f Dockerfile_roar_db_image -t db-image:v1 . Sending build context to Docker daemon 15.62MB Step 1/4 : FROM mysql:5.5.45 5.5.45: Pulling from library/mysql 8f47f7c36e43: Pull complete a3ed95caeb02: Pull complete 795c16ff48e5: Pull complete 66db5e7b465f: Pull complete 30727d3e8c17: Pull complete 0da5ab937ea0: Pull complete 3ab3e6238011: Pull complete 2a9b8556f8f0: Pull complete 8625c7357277: Pull complete 11b81446c751: Pull complete Digest: sha256:72a09a61824bdaf652e701fcbf0ee12f5b132d8fdea f1629ce42960375de03cb Status: Downloaded newer image for mysql:5.5.45 ---> ba16edb35eb9 Step 2/4 : COPY docker-entrypoint-initdb.d /docker-entrypoint-initdb.d/ ---> 3f00bed1e02a Step 3/4 : ENTRYPOINT ["/entrypoint.sh"] ---> Running in 982e7343f323 Removing intermediate container 982e7343f323 ---> 24f4da862742 Step 4/4 : CMD ["mysqld"] ---> Running in 6302abd9512a Removing intermediate container 6302abd9512a ---> 19e729dca1ee Successfully built cc7493e1e1e5 $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE db-image v1 cc7493e1e1e5 10 minutes ago 213MB mysql 5.5.45 ba16edb35eb9 4 years ago 213MBListing 2-1: Process of creating an image
The layers that are downloaded or created during this process are stored in the local filesystem in the Docker area. The layers are given a unique id and can be reused by multiple images. This works because they are considered immutable (not able to be changed) - read-only. Figure 2-2 shows how you can think of the layers being “stacked” together to form an image.
Once an image is available, the Docker command “run” can be used to create a container from an image. The run operation adds a thin, writable layer onto the image layers. This added layer is where any new content or changes can be placed while still retaining all of the pieces that have been installed and configured in the read-only layers of the underlying image.
Figure 2-3 shows the makeup of a container based on the image we created in Figure 2-2.
And, because images are read-only and containers just add a layer on top of the image, multiple containers can be created based on the same image.
Image Names and Tags
In figures 2-2 and 2-3 above, you can see the long hexadecimal strings associated with the layers. These are the kind of identifiers that Docker attaches to the layers and ultimately to the image and container. However, those identifiers are not very user-friendly. For that reason, Docker provides a way to associate human readable names, known as Docker tags, to images and containers. These tags can be added to an image via the “-t” option in the Docker build command, or via the actual Docker “tag” command.
In reality, tags are just aliases to the IDs of the image - similar to how a tag in source control refers to a particular commit. The word “tag” is actually a bit overused here. The actual identifier you apply to an image consists of an image name and a separate “tag” part. For example the tag command itself has this kind of format:
docker tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
The SOURCE_IMAGE and TARGET_IMAGE parts here can be either the original identifier or a meaningful name such as “mysql” or “ubuntu-stable”. Although the :TAG part can be any string you want, it is usually set to a string or set of numbers that identifies the version of the named item - for example “ubuntu:lucid” or “ubuntu:12.04”.
Docker images and the tag “latest”
Docker supports a default tag named “latest”. That means if you don’t supply the :TAG part of an image identifier, Docker will automatically set it to “latest” as in “ubuntu:latest”. However, you need to be aware that this identifier does NOT mean that this is the latest built version of the image. Rather it is simply the latest image that was built that did not have an explicit value supplied for the “:TAG” part of the identifier.
Finally, in this section on containers, I’ll present an analogy for a way to think about containers and images. If this is useful to you, feel free to adopt it. If not, you can ignore it.
An Analogy for thinking about Containers and Images
Figure 2-4 represents the analogy I’ll be talking about here around layers, images and containers. Take a look at it and then I’ll explain what the various pieces represent.
If you’ve ever installed software on a computer, you can think of that as laying down a layer of software on the machine. In fact, if you were to look at this from the standpoint of setting up a computer from scratch, there are multiple “layers” of software that need to be installed - the operating system as a base layer, and then any kind of applications like office ones, antivirus, etc. As we are layering on all of these different software pieces, we are effectively creating an image composed of stacked layers. This is similar to how a Docker image is constructed.
In a company, once those layers are arranged in a way that the company wants and approves of, they are often saved off as a single disk image to be used for quickly provisioning future systems. You can think of that disk image (that is composed of layers of installed software configured to work together) as being like an image for a container.
And just like a disk image of the software can be stored - ready to be used to provision a new system for the company, a Docker image may be stored in a registry (repository), ready to be used for a container . And even if not installed on a running system, the disk image is still there in its complete form.
You could also compare the process of Docker adding the writable layer to the image (to create a container) with installing the disk image on a new computer and then adding the area for a user. The user area on a system provides a “layer” where they can have a profile and write/store their own files, like the purpose of the writable layer in a container.
So, when the user space is setup on a computer and we turn the system on, we have an analogy to a running “container”. Like the user area, there is a writable area that has a prebuilt software image underlying it. So a container can be thought of like a standalone computer provisioned from a software image.
Most modern applications that use containers use multiple containers. The more complex the application, the more containers it is likely to have (assuming functionality is well compartmentalized – such as individual microservices). Deploying, monitoring, scaling, and ensuring these containers are up, running, and can network appropriately can be a complex task. This is where the tool discussed in the next section, Kubernetes, comes in.
Kubernetes is a framework for managing sets of containers. It is responsible for creating and maintaining running sets of containers to conform to a desired deployment state. This includes starting up running instances, ensuring a set number of running instances always exist, spinning up a replacement if one goes down, scaling the number of instances to meet workload (if desired), and generally removing the need for any manual intervention to maintain the desired state.
A full discussion of Kubernetes is well beyond the scope of this book. And there are many resources available to describe it in depth.1 In this book, we’ll limit our discussion of Kubernetes to only what you need to know to understand or work with Jenkins X.
Kubernetes is often abbreviated as “K8S” – sometimes pronounced “cates”. The number 8 here is a reference to the 8 letters in the word between the first letter “k” and the ending letter “s” - thus K8S.
While Kubernetes manages containers, managing Kubernetes itself is done through a declarative process. The types and configuration of native Kubernetes objects are defined via specifications in YAML or JSON files and supplied to Kubernetes through the command line interface or an API call. These specification files are generally referred to as Kubernetes manifests or “specs”.
Kubernetes is a “declared-state” system. That means that it works diligently to ensure that the set and state of objects running in its cluster of machines matches with the desired set and state of objects declared in the manifests provided to it. The function in Kubernetes that works to make sure the state of the cluster matches the declared specifications is called a “reconciliation loop”.
In Kubernetes, the term “cluster” refers to a set of master and worker machines working together to schedule and manage containers in Kubernetes objects.
These machines may be actual “bare metal” systems or virtual machines. The master node receives requests and specification through the Kubernetes command line interface and the Kubernetes API. It runs the processes like the reconciliation loop and other controllers and also manages scheduling for the system.
The worker nodes handle actually hosting and running the Kubernetes objects.
The items that the reconciliation loop deals with in the cluster form the core of Kubernetes and provide the functionality needed around the containers. In order to understand how this works, you need to understand the native objects that Kubernetes creates and works with. I’ll describe those in the next few sections - what they are, and the function they provide. We’ll start with the most basic Kubernetes object - the pod.
Pods are the smallest deployable units in Kubernetes. Pods wrap around one or more containers and any volumes they have. ( A volume is an area to store data, such as a disk. ) The pod specification includes information about what images to base the containers on, system requirements, and identifying information. Containers in a pod share the network and storage resources. You can think of a pod as a sort of “virtual host” for the containers in it - much as a single system might host a set of docker containers outside of Kubernetes.
Individual pods are ephemeral - they may be deleted and a new one started in their place. Likewise, the containers inside of them will also be deleted or terminated if the pod goes away. Problems during execution or challenges with system resources may also lead to the pod shutting down or being evicted from the cluster. To avoid having to manually monitor the number of pods and manually start new ones, we need a way to ensure that we always have the needed number of pods available to support the expected or actual load. Ensuring that is the job of our next type of Kubernetes object - deployments.
Deployments instantiate pods and also manage pod replicas - additional copies of pods to handle a workload. The deployment ensures that, if the desired number of pods is less than the desired number of replicas, additional pods are spun up to get to the desired number. A construct called a Replicaset is used to ensure that a given number of any particular pod are running at any point in time.
For example, if the deployment declares we want 5 instances of pod A running, and there were 5, but one failed and now there are only 4, the deployment controller will spin up another one to ensure we keep the specified 5 running. With relation to pods, deployments can be used for a number of tasks such as rolling out a replicaset, scaling to more or fewer pods, pausing/resume, and cleanup.
Deployments and replicasets can ensure that a desired number of pods are maintained in the cluster. However, they don’t ensure that the same pods will always be available - only a certain replica count. Since individual pods may still come and go, you can’t rely on being able to always attach to the same pod via methods such as an IP address. So you need a way to have a dedicated IP address to attach to - regardless of what pods are used on the back end. This is where Kubernetes services come in.
Services provide a virtual IP (and optionally load balancing) for a set of pods, along with policies for accessing them. Within Kubernetes, each pod gets its own IP address. However, since pods may be terminated at some point or may be sharing workloads, we do not want to be tied to any particular pod’s IP address. The service allows us to have one IP address and to route traffic to one or more pods on the backend transparently. For accessing items in a cluster, you typically have a Kubernetes ingress.
An ingress is a set of rules/apis that define how you allow external access to services in a Kubernetes cluster. Ingresses can also provide load balancing, virtual hosts, and SSL functionality.
In basic terms, you configure access by defining rules that specify which inbound connections can reach which services in your cluster. Ingresses allow you to have your set of routing rules in the single Ingress resource.
Pods, ingresses, deployments, and services are all grouped together for related functions in Kubernetes namespaces. Namespaces are the domains of Kubernetes – the way that a cluster is divided up for use. For example, the set of pods that provide the default Kubernetes functionality are setup and run in the “kube-system” namespace. Which namespace is used by default is controlled by the system’s context.
The Kubernetes Command Line
When working with Kubernetes, most of your direct interfacing will be via the Kubernetes command line. The command we run for that is “kubectl” (short for “kube controller” but often pronounced as “kube cuttle”). There are a couple of primary ways to interact with a cluster via kubectl. For example, you can take a yaml file with a specification for a pod, service, deployment, etc. and apply it to the cluster with “kubectl apply -f <filename>”. Listing 2-2 shows a yaml file with a deployment and service specification.
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: roar-web labels: name: roar-web namespace: roar spec: replicas: 1 template: metadata: labels: name: roar-web spec: containers: - name: roar-web image: localhost:5000/roar-web-v1 ports: - name: web containerPort: 8080 --- apiVersion: v1 kind: Service metadata: name: roar-web labels: name: roar-web namespace: roar spec: type: NodePort ports: - port: 8089 targetPort: 8080 nodePort: 31789 selector: name: roar-web
Listing 2-2: Yaml file specification for Kubernetes deployment and service objects
This means that, assuming the file is syntactically correct, Kubernetes will try to make the current state of the cluster match the declared state in the yaml file. Along with applying changes, the command line can be used to create, examine, and/or delete any of the various entities, “exec” into a running pod, get the logs from a pod, and so on. If you want an operation to apply to a particular namespace, you would just supply the -n <namespace> operation or set that namespace as the default one in the context. Listing 2-3 shows an example of using the command line in various ways to get a list of services from a namespace.
$ kubectl get service -n roar2 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) mysql ClusterIP 10.109.196.123 <none> 3306/TCP roar-web NodePort 10.106.132.164 <none> 8089:30318 $ kubectl -n roar2 get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) mysql ClusterIP 10.109.196.123 <none> 3306/TCP roar-web NodePort 10.106.132.164 <none> 8089:30318 $ kubectl get -n roar2 all | grep service service/mysql ClusterIP 10.109.196.123 <none> 3306/TCP service/roar-web NodePort 10.106.132.164 <none> 8089:30318
Listing 2-3: Using the Kubernetes command line
This quick summary is just to give those who don’t have Kubernetes experience an idea of some of the basic concepts. We’ll cover any specific commands you might want to use in more detail throughout the book.
As you can begin to see, the combination of containers and Kubernetes provides the foundational functionality we need for being cloud-friendly (spin-up, spin-down, scaling, etc.) When Jenkins X is deployed, it is deployed in its own “jx” namespace (by default). And each application that Jenkins X makes use of gets its own pod in that namespace. Listing 2-4 shows the output of the pods running in the jx namespace for a deployed Jenkins X.
$ k get pods -n jx NAME jenkins-64b6bc7589-h9xr2 jenkins-x-chartmuseum-d87cbb789-2zfb6 jenkins-x-controllerrole-9c55bdb47-8m8rm jenkins-x-controllerteam-56d5884f7f-ndfmb jenkins-x-controllerworkflow-c77d44668-fvx9r jenkins-x-docker-registry-69d666d455-8nppm jenkins-x-heapster-ff6df6848-jjdjf jenkins-x-nexus-6bc788447f-gwqpw
Listing 2-4: List of pods in a Jenkins X namespace
Finally, in this section, as I did for containers, I’ll present an analogy for a way to think about Kubernetes. Again, if this is useful to you, feel free to adopt it. If not, you can ignore it.
An Analogy for thinking about Kubernetes
Recall the analogy from earlier in the chapter where I compared containers to physical computers staged with all the different layers of software they needed to run. Carrying that analogy further, one way to think about a Kubernetes system is being like a data center for those computers.
Figure 2-5 shows a visual of these ideas. You can gather from this analogy that Kubernetes is concerned with many of the same goals as a data center would be, such as uptime, scaling, and redundancy.
The overall functions of a typical data center can be summarized as:
- Provide systems to service needs (regardless of the applications)
- Keep systems up and running
- Add more systems /remove systems depending on load
- Deal with systems that are having problems
- Deploy new systems when needed
- Provide simple access to pools of systems
From that perspective, you can make the following comparisons:
You can think of a container in a pod as being like a server in a rack. It’s a standalone system enclosed in a supporting structure.
You can think of a pod of containers as being like a rack of servers - both designed to enclose/support multiple systems. And if one server goes down, there is an attempt to restart. If it can’t be restarted, it is “evicted” (removed from the rack), and a new one is “spun up” to put in its place in the rack.
A deployment can be compared to multiple server racks. Just as it has a replica count to specify the number of pods that should be used to support a need/load, so you might have a number of server racks to support a need/load. Likewise, if additional systems are needed to scale up to handle load, new server racks can be added, just as a deployment could add new pods. And if they’re no longer needed, the pods/racks can be removed.
A service can be compared to a central login server or control computer that transparently connects, on the back end, to a set of other systems and monitors them. The service provides a way to connect to any of a set of pods transparently. This allows doing work with them without having to know/care which one you’re actually using. A load balancing system in a data center could also be a good comparison.
Finally, a namespace can be compared to a server room - where there is a collection of racks, control and monitoring systems, etc. It encompasses those systems and provides an environment for them to run in - along with a designated location where they exist.
One other note to mention here is that Kubernetes also provides a visual dashboard interface. This dashboard does not allow you to modify the cluster or objects within it, but provides a graphical environment for listing those objects and drilling in to get more information about them. Figure 2-6 shows an instance of the Kubernetes dashboard running over a basic cluster and drilling into the system namespace - kube-system.
This should provide the basics of what you need to understand around containers, Docker, and Kubernetes for working with Jenkins X. But there is one other tool which features prominently, in working with Kubernetes, that Jenkins X leverages - Helm. I’ll discuss that tool as the last one in this section.
Helm provides a way to install and manage related sets of Kubernetes specifications. These specifications can be for any valid Kubernetes objects such as services, deployments, etc. and can be for any number of them. So this is especially useful for managing specifications for multiple projects or products.
A set of running services, deployments, etc. resulting from an install by Helm into a Kubernetes cluster is called a “release”. Helm maintains information about each release, including it’s history. This allows releases to be rolled back to previous versions as well as being easily upgraded via the Helm command line.
Helm provides additional utility for Kubernetes specifications by providing a templating capability. This means that the traditional specification formats can have placeholders in them in the form of Go templates, instead of hard-coded values. In other words, Helm templates are just the specification for a Kubernetes object with places to plug in values that are passed in and calls to functions to return values to use.
When Helm ultimately “renders” one of these, it will fill in any values passed in, run any functions specified and produce a fully complete specification file (manifest) that can be passed into Kubernetes. The values used to fill in the templates can come either from separate values files, or be passed on the command line. The benefits of templating include:
- Not having to hard-code values in the specifications
- Not having to repeat values throughout similar specifications
- The ability to override default values
- Template reuse for multiple instances given the above
- Ability to compute and modify rendered values in templates through the use of template functions
Template functions for use with Helm are provided by Go templates and the Sprig library of template functions. Examples of functions include simple transformation ones such as changing text to upper case or quoting it in the rendered template. More complex functions can also be created to do things such as read in data in one form, and fill in the template in a different format. Or functions can do calculations such as adding an offset to a starting port passed in on the command line.
Helm organizes the set of Kubernetes specifications for an applications in a structure known as a “chart”. Typically a chart may include the various specification files as templates in a templates directory. A chart can also include simple files that specify the chart name/version as well as helper functions.
Charts may also specify other required charts to be included through a requirements file.
To assist in managing and organizing charts, Helm provides a repository implementation and format for storing charts as packages. As an alternative, charts can simply be managed and stored as text files in a source repository such as Git.
In Jenkins X, Helm is used to render and apply manifests for Kubernetes into different namespaces and environments. In this way, by putting the desired values into the templates, Helm creates Kubernetes manifests for a particular namespace/environment. Then, by applying those generated manifests, the appropriate Kubernetes objects actually get created as needed for a particular project.
This concludes the discussion around the cloud technologies being using in Jenkins X. Certainly there is much more that could be said. But I’ll cover needed information in the respective chapters that follow where needed. Now I’ll move on to the technologies that Jenkins X uses for constructing and running CI/CD pipelines.
Technologies used for CI/CD pipelines
In this section, I’ll discuss the different technologies that Jenkins X uses to construct and run CI/CD pipelines. This technology is at the core of Jenkins X since pipelines are at the core of its functionality.
You might imagine that Jenkins X would use Jenkins itself. And, for one option early in the life of the tool, it did. This was referred to as the “static” option for implementing the CI/CD processes. But that option was inefficient and problematic and so was removed completely. Jenkins X now exclusively uses a more “Kubernetes-native” way to implement the CI/CD processes. (This was formerly referred to as the serverless or “next gen” option.)
Since Jenkins X is completely different from, and does not use, traditional Jenkins, it is worthwhile to take a moment and clarify the differences in the versions of Jenkins that have been made available over the years. Understanding the differences between these is not technically required, and you can feel free to skip over this part if you prefer and move on to the up and running content in chapter 3.
But for those new to Jenkins X, a quick explanation may be useful - both on what the different versions are for and how they differ. So I’ll cover that next.
Whether you choose to skip this content or not, note that within this section, I’ll also cover the pipeline technologies.
From Hudson to Jenkins X, the incarnations of the technology associated with the “Jenkins” name have undergone a significant amount of change over the years.
Take a look at Table 2-1. It shows the various Jenkins “platforms” that have been developed over time.
(original and commercial)
|CI||Hudson application + plugins||Web-based forms||Minimal||Jobs|
CI and by extension CD
Jenkins application + plugins
Jenkins application + plugins + shared libraries + containers
Pipeline jobs and Jenkinsfiles
Kubernetes + containers + Helm + Prow/Lighthouse + Tekton pipelines + Git + more…
Jx command line (under the covers Helm)
Environments (Dev, Staging, Production)
Let’s take a brief look at each one of these in turn.
Hudson was the original version of the traditional Jenkins application. The project was branched off and renamed after the company where Hudson was being developed (Sun) was bought by Oracle. The new owners decided to take Hudson in a more commercial direction. Jenkins was then forked from the Hudson source to allow the project to continue via a community-sponsored, open-source effort. Since the Jenkins line is what evolved as the tooling we know today, we’ll discuss features of that tooling in the next section and just note that Hudson has continued on under a commercial offering.
Here I’m referring to the traditional Jenkins (forked off of the original Hudson tooling after that was retargeted for commercial use). At its core, this is a tool for doing CI and job monitoring. (CI here refers to Continuous Integration – the process of initiating a build/test process based on changes in a source code repository.) The primary functionality is provided by the Jenkins application itself and whatever plugins are installed for it.
The basic setup for working with traditional Jenkins goes something like this. Applications (compilers, servers, repositories) are installed on systems. Jenkins is configured to know where the applications are installed and how to get to them via web forms in a global configuration section. Figure 2-7 shows an example of a global configuration form - in this case for the Gradle build engine.
Individual jobs in Jenkins are defined via web-based forms (drop-downs, text-entry boxes, etc.). These forms allow users to describe the environment and specify the applications needed for the job, as well as the basic steps to execute as part of it.
Jenkins provides jobs of certain pre-defined types to make setup simpler. These include ones tailored for specific technologies such as Maven. It also includes the popular and ubiquitous Freestyle type which allows the user more generic options for the various pieces. Figure 2-8 shows an example of a Freestyle job that uses the Gradle configuration from figure 2-7.
The next evolution of Jenkins materialized with the release of Jenkins version 2.0 and beyond – what I am referring to here with the umbrella term “Jenkins 2”. The emphasis in Jenkins 2 is on “pipelines-as-code” – that is, being able to write your pipelines as instructions in text just as you would any script or program – instead of using forms in the browser. Further, you are able to store the pipeline code in an external text file – the Jenkinsfile – and store it in a branch of your source code repository (as a file named “Jenkinsfile”). This allows you to treat your pipeline code just like any other source code – tracking changes to it over time, easily comparing versions, performing code reviews, etc. Figure 2-9 shows an example of such a Jenkinsfile stored in a GitHub project.
In addition to Jenkinsfiles, Jenkins 2 provides several other innovations and improvements for the Jenkins tooling. Those include
- Multibranch Pipelines
- Automatic Builds with Pull Requests
- Shared Libraries
- More integration with Docker and better support for containers
I’ll discuss those items next.
Jenkins 2 also introduced the concept and implementation of “Multibranch Pipelines”. The basic idea of the Multibranch Pipelines functionality is automatic generation of Jenkins jobs (within an instance of the Jenkins application) based on the presence of a Jenkinsfile in a branch in a source control project.
This accomplishes being able to store the pipeline code externally with a project’s source code and treat it just like another source file for continuous integration. Since Jenkins can create internal jobs based on the external file, our pipeline becomes just like source. This gets you closer to the DevOps ideal of everything-as-code.
More specifically, the Multibranch Pipeline follows the following basic workflow:
- Users create their pipeline code in a Jenkinsfile and push it into a branch in a project in source control
- They then create a Multibranch Pipeline project in Jenkins and point it to the project in source control as seen in Figure 2-10
- Jenkins will scan the branches of the project in source control looking for Jenkinsfiles
- If it finds a Jenkinsfile in a particular branch, Jenkins will automatically create a read-only project to run the code specified in the Jenkinsfile
- Jenkins will then execute the code to do the initial build
Automatic Builds with Pull Requests
Jenkins 2 leverages the Pull Request functionality provided by several Git frameworks. (See the sidebar for information on what a Pull Request is if you’re unfamiliar with it.) With the Jenkins 2 Pipeline-As-Code functionality, new Pull Requests in source control will also trigger a new build in Jenkins. This build is used for verifying correctness of the code in the Multibranch Pipeline project. As well, when the Pull Request is merged, Jenkins does another build to verify everything is good (build-wise) after the merge.
In both cases, results of the Jenkins operations are reported back to the source control system and registered there. This serves as a pass/fail gate so it knows whether or not to proceed with the next steps of the pipeline. Figure 2-11 shows an example of an automatic build running from a Pull Request and figure 2-12 shows the automatic tracking of this in source control.
Plugins and Shared Libraries
The ability to write pipelines as code is not entirely new with Jenkins 2 – early versions of Jenkins had a set of workflow plugins that allowed a limited version of this. However, Jenkins 2 incorporates this concept as expected core functionality. Along with the changes in the application itself, plugins were required to evolve – to be restartable and also to provide “steps” that the Jenkins pipeline scripts could call. For example, if you have the Git plugin installed, then you would have a “git” step that you could call in your pipeline script to pull code.
Beyond just the individual steps, it is useful to be able to share code, encapsulate functionality, and abstract out complexity from everyday pipeline scripts. This problem was solved by the introduction of shared libraries - a set of pipeline steps and other supporting code that can be stored in a source code project with a defined structure and pulled in to any pipeline scripts that need it. An example structure of a shared library can be seen in Listing 2-5.
. ├── resources ├── src │ └── org │ └── foo │ └── utilities.groovy └── vars └── mailUser.groovy
Listing 2-5: Structure of a shared pipeline library
Figure 2-13 shows how this shared library is configured in Jenkins.
And Listing 2-6 shows an example of code to pull it into a script.
Listing 2-6: Pulling in a shared library (declarative syntax)
Docker and Container Integration Improvements
When most people think of containers today, they still think of Docker. Docker is a set of commands and specifications for creating and working with containers. With the Docker plugin installed, Jenkins 2 provides significant support for working directly with containers. This includes an available “Docker” global variable (similar to a pipeline step), and a set of methods around that for simplifying use of containers in your pipeline code. For example, there is a an “inside” method which allows the user to do the following with one call.
- Pull a docker image if not already available
- Create a container from the image
- Map the Jenkins working directory into the container (assuming filesystem access)
- Execute any shell calls in the pipeline script directly in the container
- At the end, stop and remove the container
These simplified ways to work with containers in Jenkins open up a number of possibilities of doing work in containerized environments that we would have formerly required much more direct configuration to accomplish. This means we have less specification, setup, monitoring, etc. within the Jenkins jobs - we can take care of all of that within the containers. In fact, there is a whole philosophy around leveraging containers to maximum benefit called “Use Jenkins Less” - as described in the sidebar.
This brings me to Jenkins X. As referenced earlier, Jenkins X doesn’t use Jenkins or any application that is spun up and kept running. It leverages other technology to accomplish creating and running pipelines. That may seem surprising and confusing, but we’ll explain more about that (briefly) in the next section. And we’ll go into much more detail later in the book.
(Serverless) Jenkins X
Jenkins X without Jenkins seems counter-intuitive. If Jenkins is the basis of creating and running pipelines, why would you not have it present in all versions of Jenkins X? One reason is the overhead that having an application like Jenkins running in the cluster requires. If you think about the Jenkins application, it was really intended for long-running, browser-based instances. While we can wrap this in a container to get more “cloud-like” control, this is not optimal. A better approach is a “serverless” implementation.
The term “serverless” sounds like you don’t need any servers to run your code. In most cases, the term refers instead to the ability to define a process that only spins up resources such as a server when a workload is submitted – i.e. as needed. And then when done, the resources can be deleted or directed elsewhere. Implementations of this idea are in wide use already, such as with the use of AWS’ “lambda” functions.
In Jenkins X, the description means no server – or, in this case, no Jenkins. Instead, Jenkins X takes the approach of creating pipelines as Kubernetes-native objects. What this means is that the pipeline steps are “written out” in a yaml format similar to Kubernetes manifests, and run as pods in a cluster. This is done with two primary technologies: Tekton Pipelines and Prow/Lighthouse.
The idea behind Tekton pipelines is to be able to define Kubernetes-native pipelines. What that means is that Tekton pipelines encapsulate their steps in containers and pods and run on Kubernetes clusters. Such pipelines are well-suited for the Kubernetes environment and run closer to the execution layer. Additionally, this approach has the advantage of having less overhead, and being easier to manage with the Kubernetes tooling.
The disadvantage of Tekton pipelines is that they are designed to be written at a very low level as yaml files and require a lot of coding for even simple tasks. To get the advantages of using Tekton pipelines without the disadvantages, Jenkins X leverages a tool called “Prow” (being update to “Lighthouse”) to create the Tekton pipeline specifications, based on events from Git, and generates a jenkins-x.yml file - a similar concept to the Jenkinsfile used in Jenkins 2.
Prow and Lighthouse
Per the Prow GitHub site, “Prow is a Kubernetes based CI/CD system. Jobs can be triggered by various types of events and report their status to many different services. In addition to job execution, Prow provides GitHub automation in the form of policy enforcement, chat-ops via
/foo style commands, and automatic PR merging.” That’s a lot to digest. But for our purposes, all we really need to know for now is that Prow is the connection between Git and Tekton. This means that operations you do in the Jenkins X workflow that update Git can trigger Prow to execute CI/CD pipeline operations in Tekton.
Prow can receive input in two ways – either via comments added into a PR or via notification from a webhook from a Git repository. When Prow gets notified via a webhook that a request has been made, it passes it on to the “Jenkins X Pipeline Operator”. (See the sidebar for a description of what an operator in Kubernetes means.) This Pipeline Operator then gets a pipeline specification stored in a jenkins-x.yml file from the repository and translates it into Tekton tasks and/or Tekton pipelines.
Lighthouse is a further evolution of Prow. It includes all of the functionality of Prow plus more. The most significant difference is that Lighthouse can work with any of the other multiple Git providers that Jenkins X can work with. It is not just limited to GitHub as Prow is.
Lighthouse was forked from Prow by the Jenkins X community in order to create a tool that could handle the multiple Git providers.
The Pipeline Operator we mentioned above is an instance of a Kubernetes Operator. A Kubernetes Operator is set of code designed to manage a non-native (custom) application running in Kubernetes. It is intended to be able to respond to Kubernetes lifecycle events such as scaling, backups, updates, etc. The operator lets you build in extra logic and custom handling. The ideas is that the operator allows the custom application/object to respond to Kubernetes events in the way that Kubernetes would expect. But, it also provides custom processing and executes extra logic in the way that the application/object needs.
So, with the use of Prow, Tekton pipelines, and some other tooling, Jenkins X lets you create and execute tasks and pipelines on-demand in a form that Kubernetes can more easily and natively work with.
Jenkins 2 and Serverless
There actually was exploration at one point of using a true Jenkins instance for the serverless flavor. The idea was to start up a Jenkins instance “on-demand” when work needed to be done and then have it go away immediately afterwards. Ultimately it was decided that this was too much overhead and was trying to coerce Jenkins into running in a way that it wasn’t designed for.
Which Jenkins to choose?
Obviously, since you’re reading this book, you have an interest in Jenkins X. But you may still be wondering if it’s better to use one of the older versions? Or wondering why someone would use one of the older versions? Here are a couple of thoughts on how to choose which version is best for you:
- If you have older, legacy jobs in traditional Jenkins and you don’t need to migrate them to a cloud environment, you don’t need Jenkins X. However, you should consider whether its worthwhile to migrate them to pipelines-as-code and use Jenkins 2 if you intend to make significant further changes to them. (Processes for doing this are outlined in the Jenkins 2 – Up and Running book.)
- If you have existing jobs in a Jenkins 2 environment and don’t need to migrate them to a cloud or Kubernetes environment, then you may be better off just leaving them as is.
- If you want your jobs to run in the cloud and you can support them running in containers, then you should consider migrating them to Jenkins X. (The most likely way to do this would be with the Jenkins X import command that we’ll talk about later in the book.)
With the technologies discussed so far, Jenkins X has the basis for managing automation in the cloud and pipeline spaces. However, it still needs some other technologies to form a complete system. I discuss the remaining technologies to make up a complete system next.
In this section I’ll briefly discuss a few of the supporting technologies that Jenkins X uses and what they are used for. These include ChartMuseum, Skaffold, Ksync, Nexus, ExposeController, and Monocular.
By the term “supporting” here, I mean they do not implement critical core functionality for Jenkins X. Rather they help simplify processes or fill in gaps in functionality.
For certain ones of these, it may be possible in the near future to replace them with a corresponding application that does a similar task.
Earlier in this chapter, we discussed the Helm tool and how it provides a way to install and manage applications in Kubernetes. Helm does this by using charts - a set of templates and associated files with helper functions and values - to render into the templates.
To help with managing the sets of files associated with a chart, Helm provides the ability to “package” a chart into a single object and store those packaged charts in “chart repositories”. A chart repository is basically an HTTP server to provide access to store/retrieve the packaged charts. It also maintains an index of them.
ChartMuseum is an open-source implementation of one of these repositories for packaged Helm charts. It has a command-line interface and also an API to work with charts for actions such as uploading, deleting, getting a list of the charts, etc. It is written in Go and is also made to be compatible with all of the cloud storage backends.
Skaffold is a command line tool to make it easier to do CD for Kubernetes application. It allows you to iterate on your app’s source locally, and then deploy it out to Kubernetes clusters. It helps with managing the workflow for apps - from builds to deployment - and promoting applications through different “levels” such as test, staging, and production.
To be more specific, it can handle doing builds via Docker and redeploying apps using either the Kubernetes command line or Helm.
Jenkins X makes use of skaffold to create Docker images in its pipelines.
Skaffold has a “dev” command that does the following:
- Watches your source for changes
- Syncs files to pods (if marked as syncable)
- Builds artifacts from source
- Tests built artifacts using container-structure-tests
- Tags, pushes, and deploys artifacts
- Monitors the deployed artifacts
- Cleans up deployed artifacts at exit
As you can see, Skaffold is a very powerful tool that enables some much-needed functionality for Jenkins X.
As the name implies, KSync is used to sync (update) content. In this case, the content it is synching is files we have on our local file system with containers running in a cluster. One of the major benefits this tool provides is that it allows you to use an IDE to work with objects inside the cluster.
Jenkins X uses Ksync to help with the implementation of DevPods - a developer-facing pod with IDEs to make changes easily. More about DevPods can be found in chapter ???
Nexus is an artifact repository. An artifact repository is used to store (usually binary) artifacts, much like a source control repository is used to store source. In Jenkins X, Nexus is used as a cache for dependencies to improve build times. Nexus is an example of an application that potentially could be replaced with another application with similar functionality - in this case Artifactory. But as of the time of this writing, that option does not exist yet.
Monocular is a web UI that is used for search and discovery of Helm charts. It is used to make it easy to find and view the Helm charts being used in Jenkins X. Note that it is optional and not required to run Jenkins X.
Jenkins X installs an Nginix ingress controller if needed with an external loadbalancer pointing to it’s Kubernetes service. Jenkins X also generates all the needed ingress rules for Kubernetes using a tool named “exposecontroller”.
Exposecontroller’s job is to handle creating the ingress rules, OpenShift routes, or otherwise modifying services in order to expose services. In Jenkins X, it is executed as a Kubernetes job, that is triggered by Helm, when an application is installed to the cluster. It allows Jenkins X to control the ingress rules using a single set, instead of each app having to know how to expose the service in Kubernetes outside of the cluster.
It also allows for switching between protocols, such as HTTP/HTTPS and having integration with projects for TLS certificates like cert-manager.
Crier is a Kubernetes controller. A controller in Kubernetes essentially watches the status of resources running in the cluster and notifies the system if changes needs to be made or problems with the resource occur and ultimately drive any needed adjustments. Crier watches custom resources associated with Prow jobs and notifies applications external to Jenkins X, such as GitHub, when the status of Prow jobs change.
Deck is a UI that shows a list of recent Prow jobs and status information about them. This is similar to the way the traditional Jenkins and Jenkins 2 UIs would show lists and status of Jenkins jobs.
Tide is used to automatically merge Pull Requests (PRs) when they meet a set of predefined criteria. It can also automatically retest PRs. It is not event-driven, but polls periodically to see if there is work it needs to do.
Kaniko is an application to build images for containers (from a Dockerfile) within a container or Kubernetes cluster.
Kaniko executes the commands from a Dockerfile in the user space and so doesn’t require a Docker daemon. This means it can build images in places where Docker can’t easily be executed.
That concludes the overview of the main technologies that Jenkins X uses to accomplish its cloud, pipeline and supporting tasks. As you can see there is a lot to coordinate here and a lot that Jenkins X is managing for you. There is one more fundamental approach used in Jenkins X that you should be aware of - GitOps.
Managing By GitOps
The default approach in Jenkins X is to manage the workflow from development to the production area via GitOps. Managing by GitOps here means a couple of things:
- Jenkins X is creating Git repositories for storing and tracking the state of your environments and related pieces
- When you run a selected set of commands in Jenkins X that modify content such as the environment or apps, Jenkins X will create Pull Requests in the dev environment’s Git repository.
GitOps is an implementation of the idea of storing and managing everything in a Git repository and then having the cluster updated via some process that watches for changes in the Git repository. When a change is detected, the Pull Request is created for you to approve. In this way, you get to see what changes this would make to the system (via looking at the diffs in the Pull Request) and merging. If the Pull Request is merged, the change is made in the file(s) in Git and then the updates are made in the cluster. In this way, you can use the abilities of Git ( tracking, diffing, reset/revert ) to manage changes to the cluster.
Within Jenkins X, GitOps is the default now. That implies an option of –gitops for operations. However, there may still be some specific options to certain commands that only work if you are NOT managing your environment with GitOps. For those, the –nogitops argument needs to be added.
In this chapter, I explored the underlying technologies that Jenkins X uses to accomplish much of its functionality. I also looked at the gamut of Jenkins versions out there - from Hudson to Serverless Jenkins X - and how they differ from each other.
While Jenkins X is essentially a completely different application than the other versions of Jenkins, it is worth considering which “Jenkins” is the best fit for your particular purpose. You can use the outlines and summaries of the respective versions to guide you in that decision.
The chapter also explored the underlying technologies used in Jenkins X. The key thing to remember here is that you do not have to understand all of these technologies to use Jenkins X. Jenkins X handles the complexity and coordination of these technologies for you, weaving them together to create a CI/CD pipeline and integrated DevOps flow.
Now that you understand the general ideas around Jenkins X, in the next chapter, I’ll go through how to get setup to use it, including how to configure it and get it up and running on different cloud providers.
1 See for example: https://kubernetes.io/docs/home/ or https://www.katacoda.com/courses/kubernetes