Chapter 1. IBM Spectrum Scale and Containers Introduction – IBM Spectrum Scale CSI Driver for Container Persistent Storage

IBM Spectrum Scale and Containers Introduction
IBM Spectrum Scale is a proven, scalable, high-performance data and file management solution. It provides world-class storage management with extreme scalability, flash accelerated performance, automatic policy-based storage that has tiers of flash through disk to tape, and support for various protocols, such as NFS, SMB, Object, HDFS, and iSCSI. Containers can leverage the performance, Information Lifecycle Management (ILM), and Scalability and multisite data management to give the same full flexibility on storage as they experience on the runtime.
This chapter covers the following topics:
This understanding will help for subsequent chapters about planning, use cases, and troubleshooting. Though not mandatory, it is recommended for all readers.
1.1 Abstract
IBM Spectrum Scale is a cluster file system that provides concurrent access to a single file system or set of file systems from multiple nodes. The nodes can be SAN attached, network attached, a mixture of SAN attached and network attached, or in a shared nothing cluster configuration. This enables high-performance access to this common set of data to support a scale-out solution or to provide a high-availability platform.
IBM Spectrum Scale has many features beyond common data access, including data replication, policy-based storage management, and multisite operations. You can create a cluster of IBM AIX® nodes, Linux nodes, Windows server nodes, or a mix of all three. IBM Spectrum Scale can run on virtualized instances providing common data access in environments, leverage logical partitioning, or other hypervisors. Multiple IBM Spectrum Scale clusters can share data within a location or across wide area network (WAN) connections.
Containers adoption is increasing in all industries, and containers sprawl across multiple nodes on a cluster. The effective management of containers is necessary because they will probably reach far greater numbers than virtual machines today. Kubernetes is the standard container management platform being used. Data management is of ultimate importance and often is forgotten because the first workloads containerized are ephemeral.
For data management, many drivers with different specifications were available. A specification named Container Storage Interface (CSI) was created and is now adopted by all major Container Orchestrator Systems available.
Although other container orchestration systems exist, Kubernetes became the standard framework for container management. It is a very flexible open source platform used as base for most cloud providers and software companies container orchestration system.
Red Hat OpenShift is one of the most reliable enterprise-grade container orchestration systems based on Kubernetes, designed and optimized to easily deploy web applications and services. OpenShift enables developers to focus on the code, while the platform takes care of all of the complex IT operations and processes.
The CSI Driver for IBM file storage enables IBM Spectrum Scale to be used as persistent storage for stateful application running in Kubernetes clusters. Through the Container Storage Interface Driver for IBM file storage, Kubernetes persistent volumes (PVs) can be provisioned from IBM Spectrum Scale. Therefore, the containers can be used with stateful microservices, such as database applications (MongoDB, PostgreSQL, and so on).
1.2 Assumptions
This IBM Redpaper publication assumes that you are familiar with basic IBM Spectrum Scale knowledge. If you need more information while reading, you can see IBM Knowledge Center for IBM Spectrum Scale:
We also assume you have Kubernetes and OpenShift knowledge. To learn more about Kubernetes, see the following website:
To learn about Red Hat OpenShift, see the following websites:
1.3 Key concepts and terminology
The information in this section can help you remember some of the content that you will need to follow the examples in this document. If you are familiar with any of these topics, skip to the next section.
1.3.1 IBM Spectrum Scale
Here are the major topics about IBM Spectrum Scale that will be used in this publication. We will give you links to the documentation where you can deepen your skills if you feel that it is necessary at any time.
File sets
Active File Management
Multi-cluster and Remote Mounts
IBM Spectrum Scale on AWS
IBM Spectrum Scale GUI
IBM Spectrum Scale Developer Edition
IBM Spectrum Scale REST API
If you need a different version, there is a selector that can take you to the one. that you want Remember that CSI is supported from version on.
1.3.2 Container runtime
A container runtime environment is a logical grouping of libraries that are referenced on the lifecycle of containers. It is responsible for the container while it is running (like listing running instances), the management of the container (start/stop), and their image management (pull/push/load/save). A great explanation this topic can on the following website:
1.3.3 Container Orchestration System
Containers are the best way to run and maintain microservices architecture-oriented applications, that creates a better environment for continuous improvement, because each part of the application can be independently upgraded and deployed. This also means that many containers will be used to support a single application. The components must be able to communicate securely across many nodes while blocking other access from containers that should not reach the provided services.
Kubernetes Although other container orchestration systems exist, Kubernetes became the standard framework for container management. It is a very flexible, open source platform used as the base for most cloud providers and software companies’ container orchestration systems:
OpenShift Red Hat OpenShift is one of the most reliable enterprise-grade container orchestration systems, designed and optimized to easily deploy web applications and services. Categorized as a cloud development Platform as a Service (PaaS), OpenShift enables developers to focus on code while it runs all of the complex IT operations and processes:
1.4 Introduction to persistent storage for containers Flex volumes
Persistent Volume (PV) is a unit of storage in the cluster that has been provisioned by an administrator or dynamically provisioned via a storage driver or plug-in. A Persistent Volume Claim (PVC) is a request for storage by a user (for example, creating a pod). The PV is used by the PVC.
1.4.1 Static provisioning
A cluster administrator creates several Persistent Volumes up front. PVs carry the details of the real storage that is available for use by cluster users. This causes the administrator to know and set the storage requirements up front. This is useful when there is existing data on IBM Spectrum Scale cluster that needs to be provisioned as persistent volume for containers.
1.4.2 Dynamic provisioning
Dynamic volume provisioning enables storage volumes to be created on-demand. Without dynamic provisioning, cluster administrators must manually make calls to their cloud or storage provider to create new storage volumes, and then create Persistent Volumes. The dynamic provisioning feature eliminates the need for cluster administrators to pre-provision storage. Instead, it automatically provisions storage when it is requested by users.
The implementation of dynamic volume provisioning is based on Storage Class objects. A cluster administrator can define as many Storage Class objects as needed, each specifying a volume plug-in (also known as a provisioner) that provisions a volume and the set of parameters to pass to that provisioner when provisioning. A cluster administrator can define and expose multiple types of storage (from the same or different storage systems) within a cluster, each with a custom set of parameters.
1.4.3 Container Storage Interface (CSI)
CSI is the result of a collaborative initiative to unify the storage interface of Container Orchestrator Systems (COS), such as Kubernetes, Mesos, Docker Swarm, Cloud Foundry, and so on, combined with storage vendors (such as IBM, Ceph, Portworx, and NetApp). A single CSI implementation for a storage vendor should work with all COS. CSI defines a specification that Storage vendors implement in a CSI driver. CSI is based on the general-purpose Remote Procedure Calls (gRPC) framework. It provides extended volume management functions, such as snapshots, clones, and volume expansion.
1.4.4 Advantages of using IBM Spectrum Scale storage for containers
There are many advantages to using IBM Spectrum Scale storage for Containers:
IBM Spectrum Scale is used on high-performance computing and delivers great performance.
It is the provisioner that provides the most flexible way to provision storage:
 – All servers can have direct access to the physical disk, or just designated ones as it best fits the purpose.
 – The disks can be attached using different networks, such as FC, Ethernet, or InfiniBand.
 – Can be provisioned on pre-defined Arrays.
 – Can use your own storage (including cloud storage).
 – Can be created as a Shared Nothing Cluster with two or three copies for resiliency and data availability.