Chapter 3. Solutions and Use Cases – IBM Spectrum Scale CSI Driver for Container Persistent Storage

Solutions and Use Cases
IBM Spectrum Scale along with the IBM Spectrum Scale CSI driver offers unique capabilities to support solutions and use cases for the containerized applications landscape.
Moving to containerized applications is one of the foundations to be able to exploit the benefits of Multicloud environments. Modern container-based cloud environments provide:
Improved agility through more efficient resource utilization and faster workload deployment times
Improved application elasticity by leveraging the container environment scheduling and auto-scaling capabilities
Improved security through service isolation and multi-tenancy
Efficient reuse of applications and services in a DevOps environment
Simplified and accelerated application deployment through ready-to-use CI/CD solution stacks
When considering application Lift and Shift, refactoring, or a green-field approach to the transformation of applications into a container-based microservice architecture, IBM recommends the IBM Garage™ Method approach, details of which can be found at the following website:
The IBM Garage method is IBM’s approach to enable business, development, and operations to continuously design, deliver, and validate new solutions leveraging cloud technologies.
Furthermore, it is recommended to consider a Cloud Adoption Briefing:
Or use the Cloud Transformation Advisor software at to get application transformation advise:
This chapter describes various use cases for applications that are already containerized. Understanding these use cases will help for subsequent chapters of IBM Spectrum Scale CSI driver planning, deployment, and usage. Though not mandatory, it is recommended for all readers.
This chapter covers the following topics:
3.1 Multiple containers accessing the same data
Microservices applications that require persistent data to be shared across multiple services have the choice to follow two different approaches.
With the independent reader/writer approach, each service reads and writes independent fragments of the application data. Data distribution and synchronization is done on the application level. An example for this pattern is an application that replicates data across multiple service instances to provide high availability or load balancing. The MongoDB NoSQL databases horizontal sharding uses this pattern to distribute workload across multiple server instances.
A different approach is reading and writing the same data by leveraging a shared storage solution across the different service instance hosts. This approach would be used if maximum data reuse is required while the data availability is handled by the underlying storage solution. IBM Spectrum Scale as a high performance, very resilient, and highly available storage solution strongly supports this approach.
Figure 3-1 on page 13 shows a Kubernetes cluster with an exemplary microservices application consisting of two containers (Microservice A and Microservice B). Both containers read and write to common data that is exposed by the IBM Spectrum Scale file system through the CSI driver to Microservice A and Microservice B.
Figure 3-1 Basic multi reader/writer microservices use case
The backing image store for a container registry (like Red Hat Quay) in a highly available setup is a practical example for this use case. Multiple container registry service instances need access to shared container images that are stored on a shared file system. If one instance should fail, the remaining instances still have access to the container image data.
Because container images might have a substantial size (up to 100s of MB), storing them only once using a shared IBM Spectrum Scale file system is very cost-efficient. When the container registry is configured to provide active-active HA for scalability, the simultaneous access to images requires a high-performing storage backend, such as IBM Spectrum Scale.
This capability can be achieved by using ReadWriteMany access mode defined in persistent volume claim configuration, which will enable multiple pod to consume one persistent storage at the same time. For details about using ReadWriteMany access mode during volume creation, see Chapter 5, “Deployment and Administration” on page 29.
3.2 Multi-protocol access
Research institutes use data collection scientific instruments that rely on NAS protocols, such as NFS and SMB, to deliver the data they generate.
Automotive companies participating in the area of advanced driver-assistance systems (ADAS) use similar devices to ingest the data collected within the cars through NAS protocols to centralized storage systems within their corporate network. When using IBM Spectrum Scale for these use cases, data can be ingested through the IBM Spectrum Scale CES protocol services for NFS and SMB, and then be used and analyzed by containerized services running in a Kubernetes cluster.
This capability of consuming NFS data from Kubernetes/OpenShift container can be achieved by using the static provisioning feature of IBM Spectrum Scale CSI driver, where data written by NAS applications should be statically provisioned as volumes to be made accessible to containers.
Such a scenario is depicted in Figure 3-2, where the IBM Spectrum Scale cluster has additional IBM Spectrum Scale Cluster Export Service (Protocol) nodes outside the Kubernetes cluster that could be used for high-performance data ingest. The data is then analyzed using big data analytics (BDA) or artificial intelligence (AI) Machine Learning/Deep Learning (ML/DL) applications, which are commonly deployed in a containerized way. After analysis, the results are written to some persistent storage as well, but because this is usually a different volume than the one used for ingest, this is not shown in Figure 3-2.
Figure 3-2 Multi-protocol use case
3.3 Multi-tenancy use case
Multilevel department organization where microservices applications require persistent data to be stored and shared across multiple departments, data can be ingested through applications to IBM Spectrum Scale file system and can be reused for reporting and archive purposes.
Universities participating in multiple research projects and researchers produce data during their research, or need access to the existing data, which is backed by IBM Spectrum Scale file system. The data can be consumed using IBM Spectrum Scale CSI driver in microservices applications on container orchestrator platforms, such as Kubernetes, OpenShift, and so on.
Such a scenario is depicted in Figure 3-3, where the University has multiple colleges and the colleges have multiple schools, such as school of engineering, school of management, and so on. In each school, researchers might be working on multiple research projects. Today, the container orchestrator platform environment can be used to run microservice applications in the university, and data can be ingested through container instances to IBM Spectrum Scale file system.
Figure 3-3 University use case with multiple colleges with multiple schools
In Figure 3-3, consider that a university has one IBM Spectrum Scale cluster and that the admin can create multiple file systems underneath:
Each college can be assigned with one file system.
Each school can be assigned with one independent file set based on storage requirements (multiple file set-based policies can be applied to this file set based on the requirement, and a snapshot can also be taken at the file set level).
Each research project can be assigned with one dependent file set, where the parent file set will be the school’s independent file set under which the project is running.
Each research team member can be assigned with a lightweight directory inside the dependent file set. The Kubernetes/OpenShift admin can create users per storageclass for each research project.
For detailed information to achieve this file set/lightweight directory based provisioning in Kubernetes/OpenShift container orchestrator environment, Refer to Chapter 5, “Deployment and Administration” on page 29.
3.4 Remote file system access use cases
IBM Spectrum Scale provides a way to access a file system from another IBM Spectrum Scale cluster. Such remote file system access is described in IBM Knowledge Center for IBM Spectrum Scale in the section related to Accessing a remote GPFS file system.
Some of the common use cases when remote file system access is used are:
Separation of IBM Spectrum Scale clients and IBM Spectrum Scale NSD servers for administration purposes
Network isolation of multiple clients within a cluster
Configuring different protocol authentication mechanisms for different protocol (CES) clusters
The same use cases also apply to the Kubernetes and OpenShift environments. For these environments, other than setting up remote file system access, additional configuration is required during deployment and configuration. This section describes the use case, deployment, and configuration considerations when using remote file system access.
3.4.1 Separation of IBM Spectrum Scale clients and NSD servers
You can set up a different IBM Spectrum Scale cluster for clients that are part of a Kubernetes cluster to isolate their administration from IBM Spectrum Scale NSD server administration.
Figure 3-4 shows the architecture when IBM Spectrum Scale clients in a Kubernetes cluster are configured in a separate IBM Spectrum Scale cluster.
Figure 3-4 Remote file system access: IBM Spectrum Scale configuration
The configuration in Figure 3-4 on page 16 shows setting up two different IBM Spectrum Scale clusters as follows:
IBM Spectrum Scale Storage cluster: You can configure NSD servers and GUI server as part of this cluster (referred as remote cluster in subsequent paragraphs). The remote GUI servers are used by IBM Spectrum Scale CSI driver to create dynamic persistent volumes that are mapped to new dependent or independent filesets.
IBM Spectrum Scale Client cluster: In this cluster, you can configure all Scale client nodes that are part of Kubernetes cluster including Scale clients on Kubernetes infrastructure nodes and worker nodes. You also need to configure local IBM Spectrum Scale GUI servers as part of this cluster that are used to create static or dynamic light-weight volumes based on directories.
Optionally, you can configure all the Scale clients on Kubernetes worker nodes and Scale NSD servers as part of the same high-speed network for exchanging data or daemon traffic while keeping administration isolated within each cluster.
The configuration for this environment involves the following steps:
1. Enable and mount the IBM Spectrum Scale file system in the IBM Spectrum Scale Storage cluster to be accessible via IBM Spectrum Scale client nodes cluster. See IBM Knowledge Center for IBM Spectrum Scale Accessing a remote GPFS file system section for instructions on setting up the remote file system access.
2. Configure a remotely mounted file system as local storage along with a remote GUI server and local GUI server.
3. Configure PV using the remotely mounted storage as local storage.
For details about configuring a remote cluster with IBM Spectrum Scale CSI driver, see Chapter 5, “Deployment and Administration” on page 29.
3.5 Kubernetes multi-cluster use cases
Companies need to operate multiple Kubernetes/OpenShift clusters depending on the workload and service levels required for the applications. In some cases, it is required to share the same set of data not only within the single cluster but also across the multi-cluster. IBM Spectrum Scale provides the scalability and flexibility in performance and capacity with the single storage name space and that enables users to centralize their data management and avoid the silo of a data store.
These use cases of the data sharing with multi-cluster are typical:
AI and Big Data There are several applications running in AI and Big Data pipeline including “Data Injection”, “Data Preparation/Processing”, “Training,” and “Inference”. In that case, one Kubernetes/OpenShift cluster is being used by many applications that generate a large amount of data and another Kubernetes/OpenShift cluster is being used to run analytics over that data. In order to support the whole pipeline, there is a need for sharing the data between the two Kubernetes/OpenShift clusters. Rather than creating multiple copies of data, IBM Spectrum Scale CSI Driver provides a better approach to share data between two or more clusters.
High Availability Users prepare multi-cluster and run application pods on them to ensure high availability. For instance, users prepare two Kubernetes/OpenShift clusters, then run the production pods on one cluster and prepare a hot/cold standby pod on another cluster.
In that case, it is required to share the same set of data between production and the hot/cold standby pod. IBM Spectrum Scale CSI driver has functions to create the Persistent Volume from the existing directories, and that enables users to easily share the data across multi-cluster.
The configuration in Figure 3-5 describes the data sharing model between two Kubernetes clusters. Users can share existing directories with the Static provisioning volume. The access control can be managed by applying the accessMode for the Persistent Volume.
IBM Spectrum Scale supports the following accessMode:
ReadWriteOnce (RWO) Single node can perform read and write.
ReadWriteMany (RWM) Multiple nodes can perform read and write.
Figure 3-5 Data sharing model between two Kubernetes clusters
Note: When sharing the volume created by the Dynamic provisioning, the path of the Dynamic provisioning volume will be automatically deleted at the Persistent Volume Claim deletion because the reclaim policy of the Dynamic provisioning volume is “Delete” by default. It is recommended to use the Static volume provisioning with the existing directories and use “Retain” reclaim policy.
3.6 CSI driver for multisite use cases (AFM use cases)
IBM Spectrum Scale has an asynchronous caching mechanism, Active File Management, that can be used to implement multisite use cases. This can be achieved even on a public cloud setup where and if AFM, GUI, and CSI driver are supported, enabling hybrid cloud.
Note: As of the writing of this Redpaper publication, IBM Spectrum Scale on AWS 1.2.0 does not enable GUI by default so it would not be supported in this version. In a future version when all of the needed features are supported, the multisite feature will enable hybrid multi-cloud out-of-the-box.
Static provisioning on directories that reside inside of an AFM file set should be used to provide persistent volumes, ensuring that data will be used as is and not deleted after the pod is deleted. Make sure that you are familiar with all of the concepts and limitations of AFM, because they will all apply to the multisite use case. See the following documentation:
Depending on the use case, a different AFM mode can be used to cache data across sites. The two use cases we show here are the Independent Writer (IW) and the Single Writer (SW) with Read Only (RO) modes. On our example, we use a MongoDB stateful pod that uses the static provisioned volume on all clusters.
The implementation of the Independent Writer AFM mode is done updating all the caches sites with the latest metadata and data pointers found in the home cluster. All the writes done on the file set are asynchronously sent to the home and no locking is done across caches.
Because of this, the user implementation of the use case must ensure correct usage not to cause unintended data corruption. Therefore, the MongoDB can never be up on more than one site at a time. In this use case, the only automation needed is the periodic prefetch of the data to ensure minimum data transfer is needed in case the activation of the cache side is performed. Figure 3-6 depicts the implementation of the case.
Figure 3-6 Multisite use case with AFM independent writer
Implementing an Independent writer architecture also implies that from time to time the cache must go back to the home and see if any data has changed, and if so, replicate any metadata updates that are needed. Be aware that this can add latency because the check from each cache needs to be performed on home regularly.
To avoid the downsides of the Independent Writer on a multi-cloud case, one might use the Read only and Single Writer modes. The changes on the mode on each cache need to be automated and write locking will be enforced on all RO caches. The client must ensure all caches that are not active are Read Only when implementing this use case to avoid unwanted writes corrupting data. If the workload is running on home, all caches must be RO. If it is running on one of the caches, only the one that is running the workload must be a writer, other sites must be RO. This scenario is shown in Figure 3-7.
Figure 3-7 Multisite use case with AFM Single Writer
Although this mode will not have the data change verification delay like the Independent Writer mode, it will require an automation to change the mode from SW to RO and vice-versa when migrating the workload across clusters.
3.7 Compressed Volumes use case
This section describes how one could leverage IBM Spectrum Scale’s file compression feature in order to achieve volume compression for persistent volumes for containers.
IBM Spectrum Scale Compression
File compression is an important technology for storage products. Compression improved overall storage space efficiency. It also reduces I/O bandwidth consumption resulting in reduced load on the storage back-end. Further, caching compressed data on the client increases the apparent cache size. IBM Spectrum Scale 5.0.0 supports the LZ4 compression algorithm. LZ4 is a much faster compression algorithm, with decompression speed up to 5 times better than zlib.
When to use file compression
ZLIB is intended primarily for cold objects and files. It favors saving space over read-access speed. Compressing other types of data can result in performance degradation. LZ4 is intended primarily for active data and favors read-access speed over maximized space saving.
Sample use case
Due to legal and compliance requirements, applications need to maintain data/logs for a predefined period. If such applications are deployed in a container environment that uses Persistent Storage volume for storing its data/logs, one can use the compression functionality available with IBM Spectrum Scale for compressing the files in the persistent volume. Usually logs have a good compression ratio.
To use IBM Spectrum Scale functionality with IBM Spectrum scale CSI driver, complete the following steps:
1. Create PV/PVC.
2. Write a compression policy based on your requirements.
3. Apply the compression policy based on your requirement.
For detailed steps and an example, see A.2, “Compression use case” on page 61. See the mmapplypolicy command and file compression policy for more details: