Problem determination and troubleshooting
The state of IBM Spectrum Scale CSI driver and its behavior depends on various Kubernetes IBM Spectrum Scale components working together. Problem determination in this area involves collecting debug data from these various components and using the same for troubleshooting.
In this chapter, we cover the following topics:
Specifically, we provide the following information:
•The process of log data collection
•How to use collected data for debugging issues
•Sample error scenarios during deployment and configuration
•Sample runtime issues
7.1 How to collect debug data for IBM Spectrum Scale CSI driver
IBM Spectrum Scale CSI driver provides a tool (spectrum-scale-driver-snap.sh) to collect the driver debug data. This tool gathers the state of required Kubernetes resources, such as nodes, pods, service accounts, and so on, and collects statefulset and daemonset logs from all nodes. It collects definition of resources in the given namespace with the label product=ibm-spectrum-scale-csi. The collected logs are stored in a given output directory.
Here is debug data collection usage format and parameters:
spectrum-scale-driver-snap.sh [-n namespace] [-o output-dir] [-h]
-n: Debug data for CSI resources under this namespace will be collected.
If not specified, default namespace is used. The tool returns error
if CSI is not running under the given namespace.
-o: Output directory where debug data will be stored. If not specified,
the debug data is stored in current directory.
-h: Prints the usage
Download the tool from the following site:
Sample output of the command is shown in Example 7-1.
Example 7-1 Sample output from the execution of CSI driver debug data collection tool
# spectrum-scale-driver-snap.sh -n ibm-spectrum-scale-csi-driver
Collecting "ibm-spectrum-scale-csi" logs...
The log files will be saved in the folder [./ibm-spectrum-scale-csi-logs_11-18-2019-01:58:22]
oc logs --namespace csi StatefulSet/ibm-spectrum-scale-csi-attacher
oc logs --namespace csi StatefulSet/ibm-spectrum-scale-csi-provisioner
oc logs --namespace csi pod/ibm-spectrum-scale-csi-cr4zt
oc logs --namespace csi pod/ibm-spectrum-scale-csi-lwjnk
oc logs --namespace csi pod/ibm-spectrum-scale-csi-lzscw
oc logs --namespace csi pod/ibm-spectrum-scale-csi-npczx
oc describe CSIScaleOperator --namespace ibm-spectrum-scale-csi-driver
oc logs --namespace ibm-spectrum-scale-csi-driver pod/ibm-spectrum-scale-csi-operator-
oc describe all,cm,secret,storageclass,pvc,ds,serviceaccount -l product=ibm-spectrum-scale-csi --namespace csi
oc describe clusterroles/external-provisioner-runner clusterrolebindings/csi-provisioner-role clusterroles/external-attacher-runner clusterrolebindings/csi-provisioner-role clusterroles/csi-nodeplugin clusterrolebindings/csi-nodeplugin --namespace csi
oc get all,cm,secret,storageclass,pvc,ds,serviceaccount --namespace csi -l product=ibm-spectrum-scale-csi
oc get pod --namespace ibm-spectrum-scale-csi-driver -o wide -l product=ibm-spectrum-scale-csi
oc get configmap spectrum-scale-config --namespace csi -o yaml
oc get nodes
oc describe nodes
oc describe scc csiaccess
oc cluster-info dump --namespaces kube-system --output-directory=./ibm-spectrum-scale-csi-logs_11-18-2019-01:58:22
Finished collecting "ibm-spectrum-scale-csi" logs in the folder -> ./ibm-spectrum-scale-csi-logs_11-18-2019-01:58:22
The resultant folder contains the following files with debug information:
ibm-spectrum-scale-csi-xxxxx-driver-registrar.log and ibm-spectrum-scale-csi-xxxxx.log are daemonset logs present for every worker node where the driver is running. For detailed descriptions of these files, see the following section.
7.2 Understanding log files
The first step of troubleshooting is to check the state of the system and services. For CSI driver to function well, it is essential that the Kubernetes or OpenShift cluster is in a good state and properly configured.
7.2.1 Checking the state of Kubernetes cluster
Here are the descriptions of the Kubernetes cluster log files:
This file contains the description of all nodes in the cluster and their status. Things to check here are the node roles, labels, any taints that are applied, system information, and a list of pods that are running on this node. This file also gives information about specific node conditions, such as MemoryPressure, DiskPressure, and PIDPressure, which might cause a node to fail or not be in a ready state.
This file contains a list of CSI driver resources, such as pods, serviceaccounts, statefulsets, and their status.
This file contains detailed information about CSI driver resources, such as pods, serviceaccounts, statefulsets, and containers running within the pods. Things to check here are any events listed under the pods and statefulsets. If labels are used for resource creation, then this file contains information about storageclasses and PVCs.
7.2.2 Checking for issues during driver initialization
Here are the descriptions of logs to check during driver initialization:
This file contains the configmap details for the CSI driver, which includes cluster, file system, and GUI server details.
This file contains logs for registration of the CSI driver with kubelet.
This file contains detailed logs of the CSI driver initialization process.
7.2.3 Checking for issues during provisioning of volumes
Here are the descriptions of logs to check for issues during provisioning of volumes.
The volume creation and deletion requests are logged in this file. Use this file to determine the request that failed. Identify the PVC name from the failed request. For example: pvc-f531f55d-d90f-4ee0-8ad9-e81c55fe5684
Use this file to look for detailed logs of the failed request. The failed request can be looked up by searching for the PVC name identified from ibm-spectrum-scale-csi-provisioner.log.
7.2.4 Checking for issues during attaching of volumes
Here are the descriptions of logs to check for issues during attaching of volumes.
The volume mount/unmount requests are logged in this file. Use this file to determine the request that failed. Identify the volume ID from the failed request. For example: volume_id:”17797813605352210071;AC10D811:5DA2D1D1;
Use this file to look for detailed logs of the failed volume attach request. The failed request can be looked up by searching for the volume ID identified from ibm-spectrum-scale-csi-attacher.log.
For more details about troubleshooting and error scenarios, see IBM Knowledge Center for IBM Spectrum Scale Container Storage Interface Driver Troubleshooting: