Chapter 4. Configuration and management recommended practices – Implementing and Managing a High-performance Enterprise Infrastructure with Nutanix on IBM Power Systems

Configuration and management recommended practices
This chapter describes a few management recommended practices from the experiences that were gathered during the project.
This chapter includes the following topics:
4.1 Nutanix disaster recovery
In this section, we describe how to configure disaster recovery (DR) in a Nutanix cluster.
4.1.1 Implementing disaster recovery
Nutanix allows native backup and DR, which gives the users the ability to back up and restore objects that are running on local or cloud environments (Xi).
To start the data protection process, complete the following steps by using the Nutanix’s Web Console, as shown in Figure 4-1.
Figure 4-1 Selecting the Data Protection option
1. In the Data Protection dashboard, click Protection Domain and select Async DR from the drop-down list. The Protection Domain window opens, as shown in Figure 4-2.
Figure 4-2 Nutanix Web Console Protection Domain pane
2. Enter the information about the remote site by using a proxy and selecting a backup or the disaster recovery option, as shown in Figure 4-3.
Figure 4-3 Remote Site configuration window
Figure 4-4 shows the details for the remote site, the running replications, and a snap shot of the disaster recovery such as metrics, alerts, and events.
Figure 4-4 Configuring Disaster Recovery window
Figure 4-5 shows the name and the current setting of the DR configuration.
Figure 4-5 Data Protection Remote Site configuration window
Figure 4-6 shows the table, overview, and current settings of the DR restore configuration.
Figure 4-6 Nutanix Web Console - Restoring the VM
Figure 4-7 shows the restored snapshot, its name, and the type of the protection domain.
Figure 4-7 Restore Snapshot window
Nutanix also supports several types of protection strategies, including one-to-one or one-to-many replications. The replication strategies can be implemented as a data protection by configuring the protection domains and remote sites by using the Nutanix Web Console.
A protection domain is a defined set of virtual machines and volume groups or storage containers.
Replication is an important component of the enterprise data protection solution. It ensures that critical data and applications can be effectively replicated to a site or another environment.
Some of the replication options are Per-VM Backup, which provides the ability to designate certain VMs for backup to a different site, and the Selective Bidirectional Replication, which is in addition to replicating selected VMs. This option provides a flexible replication solution by accommodating various enterprise topologies.
4.2 Configuring a protection domain (async DR)
This procedure shows how to create a protection domain that supports backup snapshots for selected VMs and volume groups by using asynchronous data replication.
Before starting, ensure that you meet the protection domain guidelines for configuring the Async DR. These guidelines are provided by Nutanix and are available at this web page.
4.2.1 General guidelines
For successful replication, all local VMs controllers must communicate with all remote controller VMs.
Nutanix recommends that any ESXi clusters that implement a DR configuration be registered with vCenter Server. This recommendation also applies to a cross-hypervisor disaster recovery (CHDR) configuration.
For VM migration as part of data replication to succeed in an ESXi hypervisor environment, check that you configured forward (DNS A) and reverse (DNS PTR) DNS entries for each ESXi management host on the DNS servers that are used by the Nutanix cluster.
Note: The hardware page on the web console shows the host name and hypervisor IP address for each management host. The ncli host ls command also lists each hypervisor’s IP address.
When configuring encryption of data replication, set up an encrypted site-to-site tunnel and specify the tunnel IP address when you create the remote site (specify the tunnel IP address in the addresses parameter).
For cases when bandwidth between sites is limited, set a limit on the bandwidth that replication uses by specifying the Maximum Bandwidth parameter.
Note: A consistency group is a subset of the entities in a protection domain. Consistency group typically must not exceed more than 20 entities.
One-time snapshots have infinite expiry time, and hence it is recommended to specify retention time when you are creating one-time snapshots. Consider the following points:
Do not include the source and the destination cluster under the same data center because VMs can be deleted from the source and destination clusters post migration process.
Do not change the VM or its configuration on the source cluster after the VM is powered off by the system during the migration of the protection domain. Otherwise, the changes that you made are lost.
Note: If you are using the same vCenter Server to manage the primary and remote sites, do not have the storage containers with the same name on both the sites.
Do not have VMs with the same name on the primary and the secondary clusters. Otherwise, the recovery procedures can be affected.
The snapshot operation fails after six retries of the protection domain that has the VM on which some other ongoing tasks are in progress. The snapshot operation succeeds only if the ongoing tasks on the VM are completed within these six retries.
If you use Prism to take one or more storage snapshots of a protection domain that includes a VM that also includes VMware snapshots, a risk exists that the VM snapshot might become corrupted under certain circumstances. To avoid this issue, apply the following best practices:
Ensure that a VMware snapshot (through vCenter) does not exist when a storage snapshot (Prism) is taken.
Perform storage snapshots during times when VM snapshots are less likely to occur.
Restore VMs only from a storage snapshot that was taken when no VMware snapshots existed for the VMs or the VMs were powered off when the storage snapshot was taken.
Schedule Nutanix storage snapshots to have limited or no overlap with manual or backup initiated VM snapshots.
Do not use the VMware level encryption that is used to encrypt the existing virtual machine or virtual disk along with Async DR configuration because this use is not supported.
To protect VMs that were created by VMware View Composer or Citrix XenDesktop, Nutanix recommends adding files that are associated with the VM gold image to a protection domain. Use the nCLI command to create the protection domain.
For example, to protect the replica-ABC.vmdk file in a protection domain that is named vmware1-pd and a consistency group that is named vmware1-cg, run the following command:
ncli> protection-domain protect \
files=/container1/view-gold-image/replica-ABC.vmdk name=vmware1-pd cg-name=vmware1-cg
Note: You must disable the VMware View pool from the View Composer. For more information about disabling View Pool from the View Composer, see VMware View documentation at this website.
If the local and remote sites are running ESXi and are registered with vCenter Server, MAC address retention after a failover works in the following ways:
If you configure a static MAC address for a virtual network adapter on the local site, the adapter retains its MAC address after the protection domain is activated at the remote site and the VM is powered on.
If you configure automatic MAC address assignment for the virtual network card on the local site, and both sites (local and remote) are registered with the same vCenter Server, the MAC address changes when the VM is registered at the remote site.
If you configure automatic MAC address assignment for the virtual network card on the local site, and the local and remote sites are registered with separate vCenter Servers, the MAC address changes only after the VM is powered on at the remote site.
At any site (primary or remote), all the nodes in the Nutanix cluster must be part of the same ESXi host cluster and the network must be available on all the nodes in the Nutanix cluster.
If you are deploying an intrusion prevention system (IPS) appliance or software, consider whether any configured filters or other network monitoring aides can block packets that are transferred during replication operations. You can add the IP address of any appliances or systems that are running the software to the whitelist, as described in Configuring a Filesystem Whitelist, which is available at this web page.
When the VMs are created to protect them, the following recommendations are provided:
(Hyper-V): It is recommended to create a VM in their unique folders instead of using a default folder. If a default folder is used to create the VMs, you cannot protect these VMs.
(Hyper-V): Path-prefix must not be reused for the same VMs in a protection domain.
When a VM running NGT is restored from a hypervisor-based snapshot that was created before NGT was installed on the VM, the VM is restored without NGT. Therefore, any native snapshots of the VM that were created after restoration are based on stale NGT information. To avoid this issue, reinstall or disable NGT on the VM after restoration.
Consider the following general limitations for Async DR:
Protection domains can have no more than 200 entities (VMs or volume groups). It is recommended that each application that constitutes a set of entities is protected by a unique protection domain.
Because restoring a VM does not allow for VMX editing, VM characteristics (such as MAC addresses) can be in conflict with other VMs in the cluster.
To be in a protection domain, a VM must be entirely on Nutanix data store (no external storage).
Data replication between sites relies on the connection for encryption.
It is not possible to make snapshots of entire file systems or storage containers.
The shortest possible snapshot frequency is one per hour.
Consistency groups cannot define boot ordering.
Inactivating a protection domain deletes the entities from the cluster. Deleting a protection domain removes all the snapshots that are associated with the protection domain from the cluster. Therefore, do not inactivate and then delete a protection domain that contains VMs. Delete the protection domain without inactivating it, or remove the VMs from the protection domain before deleting it.
Attention: If you inactivate and then delete a protection domain that contains VMs, the VMs in the protection domain are deleted.
Some VMs might not appear in the web console or in the nCLI command results during rolling upgrades, planned Controller VM maintenance, or when hosts are down or unavailable. Some protection domain operations, such as snapshots or protection, might also fail in this case.
The following limitations apply to the inclusion of related entities in a protection domain:
If the number of entities in a consistency group exceeds 10, protection of related entities fails. However, if you want to include more than 10 entities in a protection domain, protect the entities in separate consistency groups within the same protection domain.
Even if two related entities are in separate consistency groups, their attachment configuration is included in the snapshots and restored during recovery if they are in the same protection domain.
Snapshot creation and recovery of attachments are not supported if you configured volume groups with the following characteristics:
 – Challenge-Handshake Authentication Protocol (CHAP)
The iSCSI target secret is cleared after a volume group is restored from a snapshot.
 – IP addresses
If you attach volume groups to VMs by whitelisting the IP addresses of the VMs, entities are recovered, but their attachment configuration is not recovered. You must manually reattach the entities after recovery.
The following operating systems are supported:
Microsoft Windows Server 2008 R2 and Microsoft Windows Server 2012 R2.
Red Hat Enterprise Linux 6.7 and 6.8.
Oracle Linux 6.7 and 7.2.
Consider the following vSphere environment-specific limitations:
Nutanix native snapshots cannot be used to protect VMs on which VMware fault tolerance is enabled.
VMs that are connected to a vSphere Distributed Switch (dvSwitch) are not connected to their port group after failover. After failover, you must manually connect such VMs to their port group.
Consider the following Hyper-V environment-specific limitations:
A disaster replication snapshot fails and raises an alert if one of the following conditions occurs:
 – Any VM files (for example, configuration, snapshots, virtual disks, and ISOs) are on non-Nutanix storage containers.
 – All virtual disks that are associated with a VM are in different directories or folders. That is, all virtual disks that are associated with a VM must be in a single directory or folder.
 – A VM’s folder and its snapshot folder are in different directory or folder paths. That is, a snapshot folder often is in a snapshot folder under the VM’s folder. The snapshot folder must be under the VM folder or the replication fails.
Run-as account must be a domain account and have local administrator privileges on the Nutanix hosts. This account can be a domain administrator account. When the Nutanix hosts are joined to the domain, the domain administrator accounts automatically takes administrator privileges on the host. If the domain account that is used as the run-as account in SCVMM is not a domain administrator account, you must manually add it to the list of local administrators on each host by running the sconfig command.
Nutanix does not support Hyper-V replica VM in the Async DR protection domain.
The name of the Hyper-V virtual switches between primary and remote site must be the same; otherwise, the restore fails.
If the base VM and the differencing disk are in the same protection domain, restore fails after migrating to the protection domain to the secondary site in-place\.
Any VMs that were created during hypervisor upgrade (including VMs that were created by using DR operations) experience downtime if they are not configured for high availability and the corresponding node is yet to be upgraded. It is recommended that you not create any VMs manually or by using DR operations during the hypervisor upgrade process.
After a VM is migrated, its entry is not removed from Failover Cluster Manager on the source cluster.
If a highly available VM is protected, clones of the VM do not inherit the high availability configuration.
A highly available VM is no longer highly available after it is migrated to a remote site, unless an entry with the same VMID exists in Failover Cluster Manager.
During an in-place restore or failback operation, the HA property of a VM is honored only if the entry is not cleaned up in Failover Cluster Manager.
The state of the HA property of a VM is not captured when a snapshot is created. If a VM’s VMID entry is present in Failover Cluster Manager when it is being restored from its snapshot, the VM is HA protected.
The owner node of a VM can change when a disaster recovery operation, such as in-place restoration or migration, is performed on the VM.
For more information, see the Nutanix-provided practice guide to set up DR, which is available at this web page.