Chapter 1. Ethernet architectures for SAP workloads – IBM Power Systems Infrastructure I/O for SAP Applications

Ethernet architectures for SAP workloads
This chapter summarizes several technologies and puts them into the context of different SAP landscapes. In SAP landscapes, Shared Ethernet Adapters (SEAs) on 10 Gbps infrastructures are the dominant and most viable deployment. In the future, new technology is coming that will change the landscape, especially for larger SAP S/4HANA/Business Suite environments.
This chapter describes the following topics:
1.1 Preferred Ethernet cards for Linux on IBM POWER9 processor-based servers for SAP workloads
At the time of writing, the preferred Ethernet cards that were tested in the context of SAP workloads in a Linux environment on IBM POWER9™ processor-based servers are shown
in Table 1-1.
Table 1-1 Preferred Ethernet cards
Description
Low-profile Fibre Channel
Full-height Fibre Channel
PCIe3 LP 2-Port 10 GbE NIC & RoCE SR/Cu Adapter
EC2R
EC2S
PCIe3 LP 2-Port 25/10 GbE NIC & RoCE SR/Cu Adapter
EC2T
EC2U
PCIe3 LP 2-port 100/40 GbE NIC & RoCE QSFP28 Adapter x16
EC3L
EC3M
PCIe4 LP 2-port 100/40 GbE NIC & RoCE QSFP28 Adapter x16
EC67
EC66
With virtual network interface cards (vNICs) and 25 or 100 Gb cards, use the latest firmware levels when using vNICs to ensure the highest processor (core) savings and
best performance.
 
Note: During the development of this publication, the team did not explicitly test the cards with transceivers.
1.2 Ethernet technology introduction
IBM PowerVM® provides different options for virtualizing the network connectivity of a Linux logical partition (LPAR). You must determine whether dedicated network adapters can be assigned to the LPAR to achieve the highest network bandwidth and lowest latency or whether you can use Virtual I/O Server (VIOS) and use its advanced flexibility and reliability options. IBM Power Systems servers can be configured in mixed modes with some LPARs that are configured with dedicated or shared adapters and other LPARs that use VIOS.
1.2.1 Dedicated and physical sharing
An Ethernet network card can be directly dedicated to one LPAR (all ports are bound to a single LPAR), or it can be shared by using Single Root I/O Virtualization (SR-IOV) technology. SR-IOV provides logical ports to share the physical ports across multiple LPARs.
The tradeoff for these deployment options are less latency and better throughput because of the absence of Live Partition Mobility (LPM). Without VIOS, all traffic goes outside the server and comes back to it, unlike internal virtual LAN (vLAN) configurations with VIOS.
The typical use cases that these two options provide are as follows:
A single LPAR uses the full central electronics complex (CEC) without LPM (for example, database replication to a second node).
Small deployments on S-class servers.
Large SAP S/4HANA/Business Suite databases (appserver traffic).
Dedicated adapters
If you are using dedicated adapters, all physical adapters are assigned directly to the client LPAR. The adapter is exclusively bound to one particular partition, including all its ports. Dedicated adapters provide the best possible performance and latency, but do not allow any resource sharing.
Single Root I/O Virtualization
SR-IOV is an enhanced network virtualization technology on Power Systems servers. In SR-IOV shared mode, the physical network adapter is assigned to and managed by the
IBM PowerVM Hypervisor. The physical SR-IOV adapter has multiple physical ports that are connected to external network switches. On POWER9, the different ports of the adapter can be equipped with different transceivers to allow operations with different network speeds.
In this case, no VIOS partition is required, so sharing is possible by enabling the SR-IOV adapter in SR-IOV shared mode. The ratio between LPARs and required adapters, occupied PCI slots, and used network ports is improved because of better resource utilization. Depending on the adapter type, a different number of virtual functions is possible. The number of virtual functions define the granularity for the partitioning of the adapter. For more information, see How many logical ports/VFs are supported per adapter?
Each LPAR receives an SR-IOV logical port (Figure 1-1) with ensured capacity and bandwidth that is assigned according to the defined number of virtual functions.
Figure 1-1 SR-IOV logical port view assignment per LPAR
The entitled capacity is the ensured amount of adapter bandwidth, which can be exceeded if the port has available bandwidth1.
Remote direct memory access (RDMA) technology minimizes the required memcopy actions in the layers. So, SR-IOV provides more packets per second with lower latency and lower CPU consumption compared to SEA technology. Workloads that use many small packages can benefit from a latency perspective, such as transactional workloads where many appservers send their requests to a single database.
For each logical port that is assigned to an LPAR or configured for eventual use, a small amount of network bandwidth is reserved and is not available for any other LPAR. For example, a 10 Gbps port is assigned to 48 LPARs (2% each), and only one LPAR heavily communicates with the network (all other LPARs are idle in the network), which results in a maximum throughput value for the busy LPAR of about 5 Gbps. When all LPARs are actively communicating, the limit is not noticeable because the sum of all communication channels is limited by the total bandwidth of the adapter.
1.2.2 Network virtualization
There are three key technologies for network virtualization in PowerVM. All of them require the mandatory implementation of a dual-VIOS setup. They are valuable network sharing options and support LPM:
Virtual Ethernet is used for internal LPAR to LPAR communication when all LPARs are within the same Power System server. Virtual Ethernet does not use a physical network adapter, and it provides high bandwidth and low latency.
SEA has different means of implementation, and it has been the dominant network virtualization technique for more than a decade. Virtual Ethernet is extended to the external network infrastructure by using physical network adapters that are assigned to the VIOS.
vNIC, a new technology, uses SR-IOV and addresses the disadvantages of SEA (high amount of CPU utilization, limits in high numbers of packets per second, and higher latency) when using high-speed network adapters. For 25 Gbps and faster ports, this technology is starting to appear in a few SAP deployments at clients.
 
Note: For LPARs running production or production-like workloads, a dual-VIOS configuration is mandatory to meet the availability requirements and limit both planned and unplanned downtime.
Virtual Ethernet adapter
The virtual Ethernet adapter (internal vLAN) allows for communication between LPARs within one physical Power Systems server. The IBM POWER Hypervisor is used as an internal network switch, which provides in traditional 10 Gb environments at least twice the throughput at lower latency without external network traffic. It can be also configured when using more than 10 Gbps port speeds, but the internal vLAN speeds did not increase with the tested stack that was used in 2019 for this document.
VIOS Shared Ethernet Adapter
For the SEA, the physical network or SR-IOV (promiscuous mode) adapter and all its ports are assigned to a VIOS partition, and virtual adapters are provided to the client partitions by mapping inside the VIOS a single physical port or SR-IOV logical port in promiscuous mode to multiple virtual ports. The virtual ports are then assigned to an LPAR.
The throughput scalability for the multiple LPAR setup is excellent, but it comes with a CPU cost on the VIOS for doing the mapping between virtual and physical ports, and for using memcopy in the various layers. For environments up to 10 Gbps network speed, a SEA setup is a good tradeoff for optimizing utilization and providing redundancy at low cost. For environments with high-speed adapters (25 Gbps, 40 Gbps, and 100 Gbps), SEA implementation does not allow you to use fully that bandwidth from a single LPAR, but you can do it from multiple LPARs to reduce the number of physical adapters.
 
Note: You must still have redundant physical adapters.
SEA can be configured to share the load in a dual-VIOS environment or as a simple failover configuration. For more information, see “Shared Ethernet adapters for load sharing in IBM Knowledge Center and “Shared Ethernet Adapter failover” in IBM Knowledge Center.
vNIC
vNIC is a new virtual adapter type that became available in December 2015 (it was restricted to the AIX operating system (OS) then). SUSE released the ibmvnic driver in 2019, which is described at the SUSE Blog.
For SAP landscapes, vNIC is a future-oriented solution for a higher bandwidth adapter, lower latency, and reduced CPU usage. Because the SEA virtualization cost and latency is acceptable now, there is no technical need to move to this new technology yet.
The vNIC technology enables advanced virtualization features such as LPM with SR-IOV adapter sharing, and it uses SR-IOV quality of service (QoS).
To configure a vNIC client, the SR-IOV adapter must be configured in SR-IOV shared mode. Free capacity to feed the used logical ports must be available. When an LPAR is activated with a client vNIC virtual adapter (Figure 1-2), or when a client vNIC virtual adapter is added to a partition dynamically by a DLPAR operation, the Hardware Management Console (HMC) and the platform firmware automatically creates the vNIC server and the SR-IOV logical port backing device, and dynamically adds them to the VIOS.
Figure 1-2 Client LPAR vNIC
The vNIC configuration requires the enhanced GUI in the HMC2 or the HMC REST interface. When a vNIC adapter is added to an LPAR, all necessary adapters on the VIOS (SR-IOV logical port and vNIC server adapter) and on the LPAR (vNIC client adapter) are created in one step. No extra configuration is required on the VIOS.
vNIC has more concepts for failover (active-passive with multiple backing devices or link aggregation). Both concepts are compatible with LPM requirements and provide
LPM capability.
vNIC failover
vNIC failover provides a high availability (HA) solution at the LPAR level. A vNIC client adapter can be backed by multiple logical ports (up to six) to avoid a single point of failure. Only one logical port is connected to the vNIC client concurrently (the active backing device has the highest priority). If the active backing device fails, then the hypervisor selects a new backing device according to the next highest priority.
Active-backup link aggregation technologies like Linux bonding active-backup mode can be used to provide network failover capability and sharing of the physical port (Figure 1-3 on page 7). To ensure detection of logical link failures, a network address to ping must be configured to monitor the link. For Linux active-backup mode, the fail_over_mac value must be set to active (fail_over_mac=1) or follow (fail_over_mac=2).
Figure 1-3 vNIC bonding
Multiple vNIC client virtual adapters can be aggregated to a single bonding device in the client LPAR to achieve higher bandwidth.
A set of requirements must be met to support network bonding (Figure 1-4):
Each vNIC client must have a single backing device. When the vNIC client is defined with multiple backing devices, then link aggregation is not possible.
Each SR-IOV physical port must not be shared with other vNIC servers. Per physical port, only one LPAR can be assigned. It is a best practice to configure the logical port with a capacity of 100% (to prevent sharing it with other LPARs).
 
Note: When using high-speed network adapters, check that the Linux service irqbalance is installed and active.
Figure 1-4 Sample architecture for using bonding and vNIC for filer attachments
The target system must have an adapter in SR-IOV shared mode with an available logical port and available capacity (virtual functions) on a physical port. If labels are correctly set on the SR-IOV ports, then during LPM the correct physical ports are automatically assigned to the label name.
1.2.3 Selecting the correct technology for SAP landscapes
When you select the correct technology for your SAP landscapes, you typically have the following typical decisions to make for the client architecture workshops:
Which IBM Ethernet virtualization and sharing capabilities are wanted (LPM capability, rolling maintenance, or sharing).
Different network needs. For example, hot standby databases such as IBM DB2® High Availability Disaster Recovery (DR) (HADR) or SAP HANA System Replication (HSR) versus appserver communication. These needs can be defined by transmissions per second (TPS), latency, and packet sizes.
Client sample
SAP HANA is installed by using HSR for the Business Suite application. The application servers use up 160 cores in total with more than 300,000 TPS, and they have a low latency I/O requirement for small packages on the HANA LPAR. So, the application servers are on
10 Gbps SEA, but the HANA DB where all I/O is bundled is configured with SR-IOV.
 
Note: After this step is complete, cross-verify the planned number of adapters that fit into the selected server model, including the consideration of no network cards (for example, Fibre Channel (FC) cards).
Comparing the different network virtualization and sharing options
Table 1-2 summarizes the various options.
Table 1-2 Comparison of Ethernet technologies on Power Systems servers
Technology
LPM
QoS
Direct-access performance
Link aggregation
Requires VIOS
>400.00 TPS per 25 GBps port
Physical adapter sharing
Dedicated network adapter
No
N/A
Yes
Yes
No
Yes
No. Each adapter is assigned directly to an LPAR.
SR-IOV
No
Yes
Yes
Yes*
No
Yes
Yes. An SR-IOV logical port is created, and virtual functions are assigned.
vNIC
Yes
Yes
No
Yes1
Yes
Not yet
For vNIC failover full sharing flexibility. For link aggregation, a full port must be dedicated.
SEA
Yes
No
No
Ye
Yes
No
Yes. A virtual Ethernet client.

1 IEEE802.3ad/802.1ax (LACP) is supported for SR-IOV and vNIC. The requirement is that there is a one-to-one relationship between the physical port and the logical port. Configure only one logical port per physical port by configuring the logical port with a capacity value of 100% to prevent configuration of more than one logical port per physical port.
1.3 Ethernet tuning for SAP networks
There are different networks in SAP landscapes, and some have different needs based on the application workload. This section highlights the key areas of Ethernet tuning, but does not necessarily cover all aspects:
Appserver to DB server: 10 Gbps cards + SEA are often used on the application server.
Transactional workloads: The dominant deployment is based on 10 Gbps SEA with load sharing and internal vLAN. Transactional workloads can result in many small packages with lower latency. More interrupt queues can help improve the performance on the DB. In the past, the solution used dedicated adapters, but with the SR-IOV and vNIC options, more flexibility is available. Typically, it is sufficient to use this deployment on the DB side because all the application servers centralize their I/O requests (1-DB: n-application server).
Analytical workloads: These workloads tend to have fewer and larger packages that are sent to and from the DB server. In most cases, this communication does not require special handling, and unified connectivity is the objective. The dominant deployment is based on 10 Gbps SEA with load sharing. SEA still delivers best sharing characteristics when used in 10 Gbps environments for many applications with small bandwidth and no need for bandwidth control. When moving to higher speeds, SEA is not the preferred option, but it can be considered an intermediate step.
Backup by using BACKINT.
SAP BACKINT is a backup mechanism where data is read directly from memory to the backup server. If this method is too slow, it can have an impact on the DB availability and responsiveness depending on the backup method that is configured. If the performance is insufficient, throughput must be increased (latency is not the problem because large packages are written). Throughput can be increased either by having multiple interrupt requests (IRQs) (more adapters inside the LPAR) or higher bandwidth (25 Gbps cards) by using jumbo frames and a large send offload (LSO) configuration. The offloading of large packages is mandatory to benefit from jumbo frames.
 
Note: The maximum speed cannot go beyond the storage speed to where the backup is written. For more information about using jumbo frames, see Network Configurations for HANA Workloads on IBM Power Servers, found at SAP HANA on IBM Power Systems and IBM System Storage - Guides.
Database to database.
Databases in scale-out deployments such as HANA have specific I/O patterns for internode communication. For SAP HANA, see Network Configurations for HANA Workloads on IBM Power Servers, found at SAP HANA on IBM Power Systems and IBM System Storage - Guides. For other databases, contact your database vendor.
Database Replication for Hot Standby Solutions typically tends to create many small packages. Hence, the number of parallel interrupt queues determine the efficiency.
Filer Attachment and internet Small Computer Systems Interface (iSCSI) boot have different patterns because this is storage I/O, which often requires high bandwidth and high-speed adapters. Existing client deployments are based on bonding 25 Gbps ports, but other deployment options are possible too. Also, InfiniBand can be used in some cases to benefit from RDMA but without LPM capability.
1.3.1 Optimizing the network configuration for throughput on 10 Gbps by using SEA and jumbo frames
These instructions focus on throughput optimization occurring in SAP landscapes, for example, for backup or HANA scale-out deployments when using SEA.
Here are items to check before planning for jumbo frames:
Deployments require a backbone that can support large packages end-to-end to avoid performance impacts.
Just setting the maximum transmission unit (MTU) to 9000 is not sufficient. For more information, see “Configuring your Network for SAP HANA”, found at SAP HANA on IBM Power Systems and IBM System Storage - Guides.
When small packages are sent, jumbo-frame-enabled landscapes and MTU=1500 landscapes do not show a difference in performance.
SAP does not use jumbo frames by default.
For 10 Gbps adapters in an environment that can use jumbo frames (that use an MTU of 9000), see “Configuring your Network for SAP HANA”, found at SAP HANA on IBM Power Systems and IBM System Storage - Guides. This information is also applicable to higher speeds but has not been verified.
Background
There are certain metrics that are described in this section that control the packaging of the network packets.
The MTU is the maximum size of a single data unit of digital communications that can be transmitted over a network. The MTU size is an inherent property of a physical network interface, and it is measured in bytes. The default MTU for an Ethernet frame is 1500. An MTU of 9000 is referred to as a jumbo frame. The maximum segment size (MSS) is the maximum data payload for a socket, and it is derived from the MTU. For a TCP session, each peer announces the MSS during the 3-way handshake.
The implementation of jumbo frames in with Platform Large Send Offload (PLSO) is the only way to reduce the impact of processing and CPU cycles when large packages are transmitted to achieve a throughput of more than 9 Gbps on a 10 Gb adapter, as verified by the SAP Hardware Configuration Check Tool (HWCCT) for SAP HANA multi-node.
One major prerequisite for implementing jumbo frames is that all network components across the whole chain from sender to receiver can handle the large MTU settings. Hosts or networks that have an MTU setting of 1500 can become unreachable after setting the MTU to 9000. If the infrastructure does not allow for MTU 9000, the MTU size must remain as the default value.
Setting only the MTU is not sufficient. Other techniques to improve network throughput and lower CPU utilization are large send offload (LSO) and large receive offload (LRO), which must be implemented.
For outbound communication, LSO aggregates multiple packets into a larger buffer to the network interface. The network interface then splits the aggregated packets into separate packets according to the MTU size. The server cannot send frames that are larger than the MTU that is supported by the network. When LSO is disabled, the OS is responsible for breaking up data into segments according to the MSS. With LSO enabled, the OS can bypass data segmentation and send larger data chunks directly to the adapter device.
LRO is the counterpart of LSO for inbound communication. Multiple incoming packets from a single stream are aggregated into a larger buffer before they are passed up the networking stack, thus reducing the number of packets that must be processed.
If the network adapter supports LSO and is a dedicated adapter for the LPAR, the LSO option is enabled by default. Especially for data streaming workloads (such as FTP), RCP, backup, and similar bulk data movement), LSO can improve performance on 10-Gigabit Ethernet and faster adapters.
If the default MTU size of 1500 is used, PLSO is still beneficial, but maximum throughput on a 10 Gb adapter can be expected to be 6 - 8 Gbps. Without PLSO, it goes down to 3 - 4.5 Gbps for a single LPAR.
For virtual Ethernet adapters and SEA devices, LSO is disabled by default because of interoperability problems with older OS releases. This issue must be addressed if LSO
is configured.
1.3.2 Latency optimization
If your focus is to reduce latency and omit virtualization, then SR-IOV port sharing is the preferred option. For more information, see the latest publications at the IBM Redbooks website because this feature is constantly evolving.
1.4 VIOS configuration for SAP network requirements
The sizing requirements for every VIOS deployment for SAP landscapes are as follows:
Use only two or four VIOSs, no more or less.
Start with a minimum of two dedicated cores for E-class servers with production workloads.
Implement VIOS Core utilization monitoring to prevent I/O bottlenecks by undersized VIOSs (stand-alone or by using saphostagent that is deployed on VIOS).
For FC virtualization, use only N_Port ID Virtualization (NPIV) because some infrastructure functions rely on it. Also, NPIV saves on adding more cores inside the VIOS and provides better latency.
Start with 16 GB of memory per VIOS:
 – NPIV uses a minimum of 128 MB of memory per virtual connection. This memory is used by the hypervisor. Add the calculated memory for NPIV.
 – Other requirements might apply.
NUMA placement and the locality of the VIOS to the adapter matters.
PLSO helps in all cases to reduce CPU, not only when using jumbo frames.
For the convenience of the sysadmin and to not lose any virtualization capabilities, install the VIOSs on internal solid-state drives (SSDs) or with POWER9 processor-based supported cards on Non-Volatile Memory Express (NVMe) (consider redundancy).
The investment into VIOS pays off well for your landscape:
As you share physical adapters with multiple VIO clients or LPARs, you need
fewer adapters.
You have increased flexibility because you can add an LPAR as needed at any time because no new hardware must be added.
Provides a faster response to changing circumstances.
Pooling physical adapters at the VIOS results in higher bandwidth than assigning exclusive adapters per client LPAR.
You have better DR and administration of resources.
Facilitates LPM, Simplified Remote Restart, and DR.
Reduces planned downtime for server administration to zero.
 

1 Each used logical port reserves a tiny portion of the adapter that cannot be used by others. This portion is noticeable only when configuring many LPARs, but put a workload only on a single one and try to get to line speed.
2 Since 2019, the HMC comes with the enhanced GUI by default.