In order to improve performance, please make sure the HW LRO is enabled.
Receive Queue Interrupt Moderation
An armed CQ will generate an event when either of the following conditions is met:
-
The number of completions generated since the one which triggered the last event generation reached a set in advance number
-
The timer has expired and an event is pending
The timer can be set to be restarted either upon event generation or upon completion generation.
Setting the timer to be restarted upon completion generation affects the interrupt receiving rate. When receiving a burst of incoming packets, the timer will not reach its limit, therefore, the interrupt rate will be associated to the size of the packets.
#> sysctl dev.mce.<N>.conf.rx_coalesce_mode=[0/1/2/3]
0: For timer restart upon event generation.
1: For timer restart upon completion generation.
2: For timer restart upon event generation where usecs and pkts values are adaptive/dynamic, depending on the traffic type and network usage.
3: For timer restart upon completion generation where usecs and pkts values are adaptive/dynamic, depending on the traffic type and network usage.
#> sysctl dev.mce.<N>.conf.rx_coalesce_pkts=<x>
#> sysctl dev.mce.<N>.conf.rx_coalesce_usecs=<x>
Note: The default values are:
-
dev.mce.1.conf.rx_coalesce_mode: 1 - Timer restarts upon completion generation
-
dev.mce.1.conf.rx_coalesce_pkts: 32 - 32 completions generate interrupts
-
dev.mce.1.conf.rx_coalesce_usecs: 3 - Timer count down 3 micro sec
Tuning for NUMA Architecture
Single NUMA Architecture
When using a server with single NUMA, no tuning is required. Also, make sure to avoid using core number 0 for interrupts and applications.
-
Find a CPU list:
#> sysctl -a | grep "group level=\"2\"" -A 1 <group level="2" cache-level="2"> <cpu count="12" mask="fff">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11</cpu> -
Tune the NICs to work on desirable coresFind the device that matches the interface: #> sysctl -a | grep mce | grep mlx dev.mce.<N>.conf.device_name: mlx5_core1 dev.mce.<N>.conf.device_name: mlx5_core0 Find the device interrupts. vmstat -ia | grep mlx5_core0 | awk '{print $1}' | sed s/irq// | sed s/:// 269 270 271 … Bind each interrupt to a desirable core. cpuset -x 269 -l 1 cpuset -x 270 -l 2 cpuset -x 271 -l 3 … Bind the application to the desirable core. cpuset -l 1-11 <app name> <sever flag> cpuset -l 1-11 <app name> <client flag> <IP>
Specifying a range of CPUs when using the cpuset command will allow the application to choose any of them. This is important for applications that execute on multiple threads.
The range argument is not supported for interrupt binding.
Dual NUMA Architecture
-
Find the CPU list closest to the NIC.Find the device that matches the interface: #> sysctl -a | grep mce | grep mlx dev.mce.3.conf.device_name: mlx5_core3 dev.mce.2.conf.device_name: mlx5_core2 dev.mce.1.conf.device_name: mlx5_core1 dev.mce.0.conf.device_name: mlx5_core0 Find the NIC's PCI location: #> sysctl -a | grep mlx5_core.0 | grep parent dev.mlx5_core.0.%parent: pci3 Usually, low PCI locations are closest to NUMA number 0, and high PCI locations are closest to NUMA number 1. Here is how to verify the locations:Find the NIC's pcib by PCI location: #> sysctl -a | grep pci.3.% parent dev.pci.3.%parent: pcib3 In "handle", PCI0 is the value for locations near NUMA0, and PCI1 is the value for locations near NUMA1.Find the cores list of the closest NUMA: #> sysctl -a | grep "group level=\"2\"" -A 1 <group level="2" cache-level="2"> <cpu count="12" mask="fff">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11</cpu> -- <group level="2" cache-level="2"> <cpu count="12" mask="fff000">12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23</cpu> Note: Each list of cores refers to a different NUMA.
-
Tune the NICs to work on desirable cores.
- Pin both interrupts and application processes to the relevant cores.
- Find the closest NUMA to the NIC
- Find the device interruptsvmstat -ia | grep mlx5_core0 | awk '{print $1}' | sed s/irq// | sed s/:// 304 305 306 …
-
Bind each interrupt to a core from the closest NUMA cores list.
Note: It is best to avoid core number 0.cpuset -x 304 -l 1 cpuset -x 305 -l 2 cpuset -x 306 -l 3 ... -
Bind the application to the closest NUMA cores list.
Note: It is best to avoid core number 0.cpuset -l 1-11 <app name> <sever flag> cpuset -l 1-11 <app name> <client flag> <IP>
-
For best performance, change CPU’s BIOS configuration to performance mode.
Due to FreeBSD internal card memory allocation mechanism on boot, it is preferred to insert the NIC to a NUMA-0 slot for max performance.
Last updated: