DOCA SDK Documentation

SR-IOV

Single Root I/O Virtualization (SR-IOV) enables a single physical PCIe device to expose multiple virtual instances on the PCIe bus. Each instance, known as a virtual function (VF), acts as an independent PCIe device while sharing the physical function (PF)'s resources.

NVIDIA® ConnectX® adapters support up to 127 VFs per port, each of which can be provisioned and managed independently. SR-IOV is typically used with an SR-IOV-enabled hypervisor to provide virtual machines with direct hardware access to network interfaces, improving throughput and reducing CPU overhead.

This section describes how to configure SR-IOV in a Red Hat Enterprise Linux (RHEL) environment using ConnectX VPI adapters.

System Requirements

To configure and use SR-IOV, ensure the following prerequisites are met:

  • Installed MLNX_OFED driver

  • A server or blade with an SR-IOV-capable BIOS

  • A hypervisor that supports SR-IOV (for example, Red Hat Enterprise Linux Server 6 or later)

  • An ConnectX VPI adapter supporting SR-IOV

BIOS and Kernel Setup

The figures used in this section are for illustration purposes only. For further information, refer to your BIOS User Manual.

  1. Enable "SR-IOV" in the system BIOS.
    worddavb2ee67a7eb9aae5c536610e39a37dcc5.png

  2. Enable "Intel Virtualization Technology" (VT-d).
    worddav6931c32564b3b0c166f4a26788219144.png

  3. Install a hypervisor that supports SR-IOV.

  4. Update the GRUB configuration to enable IOMMU: 
    Example for Intel systems (/boot/grub/grub.conf):

    default=0
    timeout=5
    splashimage=(hd0,0)/grub/splash.xpm.gz
    hiddenmenu
    title Red Hat Enterprise Linux Server (4.x.x)
            root (hd0,0)
            kernel /vmlinuz-4.x.x ro root=/dev/VolGroup00/LogVol00 rhgb quiet 
            intel_iommu=on        initrd /initrd-4.x.x.img
    

    Ensure the parameter intel_iommu=on is present. On newer systems using /boot/grub2/grub.cfg, add the parameter to the line starting with linux16.

Configuring SR-IOV (Ethernet)

For configuration details, refer to the community guide HowTo Configure SR-IOV for ConnectX-4/ConnectX- 5/ConnectX-6 with KVM (Ethernet).

Configuring SR-IOV (InfiniBand)

  1. Install MLNX_OFED for Linux with SR-IOV support.

  2. Verify SR-IOV enablement in the firmware:

    mlxconfig -d /dev/mst/mt4115_pciconf0 q
    

    Example output: 

    SRIOV_EN              1
    NUM_OF_VFS            8
    

    To modify these settings, if needed: 

    mlxconfig -d /dev/mst/mt4115_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=16
    
  3. Reboot the server.

  4. Create VFs. Depending on your kernel version, use one of the following sysfs files:Standard (for newer kernels):  echo <num_vfs> > /sys/class/infiniband/mlx5_0/device/sriov_numvfs Legacy (for older kernels):  echo <num_vfs> > /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs The sriov_numvfs file is only present if intel_iommu=on was set in GRUB.Rules:You can change the number of VFs only when none are assigned.If VFs are assigned to VMs, the count cannot be changed.Unloading the PF driver removes SR-IOV only if no VFs are assigned.When the PF driver is reloaded, assigned VFs become operational again (the VF driver may need to be restarted).

  5. Verify VF creation.

    lspci | grep Mellanox
    

    Example output: 

    08:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
    08:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
    08:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
    08:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
    08:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
    08:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
    
  6. Configure each VF. Sysfs entries are available under /sys/class/infiniband/mlx5_<PF_INDEX>/device/sriov/. Example output: 

    sriov/
    ├── 0/
    │   ├── node
    │   ├── port
    │   └── policy
    ├── 1/
    │   ├── node
    │   ├── port
    │   └── policy
    └── 2/
        ├── node
        ├── port
        └── policy
    
    • Node GUID: 

      echo 00:11:22:33:44:55:1:0 > /sys/class/infiniband/mlx5_0/device/sriov/0/node

    • Port GUID: 

      echo 00:11:22:33:44:55:2:0 > /sys/class/infiniband/mlx5_0/device/sriov/0/port

    • Policy (/sys/class/infiniband/<PF>/device/sriov/<index>/policy) – Defines VF port behavior. Options: ValueDescriptionDownPort state remains downUpSets port to Initialize, allowing the SM to bring it upFollowMirrors the physical port's state

      By default, all VF policies initialize as Down, except VPort0, which defaults to Follow.

  7. Enable virtualization in OpenSM by adding the following to /etc/opensm/opensm.conf

    virt_enabled 2

    OpenSM and related InfiniBand tools (e.g., iblinkinfo, ibqueryerr) must run on the PF, not the VF. In multi-PF configurations, OpenSM should run on host0.

VF Initialization and Binding

Because the same mlx5_core driver handles both PFs and VFs, the PF driver attempts to initialize all VFs by default.
To assign a VF to a virtual machine, unbind it from the PF driver first:

  1. Identify the VF PCIe address:

    lspci -D
    

    Example: 

    0000:09:00.2
    
  2. Unbind from PF driver:

    echo 0000:09:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind
    
  3. Bind again (if needed):

    echo 0000:09:00.2 > /sys/bus/pci/drivers/mlx5_core/bind
    

PCIe BDF Mapping of PFs and VFs

PCIe addresses are sequential across PFs and VFs.

For example, if the card's PCIe slot is 05:00 and it has two ports:

Function

PCIe BDF Range

Description

PF0

05:00.0

PF for port 0

PF1

05:00.1

PF for port 1

VFs for PF0

05:00.2–05:00.4

VFs 0–2 for PF0 (mlx5_0)

VFs for PF1

05:00.5–05:00.7

VFs 0–2 for PF1 (mlx5_1)

Additional SR-IOV Configurations

Assigning VF to Virtual Machine

This section describes how to attach an SR-IOV VF to a VM on a Red Hat KVM host using virt-manager (RHEL/KVM).

  1. Run the virt-manager.

  2. Double-click the VM and open its Properties.

  3. Go to Details → Add Hardware → PCI Host Device.
    image2019-3-8_12-50-6.png

  4. Select the NVIDIA VF by its PCIe address (e.g., 00:03.1).

  5. Reboot the VM if it's running; otherwise, start it.

  6. Inside the guest, verify the device is present:

    lspci | grep Mellanox
    

    Example: 

    01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    
  7. (Optional) Configure the guest interface (e.g., via /etc/sysconfig/network-scripts/ifcfg-ethX). 

    VF MACs are randomly assigned by default; you don’t need to set one unless you require a stable MAC.

Ethernet VF Configuration (Host)

You can configure VFs via iproute2 (preferred) or sysfs.

  • Using ip (preferred)

    ip link set { dev <PF_DEVICE> | group <DEVGROUP> } [ up | down ] \
      vf <NUM> [ mac <LLADDR> ] [ vlan <VLANID> [ qos <VLAN-QOS> ] ] \
      [ spoofchk { on | off } ] \
      [ state { enable | disable | auto } ]

  • Using sysfs (example layout, ConnectX-4)

    /sys/class/net/<PF>/device/sriov/<VF>/
    ├── config
    ├── link_state
    ├── mac
    ├── mac_list
    ├── max_tx_rate
    ├── min_tx_rate
    ├── spoofcheck
    ├── stats
    ├── trunk
    └── trust


VLAN Modes: VGT vs VST

  • VGT (VLAN Guest Tagging) – Guest tags/untags its own traffic. (Default)

  • VST (VLAN Switch Tagging) – Hypervisor enforces a VLAN/QoS for the VF; outgoing untagged/priority-tagged traffic is tagged by the hypervisor; incoming VLAN tags are stripped.

Configure VST:

ip link set dev <PF_DEVICE> vf <NUM> vlan <VLAN_ID> [qos <QOS>]
# Example:
ip link set dev eth2 vf 2 vlan 10 qos 3   # enable VST with VLAN 10, QoS 3
ip link set dev eth2 vf 2 vlan 0          # revert to VGT

Additional Ethernet VF Options

  • Guest MAC (set a stable MAC before the guest driver loads):

    ip link set dev <PF_DEVICE> vf <NUM> mac <LLADDR>

    For legacy/ConnectX-4 guests (no random MAC), always configure via ip link.

  • Spoof checking (kernel ≥ 3.1):

    ip link set dev <PF_DEVICE> vf <NUM> spoofchk [on | off]
    
  • Guest link state:

    ip link set dev <PF_DEVICE> vf <UM> state [enable| disable| auto]
    

VF Statistics (sysfs)

Virtual function statistics can be queried via sysfs:

cat /sys/class/infiniband/mlx5_2/device/sriov/2/stats
tx_packets : 5011
tx_bytes   : 4450870
tx_dropped : 0
rx_packets : 5003
rx_bytes   : 4450222
rx_broadcast : 0
rx_multicast : 0
tx_broadcast : 0
tx_multicast : 8
rx_dropped : 0

Mapping VFs to Ports

Use ip link (v2.6.34~3+):

ip link

Example (excerpt):

61: p1p1: ...
    vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 38 MAC ff:ff:ff:ff:ff:ff, vlan 65535, spoof checking off, link-state disable

A MAC of ff:ff:ff:ff:ff:ff indicates the VF is not assigned to this net device's port. 

You can still configure such VFs from this PF; changes apply to the VF’s actual port owner.

RoCE Support

RoCE is supported on VFs and can be used with VLANs. The hypervisor GID table has 16 entries; the remaining 112 entries are shared across VFs. With >56 VFs, some may have only a single GID entry, which is insufficient if a VF’s Ethernet interface is assigned an IP. Plan VF counts accordingly.

VGT+ (Virtual Guest Tagging Plus)

VGT+ lets a VF tag its own packets while enforcing an administrative VLAN trunk policy that defines which VLANs are allowed.

  • No default VLAN is defined by VGT+.

  • Outgoing packets are forwarded only if they match allowed VLANs.

  • Incoming packets are delivered to the VF only if allowed by policy. 

    In SR-IOV, the default operating mode is VGT.

Enable VGT+ (set allowed VLAN ranges):

# Enable VLAN range(s) on VF 0 of PF eth5:
echo "add <start_vid> <end_vid>" > /sys/class/net/eth5/device/sriov/0/trunk

# Examples:
echo "add 4 15"  > /sys/class/net/eth5/device/sriov/0/trunk
echo "add 17 17" > /sys/class/net/eth5/device/sriov/0/trunk

# VLAN 0 means untagged and priority-tagged traffic is allowed.
# Disable VGT+ (remove all VLANs):
echo "rem 0 4095" > /sys/class/net/eth5/device/sriov/0/trunk
# Remove a specific range/ID:
echo "rem 4 15"   > /sys/class/net/eth5/device/sriov/0/trunk
echo "rem 17 17"  > /sys/class/net/eth5/device/sriov/0/trunk

SR-IOV Advanced Security

MAC Anti-Spoofing

Prevents a VF from sending frames with a MAC different from the one assigned by the admin. Disabled by default.

  • Using ip (kernel ≥ 3.10):

    ip link set ens785f1 vf 0 spoofchk on   # enable
    ip link set ens785f1 vf 0 spoofchk off  # disable

  • Using sysfs: 

    echo "ON"  > /sys/class/net/ens785f1/device/sriov/0/spoofcheck
    echo "OFF" > /sys/class/net/ens785f1/device/sriov/0/spoofcheck

This setting is non-persistent across driver restarts.


Rate Limit per VF

See HowTo Configure Rate Limit per VF for ConnectX-4/ConnectX-5/ConnectX-6 Community post. Per-VF files (e.g., /sys/class/net/<ifname>/device/sriov/<vf_num>/max_tx_rate) still apply.

Rate Limit per Group of VFs

Group VFs and apply a group rate limit; effective VF limit is the min of the VF's own limit and the group’s available bandwidth share.

# Enable VLAN range(s) on VF 0 of PF eth5:
echo "add <start_vid> <end_vid>" > /sys/class/net/eth5/device/sriov/0/trunk

# Examples:
echo "add 4 15"  > /sys/class/net/eth5/device/sriov/0/trunk
echo "add 17 17" > /sys/class/net/eth5/device/sriov/0/trunk

# VLAN 0 means untagged and priority-tagged traffic is allowed.
# Disable VGT+ (remove all VLANs):
echo "rem 0 4095" > /sys/class/net/eth5/device/sriov/0/trunk
# Remove a specific range/ID:
echo "rem 4 15"   > /sys/class/net/eth5/device/sriov/0/trunk
echo "rem 17 17"  > /sys/class/net/eth5/device/sriov/0/trunk

Configuration outline:

  1. When supported, the driver exposes /sys/class/net/<ifname>/device/sriov/groups/.

  2. All VFs start in group 0.

  3. Move a VF to a group:

    echo 7 > /sys/class/net/<ifname>/device/sriov/5/group
    
  4. Set group max rate:

    echo 5000 > /sys/class/net/<ifname>/device/sriov/groups/7/max_tx_rate

  5. Inspect VF/group:VF stats include group ID: cat /sys/class/net/<ifname>/device/sriov/<vf_num>/stats Group config shows current rate limit and member count: cat /sys/class/net/<ifname>/device/sriov/groups/<group_id>/config

Bandwidth Guarantee per Group of VFs

Guarantee a minimum transmit rate per group; ensure the sum of group minimums ≤ line rate.

Example (40 Gb/s link):

echo 20000 > /sys/class/net/<ifname>/device/sriov/group/1/min_tx_rate
echo 5000 > /sys/class/net/<ifname>/device/sriov/group/2/min_tx_rate
echo 15000 > /sys/class/net/<ifname>/device/sriov/group/3/min_tx_rate


  • Group 1: 20 Gb/s

  • Group 2: 5 Gb/s

  • Group 3: 15 Gb/s

  • Groups with 0 have no guarantee.

You can still set per-VF min rates to split a group’s guarantee among member VFs (sum should not exceed the group minimum).

Privileged VFs

Trusted VFs can receive a limited set of PF-like privileges (e.g., entering promiscuous mode).

  • Using ip (kernel ≥ 4.5):

    ip link set ens785f1 vf 0 trust on
    ip link set ens785f1 vf 0 trust off

  • Using sysfs:

    echo "ON"  > /sys/class/net/ens785f1/device/sriov/0/trust
    echo "OFF" > /sys/class/net/ens785f1/device/sriov/0/trust

Probed VFs

Probing VFs consumes resources. Disable probing if you don’t need to monitor VMs:

  • Kernel ≥ 4.12 (preferred) – use sriov_drivers_autoprobe (PCIe sysfs).

  • Older kernels – use mlx5_core module param probe_vf:

    echo 0 > /sys/module/mlx5_core/parameters/probe_vf
    

For more information on how to probe VFs, see HowTo Configure and Probe VFs on mlx5 Drivers Community post.

VF Promiscuous and All-Multicast Modes 

Only trusted VFs can enable these modes.

  • Promiscuous Mode (receive unmatched and all multicast traffic):

    ifconfig eth2 promisc     # enable
    ifconfig eth2 -promisc    # disable

  • All-Multicast Mode (receive all multicast on the port):

    ifconfig eth2 allmulti    # enable
    ifconfig eth2 -allmulti   # disable

Uninstalling the SR-IOV Driver

  1. Detach all VFs from VMs or stop the VMs that use VFs. 

    Stopping the driver while VMs are using VFs may hang the host.

  2. Run the uninstall script:

    /usr/sbin/ofed_uninstall.sh
    Follow the prompts. Example output (truncated): 
    This program will uninstall all OFED packages on your machine.
    Do you want to continue? [y/N]: y
    ...

  3. Reboot the server.

Last updated: