UFM Infra Rootless Deployment with Podman (Restricted to Oracle Linux Only)
Prerequisites
-
Download the UFM and plugins bundle tar file to
/tmp. -
Extract the contents using the command:
tar -xvf <bundle tar>
This archive (tar file) includes the following components:
-
Relevant UFM container image
-
Relevant FAST-API container image
-
Relevant Infra container image (for internal Redis usage). Refer to Redis-Related Configuration for more information.
-
Default plugin bundle for UFM
-
UFM-HA package
HA Installation Requirements
To enable the UFM Infra feature, UFM HA must be installed in a new mode (`external-storage`), using a new product (`enterprise-multinode`).
Additionally, NFS must be configured as follows:
NFS Setup Prerequisites
-
Select a dedicated NFS server to host the shared directories.
-
Create a shared directory on the NFS server for UFM configuration and logs.
-
Install the NFS client on each UFM node if not already present.
Enable HA ports in Firewall
If you have firewall rules that blocks non-standard ports, we need to open these ports so high availability services could communicate with each other on the HA nodes. To do so, run these commands:
firewall-cmd --permanent --add-service=high-availability
# or
firewall-cmd --add-service=high-availability
# and then reload the rules
firewall-cmd --reload
Create and Mount the UFM Directory
At this stage, apply point #2 (Mount the UFM directory) only on the master machine.
Other nodes will be visited for mount later.
-
Create the UFM directory:
mkdir -p /opt/ufm/files/
-
Mount the UFM directory:
-
If using NFS 4.2:
mount -t nfs4 -o context="system_u:object_r:container_file_t:s0" <server>:/shared_folder /opt/ufm/shared_files -
If using NFS 3:
mount -t nfs -o vers=3,context="system_u:object_r:container_file_t:s0" <server>:/shared_folder /opt/ufm/shared_files
-
-
Ensure the NFS version and mount options are compatible with the NFS server.
-
Verify that the following HA packages are installed:
pcs,pacemaker, andcorosync. Install them if they are missing. -
Follow the HA installation steps in Run the HA Installation.
Run the HA Installation
Follow the HA installation instructions at UFM High-Availability Installation and Configuration.
When running the HA installation script, use the following command:
./install.sh -p enterprise-multinode -l /opt/ufm/shared_files
-
The
-lflag must always point to the shared directory path:/opt/ufm/shared_files -
No need to provide the DRBD disk argument to the installation script.
Installation Instructions
-
Check firewall status:
systemctl status firewalld -
Configure Firewall (if active):
# check if firewalld is running systemctl status firewalld # Permanently add port 8443 to firewalld firewall-cmd --permanent --add-port=8443/tcp # reload firewalld config firewall-cmd --reload
-
Create UFM directory:
mkdir -p /opt/ufm
-
Create UFM group:
groupadd ufmadm -g 733
-
Create a UFM user:
useradd -d /opt/ufm -m -u 733 -g ufmadm ufmadm
-
Set directory ownership:
chown -R ufmadm:ufmadm /opt/ufm chown -R ufmadm:ufmadm /opt/ufm/shared_files
-
Configure SubUID and SubGID:
echo "ufmadm:100000:65536" >> /etc/subuid echo "ufmadm:100000:65536" >> /etc/subgid
-
Enable Login Linger for UFM ser:
loginctl enable-linger ufmadm
-
Configure Rootless Podam storage
sudo -u ufmadm mkdir -p /opt/ufm/.config/containers cat <<EOF | sudo -u ufmadm tee /opt/ufm/.config/containers/storage.conf > /dev/null [storage] driver = "overlay" runroot = "/run/user/733" EOF
-
Create Podman UFM socket:
cat <<EOF > /usr/lib/systemd/system/podman-ufm.socket [Unit] Description=Podman API Socket For Nvidia UFM [Socket] SocketUser=ufmadm SocketGroup=ufmadm ListenStream=%t/podman-ufm/podman-ufm.sock SocketMode=0660 [Install] WantedBy=sockets.target EOF
-
Create Podman UFM service
cat <<EOF > /usr/lib/systemd/system/podman-ufm.service [Unit] Description=Podman API Service for Nvidia UFM Requires=podman-ufm.socket After=podman-ufm.socket StartLimitIntervalSec=0 [Service] Delegate=true Type=exec User=ufmadm Group=ufmadm KillMode=process Environment=LOGGING="--log-level=info" ExecStart=/usr/bin/podman \$LOGGING system service LimitMEMLOCK=infinity [Install] WantedBy=default.target EOF
-
Create Podman cleanup service:
cat <<EOF > /usr/lib/systemd/system/podman-ufm-cleanup.service [Unit] Description=podman-ufm-cleanup - clean stuck rootless containers at boot After=podman-ufm.service Before=ufm-enterprise.service [Service] Type=oneshot User=ufmadm Group=ufmadm ExecStart=/usr/bin/podman system migrate [Install] WantedBy=multi-user.target EOF
-
Enable and start Podman services:
systemctl daemon-reload systemctl enable --now podman-ufm.socket systemctl enable --now podman-ufm.service systemctl enable --now podman-ufm-cleanup.service
-
Create Udev Rules for InfiniBand Devices
cat <<EOF > /etc/udev/rules.d/70-umad.rules KERNEL=="umad*", SUBSYSTEM=="infiniband_mad", MODE="0600", OWNER="ufmadm", GROUP="ufmadm" KERNEL=="issm*", SUBSYSTEM=="infiniband_mad", MODE="0600", OWNER="ufmadm", GROUP="ufmadm" EOF udevadm control --reload-rules udevadm trigger
-
Clean and create UFM directories
rm -rf /opt/ufm/systemd sudo -u ufmadm mkdir -p /opt/ufm/ufm_plugins_data sudo -u ufmadm mkdir -p /opt/ufm/systemd sudo -u ufmadm mkdir -p /opt/ufm/etc/apache2
-
Load UFM image and extract version:
# Extract UFM version from filename (e.g., ufm_6.22.0-7.ubuntu24.x86_64-docker.img.gz -> 6_22_0_7) UFM_VERSION=$(basename "$UFM_IMAGE_FILE" | sed 's/ufm_\([0-9][^.]*\.[^.]*\.[^.]*-[^.]*\)\.ubuntu.*/\1/' | tr '.-' '_') echo "UFM Version: $UFM_VERSION" # Load the UFM image sudo -u ufmadm podman load -i "$UFM_IMAGE_FILE"
-
Create version-specific directory and soft link:
# Create version-specific directory in shared storage sudo -u ufmadm mkdir -p /opt/ufm/shared_files/ufm-${UFM_VERSION} # Remove existing files link if it exists rm -f /opt/ufm/files # Create soft link to version-specific directory sudo -u ufmadm ln -s /opt/ufm/shared_files/ufm-${UFM_VERSION} /opt/ufm/files # Verify the soft link ls -la /opt/ufm/files -
Run UFM installer:
sudo -u ufmadm podman run -it --rm --name=ufm_installer \ -v /run/podman-ufm/podman-ufm.sock:/var/run/docker.sock \ -v /opt/ufm/:/installation/ufm_files/ \ -v /opt/ufm/files:/installation/ufm_files/files \ -v /opt/ufm/systemd:/etc/systemd_files/ \ mellanox/ufm-enterprise:latest \ --install \ --fabric-interface ib0 \ --rootless \ --plugin-path /opt/ufm/ufm_plugins_data \ --ufm-user ufmadm \ --ufm-group ufmadm \ --ufm-infra**Note**: Replace `ib0` with your actual InfiniBand interface name, if it is not the default ib0. **Note**: - All other UFM install flags are supported and can be added to the command.
-
Load Redis Image (if not using external Redis):
Load the given Redis image (in case you are not using external Redis) sudo - u ufmadm load -i "<PATH TO GIVEN REDIS IMAGE>"
-
Load Fast API Plugin image:
sudo - u ufmadm run --hostname $HOSTNAME --rm --name=ufm_plugin_mgmt --entrypoint="" \ -v /run/podman-ufm/podman-ufm.sock:/var/run/docker.sock \ -v /opt/ufm/files:/opt/ufm/shared_config_files \ -v /dev/log:/dev/log \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ -v /lib/modules:/lib/modules:ro \ -v /opt/ufm/ufm_plugins_data:/opt/ufm/ufm_plugins_data \ -e UFM_CONTEXT=ufm-infra \ mellanox/ufm-enterprise:latest \ /opt/ufm/scripts/manage_ufm_plugins.sh add -p fast_api -t ${FAST_API_VERSION} -c ufm-infra -
Install service files:
mv /opt/ufm/systemd/ufm-enterprise.service /etc/systemd/system/ufm-enterprise.service mv /opt/ufm/systemd/ufm-infra.service /etc/systemd/system/ufm-infra.service systemctl daemon-reload
To start UFM as a standalone instance, run:
systemctl daemon-reload
systemctl start ufm-infra
systemctl start ufm-enterprise
Running in HA Mode
Do not manually start any services.
-
Ensure UFM and UFM-HA are installed on all nodes as described in the above sections.
-
Mount /opt/ufm/files on all standby nodes as described point #2 (Mount the UFM directory)
-
On one node, edit the HA configuration file:
/etc/ufm_ha/ha_nodes.cfgFill each node parameters
[Node.1] # valid role options: master/standby role = master # Mandatory primary_ip = # Mandatory if dual_link = true secondary_ip = [Node.2] role = standby primary_ip = secondary_ip = [Node.3] role = standby primary_ip = secondary_ip = -
Ensure the file sync mode is set to
external-storage, and that the shared file system is mounted prior to HA configuration.[FileSync] # valid options are: drbd/external-storage # in case of external-storage the user MUST mount the files system PRIOR to ha configuration mode = external-storage -
Copy the edited file to all nodes at the same path.
-
Configure the cluster, starting from standby nodes and ending with the master node:
ufm_ha_cluster config -p <password>
Use the same password on all nodes.
-
After finishing the configuration on all nodes, run:
ufm_ha_cluster status
-
Start the cluster:
ufm_ha_cluster start
-
Check cluster status again to ensure all services have started successfully.
Last updated: