NVIDIA UFM High-Availability User Guide

Installation and Configuration

Installation

The UFM HA package can be downloaded by running the following command:

wget http://www.mellanox.com/downloads/UFM/ufm_ha_6.2.1-5.tgz


The UFM HA package should be installed on both machines (Master and Standby) and the required UFM products (Installation order does not matter). 

To install the UFM-HA package:

  • Untar the ufm-ha package: 

    tar xvzf ufm-ha-<version>.tgz
    


  • Go to the directory you extracted and run the installation script. For example: 

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise
    

    For NFS support, run the following installation script. For example:

    ./install.sh -l /opt/ufm/files/ -p enterprise 
    


    Option

    Description

    -l

    Sync Files Location. Must be always /opt/ufm/files/

    -d

    Disk name for DRBD. For example /dev/sda5 (in case of using DRBD). Note that the `-d` option is not needed in case of NFS.

    -p

    Product Name. Must use “enterprise” to UFM Enterprise


In cases where you have a previous installation of ufm_ha and you want to upgrade to the newer version, run the following command: 

./install.sh -u



UFM HA scripts are installed under /usr/bin.

Configuration

There are two methods to configure the HA cluster, depending on how the configuration procedure is orchestrated:

Manual (per-node) configuration
The user runs the configuration steps separately on each node (run first on standby node).
This method does not require SSH trust between the nodes.

Orchestrated (cluster-wide) configuration
The ufm_ha_cluster procedure runs once on the master node and orchestrates the configuration remotely across all nodes.
This method requires passwordless SSH trust between the nodes.

Configure HA Manual (Per-Node) Mode 

Use this method when you prefer to control the configuration process on each node individually, or when SSH trust cannot be established between the HA servers.

In this mode, the user is responsible for running the configuration commands separately on each node using ufm_ha_cluster config.

You can view all available configuration options in the Help menu:

ufm_ha_cluster config -h  

Usage: 

ufm_ha_cluster config [<options>] 


To configure the HA cluster in per-node mode, you must configure all standby nodes first and the master node last.

This order is required to ensure the cluster is correctly created on the master node; if the master is configured first, synchronization with standby nodes may fail.


Example for a 2-node cluster:

On the first node (Standby):

ufm_ha_cluster config -r standby 
                      -e <peer primary IP> 
                      -l <local primary IP> 
                      -E <peer secondary IP> 
                      -L <local secondary IP> 
                      -p <cluster_password> [...]

On the second node (Master): 

ufm_ha_cluster config -r master 
                      -e <peer primary IP> 
                      -l <local primary IP> 
                      -E <peer secondary IP> 
                      -L <local secondary IP> 
                      -p -i <virtual-ip> [...]


All available configuration options for the ufm_ha_cluster config command are listed in the table below:

Option

Description

-r

--role <node role>

Node role (master or standby)

-e

 --peer-primary-ip <ip address>

Peer node primary IP address (mandatory)

-l

--local-primary-ip <ip address>

Local node primary IP address (mandatory)

-E

--peer-secondary-ip <ip address>

Peer node secondary IP address (mandatory)

-L

--local-secondary-ip <ip address>

Local node primary IP address (mandatory)

-i

--virtual-ip <virtual-ip> 

--virtual-ip4 <virtual-ip>

--virtual-ip6 <virtual-ip>

Cluster virtual IP (auto-detects IPv4/IPv6)

IPv4 virtual IP (Deprecated Use --virtual-ip)

IPv6 virtual IP (Deprecated Use --virtual-ip)


--vip-interface <interface>  

Network interface for VIP (e.g., lo for BGP).

-N

--no-vip

Configure HA without virtual IP

-M

--ignore-mgmt-failure

Ignore management interface status if VIP is configured.
Will not failover if master node's secondary IP is down.


--file-sync-mode <mode> 

File sync mode: drbd or external-storage


--drbd-data-mode <mode>

DRBD data mode: ordered or journal 

-D

--drbd-dual-primary

Enable DRBD dual-primary mode for UFM active-active

(default: single-primary)


 --enable-multinode    

Add UFM Infra services (ufm-redis-mgr, ufm-infra). Use this option for UFM Infra setup 


--enable-single-link

Enable single network interface mode. 


--ha-params-file

Path to ha_nodes.cfg configuration file

(default: /etc/ufm_ha/ha_nodes.cfg).

-p

--hacluster-pwd <pwd>

HA cluster user password. Must be at least 8 characters long.


--configure-all-nodes

Configure all cluster nodes (via SSH) before configuring master. 
Run only from master node.
Requires SSH trust.

-h

--help 

Show this message 


Modify the configuration command options to match your specific setup (network, storage, etc.).


To ensure effective HA sync interface functionality for PCS version 0.9.X, employing back-to-back ports with local IP addresses, it is crucial to incorporate the relevant IP addresses and hostnames into the /etc/hosts file. This step is necessary to enable the HA configuration to accurately resolve hostnames based on the specific IP addresses in use.


While configuring UFM HA on Oracle Linux, make sure the SELinux is disabled. You can check SELinux status with sestatus.
If it is enabled, follow the below steps to disable it:

  • Run vi /etc/selinux/config 

  • Add SELINUX=disabled 

  • Reboot the machine 

  • Verify SELinux is disabled with the command sestatus.

Configure HA Orchestrated Configuration 

Use this method when SSH trust is established between the cluster nodes.

In this mode, a single ufm_ha_cluster config command is executed on the master node, and the tool orchestrates the configuration process remotely across all cluster nodes.

To configure all nodes, add the --configure-all-nodes option to the ufm_ha_cluster config command, as shown in the example below.

ufm_ha_cluster -r master --configure-all-nodes <...>


The configure-all-nodes  option requires  SSH connection to the standby server. If SSH trust is not configured, then you are prompted to enter the SSH password of the standby server during configuration runtime


DRBD Configuration  

The DRBD is used for syncing File System between the two nodes. DRBD is the default sync method, unless stated otherwise in the configuration file. (see section: "Using File Configuration" below). The DRBD disk that was assigned during installation phase will be mounted as a File System directory, the default option of this mount is "data=ordered", however, it can be override in the configuration file in the "DRBD" section, in order to set the data option to "journal" which offers the highest level of data integrity, but it can impact write performance.

NFS File Sharing

NFS synchronization mechanism can be used instead of DRBD. Multi-Nodes Support can be used with NFS synchronization mechanism only, as described in the following section. To activate this functionality, users must define the following parameters:

  • Mode: NFS

  • NFS Server

  • Shared Folder

Ensure that the NFS version supports nfs4. It is recommended that the NFS server is not one of the UFM-HA nodes. 

Refer to the Using File Configuration section below for details on configuring HA with NFS. 

 HA Configuration for UFM Infra (Active-Active) 

UFM-HA active-active mode manages additional services required for UFM active-active deployments (UFM-Infra) and provides additional storage options for this setup.

Multinode Option

The --enable-multinode option enables UFM infra services (ufm-redis-mgr, ufm-infra).


ufm_ha_cluster config ... --enable-multinode

Or set in /etc/ufm_ha/ha_nodes.cfg:

[General] enable_multinode = true


Storage Options

Option

Use Case

External Storage (NFS)

External NFS server available

DRBD Dual-Primary

No external storage server required (Ubuntu 24.04+)


External Storage (NFS)

CLI:

ufm_ha_cluster config --role <standby|master> \
                      -l <local-ip> -e <peer-ip> \
                      --file-sync-mode external-storage \
                      --enable-multinode

Or set in /etc/ufm_ha/ha_nodes.cfg:[FileSync] mode = external-storage Note: NFS mount must be configured separately on both nodes before running HA config.


DRBD Dual-Primary

Prerequisites: Ubuntu 24.04+, ocfs2-tools, port 7777 open.

CLI:

ufm_ha_cluster config --role <standby|master> \
                      -l <local-ip> -e <peer-ip> \
                      --drbd-dual-primary \
                      --enable-multinode

Multi-Nodes Support

The UFM-HA cluster can comprise of more than two nodes. Among these nodes, one will serve as the master, while the others will operate in standby mode.

To configure multiple nodes, users must populate the configuration file '/etc/ufm_ha/ha_nodes.cfg' on all nodes (ensuring that the file is identical across all nodes).

This file contains details about each participating node, including:

  • Role: Master/Standby

  • Primary IP address

  • Secondary IP address

Using File Configuration

The '/etc/ufm_ha/ha_nodes.cfg' file contains all the necessary information for HA configuration and can serve as a replacement for command-line configuration.

The only configuration not saved in the file is the password for security reasons.

To configure, use the following command (should be executed after setting the configuration): 

ufm_ha_cluster config –p <password>


The standby nodes must be configured at first, with the last node being set as the master node.

Configuration File

The sample configuration file includes up to three sections for nodes, but users can add additional sections as needed. 

[General]
# Connection mode 
# in case dual_link is true, each node must have primary and secondary IPs
dual_link = true

# enable ufm-infra add-on; default is false
enable_multinode = false

# automatic failure cleanup interval (in hours)
# will perform cleanup of failures to enable automatic failover
automatic_failure_cleanup_interval = 24

[Node.1]
role = master
primary_ip =
secondary_ip =

[Node.2]
role = standby
primary_ip =
secondary_ip =

# Add other Node.x sections if needed.

[Virtual]
# If virtual IP should not be added, set `no_vip = true`
no_vip =

virtual_ip =

ignore_mgmt_failure = false
# when using BGP virtual IP, you must use the loopback interface, set `interface = lo`
# in other cases we let the pcs to decide on the relevant network interface.
interface =

[FileSync]
# valid options are: drbd/external-storage
# in case of external-storage the user MUST mount the files system PRIOR to ha configuration
mode = drbd

[DRBD]
# fill in case the FileSync.mode is drbd
# drbd data mode. options are: ordered/journal (default is ordered)
# data=journal offers the highest level of data integrity,
# but it can impact write performance.
# primary_mode = single/dual (default is single)
data = ordered
primary_mode = single

UFM HA Cluster Operations

Show UFM HA version

Run the following command to show UFM HA version: 

ufm_ha_cluster version

Starting UFM HA Cluster

Before starting the UFM cluster, ensure that the DRBD sync is completed.

To start UFM HA cluster:

 ufm_ha_cluster start

Checking UFM Cluster Status

To check UFM HA cluster status: 

ufm_ha_cluster status 

Stopping UFM HA Cluster

To stop UFM HA cluster: 

ufm_ha_cluster stop 

Takeover Services

The takeover command can be executed on the standby machine so that it will be the master.

ufm_ha_cluster takeover

Master Failover

The failover command can be executed on the master machine so that it will be the standby.

ufm_ha_cluster failover

Automatic cleanup of failed actions

When an action failed in one of the HA nodes, for example DRBD failure, service failure or any other HA resources failure, the failed node will no longer be a candidate of automatic failover until these failed actions are cleaned up. To manually cleanup failed action, the user can run the following command:

pcs resource cleanup

The UFM-HA performs automatic cleanup of failed actions every 24 hours. This period is configurable and can be changed in the General section in the ha_nodes.cfg  configuration file. See section "Configuration File" above.

Replacing the Standby Node

  • Install the HA package for the new node (standby).

  • Disconnect the standby node (the old standby) and run the following command on the master node:

    ufm_ha_cluster detach
    


  • Config the new standby node; please refer to Installation and Configuration | id (6.2.1)InstallationandConfiguration Configuration.

  • Connect the new standby to the cluster by running the command on the master node:

    ufm_ha_cluster attach -l <local primary ip address> -e <peer primary ip address> -E <peer secondary ip address> -p <cluster_password> 
    


Uninstalling UFM HA 

To uninstall UFM HA, first stop the cluster and then run the uninstallation command as follows:  

/opt/ufm/ufm_ha/uninstall_ha.sh 


Last updated: