NVIDIA UFM Enterprise Appliance Software User Manual

Appendix - UFM Clustered Telemetry

UFM Clustered Telemetry is an advanced feature that enables distributed telemetry data collection across multiple network adapters (HCAs) in your InfiniBand fabric. This feature provides improved performance and scalability for large-scale deployments through workload distribution.

Key Benefits

  • Better Performance: Workload distribution across multiple instances reduces collection bottlenecks

  • HCA Utilization: Leverages multiple network adapters for parallel data collection

  • Scalability: Handles larger fabric deployments more efficiently

  • Flexibility: Customizable instance distribution based on your infrastructure

Configuring Cluster Telemetry in Multi-Node Mode

To set up cluster telemetry in multi-node mode on both the master and standby nodes, execute the following CLI commands:

1. Configure HA (Active-Active) on Both Nodes

Run the appropriate command on each machine:

On the Standby (slave) Node: 

# ufm ha configure standby 3.3.3.2 3.3.3.1 10.236.17.102 10.236.17.101 10.236.17.103 123456 multi-node

On the Master Node:

# ufm ha configure master 3.3.3.1 3.3.3.2 10.236.17.101 10.236.17.102 10.236.17.103 123456 multi-node

2. Enable Infrastructure Mode (Run on Both Nodes)

# ufm infra-mode --enable

3. Enable Cluster Telemetry Mode (Master Node Only)

# ufm telemetry utm-mode --enable

4. Start UFM (Run on Both Nodes)

# ufm start

For more information, refer to Appendix - UFM Clustered Telemetry.

Last updated: