Plugin Release Notes
Changes and New Features
|
Plugin Version |
Feature |
|---|---|
|
1.25.1-3 |
N/A |
Bug Fixes
|
Plugin Version |
Bug Fix |
|---|---|
|
1.25.1-3 |
N/A |
Overview
The UFM Telemetry Manager (UTM) plugin partitions IB-fabric monitoring across multiple UFM Telemetry Instances (TIs) for high-scale clusters. UTM assigns fabric ports to TIs deterministically using consistent hashing, optionally with redundancy, and manages their lifecycle: health monitoring, port assignment updates, and targeted restarts on fabric topology changes.
Key capabilities:
-
Stable port distribution: each port is assigned to a specific TI by consistent hashing, so the port-to-TI mapping does not reshuffle on every TI restart.
-
Configurable redundancy: a port can be monitored by multiple TIs simultaneously (
port_redundancy_factor), so a TI failure causes zero monitoring gap on its ports. -
Targeted restart: when a topology change adds new ports, only the TIs that own the new ports are restarted; unaffected TIs keep collecting uninterrupted.
-
TI failure handling: failed TIs are kept in the active assignment during a grace period to absorb transient failures; if the TI does not recover, its ports are redistributed across the surviving TIs.
UTM runs two telemetry groups by default: primary (high-frequency port counters) and secondary (broader counter set, lower frequency). Each group independently covers 100% of the fabric. UFM controls how many instances are created in each via primary_count / secondary_count in gv.cfg [Telemetry]; see [UFM Clustered Telemetry].
Deployment
UTM is deployed as a UFM plugin. Two deployment paths are supported.
UFM Plugin Mode
The UTM plugin can be added either via the Command Line Interface or the Web UI.
CLI Deployment
To add the plugin:
/opt/ufm/scripts/manage_ufm_plugins.sh add -p utm
To remove the plugin:
/opt/ufm/scripts/manage_ufm_plugins.sh remove -p utm
Web-UI Deployment
-
Navigate to the UFM Web UI and click Settings in the left panel.
-
Open the Plugin Management tab.
-
Right-click on the UTM plugin row and select Add.
-
Open Telemetry Status in the left panel to access the UTM UI.
To stop the plugin: in Plugin Management, right-click the UTM row and select Disable.
Kubernetes Deployment
For deploying UTM in Kubernetes alongside UFM Enterprise, see the UFM Clustered Telemetry on Kubernetes section of [UFM Clustered Telemetry].
Configuration
The UTM configuration file utm_config.ini is located at /opt/ufm/files/conf/plugins/utm/utm_config.ini. UTM restarts its main process automatically when the file changes.
Key Tunables
The settings most operators tune. Anything not listed here ships with a sensible default and should not normally need to change.
|
Section |
Key |
Default |
Description |
|---|---|---|---|
|
|
|
|
Number of TIs each port is monitored by. Set to ≥2 to eliminate the monitoring gap on TI failure. Values larger than the number of live TIs are clamped at runtime; invalid values (≤0 or non-numeric) fall back to |
|
|
|
|
Seconds between fabric snapshot fetches from UFM. |
|
|
|
|
When |
|
|
|
|
Log verbosity ( |
|
|
|
|
TI URLs. Servers under |
|
|
|
|
Set to |
|
|
|
|
Port ranges for the primary and secondary groups in HA mode. |
Authentication
UTM authenticates to the UFM REST API using either token or username/password. Token is preferred where available; username/password is the fallback.
Token authentication (recommended):
Write the API token to a file and point ufm_token_file at it:
[ufm]
ufm_token_file = /config/ufm_token
If the file exists and is non-empty, UTM uses token auth automatically.
Username/password authentication (fallback):
For non-default UFM credentials:
[ufm]
ufm_user = <user>
ufm_pass = <password>
UTM falls back to username/password when no token file is configured.
GUI
The Telemetry Status page is accessible from the UFM Web UI sidebar under Telemetry Status, or directly at http://<utm-host>:8888/files/index.html.
The page contains:
-
Top pane: general info; controls to add a TI URL for monitoring; refresh-interval selector.
-
Group panes: one panel per telemetry group, showing every TI in the group with status and counters.
-
Bottom pane: system events with history navigation.
TI Status Fields
|
Field |
Description |
|---|---|
|
URL |
TI URL ( |
|
Group |
Telemetry group the TI belongs to (e.g. |
|
Mode |
|
|
Status |
|
|
Uptime |
TI uptime in human-readable format. |
|
Collected ports |
Ports successfully collected in the last sample (with |
|
Configured ports |
Ports configured to be sampled by this TI. |
|
Enabled / Discovered ports |
Enabled and discovered ports of the fabric (per UTM's view). |
|
Iteration time |
Total iteration time of the last data-collection cycle. |
TI Management Actions
Right-click a TI row to:
-
Pause: pause a running TI; its ports are redistributed to other TIs in the group.
-
Resume: resume a paused TI.
-
Exclude: pause and remove the TI from its group (the TI itself stays on the host). Empty groups are removed automatically.
REST API
All GUI features (TI management, monitoring, configuration) are accessible via REST.
Accessing the API
In UFM plugin mode (proxied through UFM):
curl -k -u <user>:<pass> https://<UFM_HOST>/ufmRest/plugin/utm/<COMMAND>
Direct (e.g. K8s pod, port-forward):
curl http://<UTM_HOST>:8888/<COMMAND>
In plugin mode UTM listens on plain HTTP on port 8888; HTTPS termination is handled by UFM's proxy.
Common Commands
The examples below use the direct form; substitute the proxied form for plugin mode.
# List all UTM endpoints
curl http://127.0.0.1:8888/help
# Status of monitored TIs
curl http://127.0.0.1:8888/status
# Add an externally-running TI to a monitoring group
curl 'http://127.0.0.1:8888/add_server?url=http://127.0.0.1:9001&group=primary'
# Pause / resume / remove a monitored TI
curl 'http://127.0.0.1:8888/pause_server?url=http://127.0.0.1:9001'
curl 'http://127.0.0.1:8888/start_server?url=http://127.0.0.1:9001'
curl 'http://127.0.0.1:8888/remove_server?url=http://127.0.0.1:9001'
# Spawn TIs by count, with automatic round-robin HCA allocation
curl -X POST 'http://127.0.0.1:8888/host/create_sessions?group=primary&count=2&sample_rate=30'
# Stop a TI by session id
curl 'http://127.0.0.1:8888/host/remove_telemetry?session_id=<id>'
POST /host/create_sessions response codes: 200 at least one session created, 400 invalid params, 409 group already has running instances, 500 all failed, 503 no HCA in Active+LinkUp state.
Last updated: