NVIDIA UFM Enterprise User Manual

Events and Alarms

UFM offers comprehensive diagnostics for your InfiniBand fabric, covering a range of categories:

  1. Fabric configurations

  2. Fabric topology

  3. Hardware issues

  4. Communication errors

  5. Maintenance

  6. Security

  7. Switch module status

  8. NVIDIA SHARP notifications

Events are notifications generated by UFM, indicating issues within the mentioned categories in the InfiniBand fabric. On the other hand, alerts are urgent notifications derived from events (many events can be configured as alarms based on customer preferences).

These detections are performed both before running applications and during standard operation. They help troubleshoot and notify network administrators of potential network issues before they escalate.

Events can originate from various sources:

  • SM traps

  • SHARP AM traps

  • UFM internal analysis, encompassing:Internal detection of topology changesInternal fabric analysis (based on IBDiagnet)Internal monitoring of managed switchesMaintenance activities (device action tracking, licensing, cable integrity)

  • Threshold-crossing events determined by telemetry counter readings


WebUI

REST API

Events

UFM events can be viewed via the Events and Alarms WebUI view. Refer to 

Events & Alarms

Events REST API

For device-specific events, refer to the Events

N/A

Configuration of events is managed within the Events Policy Tab in the Settings window

Events Policy REST API

Alarms


UFM alarms can be viewed via the Events and Alarms WebUI view. Refer to Events & Alarms

 Alarms REST API

Configuration of alarms is managed within the Events Policy in the Settings window

N/A

For showing all the UFM-supported events, refer to Threshold-Crossing Events Reference.

Last updated: