NVIDIA UFM Enterprise User Manual

System Backup and Restore

Full system backup creates a complete snapshot of your UFM system including Docker images and configurations. This enables complete disaster recovery and version downgrade capability.

The backup includes:

  • UFM enterprise Docker image

  • All plugin Docker images

  • UFM configuration files

  • Plugin configurations

  • Install arguments

  • UFM version and plugin versions

Preserved During Restore (Not Replaced):

  • PKEY configurations (/opt/ufm/files/conf/opensm/partitions.conf)

  • Unhealthy ports configuration (/opt/ufm/files/conf/opensm/opensm-health-policy.conf)

  • Existing backups (/opt/ufm/files/backup/opt/ufm/backup)

Only a single full system backup is kept at any time; creating a new backup overwrites the existing one.

Storage Location Considerations

Standalone (SA):

  • Full backup stored at /opt/ufm/backup/downgrade/1/

  • Accessible to the local system only

High Availability (HA):

  • Full backup stored at /opt/ufm/backup/downgrade/1/ on the master node's local storage

  • NOT stored on DRBD shared storage (unlike configuration snapshots)

  • Important: If HA failover occurs, the new master will NOT have access to backups created on the old master

  • User Action Required: After failover, if you need to restore a backup created on the old master, you must manually copy /opt/ufm/backup/downgrade/ from the old master to the new master node

Create Full System Backup

During backup, the tool performs schema validation on some of the config files; if this step fails, the backup operation stops with a proper error.


Basic Backup

Create a full system backup. Run:

ufm_versions_mgr backup


Backup with Options

Create a backup with a label. Run: 

ufm_versions_mgr backup --label "Before production deployment"


Create backup in custom location. Run:

ufm_versions_mgr backup --backup-dir /mnt/external/ufm-backup


Create backup with more workers for faster operation. Run:

ufm_versions_mgr backup --max-workers 10


Command Options

Option

Description

--label TEXT

Optional description for backup

--backup-dir PATH

Custom backup location (default: /opt/ufm/backup/downgrade/)

--max-workers N

Number of parallel workers (4-10, default: 8)

--list

List existing backup

--dry-run

Preview operation

--verbose

Enable detailed output

Backup List 

List existing full system backup. Run: 

ufm_versions_mgr backup --list


Preview Backup (Dry-Run)

Preview backup operation. Run: 

ufm_versions_mgr backup --dry-run


The UFM service continues to run during the backup, with no downtime.


Restore Full System

Standalone (SA) Restore

  1. Preview restore. Run:

    ufm_versions_mgr restore --dry-run

  2. Restore full system. Run:

    ufm_versions_mgr restor


Restore Process:


  1. Validate backup integrity

  2. Save current PKEY and health policy configurations

  3. Stop UFM service

  4. Uninstall current UFM

  5. Load Docker images from backup

  6. Install UFM with saved install arguments

  7. Restore configurations

  8. Restore preserved files (PKEY, health policies)

  9. Start UFM service

The UFM service is stopped for the duration of the restore operation.

High Availability (HA) Restore

Restore full system in HA environment. Run on master node:

ufm_versions_mgr restore


Restore Process

  1. Validate backup integrity

  2. Save current PKEY and health policy configurations

  3. Stop HA cluster

  4. Get standby node IP from HA configuration

  5. Uninstall UFM on master

  6. Load Docker images on master

  7. SSH to standby: Uninstall UFM and load images

  8. Install UFM on master with saved install arguments

  9. Install UFM on standby with saved install arguments.

  10. Restore configurations

  11. Restore preserved files

  12. Start HA cluster

  • The restore must be executed from the master node.

  • SSH trust with the standby node is set up automatically.

  • The entire HA cluster is offline for the duration of the restore.

Restore from Custom Location

Restore from custom backup directory. Run:

ufm_versions_mgr restore --backup-dir /mnt/external/ufm-backup


Preserved Files

During full system restore, the following files from the current system are preserved:


File / Directories

Reason

PKey Configurations

/opt/ufm/files/conf/opensm/partitions.conf

Represents current fabric partitioning. Changing PKEYs can disrupt running workloads.

Unhealthy Ports Configuration

/opt/ufm/files/conf/opensm/opensm-health-policy.conf

Current health policies should survive version changes.

Existing Backups

/opt/ufm/files/backup/opt/ufm/backup

Maintain ability to restore again.


Last updated: