NVIDIA UFM Cyber-AI Documentation

UFM Cyber-AI OS Upgrade

Upgrading the UFM Cyber AI operating system is supported up to two previous GA software versions (GA -1 or GA -2).

This section provides a step-by-step guide for UFM Cyber-AI Operating System upgrade.

Each UFM Cyber-AI Appliance software has an additional tar file with a -omu.tar suffix (OMU stands for OS Manufacture and Upgrade).

This tar file can be used to re-manufacture the server and to upgrade the operating system/software on the server.

Hardware Platform Support:

Starting with version 2.14.1, UFM Cyber-AI provides different OMU files for different hardware platforms.

For older UFM 4.0 appliance hardware based on ubuntu 18, useufm-cyberai-appliance-<version>-<revision>-omu.tar

Important Notes:

Hardware Compatibility: Ensure you download and use the correct OMU tar file that matches your hardware platformز
Using the incorrect OMU file for your hardware may result in installation failures or operational issues. 

Extracting the Software

  1. Copy the OMU tar file to a temporary directory on the server.
    CyberAI  - ufm-cyberai-appliance<version>-<revision>-omu.tar

  2. Extract the contents of the tar file to /tmp:

    Bash
    tar xf ./ufm-cyberai-appliance-<version>-<revision>-omu.tar -C /tmp/
    
  3. Change to the extracted directory:

    Bash
    cd /tmp/ufm-cyberai-appliance-<version>-<revision>-omu
    
  4. An upgrade script and an ISO file are included in the extracted directory:

    Bash
    ls -1 ./# ls -1 ./
    ufm-os-upgrade.sh
    ufm-cyberai-appliance-<version>-<revision>.iso
    

    The following flags are available in the upgrade script help.

    Bash
    # ufm-os-upgrade.sh --help
    ufm-os-upgrade.sh will upgrade and install OS packages.
    
    IMPORTANT!!! a reboot is mandatory after the finalization of this script,
    kernel and kernel models will not work properly until the server is rebooted.
    
    Additional SW installations will be automatically invoked after reboot,
    a message will pop on all open terminals with the installation status:
    "UFM-OS-FIRSTBOOT-FAILURE" - if installation is failed.
    "UFM-OS-FIRSTBOOT-SUCCESS" - if installation succeeded.
    
    additional info will be available in "/var/log/ufm_os_upgrade_<UFM-OS-VERSION>.log" log file.
    
    syntax: ufm-os-upgrade.sh [options]
    
    options
    --appliance-sw-upgrade   upgrade ufm_appliance SW as well, default is to upgrade OS only, P.S. only applicable for StandAlone installations.
    
    -d,--debug            debug info will be visible on the screen.
    
    -r,--reboot           Automatically reboot the server when upgrade is finished.
                          P.S. if secure boot is enabled and a new certificate is enrolled 
                          the server will not automatically reboot even if this flag is set.
    
    -y,--yes              wont prompt for user acknowledgements.
    
    -h,--help             print this help message.
    

    IMPORTANT!!! System reboot is mandatory once the upgrade procedure is completed. The -r flag can be used to automatically reboot the server at the end of the upgrade. Note that some kernel modules may not work properly until server reboot is performed.

Upgrading in Standalone Mode 

  1. Stop UFM and CyberAI services.

    Bash
    systemctl stop ufm-enterprise.service
    systemctl stop ufm-cyberai.service
    
  2. Run the upgrade script:

    System reboot is mandatory once the upgrade procedure is completed. The -r flag can be used to automatically reboot the server.

    To bypass user prompts, use the -y flag when executing the command, but note that this flag alone will not trigger an automatic server reboot. If a reboot is desired, use the -r flag in combination with -y. Additionally, the --appliance-sw-upgrade flag can be used to upgrade both the UFM Enterprise Appliance SW and Cyber-AI SW, but this upgrade is not enabled by default. In the provided example, the server will automatically reboot after the upgrade process is completed.

    Bash
    ./ufm-os-upgrade.sh -y -r 
    

    The below is an example with the --appliance-sw-upgrade flag. Note that the UFM Enterprise appliance SW will also be upgraded.

    Bash
    ./ufm-os-upgrade.sh -y -r --appliance-sw-upgrade
    
  3. After the reboot procedure is complete, a systemd service (ufm-os-firstboot.service) runs the remainder of the upgrade procedure. Once completed, a message is prompted to all open terminals including the status:
    "UFM-OS-FIRSTBOOT-FAILURE" - if installation is failed.
    "UFM-OS-FIRSTBOOT-SUCCESS" - if installation succeeded.
    Example:
    https://confluence.nvidia.com/download/attachments/1675246323/image2023-1-15_15-55-45.png?version=1&modificationDate=1673993342637&api=v2

    To manually check the status, run systemctl status ufm-os-firstboot.service. If it is already completed, an error message is prompted stating that there is no such service. In that case, the log /var/log/ufm-os-firstboot.log can be checked instead.

    Bash
    systemctl status ufm-os-firstboot.service
    

    Example:
    https://confluence.nvidia.com/download/attachments/1675246323/image2023-1-15_15-57-16.png?version=1&modificationDate=1673993343507&api=v2

Upgrade in High-Availability Mode 

Upgrade on HA should be done first on the stand-by node and after that on the master node, each node upgrade is similar to the SA instructions.

In case the Standby node is unavailable, the upgrade can be run on the Master node only, however, some additional steps will be required after the appliance is upgraded.

  1. [On the standby Node]: Copy and extract the OMU tar file to a temporary directory, refer to UFM Cyber-AI OS Upgrade | Extracting the Software.

  2. [On master Node]: Run the upgrade script.

    System reboot is mandatory once the upgrade procedure is completed. The -r flag can be used to automatically reboot the server.

    The --appliance-sw-upgrade flag CAN NOT !!! be supplied to upgrade the UFM Enterprise Appliance SW in HA and the upgrade will not be performed if provided.

    The -y flag can be supplied to skip user questions (the flag does not automatically reboot the server on its own. For auto reboot, combine with the -r flag).
    In the following example the server auto reboots once the upgrade procedure is completed:

    Bash
    cd /tmp/ufm-cyberai-appliance-<version>-<revision>-omu
    ./ufm-os-upgrade.sh -y -r 
    
  3. In case the -r flag was not included, the server must be manually rebooted if the user selects "No" when prompted with a question on whether to reboot after the script finishes.

    Bash
    reboot now
    
  4. After the reboot procedure is complete, a systemd service (ufm-os-firstboot.service) runs the remainder of the upgrade procedure. Once completed, a message is prompted to all open terminals including the status:
    "UFM-OS-FIRSTBOOT-FAILURE" - if installation is failed.
    "UFM-OS-FIRSTBOOT-SUCCESS" - if installation succeeded.
    Example:

    https://confluence.nvidia.com/download/attachments/1675246323/image2023-1-15_15-55-45.png?version=1&modificationDate=1673993342637&api=v2
    To verify the status manually, execute "systemctl status ufm-os-firstboot.service". If the service has already completed, an error message will be displayed indicating that the service does not exist. In such a scenario, refer to the log file located at /var/log/ufm-os-firstboot.log for checking the status.

    Bash
    systemctl status ufm-os-firstboot.service
    

    Example:
    https://confluence.nvidia.com/download/attachments/1675246323/image2023-1-15_15-57-16.png?version=1&modificationDate=1673993343507&api=v2

  5. After the stand-by node have finished the upgrade check the HA cluster status

    Bash
    ufm_ha_cluster status
    

    https://confluence.nvidia.com/download/attachments/1675246323/image2023-3-16_21-11-14.png?version=1&modificationDate=1679019073487&api=v2

    Every node within the cluster is expected to be operational while the present node remains in a stand-by mode (designated as Secondary in DRBD_ROLE).

  6. [On the Master Node]: Initiate a fail-over of UFM to the stand-by node, which will result in the upgraded node taking over as the master and the current node transitioning to a stand-by state.

    Bash
    ufm_ha_cluster failover
    

    Wait until all the resources of UFM are up and functioning correctly on the upgraded node.

  7. Perform the same process on the inactive node that has not been upgraded, and is currently functioning as a standby.


Last updated: