NVIDIA UFM Cable Validation Tool

Bringup CLI

Running Bringup CLI

Running the bringup CLI can be done in two ways:

  1. Direct Execution in the Container:

    docker exec -it cables_bringup bringupcli

  2. Alternatively, it is possible to run exec bash in the container and run the appropriate CLI command based on the fabric type from anywhere within the container:

    • Start a bash session in the container:

      docker exec -it cables_bringup

    • Execute the desired CLI command:

      # stop cvt service
      supervisorct stop cvt-service
      # run bringupcli
      bringupcli -k

Bringupcli Usage

bringupcli may have command line arguments, see usage below for more details: 

root@r-ufm65:/# bringupcli -h usage: bringupcli \[-h\] \[-V\] \[-k\] \[-d\]

Optional Arguments


Argument

Description

-h, --help

Show this help message and exit

-v, --version

Show program version number and exit

-k, --kill-other-sessions

Kill other CLI sessions if existent

-d, --daemon

Run as daemon


Bringupcli Commands

Load Topology Commands

load_topo

Description: Loads topo file if the fabric is InfiniBand, and dot file if the fabric is Ethernet. 

 load_topo <filename> dns=<true|false> cluster=<cluster name > 

Parameters:

  • filename: absolute path for the topo/dot file

  • dns (Optional): assumes that DNS is active, and you can access the switches by hostnames by default dns=true.

  • cluster (Optional): cluster name, if cluster name is provided it will be set to the provided value, else it will be set to 'default'.

load_ptp

Description: Loads PTP topology file (Excel file). 

 load_ptp <filename> format=<legacy_ib|legacy_eth|unified_topo> dns=<true|false> cluster=<cluster name> sheets=<comma separated sheets> dc_layout=<file path> hca_mapping=<file path>

Parameters:

  • filename: The absolute path for the P2P file.

  • format: The format of ptp is of legacy or unified topology:legacy_ib: allows to load legacy ib ptp filelegacy_eth: allows to load legacy eth ptp fileunified_topo: allows to load the unified ptp file which supports IB, eth, xdr and nvlink protocols 

  • dns (Optional): Assumes that DNS is active, and you can access the switches by hostnames by default dns=true.

  • cluster (Optional): Cluster name, if cluster name is provided it will be set to the provided value, else it will be set to 'default'.

  • sheets (Optional): comma separated sheets name. If provided, only the specified sheets from the Excel file are loaded. If not provided, all sheets in the file are loaded.

  • dc_layout (Optional, Ethernet Fabric Only): A CSV file that describes the data center layout, for more information please find DC Floor Layout File.

  • hca_mapping (Optional, InfiniBand and NVOS Fabrics Only): A CSV file that defines the relationship between port numbers and HCA names, for more information please find HCA Mapping File.

load_ip 

Description: Loads switch IP addresses, can be used if DNS is inactive. 

load_ip <filename> cluster=<cluster name >

Loads the IP/switch-name mapping, to allow reaching the switch via REST API to retrieve local topology, GUID, etc. The file format is pairs of IP addresses and hostname. This file will be used in association with a 'topo' file in case DNS is unavailable.
 An IP file example: 

A comment
10.0.30 switch1
10.0.0.31 switch2

Parameters:

  • filename: The absolute path for the IP file.

  • cluster (Optional): The cluster name, if cluster name is provided it will be set to the provided value, else it will be set to 'default'.


load

Description: Loads both IP addresses and topo files. 

load <topo filename> <ip filename> cluster=<cluster name>

Loads the .topo and .ip files.

Note: if you have multiple files describing a topology, use the commands:

load_ip   file.ip
load_topo file1.topo file2.topo file3.topo

Parameters:

  • topo filename: The absolute path for the topology file directory

  • ip filename: The absolute path for the IP file directory

  • cluster (Optional): The cluster name, if cluster name is provided it will be set to the provided value, else it will be set to 'default'.

load_clusters 

Note: will be deprecated in coming release

Clusters file should have the following format, where topo file should be in xlsx format and the IP file is optional. 

load_clusters <filename> dns=true|false
cluster_name, topo_file, ip_file
CLUSTER1, cluster1_topo.xlsx, cluster1.ip
CLUSTER2, cluster2_topo.xlsx,

Parameters:

  • filename: The absolute path for the cluster file

  • dns (optional): Assumes that DNS is active, and you can access the switches by hostnames by default dns=true.

Validations Commands

show_clusters

Description: Show list of loaded clusters as loaded from the clusters file. 

show_clusters

show_switches

Description: Show list of loaded switches as loaded from the topology file 

show_switches cluster=<cluster_name>

Parameters:

  • cluster (Optional): cluster name, If the cluster name is provided, show the switch in the given cluster only.

Output Example: 

MQM8700 sw-hdr-proton01
-----------------------
MQM8700 sw-hdr-proton01 P3 --> swx-proton03 mlx5_0 P1
MQM8700 sw-hdr-proton01 P4 --> swx-proton04 mlx5_2 P1
MQM8700 ufm-sw-hdr01
--------------------
MQM8700 ufm-sw-hdr01 P1 --> ufm-sw-hdr02 P1
MQM8700 ufm-sw-hdr02
--------------------
MQM8700 ufm-sw-hdr02 P1 --> ufm-sw-hdr01 P1

check_switch_status

Description: Check switch connectivity status (Ping/JSON-API/Agent ) 

check_switch_status cluster=<cluster name >

Parameters:

  • cluster (Optional): cluster name, If the cluster is provided, the check will be done for the switches in the provided cluster only.

Output Example: 

Host IP ping JSONAPI Agent
----------------------------- ------------- ---- ---- -----
sw-hdr-proton01.mtr.labs.mlnx 209.44.74 True True True
ufm-sw-hdr01.mtr.labs.mlnx 10.209.36.113 True True True
ufm-sw-hdr02.mtr.labs.mlnx 10.209.36.122 True True True

start_validation

Description: Push topology to switches and get validation reports. 

start_validation timeout=<n> cluster=<cluster_name> 

Parameters:

  • cluster (Optional): cluster name, If the cluster is provided, the validation will be started in the switches in the provided cluster only.

  • timeout (Optional): timeout in which validation stops, n is in seconds (s), minutes (m), hours (h) or days (d). For example timeout=20m or timeout=2h.

If timeout is not provided, use the stop_validation command to stop it.

stop_validation

Description: Stops validation routine. Unsubscribe from getting switches updates. 

stop_validation

Troubleshooting


Description

Example

deploy_single_agent

Deploys agent on a specific node

will be deprecated soon. use deploy_agents instead.

deploy_single_agent <switch-ip/host-ip>

deploy_all_agents

Deploys agents on loaded nodes that have no agents.

will be deprecated soon. use deploy_agents instead.

deploy_all_agents

deploy_agents

Deploys agents on all or specific nodes.

deploy_agents all

deploy_agents node_ip1 node_ip2

remove_all_agents 

Removes agents from loaded nodes that have agents.

will be deprecated soon. use remove_agents instead.

remove_all_agents

remove_single_agent

Removes an agent from a specific node.

will be deprecated soon. use remove_agents instead.

remove_single_agent <switch-ip/host-ip>

remove_agents

Remove agents on from all or specific nodes.

remove_agents all

remove_agents node_ip1 node_ip2


Set Credentials Commands

show_creds 

Description:Display the credentials for  all nodes 

show_creds [format=json|report]

These credentials are used for communication with switches and hosts.
Parameters:

  • format: the format for displayed data .

set_default_creds 

Description: Sets the default switch/host credentials to override the built-in default credentials. 

set_default_creds user=<user> pwd=<pwd> type=<switch|host> save=<true|false>

These credentials are used for communication with any switch that does not have specific credentials.
Parameters:

  • user: user name.

  • pwd: password.

  • type (Optional): the default value is switch

  • save (Optional): If save it set to true (default: true), credentials will be saved encrypted to a file

set_node_creds

Description: Sets the credentials for a specific switch/host, it can be used when the switch credentials are different than the defaults.

set_node_creds <switch> user=<user> pwd=<pwd> save=true|false

Parameters:

  • switch: switch name

  • user: user name.

  • pwd: password.

  • save (Optional): If save it set to true (default: true), credentials will be saved encrypted to a file

remove_node_creds

Description: remove the credentials for a specific switch/host.

remove_node_creds <switch>

Parameters:

  • switch: switch name

set_credential_profile 

Description: Sets the credentials for a credential profile name specified in unified topology ptp file. 

set_credential_profile <credential_profile_name> user=<user> pwd=<pwd> [save=true|false]

Parameters:

  • credential_profile_name: Credential Profile name.

  • user: user name.

  • pwd: password.

  • save (Optional): If save it set to true (default: true), credentials will be saved encrypted to a file.

remove_credential_profile

Description: remove the credentials associated to a credential profile.

remove_credential_profile <credential_profile_name>

Parameters:

  • credential_profile_name: Credential Profile name

Web User Commands

add_web_login

Description: Add new users to login to the web gui apart from the default 'admin' login. 

add_web_login user=<user> pwd=<pwd> account_type=<account_type>

Parameters:

  • user: username

  • pwd: password

  • account_type: Account_type can be cabler, admin, nvidia, or developer

delete_web_user

Description: Delete a web user account. 

delete_web_user user=<user>

Parameters:

  • user: username.

show_web_users

Description: Shows the web users added along with their account types.

show_web_users

Example output:

admin: admin
nUser: nvidia
dUser: developer

update_web_user

Description: Update the specified web user, allowing changes to the password, account type, or both. 

update_web_user user=<user> pwd=<pwd> account_type=<account_type>

Parameters:

  • user: username

  • pwd: password

  • account_type: Account_type can be cabler, admin, nvidia, or developer

Other Commands

show_switch_history 

Description: Lists data files collected from switches in the last days 

show_switch_history <switch names> past=<n> start_time=<n> end_time=<n> prev=<n>

Parameters:

  • switch_names (Optional): a space delimited switch names, if no switches are provided, it will bring data of all switches

  • past (Optional): Past argument can be used to specify the history interval, by default it is set to one week past=1w.

  • start_time, end_time (Optional): time period to look for data, if no time is provided, data of the last week will be provided

  • prev (Optional): retrieve data during the previous period from specified time to now.
    'prev' formats: num[day|week]

amber_show_latest

Description: Shows latest collected amber data from switches

amber_show_latest

show_incoming_reports 

Description: show or hide incoming reports

show_incoming_reports show=true/false

Parameters:

  • show (Optional): the default value is true

show_statistics

Description: Shows a summary for all issues

show_statistics
2024-11-21 12:58:46.726351 - show statistics for default cluster:
Status Number of occurrences Nodes affected
------------------------------------ --------------------- --------------
No Transceiver 0 0
Link Down, No signal 0 0
Admin Down 0 0
ErrDisable - Flap 0 0
ErrDisable - Rx 0 0
Negotiation fail 0 0
Wrong-neighbor 0 0
Wrong-port 0 0
Unknown-neighbor 0 0
Extra-cable 0 0
Underperforming-link (BER) 0 0
Flapping-link 0 0
Anomalous-port (Signal, Temperature) 0 0
Unreachable-device 0 0
Correctly-wired 0
No report

show_switches  

Description: Shows the current loaded switches

show_switches

Example output:

MQM8700 sw-hdr-proton01, rack: PXX, unit: 30
--------------------------------------------
MQM8700 sw-hdr-proton01 P3 --> swx-proton03 mlx5_0 P1
MQM8700 sw-hdr-proton01 P4 --> swx-proton04 mlx5_2 P1

add_certificate 

Description: Updates the SSL certificate file used by Apache for secure connections

add_certificate <crt file> <key_file>

The provided file should be a valid SSL certificate file in crt format. The old certificate file will be backed up before replacing it with the new one.

version

Description: Shows application version.

exit 

Description: Exits the application.

help 

Description: Shows a list of commands. For help on a specific command, run help <command> 

Tool Initializing Using Bringup CLI

To initialize the tool, perform the following:

  1. Run bringupcli

  2. Open bringup GUI

  3. Load the fabric topology file using one of the Load Topology Commands or from the bringup GUI

  4. Set the credentials for the switches.

  5. Deploy the agent on all switches.

     wget  or curl should be installed on the switches and hosts as it is used to download the agent image from the collector

  6. Start validation using GUI or using the Validation Command.

Running bringup GUI

  1. Open the following URL in the browser: https://<bringup_machine_ip>/cables_validation

  2. Enter credentials in the login page.

  3. You may change the default self signed certificate located by default in the container at: 

    SSLCertificateFile ${BRINGUP_CONF_APACHE_PATH}/certs/cv-cert.crt
    SSLCertificateKeyFile ${BRINGUP_CONF_APACHE_PATH}/private/cv-cert.key


Last updated: