NMX Manager (NMX-M) Documentation

Bring-Up

Overview

The Bring-Up process is a fully automated method for configuring and registering a designated switch tray with the NMX Manager. This process enables telemetry collection and NVLink domain management.

Each switch tray is configured to ensure both nmx-controller and nmx-telemetry services are active, and a mutual LTS (mTLS)-secured gRPC connection is established and maintained.

Switch Profile API

The NMX Manager provides endpoints for securely managing switch credentials via switch profiles.

Default Profile: username: admin, password: admin, (NVIDIA recommends  to change or override for enhanced security)

Auth Required

Action

Endpoint

Description

ro-user, rw-user

Retrieve

GET /nmx/v1/switch-profiles

Retrieve a list of switch profiles

rw-user

Create

POST /nmx/v1/switch-profiles

Create a new switch profile

ro-user, rw-user

Retrieve

GET /nmx/v1/switch-profiles/{id}

Retrieve a specific switch profile

rw-user

Delete

DELETE /nmx/v1/switch-profiles/{id}

Delete a switch profile (except default)

rw-user

Update

PATCH /nmx/v1/switch-profiles/{id}

Update an existing switch profile

YAML
curl -X 'PATCH' \
  'https://<NMX-Manager-API>/nmx/v1/switch-profiles/<switch-profile-id>' \
  -H 'accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
  "Password": "admin"
}'

Bring-Up API

Bring-up is an asynchronous operation that tracks the bring-up process for one or more switches.

Auth Required

Action

Endpoint

Description

ro-user, rw-user

Retrieve

GET /nmx/v1/bring-up

Retrieve bring-up operations with optional filters (pending, in-progress, failed, completed)

rw-user

Create

POST /nmx/v1/bring-up

Initiate a new bring-up process for one or more switches

ro-user, rw-user

Retrieve

GET /nmx/v1/bring-up/{id}

Get bring-up status for a specific operation

Usage Examples

The following example demonstrates how to start a new bring-up operation using a POST request:

Bash
curl -X 'POST' \
  'https://<NMX-Manager-API>/nmx/v1/bring-up' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'FmConfig=@<fm-config-file>' \
  -F 'ProfileID=<switch-profile-id>' \
  -F 'Switches={
  "Address": ""
}'

curl -X 'POST' \
  'https://<NMX-Manager-API>/nmx/v1/bring-up' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'FmConfig=@<fm-config-file>' \
  -F 'ProfileID=' \
  -F 'Switches={
  "Address": "<switch-A-IP-Address-or-hostname>"
}' \
  -F 'Switches={
  "Address": "<switch-B-IP-Address-or-hostname>"
}'

curl -X 'POST' \
  'https://<NMX-Manager-API>/nmx/v1/bring-up' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'FmConfig=@<fm-config-file>' \
  -F 'ProfileID=' \
  -F 'Switches={
  "Address": "<switch-A-IP-Address-or-hostname>"
}' \
  -F 'Switches={
  "Address": "<switch-B-IP-Address-or-hostname>",
  "ProfileID": "<custom-switch-profile-id>"
}'

curl -X 'POST' \
  'https://<NMX-Manager-API>/nmx/v1/bring-up' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'FmConfig=@fm_config_72x1_C9_S9  2.cfg' \
  -F 'ProfileID=' \
  -F 'Switches={
  "Address": "<switch-A-IP-Address-or-hostname>",
  "ProfileID": "<custom-switch-profile-A-id>"
}' \
  -F 'Switches={
  "Address": "<switch-B-IP-Address-or-hostname>",
  "ProfileID": "<custom-switch-profile-B-id>"
}'

fm_config / SDN Config

  • Provide an fm_config / SDN config file that matches all switches included in this POST request.

  • For multiple topologies, use separate POST requests for each topology.

Find more information and examples: 

Switch Profile

  • Ensure you have a switch profile with the required switch credentials. If not, create one in advance via POST /nmx/v1/switch-profiles.

  • A single bring-up request may specify a global switch profile for all switches, or a separate switch profile for each listed switch.

  • If a switch profile is specified for a particular switch, it takes precedence over the global profile specified in the request.

If all initial validations pass, A HTTP 202 Accepted response is returned with a JSON body containing a bring-up operation ID to track the process:

YAML
{ 
	"operationId": "682880baaf653727786b618f" 
}

To track operation progress:

Bash
curl -X 'GET' \
  'https://<NMX-Manager-API>/nmx/v1/bring-up/682880baaf653727786b618f' \
  -H 'accept: application/json'

Step-by-Step Bring-Up Workflow

Initial Response

When the bring-up process has not yet started, the switch status is marked as "pending." It changes as soon as the NMX Manager begins the bring-up process.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Initial bring-up task",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "pending",
            "StatusDetails": "",
            "UpdatedAt": "2025-05-15T07:07:31.485Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:07:31.485Z"
}

Bring-Up Execution Steps

Step 1: Pre Bring-up Validation

The NMX manager sends an API request to the switch tray to check whether the nmx-controller and nmx-telemetry services are already active.

If detected, the bring-up process is skipped to avoid overwriting the existing configuration. Bring-up can only be performed once per switch. Even if services are later shut down, the NMX Manager remembers and blocks repeated bring-up attempts, unless the services are explicitly deregistered from the NMX Manager by the user.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Step 1: Is switch configured request",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "in-progress",
            "StatusDetails": "Step 1: Is switch configured request: sent to switch-gateway.",
            "UpdatedAt": "2025-05-15T07:07:31.515Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:07:31.515Z"
}

Step 2: Enable Cluster

The NMX Manager instructs the switch tray to start the nmx-controller and nmx-telemetry services required for cluster operations.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Step 2: Enable cluster request",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "in-progress",
            "StatusDetails": "Step 2: Enable cluster request: sent to switch-gateway.",
            "UpdatedAt": "2025-05-15T07:07:37.278Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:07:37.278Z"
}

Step 3: Import And Configure Certificates

The NMX Manager generates the required certificates and sends them to the switch. Certificates are stored locally and configured for both nmx-controller and nmx-telemetry.

Each configuration action is processed asynchronously via the NVOS API, and job success is verified by polling.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Step 3: Import certificates request",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "in-progress",
            "StatusDetails": "Step 3: Import certificates request: sent to switch-gateway.",
            "UpdatedAt": "2025-05-15T07:07:59.62Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:07:59.62Z"
}

Step 4: Enable mTLS For NMX Services

The NMX Manager sends a request to configure both services for mTLS encryption, ensuring secure gRPC communication. Each operation is tracked and validated.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Step 4: Enable encryption request",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "in-progress",
            "StatusDetails": "Step 4: Enable encryption request: sent to switch-gateway.",
            "UpdatedAt": "2025-05-15T07:08:39.559Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:08:39.559Z"
}

Step 5: Import and Configure SDN Config

The NMX Manager uploads the SDN config (fm_config) provided in the bring-up request to the switch file system. nmx-controller is configured accordingly. Jobs are tracked and confirmed.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Step 5: Install SDN request",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "in-progress",
            "StatusDetails": "Step 5: Install SDN request: sent to switch-gateway.",
            "UpdatedAt": "2025-05-15T07:09:02.492Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:09:02.492Z"
}

Step 6: Wait for NMX Controller Status And Register

The NMX Manager polls the controller until its addition-info field reports CONTROL_PLANE_STATE_CONFIGURED. Once confirmed, registration begins using a secure gRPC connection.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Step 6: Wait configured request",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "in-progress",
            "StatusDetails": "Step 6: Wait configured request: sent to switch-gateway.",
            "UpdatedAt": "2025-05-15T07:09:17.31Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:09:17.31Z"
}

Step 7: Register NMX Telemetry

NMX Manager initiates registration of nmx-telemetry over a secure gRPC connection. This is also tracked and processed by backend components.

YAML
{
    "CreatedAt": "2025-05-15T07:07:31.428Z",
    "ID": "682880baaf653727786b618f",
    "Status": "in-progress",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Registration request",
            "StartedAt": "2025-05-15T07:07:31.485Z",
            "Status": "in-progress",
            "StatusDetails": "Step 7: NMX Telemetry Registration request: sent to inventory.",
            "UpdatedAt": "2025-05-15T07:09:55.687Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:09:55.687Z"
}

Final Result: Registration Completed

If successful, both nmx-controller and nmx-telemetry are registered, and their ObjectIDs are returned.

YAML
{
    "CreatedAt": "2025-05-15T07:18:06.216Z",
    "ID": "682880baaf653727786b618f",
    "Status": "completed",
    "Switches": [
        {
            "Address": "<switch-IP-Address-or-hostname>",
            "CurrentStep": "Registration response",
            "NMX-Controller-ID": "682595b7799bc550eec18a77",
            "NMX-Telemetry-ID": "682595b8799bc550eec18a78",
            "StartedAt": "2025-05-15T07:18:06.298Z",
            "Status": "completed",
            "StatusDetails": "bring-up completed successfully",
            "UpdatedAt": "2025-05-15T07:20:25.754Z"
        }
    ],
    "UpdatedAt": "2025-05-15T07:20:25.761Z"
}

Operational Considerations

Asynchronous Processing
  • All POST requests return HTTP 202 Accepted with an operationId and a Location header.

  • Operations continue until each sub-task (per switch) reaches a terminal state: failed or completed.

  • Sub-tasks progress independently and in parallel.

Cancellation
  • Bring-up operations cannot be canceled once started.

Timeouts
    • If no progress is made within 10 minutes (configurable), the operation is marked as failed.

    • As long as one sub-task is progressing, the operation continues. Progress is indicated by the UpdatedAt field.

Troubleshooting

If bring-up fails, perform a cleanup:

  1. SSH into the switch. 

  2. Retrieve existing certificates: 

    nv show system security ca-certificate
    nv show system security certificate

  3. Delete NMX-added certificates: 

    nv action delete system security certificate <certificate_name>
    nv action delete system security ca-certificate <ca_certificate_name>

  4. Remove SDN config: 

    nv action delete sdn config apps nmx-controller type fm_config files fm_config.cfg

  5. Reset SDN configuration: 

    nv action reset sdn factory-default

  6. Disable cluster state: 

    nv set cluster state disabled
    nv config apply

  7. Clean up certificate and config files: 

    rm /tmp/cert.p12 /tmp/ca-cert.crt /tmp/fm_config.cfg
    


Last updated: