The NMX-T instance runs a gRPC server that allows clients to retrieve application information and subscribe to telemetry data. The full gRPC interface prototype definition, nmx-telemetry.proto, can be found in the ./proto subdirectory of the package installation directory.
service TelemetryService {
rpc Hello(ClientHello) returns (ServerHello);
rpc SubscribeTelemetryData(TelemetrySubscription) returns (stream TelemetryData);
}
Interface Security
The gRPC interface is optionally secured with TLS and mTLS. By default gRPC interface runs unsecured.
-
disabled - no security communication enforced
-
tls - TLS encryption enforced, where the gRPC interface trust could be verified by the client
-
mtls - mutual TLS enforced, where the gRPC server also checks the trust of a connected client
Interface Enabling/Disabling
The gRPC interface can be enabled or disabled. By default, it is enabled.
The parameter nmx-telemetry-grpc-interface controls the interface's on/off state in the user_config.json file.
Application Information
The "Hello" remote procedure call is used to synchronize the client and server versions, and if needed, enforce version matching and adjust the logic accordingly.
service TelemetryService {
rpc Hello(ClientHello) returns (ServerHello);
}
Client parameters to the handshake
message ClientHello {
string gatewayId = 1;
ProtoMsgMajorVersion major_version = 2;
ProtoMsgMinorVersion minor_version = 3;
}
In addition to other application-specific data, the telemetry service returns the application instance and environment identifiers.
-
domain_uuidenvironment domain identifier, unique identifier of the GB200 instance -
app_uuidApplication instance unique identifier -
app_verApplication version string
Telemetry Data Subscription
The Remote Procedure Call SubscribeTelemetryData enables clients to receive a stream of telemetry data collected by NMX-Telemetry.
service TelemetryService {
rpc SubscribeTelemetryData(TelemetrySubscription) returns (stream TelemetryData);
}
Subscription Parameters
Message TelemetrySubscription defines subscription parameters.
message TelemetrySubscription {
string data_type = 1; // * | ib_counters | sys_log | gpu_counters
string source_id = 2;
string source_tag = 3;
}
Set the parameter values to select the types or sources of data to receive, or leave the values blank to subscribe to all available data.
-
data_typeType of the data to subscribe forempty string or asterisk * to subscribe for all the data typescomma-separated list of data types for a fine-grained subscription -
source_iddata source identifier to get data from -
source_tagdata source tag
Leave all the parameters empty to receive all telemetry data as it is collected, without any filtering or pre-selection.
Telemetry Data Response
The telemetry data response includes metadata fields and the actual data payload. The format of the payload may vary depending on the type of data received.
message TelemetryData {
string aggregator_id = 1;
string source_id = 2;
string source_tag = 3;
string data_type = 4;
int64 timestamp = 6;
Encoding encoding_type = 7;
bytes message = 8;
}
Metadata fields describe the payload
-
aggregator_id - the unique identifier of the application domain (Oberon domain UUID)
-
data_type - a name of the type of data the payload contains, for example "counters"
-
soruce_id - identifier of the data source - device guid for the NVLink telemetry counters, switch ip and port for the gNMI aggregation, server ip for the syslog message aggregation
-
timestamp - moment of time the message has been formed, in microseconds
-
encoding_type - a hint to interpret the payload, could be JSON or BYTES
-
message - is the actual data payload, as described in the section below
Telemetry Data Payload
For example a message representing an event of type nvl_packet_types_counters may have the following values:
aggregator_id = b954ce10-be66-4d75-a538-405ac8517c38
data_type = nvl_packet_types_counters
source_id = 0x1070fd030058c216
source_tag = nvlink
Telemetry data, including counters and events, is presented as comma-separated values (CSV) enclosed within a JSON format.
The JSON object consists of
-
Timestamp: The time at which the data is collected.
-
Fields: A comma-separated list of data fields contained in the payload.
-
Values: A list of strings, each representing a list of values corresponding to the respective fields.
Message payload of data type nvl_packet_types_counters may look like the following:
[
{
"timestamp": 100,
"fields": "node_guid,port_guid,port_num,port_rcv_ibg1_nvl_pkts,port_rcv_ibg1_non_nvl_pkts,port_rcv_ibg2_pkts,port_xmit_ibg1_nvl_pkts,port_xmit_ibg1_non_nvl_pkts,port_xmit_ibg2_pkts",
"values": [
"0x1070fd0300580000,0x1070fd030058c216,9,0,0,0,0,0,0",
"0x1070fd0300580002,0x1070fd030058c216,9,0,0,0,0,0,0"
]
},
{
"timestamp": 200,
"fields": "node_guid,port_guid,port_num,port_rcv_ibg1_nvl_pkts,port_rcv_ibg1_non_nvl_pkts,port_rcv_ibg2_pkts,port_xmit_ibg1_nvl_pkts,port_xmit_ibg1_non_nvl_pkts,port_xmit_ibg2_pkts",
"values": [
"0x1070fd0300580000,0x1070fd030058c216,9,0,0,0,0,0,0",
"0x1070fd0300580002,0x1070fd030058c216,9,0,0,0,0,0,0"
]
}
]
Another example, the data payload of the "counters" data type:
[
{
"timestamp": 1729872473718869,
"fields": "node_guid,port_guid,port_num,node_description,roundtrip_time_port_counters_extended",
"values": [
"0xb83fd20300f9b7dc,0xb83fd20300f9b7dc,1,swx-proton03-bf3-2 HCA-1,,0"
]
}
]
gNMI Aggregation Data
The TelemetryData response that is a result of the gNMI Aggregated Data consists of the following:
-
aggregator_id: The unique identifier for the application domain (Oberon domain UUID).
-
data_type: The name of the gNMI subscription.
-
source_id: The address and port of the gNMI target from which the data is being aggregated.
-
timestamp: The time, in microseconds, when the message was formed.
-
encoding_type: A hint for interpreting the payload, which could be either JSON or PROTO.
-
message: The gNMI update response received from the aggregation target, either in its original binary form (encoded in PROTO) or as a JSON representation of the gNMI update message.
For example a JSON-marshalled gNMI response could look like the following:
{
"update": {
"prefix": {
"elem": [
{
"name": "interfaces"
},
{
"key": {
"name": "fnma1p1"
},
"name": "interface"
}
],
"target": "netq"
},
"timestamp": "1729513043599315230",
"update": [
{
"path": {
"elem": [
{
"name": "state"
},
{
"name": "counters"
},
{
"name": "in-octets"
}
]
},
"val": {
"uintVal": "353952"
}
}
]
}
}
Syslog Aggregation Data
The TelemetryData response that is a result of the syslog collection consists of the following:
-
aggregator_id: The unique identifier for the application domain (Oberon domain UUID).
-
data_type: The value "log_message".
-
source_id: The address and port of the log message's source.
-
source_tag: The name of the process that sent the log message.
-
timestamp: The time, in microseconds, when the message was generated.
-
encoding_type: The encoding format, either JSON or ASCII.
-
message: The syslog message, which may be in its original text form (encoded in BYTES) or a JSON-serialized OpenTelemetry message.
Example:
{
"time_unix_nano": 1731603557000000000,
"observed_time_unix_nano": 1731596357165630000,
"severity_number": 10,
"severity_text": "notice",
"body": {
"Value": {
"StringValue": "Nov 14 16:59:17 swx-proton04: Hey!"
}
},
"attributes": [
{
"key": "facility",
"value": {
"Value": {
"IntValue": 1
}
}
},
{
"key": "hostname",
"value": {
"Value": {
"StringValue": "swx-proton04"
}
}
},
{
"key": "message",
"value": {
"Value": {
"StringValue": "Hey!"
}
}
},
{
"key": "priority",
"value": {
"Value": {
"IntValue": 13
}
}
},
{
"key": "appname",
"value": {
"Value": {
"StringValue": "bash"
}
}
}
]
}
Last updated: