NMX Telemetry (NMX-T) Documentation

Prometheus Metrics Endpoint

NMX Telemetry provides an HTTP endpoint for seamless integration with monitoring systems that operate in poll mode and support Prometheus, CSV, or JSON data formats. The endpoint only returns the most recent data sample, and users cannot access statistics for past time points.

curl --silent  http://0.0.0.0:9352/xcset/nvlink_domain_telemetry

Prometheus interface port is defined by the Interfaces.

Prometheus interface security configuration is handled with Interface Configuration

Data Formatting

By default, the metrics endpoint provides data in Prometheus format; however, it also supports rendering data in CSV and JSON formats to help manage convenience and payload size. The rendering format is controlled by the csv and json path prefixes.

Get metrics as comma-separated values:

curl --silent  http://0.0.0.0:9352/csv/xcset/nvlink_domain_telemetry

Get metrics as JSON objects:

curl --silent  http://0.0.0.0:9352/json/xcset/nvlink_domain_telemetry

Metrics Selection

An HTTP endpoint can deliver all sampled data via the default /xcset/nvlink_domain_telemetry URL.

If no URL prefix is specified, the filter file will be searched in both the cset and fset folders. If both contain files with the same name, both filters will be applied.

Counter Sets (cset)

Cset file contains tokens per line to filter the data with "type"="counters".

# List of available counters:
node_guid
port_guid
port_num
lid
link_down_counter
link_error_recovery_counter
symbol_error_counter
port_rcv_remote_physical_errors
port_rcv_errors
port_xmit_discard
port_rcv_switch_relay_errors
excessive_buffer_errors
...

Tokens are the actual name 'fragments' to be matched:

  • port$: Matches names that end with the token "port."

  • ^port: Matches names that start with the token "port."

  • ^port$: Matches names that are exactly "port."

  • port+xmit: Matches names that contain both the tokens "port" and "xmit."

  • port-support: Matches names that contain the token "port" but exclude those with the token "support."

  • -port: Excludes names that contain the token "port."

To disable counter export, insert a single-line token that doesn't match anything.

Field Sets (fset)

Fset consists of multiple blocks, each beginning with a header line in the format [event_type_name], followed by tokens under that header. The Fset file is used to filter data with "type"="events". Event type names can be prefixed to apply the same tokens to all matching types. For example, to filter all ethtool events, use [ethtool_event_*].

[type_name_1]
tokens
[type_name_2]
tokens
[type_name_3]
tokens
...

Tokens are the actual the name 'fragments' to be matched:

  • port$: Matches names ending with the token "port."

  • ^port: Matches names starting with the token "port."

  • ^port$: Matches names that are exactly "port."

  • port+xmit: Matches names containing both the tokens "port" and "xmit."

  • port-support: Matches names containing the token "port" but excluding those that also contain the token "support."

  • -port: Excludes all names containing the token "port." 

To match multiple tokens simultaneously, use the format "tok1+tok2+tok3". Exclusive tokens are also supported: for example, the line "tok1+tok2-tok3-tok4" will filter names that match both tok1 and tok2, while excluding those that match tok3 or tok4.

Meta fields are user-defined additional fields, which come in two types: aliases and new constant fields.

  • AliasesAdd the data from the field "exact_name" to the meta fields of the record under the new "alias_name."Each field can have only one alias.Aliases match only exact names and will appear in the data record, even if the field is disabled by the fset.Example:

meta_field_alias:exact_name=alias_name
  • ConstantsAdd a new field called "new_field_name" with the constant data string "constant_value" to the meta fields.Field names must be unique.Example: meta_field_add:new_field_name=constant_value

The following example will export all "switch_fan" events and "CableInfo" events filtered by the token "port":

[switch_fan]

[CableInfo]
port


To know which event type names are available use NVL5 Metrics Schema.

Corner Cases

  • An empty fset file will export all events.

  • Tokens written above or without an [event_type] will be ignored.

  • If the fset file cannot be opened, a warning will be displayed, and all event types will be exported.

Both events and counters can be extended with aliased fields and new constant fields.

  • “meta_field_aliases:exact_name=alias” will add a new field or counter with the name “alias_name” and copy the value from the existing field or counter “exact_name.”

  • “meta_field_add:new_name=constant_value” will add a new field or counter with the name “new_name” and the value “constant_value.”

New fields must have unique names; otherwise, they will be ignored.

Extended Counter Sets

The HTTP server offers an optional Extended Counter Set (xcset) selection mechanism in addition to the counter set (cset) and field set (fset) filtering. The Extended Counter Set enables users to generate an output record containing data from both "counters" and "event" data records with the same index, typically the guid/port_num in the context of NMX Telemetry. To define an extended counter set, a file or group of files with the .xcset extension must be placed in the designated directory or alongside the existing field or counter sets.

Each line of the file may include:

  • Selection of a counter with an optional alias in the format “counter[=alias]”

  • Selection of a type's field with an optional alias in the format “type.field[=alias]”

  • Reference to another file to be included in the format “file.xcset”

Extended counter set files are searched in the same directory as the source xcset.

Aliases are optional, but if provided, they will be used to name the selected counter or field in the output. Empty lines and comments (starting with "#") are ignored.

Rendering Hints

Extended counter sets support rendering hints to modify the attribution and representation of metric values.

These hints are provided as a comma-separated list of key=value pairs, placed after the field selection line, following the semicolon (;) character.

counter[=alias];key[=value][,key[=value]]*

For example:

port_guid;label,hex,default=undefined
hw_port_state;lookup=printable_port_states

Supported rendering hints are the following:

Key

Value

Description

Example

hex

n/a

Requests a numeric value to be rendered hexadecimal

port_num;hex

label

n/a

Attributes the field as Prometheus label

host_name;label

default

value

Sets a string value to be rendered in case of data for the field is missing

temperature;default=unknown

const

value

Add the marked field as constant value to the output

context;const=oberon

lookup

name

Use the named lookup table to replace the value when rendering.

hw_port_state;lookup=printable_port_states

Value Lookups

Extended counter sets support value replacement using lookup tables.

One or more lookup tables can be defined separately or as part of the xcset file. The location of the lookup table is the same for all xcset files.

Lookup Table Definition 


Name

Required
/Optional

Type

Description

lookup

Required

Keyword

Declared lookup element. 

mask

Optional

Keyword

Declaration value as a mask. If not present value is exact. 

name

Required

string

Field name.

key

Required

unsigned long long

The original value for replacement

value

Optional

string

String for replace key. If not present will show the original value.

Examples lookup definition:

lookup:link_speed_active:0:  UNKNOWN
lookup:link_speed_active:1:  SDR
lookup:link_speed_active:2:  DDR

lookup:CableInfo.cable_vendor:1:Oth

lookup:mask:fastRecoveryOverFlow:1:  num_errors
lookup:mask:fastRecoveryOverFlow:32: consecutive_normal

Lookup Value Usage 

Using

Example

Description

implicit

CableInfo.cable_vendor=cable_vendor

Value 1 will be replaced by Oth.

The lookup key should be equal to the field name.

implicit

fastRecoveryOverFlow=fastRecoveryOverFlow

Value 1 will be replaced by num_errors

Value 33 will be replaced by num_errors,consecutive_normal

explicit in xcset

CableInfo.cable_vendor=cable_vendor;lookup=

Disable lookup for CableInfo.cable_vendor.

explicit in xcset

lookup:hello:1:hello world

CableInfo.cable_vendor=cable_vendor;lookup=hello

Value 1 will be replaced by 'hello world'.

explicit in xcset

CableInfo.cable_vendor=cable_vendor;lookup=vendor
lookup:vendor:1:

Value 1 will be 1.

Output result in Prometheus without lookup:

hw_port_state{hca="mlx5_2"}  1 1716905830122
hw_port_state{hca="mlx5_2"}  2 1716905830122


Prometheus

CSV

JSON

hw_port_state{hca="mlx5_2"}  1 100500
rx_bytes{hca="mlx5_2"}  100 100700
hw_port_state{hca="mlx5_2"}  2 100500
rx_bytes{hca="mlx5_2"}  150 100700

timestamp,hca,hw_port_state,rx_bytes
100500,mlx5_2,1,100
100700,mlx5_2,2,150

{"timestamp": 100500, "hca": "mlx5_2", "hw_port_state": 1, "rx_bytes": 100},
{"timestamp": 100700, "hca": "mlx5_2", "hw_port_state": 2, "rx_bytes": 150},

with lookup

lookup:hw_port_state:1:Active


String as label

Prometheus

CSV

JSON

false

rx_bytes{hca="mlx5_2"}  100 100700
rx_bytes{hca="mlx5_2"}  150 100700

timestamp,hca,hw_port_state,rx_bytes
100500,mlx5_2,Active,100
100700,mlx5_2,2,150

{"timestamp": 100500, "hca": "mlx5_2", "hw_port_state": "Active", "rx_bytes": 100},
{"timestamp": 100700, "hca": "mlx5_2", "hw_port_state": "2", "rx_bytes": 150}

true

rx_bytes{hca="mlx5_2", hw_port_state="Active"}  100 100500
rx_bytes{hca="mlx5_2", hw_port_state="2"}  150 100700

timestamp,hca,hw_port_state,rx_bytes
100500,mlx5_2,Active,100
100700,mlx5_2,2,150

{"timestamp": 100500, "hca": "mlx5_2", "hw_port_state": "Active", "rx_bytes": 100},
{"timestamp": 100700, "hca": "mlx5_2", "hw_port_state": "2","rx_bytes": 150}

Data Filtering

The NMX Telemetry Prometheus endpoint offers data filtering capabilities to control the selection of metrics it outputs.

Filter operations and operands are provided as HTTP query string parameters. Multiple filters in a single HTTP request are combined using a logical AND.

The general format of the filter query parameter is "<field-name><operation><operands>", where:

  • field-name: The name of the counter or event field to which the operation applies.

  • operation: One of the operations from the list below.

  • operands: One or more operands used to evaluate the filter.

For example:

curl --silent  http://0.0.0.0:9302/xcset/nvlink_domain_telemetry?guid__eq__100500

Supported filters are:

Operation

Description

Applies to the field of type

Example

eq

Metrics value is equal to the given operand

floating point, decimal, string

xmit_rate__gt__10000

ne

Metrics value is not equal to the given operand

floating point, decimal, string

xmit_rate__ne__10000

gt

Greater than the given operand

floating point, decimal, string

xmit_rate__gt__10000

lt

Less than the given operand

floating point, decimal, string

xmit_rate__lt__10000

ge

Greater than or equal to the given operand

floating point, decimal, string

xmit_rate__ge__10000

le

Less than or equal to the given operand

floating point, decimal, string

xmit_rate__le__10000

bitand

Bitwise AND operation

decimal

state__bitand__7

bitor

Bitwise OR operation

decimal

state__bitor__13

in

Metrics value is in the list of given values

floating point, decimal, string

state__in__1__2__3

shard

Apply hashing function to get shard N out of K possible 

floating point, decimal, string

port_guid__shard__1__3

contains

String value contains a value (substring)

string

name__contains__mlx

Sharding Data Requests

The shard data filter is particularly useful when metrics scraping loads need to be distributed across time or consumer spaces. Several examples of sharding queries are provided.

  • Sharding of counters and events by the node GUID, serializing to csv.

    curl -v  http://0.0.0.0:9352/csv/xcset/nvlink_domain_telemetry?num_shards=2&shard=0&sharding_field=node_guid

  • Sharding, plus filtering by port number.

    curl -v  http://0.0.0.0:9352/csv/xcset/nvlink_domain_telemetry?num_shards=2&shard=0&sharding_field=node_guid&port_num__eq__1

  • Counter set explicitly selected.

    curl -v  http://0.0.0.0:9352/csv/cset/minimal?num_shards=2&shard=0&sharding_field=node_guid

  • Fieldset, selected sharding by the port number (named as “port”).

    curl -v  http://0.0.0.0:9352/csv/fset/low_freq?num_shards=2&shard=0&sharding_field=port

Last updated: