The Point-to-Point Excel file centralizes all the physical information of the project and explicitly describes how to connect each cable. For the list of supported cables, see LinkX Cables and Transceivers | NVIDIA.
To create the excel file:
-
Open an Excel file (Welcome to download and use this Template file: https://content.mellanox.com/PTP Template/ptp-example.xls )
-
Create 2 sheets as explained below:
-
-
Legend – describes basic properties for each element of the cluster. Each element should include the following properties:
-
Name – describes the naming convention for each element, best practice is to include the element basic name and * before and after the name
-
Model – element model
The “Model” is the “device format” as described inside the
“/usr/share/ibdm2.1.1/ibnl”. If the model used is not part of the supported list, please create a new one as follow:
-
Switch/HCA - whether it is a switch or HCA
-
Speed – element speed
-
Comments – general comments
NDR Example:
Name
Model
Switch/HCA
Speed
Comments
*dgx*
HCA_12
hca
4x-100G
NDR
*clf*
MQM9700
switch
4x-100G
NDR
*csp*
MQM9700
switch
4x-100G
NDR
XDR Example:
Name
Model
Switch/HCA
Speed
Comments
*dgx*
HCA_12
hca
4x-200G
XDR
*clf*
Q3400-RA
switch
4x-200G
XDR
*csp*
Q3400-RA
switch
4x-200G
XDR
-
-
PTP - explicitly describes how to connect each cable. The table has two main parts, Source and Destination, each one contains mostly the same columns. Each Line should include the following for each end of the cable:Rack - device rackU - device location in the rackName – name of the device (must comply with the naming convention as specified for the device type in the Label sheet)HCA/port - HCA name and port (in Destination part only port)SourceDestinationRackUNameHCA/portRackUNamePortSU1-1 A223cl02s01dgx011Leaves SU1 A3825cl02s01clf011SU1-1 A223cl02s01dgx012Leaves SU1 A3827cl02s01clf021SU1-1 A223cl02s01dgx013Leaves SU1 A3829cl02s01clf031SU1-1 A223cl02s01dgx014Leaves SU1 A3831cl02s01clf041SU1-1 A223cl02s01dgx015Leaves SU1 A3833cl02s01clf051SU1-1 A223cl02s01dgx016Leaves SU1 A3835cl02s01clf061SU1-1 A223cl02s01dgx017Leaves SU1 A3837cl02s01clf071SU1-1 A223cl02s01dgx018Leaves SU1 A3839cl02s01clf081
-
NOTES:
-
destination device should always be a switch (HCAs should always be specified in source)
-
for switches, use real/physical port numbers
-
HCA ports can be named/enumerated as you wish, and you have to verify that there is a proper mapping from HCA port enumeration to real HCA interface name (will be referred in next step page)
In the provided examples, the element name *dgx* denotes the device with the identifier cl02s01dgx01.
Make sure to have clear and meaningful names, well described element, its role, and its location in both the topology and in the cluster.
Last updated: