When debugging a system, it is important to be able to quickly identify the root of a problem. The Diagnostic commands enables an insight into the physical layer components where the user is able to see information such as a cable status (plugged/unplugged) or if Auto-Negotiation has failed.
PHY Firmware Indication (0—1023)
Below is a list of possible output messages:
|
Monitor _opcode |
Detailed Descritption |
Detailed Mitigation |
|---|---|---|
|
0—No issue observed |
|
Wait 5 seconds and check again. If the message continues, check peer side. |
|
1—Port is close by command |
PAOS down command, also used form port shutsdown, for example. |
Check who sent the command to close the port and reopen it. |
|
2—AN failure
|
Both sides did not agree on speed/FEC or DME is missing. |
Debug Steps:
|
|
3—AN failure |
Ack not received. |
Not relevant for NDR. |
|
4—AN failure |
Next-page exchange failed. |
|
|
5—Link training failure. |
Frame lock not acquired. |
|
|
6—Link training failure. |
Link inhibit timeout. |
|
|
7—Link training failure. |
Link partner did not set receiver ready. |
|
|
8—Link training failure. |
Tuning didn’t completed. |
|
|
9—Logical mismatch between link partners |
Did not acquire block lock. |
|
|
10—Logical mismatch between link partners |
Did not acquire AM lock (NO FEC). |
|
|
11—Logical mismatch between link partners |
Did not get align_status. AN is done but the signal is not locked. Very rare. |
|
|
12—Logical mismatch between link partners |
FC FEC is not locked. |
|
|
13—Logical mismatch between link partners |
RS FEC is not locked. |
|
|
14—Remote fault received |
|
Wait 5 seconds and check again. If the message continues, check peer side. |
|
15—Bad signal integrity |
Low Raw BER. Please notice to have it running minimum time before checking. |
The link is up, but with low Raw BER. Steps:
|
|
16—Cable compliance code mismatch (protocol mismatch between cable and port) |
|
|
|
17—Bad signal integrity |
|
Not relevant for NDR. |
|
18, 20—Internal error |
|
|
|
19—Internal error |
|
|
|
20—Stamping of non-NVIDIA Cables/Modules |
|
Replace the cable with an NVIDIA cable. |
|
21—Down by PortInfo MAD |
|
Need to check who sent the command to close the port and reopen it. |
|
22—Internal error |
|
Not relevant for the field. |
|
23—Internal error |
Calibration failure. |
|
|
24—EDR speed is not allowed due to cable stamping: EDR stamping |
Cable is invalid. |
Replace the cable with an NVIDIA cable. |
|
25—FDR10 speed is not allowed due to cable stamping: FDR10 stamping |
||
|
26—Port is closed due to cable stamping: Ethernet_compliace_code_zero |
||
|
27—Port is closed due to cable stamping: 56GE stamping |
||
|
28—Port is closed due to cable stamping: non-NVIDIA QSFP28 |
||
|
29—Port is closed due to cable stamping: non-NVIDIA SFP28 |
||
|
30—Port is closed, no backplane enabled speed over backplane channel |
|
Check the port is configured correctly: same speeds, width and FECs or that AN is fully enabled. |
|
31—Port is closed, no passive protocol enabled over passive copper channel |
|
|
|
32—Port is closed, no active protocol enabled over active channel |
|
|
|
33—Port width is does not match the port speed enabled |
|
|
|
34—Local Speed degradation |
|
The link is up, but with lower speed than expected. Steps:
|
|
35—Remote Speed degradation |
|
Review remote side status. |
|
36—No Partner detected during force mode. 37—Partial link indication during force mode. |
|
Debug steps:
|
|
38—AN failure |
FEC mismatch during override. |
|
|
39—AN failure |
No HCD. |
|
|
40 |
N/A |
Not relevant for NDR. |
|
41—Port is closed, module can’t be set to the enabled rate |
|
|
|
42—Bad SI, cable is configured to non optimal rate |
|
Check the port is configured correctly: same speeds, width and FECs or that AN is fully enabled. |
|
43—No Partner Detected in Force Mode and Fast Link Up |
|
Not relevant for NDR. |
|
44-47 |
N/A |
|
|
48—Bad signal integrity |
|
|
|
49—Bad signal integrity |
|
|
|
50—Internal error |
|
|
|
51—HST speed mismatch |
|
|
|
52—Bad signal integrity |
|
The link is up, but with low Raw BER. Steps:
|
|
53—Link failure due to MCB at link up |
|
Wait for 10 seconds, and if the message is reread then share inforamtion from both sides and toggle the link. |
|
54—PLR didn't get Rx good non sync cell |
|
|
|
55—PSI fatal error |
|
|
|
56—module_lanes_frequency_not_synced |
|
Not relevant for NDR |
|
57—signal not detected 59—Did not get module conf done |
Power detection in the SerDes is not detected. |
|
|
58 |
N/A |
Not relevant for NDR. |
|
128—Troubleshooting in process |
|
Wait 3 seconds and run the command again. |
|
1023—Info not available |
|
Wait for 10 seconds, and if the message is reread then share inforamtion from both sides and run power cycle. |
|
1024—Cable is unplugged |
No phisical tranceiver detacted on cage. |
Plug tranceiver. Please notice that no one run command simulating unplugged transceiver. |
|
1025—Long Range for non Mellanox cable/module . |
No support for long rage none NVIDIA cables. |
Replace the cable with NVIDIA cable. |
|
1026—Bus stuck (I2C Data or clock shorted) |
Received failure on the I2C EEPROM communication line. |
Transceiver reset (Disable/enable), if the issue continues, please collect information and data and then run power cycle. |
|
1027—Bad/unsupported EEPROM |
Failed to read EEPROM from tranceiver or tranceiver id is not recognized. |
Please test with another approved transceiver. Id the issue continues, please collect data and share. |
|
1028—Part number list |
Tranceiver is not permitted by vendor list. |
Replace the cable with cable from the supported list. |
|
1029—Unsupported cable. |
SFP tranceiver is not supported. |
|
|
1030—Module temperature shutdown |
Tranceiver temerature exceeded allowed threshold. |
Please check the cable temperature and cool the envoriment if it is indeed to hot. |
|
1031—Shorted cable |
Receive over current on the tranceiver. |
Bad tranceiver, please test with a different transceiver. |
|
1032—Power budget exceeded |
Board power budget have exceeded. |
Review supported power by the transceiver and board INI. |
|
1033—Management forced down the port |
Module shutdown by server command. |
Please review the serve commands. |
|
1034—Module is disabled by command |
Traceiver admin status is disabled. |
Enable admin status. |
|
1036—Module’s PMD type is not enabled (see PMTPS). |
Tranceiver type not supported. |
Replace tranceiver. |
|
1037 |
N/A |
Not relevant for NDR. |
|
1038 |
N/A |
|
|
1039 |
N/A |
|
|
1040—pcie system power slot Exceeded |
|
|
|
1041 |
N/A |
|
|
1042—Module state machine fault |
|
|
|
1043—Module’s stamping speed degeneration |
|
|
|
1044—Module’s stamping speed degeneration |
HDR speed is not supported. |
Replace the cable with an NVIDIA cable. |
|
1045—Module’s stamping speed degeneration |
EDR speed is not supported. |
|
|
1046—Module’s stamping speed degeneration |
FDR10 speed is not supported. |
|
|
1047—Modules DataPath FSM fault |
Failed to configure speed (application) by tranceiver. |
Wait for 10 seconds, and if the message is reread then share inforamtion from both sides and run power cycle. |
|
1048—Modules DataPath FSM fault |
||
|
Core/Driver (2048—3071): |
|
|
|
2048—MPR Violation (Under 64 bytes between two starts). |
|
Wait for 10 seconds, and if the message is reread then share inforamtion from both sides and run power cycle . |
Link Diagnostic Commands
show interfaces ib link-diagnostics
|
|
show interfaces ib [device/port] link-diagnostics
Displays a specific InfiniBand module/port or all InfiniBand ports. |
|
|
Syntax Description |
N/A |
|
|
Default |
N/A |
|
|
Configuration Mode |
config |
|
|
History |
3.6.4000 |
|
|
Example |
switch (config) # show interfaces ib link-diagnostics ---------------------------------------------------------------------- Interface Code Status ---------------------------------------------------------------------- IB1/1 0 The port is Active. IB1/2 0 The port is Active. IB1/3 1024 Cable unplugged IB1/4 1024 Cable unplugged IB1/5 1024 Cable unplugged IB1/6 1024 Cable unplugged IB1/7 1024 Cable unplugged IB1/8 1024 Cable unplugged IB1/9 1024 Cable unplugged IB1/10 1024 Cable unplugged IB1/11 1024 Cable unplugged IB1/12 1024 Cable unplugged IB1/13 1024 Cable unplugged IB1/14 1024 Cable unplugged IB1/15 1024 Cable unplugged IB1/16 1024 Cable unplugged IB1/17 1024 Cable unplugged IB1/18 1024 Cable unplugged IB1/19 1024 Cable unplugged IB1/20 1024 Cable unplugged IB1/21 1024 Cable unplugged IB1/22 1024 Cable unplugged IB1/23 1024 Cable unplugged IB1/24 1024 Cable unplugged IB1/25 1024 Cable unplugged IB1/26 1024 Cable unplugged IB1/27 1024 Cable unplugged IB1/28 1024 Cable unplugged IB1/29 1024 Cable unplugged IB1/30 1024 Cable unplugged IB1/31 1024 Cable unplugged IB1/32 1024 Cable unplugged IB1/33 1024 Cable unplugged IB1/34 1024 Cable unplugged IB1/35 1 The port is closed by command. IB1/36 2 Auto-Negotiation failure.. |
|
|
Related Commands |
|
|
|
Notes |
|
|
show interfaces ib internal leaf link-diagnostics
|
|
show interfaces ib internal leaf <module/port> link-diagnostics Displays a specific InfiniBand internal leaf module/port. |
|
|
Syntax Description |
N/A |
|
|
Default |
N/A |
|
|
Configuration Mode |
config |
|
|
History |
3.6.4000 |
|
|
Example |
switch (config) # show interfaces ib internal leaf 1 link-diagnostics ---------------------------------------------------------------------- Interface Code Status ---------------------------------------------------------------------- IB1/1/19 0 No issue was observed IB1/1/20 0 No issue was observed IB1/1/21 0 No issue was observed IB1/1/22 0 No issue was observed IB1/1/23 0 No issue was observed IB1/1/24 0 No issue was observed IB1/1/25 0 No issue was observed IB1/1/26 0 No issue was observed IB1/1/27 0 No issue was observed IB1/1/28 0 No issue was observed IB1/1/29 0 No issue was observed IB1/1/30 0 No issue was observed |
|
|
Related Commands |
|
|
|
Notes |
|
|
show interfaces ib internal spine link-diagnostics
|
|
show interfaces ib internal spine <module/port> link-diagnostics Displays a specific InfiniBand internal spine module/port. |
|
|
Syntax Description |
N/A |
|
|
Default |
N/A |
|
|
Configuration Mode |
config |
|
|
History |
3.6.4000 |
|
|
Example |
switch (config) # show interfaces ib internal spine 3/1/1 link-diagnostics ----------------------------------------------------------------------- Interface Code Status ----------------------------------------------------------------------- IB3/1/1 0 No issue was observed |
|
|
Related Commands |
|
|
|
Notes |
|
|
Last updated: