The following table provides a list of bugs fixed in this SHARP version.
|
Internal Ref. |
Issue |
|---|---|
|
3844898 |
Description: Fixed the issue where sharp_am failed to allocate resources for new job requests due to scattered links and unmatched trees, despite a sufficient number of links available. |
|
Keywords: SHARP, Report No resource |
|
|
Discovered in Version: 3.5.1 |
|
|
Fixed in Release: 3.8.0 |
|
|
3438393 |
Description: Fixed the issue where, in the following configuration mode, resource limitations were ignored and no limits were set for any application: when using dynamic trees allocation, Quasi Fat Tree (QFT)-oriented logic, and reservation_mode is enabled. |
|
Keywords: Dynamic trees allocation; QFT; resource limitation |
|
|
Discovered in Release: 3.3.0 |
|
|
Fixed in Release: 3.8.0 |
|
|
3971970 |
Description: Fixed the issue where |
|
Keywords: Syslog |
|
|
Discovered in Release: 3.5.0 |
|
|
Fixed in Release: 3.8.0 |
|
|
3478803 |
Description: Fixed the issue where obtaining topology information ( |
|
Keywords: SHARP topology API |
|
|
Discovered in Release: 3.5.0 |
|
|
Fixed in Release: 3.8.0 |
|
|
3844898 |
Description: Fixed the issue where sharp_am failed to allocate resources for new job requests due to scattered links and unmatched trees, despite a sufficient number of links available. |
|
Keywords: SHARP, Report No resource |
|
|
Discovered in Release: 3.5.1 |
|
|
Fixed in Release: 3.7.0 |
|
|
3696666 |
Description: Fixed the issue where libsharp could not communicate with sharp_am on systems that exclusively used IPv6 addresses without IPv4 addresses. Now, both libsharp and sharp_am can utilize either IPv4 or IPv6, depending on the machine configuration. |
|
Keywords: sharp_am, libsharp, tcp/ip, smx |
|
|
Discovered in Release: 3.5.1 |
|
|
Fixed in Release: 3.7.0 |
|
|
3686321
|
Description: When upgrading UFM from previous versions to UFM 6.15.x, This leads to failure in saving reservation and job information, so in case of a restart of |
|
Keywords: |
|
|
Discovered in Release: 3.5.0 |
|
|
Fixed in Release: 3.6.0 |
|
|
3724093 |
Description: Fixed the issue where libsharp, when communicating with sharp_am via UCX, automatically selects the first available IB adapter instead of the instructed adapter for the data path. |
|
Keywords: |
|
|
Discovered in Release: 3.5.1 |
|
|
Fixed in Release: 3.6.0 |
|
|
3665349 |
Description: Fixed an issue where sharp_am failed to detect an abnormal termination of an application executing a SHARP job, which resulted in the failure to properly clean up its resources. |
|
Keywords: |
|
|
Discovered in Release: 3.6.0 |
|
|
Fixed in Release: 3.6.0 |
|
|
3646010 |
Description: Fixed an issue in sharp_am where it failed to support virtual ports when OpenSM topology policies were employed, and sharp_am was configured to utilize only one of the sub-topologies. |
|
Keywords: |
|
|
Discovered in Release: 3.6.0 |
|
|
Fixed in Release: 3.6.0 |
|
|
3609384 |
Description: Fixed issues concerning |
|
Keywords: |
|
|
Discovered in Release: 3.4.0 |
|
|
Fixed in Release: 3.5.0 |
|
|
3541153 |
Description: Fixed an issue where client application is abnormally terminated before the sharp_coll_finalize method, |
|
Keywords: |
|
|
Discovered in Release: 3.4.0 |
|
|
Fixed in Release: 3.5.0 |
|
|
3400293 |
Description: Fixed an issue in libsharp where it failed to respond to messages from the SM while searching for Service Records, causing the SM to print timeout messages. |
|
Keywords: sharp_am; openSM |
|
|
Discovered in Release: 3.1.0 |
|
|
Fixed in Release: 3.4.0 |
|
|
3479721 |
Description: Fixed the issue where sharp_am did not handle hypercube topologies well, causing it to incorrectly treat different switches as duplicates. |
|
Keywords: sharp_am; hypercube |
|
|
Discovered in Release: 3.3.0 |
|
|
Fixed in Release: 3.4.0 |
|
|
3496440 |
Description: Fixed the issue in sharp_am where excessive log messages were printed for each disconnected or restarted compute host. Now, the information is printed in a consolidated manner in the form of summaries of disconnected hosts or a list of those hosts in a single log message.
|
|
Keywords: sharp_am |
|
|
Discovered in Release: 3.3.0 |
|
|
Fixed in Release: 3.4.0 |
|
|
3336788
|
Description: Fixed the issue in Firmware where MAD error responses might have been received in libsharp. |
|
Keywords: sharp_am; libsharp |
|
|
Discovered in Release: 3.2.0 |
|
|
Fixed in Release: 3.3.0 (Quantum-2 Firmware 31.2010.6064) |
|
|
3343503
|
Description: Fixed the issue where sharp_am installed from MLNX_OFED used an invalid range of job IDs, resulting in occasional errors when trying to establish new SHARP jobs. |
|
Keywords: MLNX_OFED; sharp_am |
|
|
Discovered in Release: 3.2.0 |
|
|
Fixed in Release: 3.3.0 |
|
|
3368381
|
Description: Fixed the issue of when no sufficient amount of retries was made to resend failed libsharp GroupJoin MADs, SHARP jobs failed before they even started. |
|
Keywords: libsharp; MADs |
|
|
Discovered in Release: 3.0.0 |
|
|
Fixed in Release: 3.3.0 |
|
|
3393902
|
Description: Fixed the issue where re-created virtual ports were not recognized by sharp_am, thus the correct tree was not built for them. This resulted in SAT jobs getting ibv_poll_cq failure in libsharp. |
|
Keywords: Virtual port; sharp_am; libsharp; SAT; ibv_poll_cq |
|
|
Discovered in Release: 3.2.0 |
|
|
Fixed in Release: 3.3.0 |
|
|
3404474
|
Description: Fixed an issue where failure of application allocation of all hosts done via /app/sharp/resources REST-API returned a successful job instead of error. |
|
Keywords: REST API; allocation |
|
|
Discovered in Release: 3.2.0 |
|
|
Fixed in Release: 3.3.0 |
|
|
3406186
|
Description: Fixed an issue where SHARP AM failed handling reports from OpenSM if some switch ports were down or isolated. |
|
Keywords: Aggregation Manager; Aggregation Node; OpenSM |
|
|
Discovered in Release: 3.2.0 |
|
|
Fixed in Release: 3.3.0 |
|
|
3236363
|
Description: Fixed the way physical link failures between switches are handled. In the event of a link failure, a SHARP job utilizing the link has to be stopped; however, this will bear no effect on the other present or future jobs. |
|
Keywords: Aggregation Manager; sharp_am; Link Failure |
|
|
Discovered in Release: 3.1.0 |
|
|
Fixed in Release: 3.2.0 |
|
|
3230585
|
Description: Fixed the issue of when operating in Dynamic trees mode, ibdiagnet may have printed warning messages about the existence of multiple distinct trees with the same tree ID. |
|
Keywords: Dynamic tree; ibdiagnet |
|
|
Discovered in Release: 3.1.0 |
|
|
Fixed in Release: 3.2.0 |
|
|
3226743
|
Description: Fixed the issue of when a management host was not connected to a leaf switch, sharp_am might have printed a number of warning messages about trees that could not reach all aggregation nodes.
|
|
Keywords: Aggregation Manager; sharp_am; leaf; GUID |
|
|
Discovered in Release: 3.0.1 |
|
|
Fixed in Release: 3.2.0 |
|
|
3274564
|
Description: Fixed an issue where sharp_benchmark bash script failed to operate on all bash versions. |
|
Keywords: sharp_benchmark |
|
|
Discovered in Release: 3.1.1 |
|
|
Fixed in Release: 3.2.0 |
|
|
3262936
|
Description: Fixed the issue where a crash took place during sharp_am reboot while physical links were hanging between switches in the fabric. |
|
Keywords: sharp_am; physical links; crash |
|
|
Discovered in Release: 3.1.0 |
|
|
Fixed in Release: 3.1.1 LTS |
|
|
3192770
|
Description: Fixed the issue where SHARP jobs failed when using virtual interfaces configured with SR-IOV. |
|
Keywords: SR-IOV |
|
|
Discovered in Release: 3.0.0 |
|
|
Fixed in Release: 3.1.0 |
|
|
3163697
|
Description: Fixed the issue of when the client application used more than 1024 file descriptors (range limit defined by FD_SETSIZE), libsharp was prevented from using any more file descriptors. Using poll() instead of select() enables using the full range of allowed file descriptors by Linux. |
|
Keywords: File descriptor; libsharp; HCOLL; HPC-X |
|
|
Discovered in Release: 3.0.0 |
|
|
Fixed in Release: 3.1.0 |
|
|
3192770
|
Description: Fixed the issue where SHARP jobs failed when using virtual interfaces configured with SR-IOV. |
|
Keywords: SR-IOV |
|
|
Discovered in Release: 3.0.0 |
|
|
Fixed in Release: 3.0.1 |
|
|
3163697
|
Description: Fixed the issue of when the client application used more than 1024 file descriptors (range limit defined by FD_SETSIZE), libsharp was prevented from using any more file descriptors. Using poll() instead of select() enables using the full range of allowed file descriptors by Linux. |
|
Keywords: File descriptor; libsharp; HCOLL |
|
|
Discovered in Release: 3.0.0 |
|
|
Fixed in Release: 3.0.1 |
|
|
2995739
|
Description: Sharp_am daemon is no longer removed when performing rpm upgrade and is overridden instead. |
|
Keywords: Aggregation Manager; rpm |
|
|
Discovered in Release: 2.6.1 |
|
|
Fixed in Release: 2.7.0 |
|
|
2972970
|
Description: Fixed the issue where completion of SHARP installation using sharp_daemons_setup.sh script depended on python availability. |
|
Keywords: Aggregation Manager |
|
|
Discovered in Release: 2.6.1 |
|
|
Fixed in Release: 2.7.0 |
|
|
2749073 |
Description: SHARP AM reports the rediscovery of aggregation nodes on every topology change. |
|
Keywords: Aggregation Manager |
|
|
Workaround: N/A |
|
|
Discovered in Release: 2.5.0 |
|
|
2736102
|
Description: SHARP AM and SHARPD overrides backlog files after restart when log rotation is enabled. |
|
Keywords: Aggregation Manager, SHARPD, log file |
|
|
Workaround: N/A |
|
|
Discovered in Release: 2.5.0 |
|
|
2700530
|
Description: Terminating a job process during job initialization before sending a job request to Aggregation Manager, might result in job resource leakage in the SHARP Aggregation Manager. |
|
Workaround: N/A |
|
|
Keywords: SHARPD, Aggregation Manager |
|
|
Discovered in Release: 2.5.0 |
|
|
2726821
|
Description: Terminating SHARPD while the job process is still running will result in job resource leakage in SHARP Aggregation Manager. |
|
Workaround: Terminate SHARPD after terminating the job processes. |
|
|
Keywords: SHARPD, Aggregation Manager |
|
|
2795902
|
Description: SHARPD might allocate handlers on GPU when running with UCX. |
|
Keywords: SHARPD, SMX, UCX |
|
|
Workaround: N/A |
|
|
Discovered in Release: 2.5.0 |
|
|
Workaround: Disable UCX |
|
|
2770210
|
Description: Syslog verbosity depends on log file verbosity. |
|
Keywords: SHARPD, Aggregation Manager |
|
|
Discovered in Release: 2.5.0 |
|
|
Workaround: None |
|
|
2825519
|
Description: Aggregation Manager continue to run after SM failover. |
|
Keywords: Aggregation Manager |
|
|
Discovered in Release: 2.5.0 |
|
|
Workaround: Stop AM daemon manually |
|
|
2754175
|
Description: SHARP Aggregation Manger might allocate bad links for jobs after receiving timeouts from Aggregation Nodes. |
|
Workaround: Restart corresponding switch or restart SHARP Aggregation Manager. |
|
|
Keywords: Aggregation Manager |
|
|
Discovered in Release: 2.5.0 |
|
|
2796317
|
Description: SHARP jobs may hang when running in reservations mode (i.e. SHARP allocation is enabled), and reservation is created with limited PKEY, and configuring reservation PKEY on tree is enabled. |
|
Workaround: The PKEY used for creating the reservation should be "full" (the most significant bit should be on e.g. 0x805c instead of 0x5a). |
|
|
Keywords: Aggregation Manager, Reservations, PKEY, UFM |
|
|
Discovered in Release: 2.5.0 |
Last updated: