Changes and New Features History
|
Feature/Change |
Description |
|---|---|
|
Rev 2.6.1 |
|
|
General |
Added support for running libsharp_coll from SHARP 2.6.1 with SHARPD from SHARP 2.4.0 – 2.6.1 |
|
General |
Added information about updatable configuration parameters in the configuration file and help menu |
|
Network |
Added support for keep-alive on connections to SHARPD |
|
Network |
Added support for asynchronous connections |
|
Network |
Disabled UCX listener as default in SHARP Aggregation Manager |
|
AM |
Added support for the non-default subnet prefix |
|
AM |
Added support for DF+ topologies with more than two-level islands |
|
SHARPD |
Added support for caching AM address |
|
Rev 2.5.0 |
|
|
Resource Management |
Added support for exclusive lock requests for streaming aggregation jobs. |
|
Network |
Enabled connection keep-alive between SHARPD and Aggregation Manager. |
|
Rev 2.4.3 |
|
|
General |
Added support for identifying Aggregation Nodes based on SMDB. |
|
General |
Improved minhop tables calculation. |
|
General |
Added a new API for querying events. |
|
Rev 2.1.4 |
|
|
sharp_am/sharpd/libsharp_coll: Streaming Aggregation |
Added support for Streaming Aggregation over ConnectX-6 adapter card and Quantum switch. |
|
libsharp_coll: GPU Accelerator |
Added support for NVIDIA GPU buffers. |
|
sharp_am: OOB |
Added support for identifying the topology type from the OpenSM SMDB file. |
|
sharp_am: Reboot |
Fixed an issue where recovery failed after reboot of all switches in the cluster. |
|
Rev 2.0.0 |
|
|
sharp_am/sharpd/libsharp_coll |
Added support for the following NVIDIA Quantum switch capabilities:
|
|
sharp_am/sharpd: Resource Management |
Added support for enabling and disabling reproducibility on the job level. |
|
sharp_am/sharpd: Subnet Management |
Added support for controlling the SA key for SA operations. |
|
libsharp_coll: GPUDirect |
Added support for CUDA GPUDirect and GPUDirect RDMA. |
|
Rev 1.8.1 |
|
|
Aggregation Manager (sharp_am): Resiliency |
Added support for waiting for jobs to end prior to performing fabric reinitialization on AM startup. |
|
Mellanox SHARP Daemon (sharpd): Out-of-Box Improvements |
Socket-based is now activated by default when installed from RPM/MLNX_OFED. |
Parameters Change History
|
Parameter |
Component |
Description |
|---|---|---|
|
Rev 2.6.1 |
||
|
dump_dir |
sharp_am |
Update: Changed default to /var/log |
|
smx_enabled_protocols |
sharp_am |
Update: Changed default from 7 to 6 (disable UCX by default) |
|
ib_mad_timeout |
sharp_am |
Update: Change deault from 200 to 500 |
|
dump_dir |
sharp_am |
Update: Change default to /var/log |
|
sr_mad_timeout |
sharpd |
New parameter: Control timeout for ServiceRecord queries Default: 10000 millieconds |
|
sr_mad_retries |
sharpd |
New parameter: Control number of retries for ServiceRecord queries Default: 3 retires |
|
Rev 2.5.0 |
||
|
smx_keepalive_interval |
sharp_am/sharpd |
New parameter: Keep alive interval in seconds 0 to disable keep alive. Default: 60 seconds |
|
smx_incoming_conn_keepalive_interval |
sharp_am |
New parameter: Keep alive interval for incoming connections 0 to disable Default: 300 seconds |
|
enable_exclusive_lock |
sharp_am |
New parameter: Enable/Disable exclusive lock feature. Default: True |
|
enable_topology_api |
sharp_am |
New parameter: Enable/Disable Toplogy API feature Default: True |
|
max_trees_to_build |
sharp_am |
New parameter: Control number of trees for AM to build Default: 126 |
|
Rev 2.4.3 |
||
|
ib_max_mads_on_wire |
sharp_am |
Modified behavior: Changed default from 100 to 4096 |
|
ib_qpc_local_ack_timeout |
sharp_am |
Modified behavior: Changed default from 0x1F to 0x12 |
|
ib_sat_qpc_local_ack_timeout |
sharp_am |
Modified behavior: Changed default from 0x1F to 0x12 |
|
ib_qpc_timeout_retry_limit |
sharp_am |
Modified behavior: Changed default from 7 to 6 |
|
ib_sat_qpc_timeout_retry_limit |
sharp_am |
Modified behavior: Changed default from 7 to 6 |
|
Rev 2.0.0 |
||
|
control_path_version |
sharp_am |
New parameter
|
|
max_compute_ports_per_agg_node
|
sharp_am |
Modified behavior: When set to 0, AN radix is set to maximal radix value. Default: 0 |
|
default_reproducibility
|
sharp_am |
New parameter: Control default reproducibility mode for jobs. Default: TURE |
|
ib_sa_key |
sharp_am |
New parameter: Control SA key for SA operations. Default: 0x1 |
|
coll_job_quota_max_payload_per_ost
|
sharp_job_quota |
Modified behavior: Change default value to 1024. |
|
SHARP_COLL_MAX_PAYLOAD_SIZE
|
Libsharp_coll |
Removed |
|
SHARP_COLL_NUM_SHARP_COLL_REQ
|
Libsharp_coll |
Removed |
|
SHARP_COLL_ENABLE_REPRODUCIBLE_MODE
|
Libsharp_coll |
New parameter: Control job reproducibility mode: 0 – Use default. 1 – No reproducibility. 2 – Reproducibility. |
|
SHARP_COLL_ENABLE_CUDA
|
Libsharp_coll |
New parameter: Enables CUDA GPU direct. |
|
SHARP_COLL_ENABLE_GPU_DIRECT_RDMA
|
Libsharp_coll |
New parameter: Enables GPU direct RDMA. |
|
Rev 1.8.1 |
||
|
pending_mode_timeout |
sharp_am |
New parameter: Defines AM waiting time for jobs to complete prior to fabric re-initialization upon startup. |
|
job_info_polling_interval |
sharp_am |
New parameter: Defines job status polling interval when waiting for jobs to complete upon startup. |
Last updated: