Introduction
ClusterKit is a multipurpose node assessment tool for high-performance clusters, aimed at conducting the following tests:
-
General Assessments: Latency, bandwidth, effective bandwidth, memory bandwidth, ordered ring bandwidth, and random ring bandwidth
-
GPU Communication Tests: Memory bandwidth, GPU-GPU latency and bandwidth, GPU-Host latency and bandwidth, and NCCL bandwidth and latency
-
Collective Evaluations: Barrier, allreduce, broadcast, alltoall, and NCCL
-
Bisectional Bandwidth
-
CPU/GPU Stress
ClusterKit Requirements
-
It is recommended to install ClusterKit on a shared directory.
-
If such directory does not exist - make sure that all scripts are available on all the hosts in the exact same directory.
-
SLURM or passwordless ssh connectivity across the hosts.
Last updated: