NVIDIA Collective Communications Library https://developer.nvidia.com/nccl
2025/07:测试CUDA Toolkit 12里面没有包含NCCL(的头文件,so文件没有检查)
自行编译:https://github.com/NVIDIA/nccl
clone好代码之后,执行:make -j src.build
需要的头文件(其实就一个nccl.h)和so文件就构建完成,可以拿去用了
[zh-ge@gekko build]$ pwd /home/zh-ge/git/nccl/build [zh-ge@gekko build]$ ls -lh total 16K drwxr-xr-x 2 zh-ge ppl 4.0K Jul 17 15:04 bin drwxr-xr-x 2 zh-ge ppl 4.0K Jul 17 15:04 include drwxr-xr-x 3 zh-ge ppl 4.0K Jul 17 15:08 lib drwxr-xr-x 9 zh-ge ppl 4.0K Jul 17 15:04 obj
// Generated by AI #include <nccl.h> ncclComm_t comm; int nranks = 4; // 4个GPU int myrank = 0; // 当前GPU编号 cudaSetDevice(myrank); ncclUniqueId id; ncclGetUniqueId(&id); ncclCommInitRank(&comm, nranks, id, myrank); float* sendbuff; float* recvbuff; size_t count = 1024; cudaMalloc(&sendbuff, count * sizeof(float)); cudaMalloc(&recvbuff, count * sizeof(float)); ncclAllReduce((const void*)sendbuff, (void*)recvbuff, count, ncclFloat, ncclSum, comm, 0); // 0 表示使用默认 CUDA Stream ncclCommDestroy(comm); cudaFree(sendbuff); cudaFree(recvbuff);