custom allreduce cuda kernel (#20703)

Commit

1 year ago

custom allreduce cuda kernel (#20703) ### Description  Conditionally route to custom AllReduce kernel when buffer size and gpu numbers meet certain requirements. Otherwise, keep using NCCL's AllReduce. ### Motivation and Context  --------- Co-authored-by: Ye Wang <wangye@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net> Co-authored-by: Your Name <you@example.com>

References

#20703 - custom allreduce cuda kernel

Author

wangyems

Parents

9daed556

onnxruntime f35dd140 - custom allreduce cuda kernel (#20703)

onnxruntime
f35dd140 - custom allreduce cuda kernel (#20703)