onnxruntime
8f34c8c8 - Introduce collective ops to ort inference build (#14399)

Commit

3 years ago

Introduce collective ops to ort inference build (#14399) ### Description Introduce collective ops into onnxruntime inference build, including 1) AllReduce and AllGather schema in contrib op, controlled by USE_MPI flag 2) AllReduce and AllGather kernel in cuda EP, controlled by ORT_USE_NCCL flag ### Motivation and Context Enable the collective ops in onnxruntime inference build so we have the ability to run distributed inference with multiple GPUs. The original ncclAllReduce ops in training build require quite complex configurations, which is not suitable for inference case, and it already broken. so we introduce a new implementation. --------- Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

Author

souptc

Parents

b539c364

onnxruntime 8f34c8c8 - Introduce collective ops to ort inference build (#14399)

onnxruntime
8f34c8c8 - Introduce collective ops to ort inference build (#14399)