onnxruntime
6c7da5e9 - Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418)

Commit

5 years ago

Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418) For the special case where all variadic inputs of a kernel are the same shape (i.e. no broadcasting is required) and there are few enough of them, we perform the entire computation in a single kernel. The general implementation (which was previously used for this special case) handles broadcasting by repeatedly invoking a binary kernel on successive inputs.

References

#4418 - Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels

Author

edgchen1

Parents

04586fc0

onnxruntime 6c7da5e9 - Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418)

onnxruntime
6c7da5e9 - Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418)