pytorch
37513a11 - Use explicit templates in CUDALoops kernels (#44286)

Commit
4 years ago
Use explicit templates in CUDALoops kernels (#44286) Summary: Reland attempt of https://github.com/pytorch/pytorch/pull/41059 Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely: BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb CompareEQKernel.cu 1.8Mb -> 1.7Mb BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb Pull Request resolved: https://github.com/pytorch/pytorch/pull/44286 Reviewed By: ngimel Differential Revision: D23859691 Pulled By: malfet fbshipit-source-id: 2c4e86f35e0f94a62294dc5d52a3ba364db23e2d
Author
Parents
Loading