SemanticDiff

pytorch
e3742807 - Use explicit templates in CUDALoops kernels (#41059)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

Use explicit templates in CUDALoops kernels (#41059) Summary: Follow up after https://github.com/pytorch/pytorch/pull/40992 Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely: BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb CompareEQKernel.cu 1.8Mb -> 1.7Mb BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb Pull Request resolved: https://github.com/pytorch/pytorch/pull/41059 Differential Revision: D22458928 Pulled By: malfet fbshipit-source-id: cca623bb6e769cfe372977b08463d98b1a02dd14

Author

malfet

malfet

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading