SemanticDiff

pytorch
e8727994 - Add symmetric version of gpu_kernel_with_scalars

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

2 years ago

Add symmetric version of gpu_kernel_with_scalars `gpu_kernel_with_scalars` produces 3 calls to `gpu_kernel` for `f(a, b)` where either of a or b can be cpu scalars. If `f` happens to be symmetric (i.e. `f(a, b) == f(b, a)`) then we only need 2 calls to `gpu_kernel` thus reducing the cuda context size. On my build for 1 cuda architecture, this reduces `torch_cuda_cu.so` by 24.5 MB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78989 Approved by: https://github.com/ngimel

Author

peterbell10

peterbell10

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading