pytorch
dde7fff0 - [PyTorch] Avoid refcount bumps in addmm_out_cuda_impl (#54935)

Commit
3 years ago
[PyTorch] Avoid refcount bumps in addmm_out_cuda_impl (#54935) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54935 Bunch of avoidable copying of Tensor objects, which results in a refcount bump. ghstack-source-id: 125216023 Test Plan: Compared percentage of self time spent in addmm_out_cuda_impl while running the following sample: ``` +import torch +import torch.nn as nn + +m = nn.Linear(1024, 1024).cuda().half() +x = torch.randn(16, 1024).cuda().half() +while True: y = m(x) ``` in perf record, decreased from 0.74% to 0.56%. Reviewed By: ngimel Differential Revision: D27420388 fbshipit-source-id: d2c5e4c4899cd02c60c45735b2d72c4ed913f6e8
Author
Parents
Loading