pytorch
a8653f35 - One more small Perf Tweak to fill_ (#110294)

Commit
1 year ago
One more small Perf Tweak to fill_ (#110294) # Summary Perf win by check which device tensors are on ## Before this PR: ``` Shell CPU | CPU: 1.3328152848407626 GPU | GPU: 6.614773320034146 CPU | GPU: 29.027153505012393 GPU | CPU: 17.22372299991548 ``` ## After this PR ``` Shell CPU | CPU: 1.4241038949694484 GPU | GPU: 7.060713530518115 CPU | GPU: 15.149936103262007 GPU | CPU: 5.774620908778161 ``` #### Repro Script ``` Python a = torch.tensor([0.2, 0.5], device="cpu") amax = torch.tensor(0.5, device="cpu") print(f"CPU | CPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") a = torch.tensor([0.2, 0.5], device="cuda") amax = torch.tensor(0.5, device="cuda") print(f"GPU | GPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") a = torch.tensor([0.2, 0.5], device="cpu") amax = torch.tensor(0.5, device="cuda") print(f"CPU | GPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") a = torch.tensor([0.2, 0.5], device="cuda") amax = torch.tensor(0.5, device="cpu") print(f"GPU | CPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110294 Approved by: https://github.com/mikaylagawarecki
Author
Committer
Parents
Loading