pytorch
4930ae7f - [MPS] Add roll op (#95168)

Commit
2 years ago
[MPS] Add roll op (#95168) Reuse the cpu implementation here as currently there is no native roll implementation from the MPS api (if any, please let me know). Compared to falling back to cpu using `PYTORCH_ENABLE_MPS_FALLBACK=1`, this way we keep tensors on MPS. Did a small benchmark: ```python for num in [10, 100, 1000, 10000]: for shft in [1, 5]: sz = num * num x = torch.arange(sz, device="cpu").view(num, num) s = time.time() r = torch.roll(x, shft) cpu_e = time.time() - s x = torch.arange(sz, device="mps").view(num, num) s = time.time() r = torch.roll(x, shft) mps_e = time.time() - s print(f"size: ({num}, {num}) shft: {shft} cpu: {cpu_e} mps: {mps_e}") ``` ``` size: (10, 10) shft: 1 cpu: 0.00015163421630859375 mps: 0.003078937530517578 size: (10, 10) shft: 5 cpu: 6.794929504394531e-05 mps: 0.0014979839324951172 size: (100, 100) shft: 1 cpu: 0.0001621246337890625 mps: 0.0016200542449951172 size: (100, 100) shft: 5 cpu: 0.00016379356384277344 mps: 0.00154876708984375 size: (1000, 1000) shft: 1 cpu: 0.0022068023681640625 mps: 0.0017690658569335938 size: (1000, 1000) shft: 5 cpu: 0.009071111679077148 mps: 0.0020020008087158203 size: (10000, 10000) shft: 1 cpu: 0.16785407066345215 mps: 0.011695146560668945 size: (10000, 10000) shft: 5 cpu: 0.1160881519317627 mps: 0.011452913284301758 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95168 Approved by: https://github.com/albanD
Author
Committer
Parents
Loading