pytorch
e4e151d6 - [inductor] Inline ComputedBuffer computation when there are no reads (#102000)

Commit
1 year ago
[inductor] Inline ComputedBuffer computation when there are no reads (#102000) When inductor compiles the following example, ```python def flip(x): idx = torch.arange(x.shape[0] - 1, -1, -1, device=x.device) return x[idx], idx ``` The return of `idx` forces it to be realized into a `ComputedBuffer` and the downstream index call inserts a corresponding load and indirect_indexing: ```python tmp0 = tl.load(in_ptr0 + (x1), None) tmp1 = triton_helpers.promote_to_tensor(tmp0) tl.device_assert((0 <= tmp1) & (tmp1 < 128), "index out of bounds: 0 <= tmp1 < 128") tmp2 = tl.load(in_ptr1 + (x0 + (128*tmp0)), None) ``` However, if we can inline the index expression from the buffer's computation we instead get direct indexing (and half the loads): ```python tmp0 = tl.load(in_ptr0 + (127 + ((-1)*x0)), None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102000 Approved by: https://github.com/lezcano
Author
Committer
Parents
Loading