pytorch
5fe834af - [inductor] Insert triton barrier before storing to inplace buffers (#100769)

Commit
1 year ago
[inductor] Insert triton barrier before storing to inplace buffers (#100769) The linked issue demonstrates a triton bug where a load broadcasted over multiple warps may see the result of a store that happens later in the triton program. The workaround is to add a barrier before storing, which enforces that all warps have already read the data. e.g. in `test_embedding_var_mean` we now generate: ```python tl.debug_barrier() tl.store(in_out_ptr1 + (tl.broadcast_to(x0, [XBLOCK, 1])), tmp17, None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100769 Approved by: https://github.com/jansel, https://github.com/ngimel
Author
Committer
Parents
Loading