pytorch
4e21fc20 - In inductor triton generated code, avoid masking when numel=1 (#91254)

Commit
2 years ago
In inductor triton generated code, avoid masking when numel=1 (#91254) This is implementing an idea from @lezcano : if we have a generated triton kernel with `xnumel=1`, then `xmask` is just `0<1` and can be dropped from all `load`/`store`/`where`. The `xnumel=1` case actually comes up relatively often when code for reductions is being generated. @lezcano reported some performance gains in micro-benchmarks (see comment below) and it is a very simple change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91254 Approved by: https://github.com/jansel, https://github.com/ngimel
Author
Fabio Rocha
Committer
Parents
Loading