pytorch
e4036ed7 - [inductor] Lower masked_scatter on CUDA (#108803)

Commit
1 year ago
[inductor] Lower masked_scatter on CUDA (#108803) This decomposes masked_scatter into `aten.cumsum` and a single pointwise kernel, which is similar to what is done in eager. I only do this for CUDA because on CPU it isn't split into two passes like this so would cause a slowdown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108803 Approved by: https://github.com/lezcano ghstack dependencies: #108802
Author
Committer
Parents
Loading