pytorch
aed9bee0 - [inductor] Lower masked_scatter on CUDA (#108803)

Comment changes are shownComment changes are hidden
Commit
1 year ago
[inductor] Lower masked_scatter on CUDA (#108803) This decomposes masked_scatter into `aten.cumsum` and a single pointwise kernel, which is similar to what is done in eager. I only do this for CUDA because on CPU it isn't split into two passes like this so would cause a slowdown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108803 Approved by: https://github.com/lezcano
Author
Committer
Parents
  • test/inductor
    • File
      test_torchinductor_codegen_dynamic_shapes.py
  • torch/_inductor
    • File
      decomposition.py
    • File
      inductor_prims.py
    • File
      lowering.py