pytorch
e4285f09 - [inductor] new way to compile f64 libdevice calls (#87189)

Commit

3 years ago

[inductor] new way to compile f64 libdevice calls (#87189) Porting over [torchdynamo/#1633](https://github.com/pytorch/torchdynamo/pull/1633) `torch/_inductor/codegen/triton.py` now defines `libdevice_<function>` variants of some functions. You can request dispatch to those for float64 dtypes when using `register_pointwise` by setting `use_libdevice_for_f64=True`. Other minor changes: - In triton, sigmoid now codegens tl.sigmoid - silu now comes from decomp, not lowering - Some test skips no longer necessary, removed or made xfails Switching to `tl.sigmoid` has exactly same performance. Moving `silu` to decomp does not change anything, same triton code is generated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87189 Approved by: https://github.com/ngimel

Author

Fabio Rocha

Committer

pytorchmergebot

Parents

c56be31d

pytorch e4285f09 - [inductor] new way to compile f64 libdevice calls (#87189)

pytorch
e4285f09 - [inductor] new way to compile f64 libdevice calls (#87189)