pytorch
be94dba4 - [NNC] fix support for FP16 in CudaCodgen (#44209)

Commit

4 years ago

[NNC] fix support for FP16 in CudaCodgen (#44209) Summary: Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load. Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209 Reviewed By: izdeby Differential Revision: D23575577 Pulled By: nickgg fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46

Author

nickgg

Committer

facebook-github-bot

Parents

9f54bcc5

pytorch be94dba4 - [NNC] fix support for FP16 in CudaCodgen (#44209)

pytorch
be94dba4 - [NNC] fix support for FP16 in CudaCodgen (#44209)