[NNC] Fix two more bugs in Cuda Half support (#46129)
Summary:
Fixes two bugs reported by https://github.com/pytorch/pytorch/issues/45953 in the NNC Cuda codegen which could break when using Half floats:
1. The Registerizer will generate new scalars with the type of the load being replaced, and doesn't have Cuda specific logic to avoid using the half type. I've added a quick mutator to coerce these to float, similar to the existing load casting rules.
2. We're not handling explicit casts to Half inserted by the user (in the report the user being the JIT). Addressing this by replacing these with casts to Float since thats the type we do Half math in.
Fixes https://github.com/pytorch/pytorch/issues/45953.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46129
Reviewed By: glaringlee
Differential Revision: D24253639
Pulled By: nickgg
fbshipit-source-id: 3fef826eab00355c81edcfabb1030332cae595ac