[NNC] Fix an issue in Cuda fusion with fp16 scalar vars coerced to float (#47229)
Summary:
Fixes an issue where fp16 scalars created by the registerizer could be referenced as floats - causing invalid conversions which would crash in the NVRTX compile. I also noticed that we were inserting patterns like `float(half(float(X)))` and added a pass to collapse those down inside the CudaHalfScalarRewriter.
Fixes https://github.com/pytorch/pytorch/issues/47138
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47229
Reviewed By: agolynski
Differential Revision: D24706475
Pulled By: nickgg
fbshipit-source-id: 9df72bbbf203353009e98b9cce7ab735efff8b21