Nvfuser - Type Promotion Fix
Fix Type Promotion failures in [Issue 76046](https://github.com/pytorch/pytorch/issues/76046)
1. Updated nvfuser type promotion rule for codegen kernel;
2. Updated casting for output of nvfuser kernel to respect profiling/TorchScript scalar type;
3. Updated type_inference.cpp to only update device/scalar_type when profiling information is missing.
Additional Type Promotion Fixes:
- test_nvfuser_correctness_softmax_with_dtype_cuda_float32
- test_nvfuser_correctness_softmax_with_dtype_cuda_bfloat16
- test_nvfuser_correctness_softmax_with_dtype_cuda_float16
- test_nvfuser_correctness_softmax_with_dtype_cuda_float32
- test_nvfuser_correctness_log_softmax_dtype_cuda_bfloat16
- test_nvfuser_correctness_log_softmax_dtype_cuda_bool
- test_nvfuser_correctness_log_softmax_dtype_cuda_float16
- test_nvfuser_correctness_log_softmax_dtype_cuda_float32
- test_nvfuser_correctness_sum_cuda_int32
- test_nvfuser_correctness_sum_to_size_cuda_int32
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76343
Approved by: https://github.com/jjsjann123, https://github.com/mruberry