Add loss operators to fp32 cast policy of AutocastCPU (#81689)
### Description
Add loss operators to fp32 cast policy of AutocastCPU to improve accuracy of BFloat16 training. There will be no performance impact on fp32, only a slight impact on bf16 training.
#### Remove _convolution
This is because conv transpose does not fully support bf16 before, and it will be replaced to _convolution in graph mode. If _convolution is in lower precision cast policy it will throw dtype related errors.
conv transpose does not fully support bf16 yet, so _convolution still needs to be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81689
Approved by: https://github.com/malfet