Avoid saving self for`softmax` and `log_softmax` (#65242)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64000
- updates double backward formula to compute grad wrt output instead of self
- ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242
Reviewed By: albanD
Differential Revision: D31238123
Pulled By: soulitzer
fbshipit-source-id: afd319d3676d9ef8d81607e0e8c2a3e6d09f68e4