Update shallow_copy_and_detach for nested tensor impls (#83002)
# Summary
This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result" for use in the backward the output of the forward op is saved using:
```
if (grad_fn) {
grad_fn->result_ = SavedVariable(result, true);
}
```
SavedVariable calls a series of functions which in turn calls shallow_copy_and_detach and when https://github.com/pytorch/pytorch/blob/c179597753f20089056115542d3b9c74ab8b864c/c10/core/TensorImpl.cpp#L533 is hit this calls sizes_custom() which is not implemented and errors. I also noticed that since the storage format is different for nested_tensor not `storage_ ` but instead two tensors that the we should actually be calling the NestedTensorImpl constructor.
This PR overrides shallow_copy_and_detach from the derived class and ensures that shallow copy works correctly.
## Update
- Added the softmax derivative in this PR because that is a direct use case that was blocked by not having shallow_copy_and_detach work correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83002
Approved by: https://github.com/albanD