Fix backward compatibility of torch.amp.custom_fwd for PyTorch < 2.4 (#7920)
`torch.amp.custom_fwd` was introduced in PyTorch 2.4, so installing
DeepSpeed from source with an older PyTorch fails because `setup.py`
triggers an import of the function.
This PR adds a fallback to `torch.cuda.amp.custom_fwd` for PyTorch <
2.4.
---------
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>