Add support to foreach torch empty for bfloat16s (#90437)
# Summary
When training a model with SGD(..., foreach=true) found that bfloat16 model was erroring with no cuda support for empty.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90437
Approved by: https://github.com/soumith