torch.flip and torch.flip{lr, ud}: Half support for CPU and BFloat16 support for CPU & CUDA (#49895)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49889
Also adds BFloat16 support for CPU and CUDA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49895
Reviewed By: mrshenli
Differential Revision: D25746272
Pulled By: mruberry
fbshipit-source-id: 0b6a9bc13ae60c22729a0aea002ed857c36f14ff