Enabled `roll` & `diag` for BFloat16 dtype on CUDA (#57916)
Summary:
Enabled `roll` & `diag` for BFloat16 dtype on CUDA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57916
Reviewed By: agolynski
Differential Revision: D28393534
Pulled By: ngimel
fbshipit-source-id: fc1d8555b23a75f8b24c2ad826f89cd4e14cf487