`topk` on CUDA supports `bfloat16` (#59977)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56176 via https://github.com/pytorch/pytorch/issues/58196
CC zasdfgbnm ngimel ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59977
Reviewed By: mrshenli
Differential Revision: D29315018
Pulled By: ngimel
fbshipit-source-id: 0a87e7f155a97225fc6b2ec5dc0dc38a23156b41