[ROCm] Enable BFloat16 type for pooling ops (#34166)
Summary:
This PR enables bfloat16 type for pooling ops on ROCm. Also adds bfloat16 implementation of atomicAdd since pooling ops use it.
Note: Changes in the lambda function blocks is only indentation as it is now wrapped inside `AT_SKIP_BFLOAT16_IF_NOT_ROCM` macro.
iotamudelta ezyang bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34166
Differential Revision: D20263421
Pulled By: ezyang
fbshipit-source-id: 3f4199ec57522e638ec29f45e22c6ec919b7816d