DeepSpeed
dbc1b075 - Fix ROCm BF16 conversion intrinsics in inference v2 (#7843) (#7846)

Commit
5 days ago
Fix ROCm BF16 conversion intrinsics in inference v2 (#7843) (#7846) Fixes #7843 On HIP/ROCm (the AMD path), several CUDA-style BF16 intrinsics used in the code are not provided, e.g.: - `__ll2bfloat16_rn` - `__int2bfloat16_rn` - `__short2bfloat16_rn` - `__bfloat162uint_rn` This causes compilation errors on HIP platforms. This PR introduces fallback paths using functions available on HIP platform mirroring the [conversion util in csrc](https://github.com/deepspeedai/DeepSpeed/blob/2c362837b0ef906ea7e7506bab3a625faa945cdd/csrc/includes/conversion_utils.h#L351). The converion paths are: - int/uint -> bf16: convert to float (or double for 64-bit), then to bf16. - bf16 -> int/uint: convert bf16 to float, then to the integer type. - float -> bf16: build from bf16 via supported HIP helpers. Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Parents
Loading