[AMD][ROCm] Improve support of AMD (#7448)
The patch delivers several fixes for building issues for CUDA part of
DeepSpeed library.
Percentage of passed unit tests improved(tested on RDNA hardware,
gfx110x and gfx12x) Before:
collected 5298 items / 15 skipped
2773 failed, 862 passed, 1665 skipped, 13 errors
After:
collected 5851 items / 11 skipped
4187 failed, 1373 passed, 292 skipped, 10 errors
Regarding testing of **fp_quantizer(DS_BUILD_FP_QUANTIZER)** via
`tests/unit/ops/fp_quantizer/test_fp_quant.py`, this test depends on
QPyTorch which should be patched before run on AMD, please apply
https://github.com/Tiiiger/QPyTorch/pull/71
---------
Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>