Rocm warp size fix (#5402)
This PR enables building the below extensions for AMD GPUs with warp
size 32.
- transformer_inference
- quantizer
- random_ltd
This PR works stand-alone for torch version <=2.0. For the latest
versions, https://github.com/microsoft/DeepSpeed/pull/5401 is required
to be merged in addition to this PR.
Unit test results (rocm/pytorch:rocm6.1_ubuntu20.04_py3.9_pytorch_2.1.2)
on NAVI3x:
**transformer_inference:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/transformer/inference
Before this PR:
===== 674 failed, 622 skipped, 8 warnings, 1728 errors in 69.37s
(0:01:09) =====
After this PR:
========== 476 failed, 1062 passed, 1486 skipped, 8 warnings in 9.31s
==========
**quantizer:**
pytest --color=yes --durations=0 --verbose -s -m "inference_ops" -rF -n
4 unit/ops/quantizer
Before this PR:
==== 244 failed, 8 warnings in 30.53s ====
After this PR:
====== 186 failed, 58 passed, 8 warnings in 8.89s ======
I could not find random_ltd related unit tests to run.
Fixes:
https://github.com/microsoft/DeepSpeed/issues/4753
https://github.com/microsoft/DeepSpeed/issues/5474
https://github.com/ROCm/DeepSpeed/issues/68
cc: @jithunnair-amd
---------
Co-authored-by: rraminen@amd.com <rraminen>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>