DeepSpeed
069ec31c - Fix fp8 gemm (#7265)

Commit

57 days ago

Fix fp8 gemm (#7265) This PR addresses this issue https://github.com/deepspeedai/DeepSpeed/issues/7236. I might have reverted some of the recent changes introduced in this [PR](https://github.com/deepspeedai/DeepSpeed/pull/6932), which was necessary to remove a misaligned address issue on the CUDA kernel. I will get back to this and try to make the necessary changes for the other pass. cc: @mrwyattii @jeffra --------- Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com> Co-authored-by: Reza Yazdani <rezay@microsoft.com> Co-authored-by: Jeff Rasley <jeffra45@gmail.com> Co-authored-by: Michael Wyatt <michael.wyatt@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

References

#7265 - Fix fp8 gemm

Author

RezaYazdaniAminabadi

Parents

e1ba9e61

Files4

csrc/fp_quantizer
- fp_quantize.cpp
deepspeed/ops/fp_quantizer
- fp8_gemm_triton.py
- quantize.py
op_builder
- fp_quantizer.py

DeepSpeed 069ec31c - Fix fp8 gemm (#7265)

DeepSpeed
069ec31c - Fix fp8 gemm (#7265)