DeepSpeed
069ec31c - Fix fp8 gemm (#7265)

Commit
57 days ago
Fix fp8 gemm (#7265) This PR addresses this issue https://github.com/deepspeedai/DeepSpeed/issues/7236. I might have reverted some of the recent changes introduced in this [PR](https://github.com/deepspeedai/DeepSpeed/pull/6932), which was necessary to remove a misaligned address issue on the CUDA kernel. I will get back to this and try to make the necessary changes for the other pass. cc: @mrwyattii @jeffra --------- Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com> Co-authored-by: Reza Yazdani <rezay@microsoft.com> Co-authored-by: Jeff Rasley <jeffra45@gmail.com> Co-authored-by: Michael Wyatt <michael.wyatt@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Parents
  • csrc/fp_quantizer
    • File
      fp_quantize.cpp
  • deepspeed/ops/fp_quantizer
    • File
      fp8_gemm_triton.py
    • File
      quantize.py
  • op_builder
    • File
      fp_quantizer.py