vllm
[AMD][FP8] Using MI300 FP8 format on ROCm for block_quant
#12134
Merged

[AMD][FP8] Using MI300 FP8 format on ROCm for block_quant #12134

gshtras
gshtras164 days ago (edited 164 days ago)

Requantizing fp8 weights into NANOO format on ROCm platform.
Conditionally using e4m3fnuz where appropriate.

This is essential for DeepSeek V3 support

gshtras Requantizing fp8 weights into NANOO format on rocm platform. Conditio…
1d54e3cb
github-actions
github-actions164 days ago

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

mgoin
mgoin approved these changes on 2025-01-17
mgoin163 days ago

Makes sense and LGTM. With these changes have you validated DSv3 on MI300?

mgoin mgoin added ready
gshtras
gshtras163 days ago👍 1

Makes sense and LGTM. With these changes have you validated DSv3 on MI300?

In a combination with the flash attention version from ROCm/vllm, which will be coming in the following weeks, together with LLama3.2 support

So converting the weights this way was required to turn nans into a reasonable PPL score.

mgoin mgoin enabled auto-merge (squash) 163 days ago
mgoin mgoin merged b5b57e30 into main 163 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone