vllm
fa0050db - [Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)

Commit

336 days ago

[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: mgoin <michael@neuralmagic.com>

References

#8651 - [Core] Default to using per_token quantization for fp8 when cutlass is supported.

Author

elfiegg

Parents

cd9d06fb

vllm fa0050db - [Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)

vllm
fa0050db - [Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)