vllm
fa0050db
- [Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
336 days ago
[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: mgoin <michael@neuralmagic.com>
References
#8651 - [Core] Default to using per_token quantization for fp8 when cutlass is supported.
Author
elfiegg
Parents
cd9d06fb
Loading