vllm
98e7f223
- enable skipping of SW attention layers when using FP8 KV cache (#33695)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
38 days ago
enable skipping of SW attention layers when using FP8 KV cache (#33695) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
References
#33695 - enable skipping of SW attention layers when using FP8 KV cache
Author
jmkuebler
Parents
b111f8a6
Loading