text-generation-inference
72ab60fd - Use FP8 KV cache when specified by compressed-tensors (#2761)

Commit
1 year ago
Use FP8 KV cache when specified by compressed-tensors (#2761) The compressed-tensors configuration can specify the configuration of the KV cache as well. Use an FP8 KV cache when the configuration tells us to do so (all other options and types are ignored for now).
Author
Parents
Loading