text-generation-inference
72ab60fd - Use FP8 KV cache when specified by compressed-tensors (#2761)

Commit

1 year ago

Use FP8 KV cache when specified by compressed-tensors (#2761) The compressed-tensors configuration can specify the configuration of the KV cache as well. Use an FP8 KV cache when the configuration tells us to do so (all other options and types are ignored for now).

References

#2761 - Use FP8 KV cache when specified by compressed-tensors

Author

danieldk

Parents

289aa485

text-generation-inference 72ab60fd - Use FP8 KV cache when specified by compressed-tensors (#2761)

text-generation-inference
72ab60fd - Use FP8 KV cache when specified by compressed-tensors (#2761)