Quantized KV Cache #30483
gante
commented
on 2024-05-01
gante
commented
on 2024-05-01
zucchini-nlp
marked this pull request as ready for review 1 year ago
gante
commented
on 2024-05-02
gante
approved these changes
on 2024-05-08
zucchini-nlp
changed the title [POC] Quantized KV Cache Quantized KV Cache 1 year ago
clean-up
16731be7
Update src/transformers/cache_utils.py
cf00de65
Update src/transformers/cache_utils.py
6de0d8a6
Update src/transformers/cache_utils.py
5a87cbbd
fixup
bfe0804b
Update tests/quantization/quanto_integration/test_quanto.py
62abc33f
Update src/transformers/generation/configuration_utils.py
519682ca
more suggestions
6f19ceea
mapping if torch available
2b8e0420
Merge branch 'main' into quant
4c65b5bf
run tests & add 'support_quantized' flag
1d9cf15e
Merge branch 'main' into quant
a58aa9df
fix jamba test
d1813392
revert, will be fixed by another PR
e658c9f5
Merge branch 'main' into quant
9397dda5
codestyle
045e4063
HQQ and versatile cache classes
3193b435
final update
126ce846
gante
approved these changes
on 2024-05-23
Merge "main"
0d1df5f0
typo
89413d38
make tests happy
9d0f6686
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub