transformers
Quantized KV Cache
#30483
Merged

Quantized KV Cache #30483

zucchini-nlp merged 21 commits into huggingface:main from zucchini-nlp:quant
zucchini-nlp
zucchini-nlp zucchini-nlp requested a review from gante gante 1 year ago
zucchini-nlp zucchini-nlp requested a review from younesbelkada younesbelkada 1 year ago
zucchini-nlp
younesbelkada
younesbelkada commented on 2024-04-25
zucchini-nlp
younesbelkada
HuggingFaceDocBuilderDev
gante
gante commented on 2024-05-01
gante
gante commented on 2024-05-01
zucchini-nlp zucchini-nlp marked this pull request as ready for review 1 year ago
gante
gante commented on 2024-05-02
younesbelkada
younesbelkada commented on 2024-05-02
zucchini-nlp
gante
gante approved these changes on 2024-05-08
gante
gante gante requested a review from ArthurZucker ArthurZucker 1 year ago
ArthurZucker
ArthurZucker commented on 2024-05-09
zucchini-nlp zucchini-nlp changed the title [POC] Quantized KV Cache Quantized KV Cache 1 year ago
zucchini-nlp clean-up
16731be7
zucchini-nlp zucchini-nlp force pushed to 16731be7 1 year ago
zucchini-nlp Update src/transformers/cache_utils.py
cf00de65
zucchini-nlp Update src/transformers/cache_utils.py
6de0d8a6
zucchini-nlp Update src/transformers/cache_utils.py
5a87cbbd
zucchini-nlp fixup
bfe0804b
zucchini-nlp zucchini-nlp requested a review from ArthurZucker ArthurZucker 1 year ago
younesbelkada
younesbelkada approved these changes on 2024-05-10
ArthurZucker
ArthurZucker commented on 2024-05-10
zucchini-nlp Update tests/quantization/quanto_integration/test_quanto.py
62abc33f
zucchini-nlp Update src/transformers/generation/configuration_utils.py
519682ca
zucchini-nlp more suggestions
6f19ceea
zucchini-nlp mapping if torch available
2b8e0420
zucchini-nlp zucchini-nlp requested a review from ArthurZucker ArthurZucker 1 year ago
ArthurZucker
ArthurZucker approved these changes on 2024-05-13
ydshieh
ArthurZucker
gante
zucchini-nlp Merge branch 'main' into quant
4c65b5bf
zucchini-nlp run tests & add 'support_quantized' flag
1d9cf15e
zucchini-nlp Merge branch 'main' into quant
a58aa9df
zucchini-nlp fix jamba test
d1813392
zucchini-nlp revert, will be fixed by another PR
e658c9f5
zucchini-nlp
ArthurZucker
zucchini-nlp Merge branch 'main' into quant
9397dda5
zucchini-nlp codestyle
045e4063
zucchini-nlp
zucchini-nlp HQQ and versatile cache classes
3193b435
ArthurZucker
zucchini-nlp final update
126ce846
zucchini-nlp
gante
gante approved these changes on 2024-05-23
zucchini-nlp Merge "main"
0d1df5f0
zucchini-nlp typo
89413d38
zucchini-nlp make tests happy
9d0f6686
zucchini-nlp
zucchini-nlp zucchini-nlp merged d583f131 into main 1 year ago
ydshieh
zucchini-nlp
ydshieh

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone