PR #30483 Quantized KV Cache

Quantized KV Cache #30483

zucchini-nlp merged 21 commits into huggingface:main from zucchini-nlp:quant

zucchini-nlp requested a review from

gante 1 year ago

zucchini-nlp requested a review from

younesbelkada 1 year ago

younesbelkada commented on 2024-04-25

gante commented on 2024-05-01

zucchini-nlp marked this pull request as ready for review 1 year ago

gante commented on 2024-05-02

younesbelkada commented on 2024-05-02

gante approved these changes on 2024-05-08

gante requested a review from

ArthurZucker 1 year ago

ArthurZucker commented on 2024-05-09

zucchini-nlp changed the title ~~[POC] Quantized KV Cache~~ Quantized KV Cache 1 year ago

clean-up

16731be7

zucchini-nlp force pushed to 16731be7 1 year ago

Update src/transformers/cache_utils.py

cf00de65

Update src/transformers/cache_utils.py

6de0d8a6

Update src/transformers/cache_utils.py

5a87cbbd

fixup

bfe0804b

zucchini-nlp requested a review from

ArthurZucker 1 year ago

younesbelkada approved these changes on 2024-05-10

ArthurZucker commented on 2024-05-10

Update tests/quantization/quanto_integration/test_quanto.py

62abc33f

Update src/transformers/generation/configuration_utils.py

519682ca

more suggestions

6f19ceea

mapping if torch available

2b8e0420

zucchini-nlp requested a review from

ArthurZucker 1 year ago

ArthurZucker approved these changes on 2024-05-13

Merge branch 'main' into quant

4c65b5bf

run tests & add 'support_quantized' flag

1d9cf15e

Merge branch 'main' into quant

a58aa9df

fix jamba test

d1813392

revert, will be fixed by another PR

e658c9f5

Merge branch 'main' into quant

9397dda5

codestyle

045e4063

HQQ and versatile cache classes

3193b435

final update

126ce846

gante approved these changes on 2024-05-23

Merge "main"

0d1df5f0

typo

89413d38

make tests happy

9d0f6686

zucchini-nlp merged d583f131 into main 1 year ago

Reviewers

ArthurZucker

gante

younesbelkada

SunMarc

ydshieh

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

transformers Quantized KV Cache #30483 Merged

Quantized KV Cache #30483

transformers
Quantized KV Cache
#30483

Merged