text-generation-inference
Add support for FP8 KV cache scales
#2628
Merged

Add support for FP8 KV cache scales #2628

danieldk merged 3 commits into main from feature/fp8-kv-cache-scale
danieldk
danieldk danieldk force pushed from 52bbb233 to fb9bd07c 1 year ago
danieldk danieldk force pushed from fb9bd07c to 08c0b3f2 1 year ago
danieldk danieldk force pushed from 08c0b3f2 to 98efcb49 1 year ago
danieldk danieldk marked this pull request as ready for review 1 year ago
danieldk Add support for FP8 KV cache scales
ba4ac963
danieldk Update FP8 KV cache test to use checkpoint with scales
1f18cb6a
danieldk danieldk force pushed from 98efcb49 to 1f18cb6a 1 year ago
mht-sharma
mht-sharma commented on 2024-10-24
danieldk `can_scale`: check that the attention is flashinfer
a68fae05
danieldk danieldk requested a review from mht-sharma mht-sharma 1 year ago
mht-sharma
mht-sharma approved these changes on 2024-10-24
danieldk danieldk merged eab07f74 into main 1 year ago
danieldk danieldk deleted the feature/fp8-kv-cache-scale branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone