PR #2628 Add support for FP8 KV cache scales

Add support for FP8 KV cache scales #2628

danieldk merged 3 commits into main from feature/fp8-kv-cache-scale

danieldk force pushed from 52bbb233 to fb9bd07c 1 year ago

danieldk force pushed from fb9bd07c to 08c0b3f2 1 year ago

danieldk force pushed from 08c0b3f2 to 98efcb49 1 year ago

danieldk marked this pull request as ready for review 1 year ago

Add support for FP8 KV cache scales

ba4ac963

Update FP8 KV cache test to use checkpoint with scales

1f18cb6a

danieldk force pushed from 98efcb49 to 1f18cb6a 1 year ago

mht-sharma commented on 2024-10-24

`can_scale`: check that the attention is flashinfer

a68fae05

danieldk requested a review from

mht-sharma 1 year ago

mht-sharma approved these changes on 2024-10-24

danieldk merged eab07f74 into main 1 year ago

danieldk deleted the feature/fp8-kv-cache-scale branch 1 year ago

Reviewers

mht-sharma

Assignees

No one assigned

Labels

None yet

Milestone

No milestone