Auto max prefill #2797

Narsil merged 19 commits into main from auto_max_prefill
Narsil
Narsil Attempt at automatic max batch prefill.
54d3c815
Narsil Taking into account number of shards.
fa912440
Narsil Adding more cards.
23c0a20d
Narsil Adding A100 + H100
5bcb3e6a
Narsil Adding a few more cards.
e85dc0a0
Narsil Logprobs cost too much.
96ad65b5
Narsil h100 better name, and keep factor of 2
748dce60
Narsil Damn inflated sparse tflops.
3a53e8c2
Narsil Typo in h100.
3ec9259b
Narsil Updated the flops calculation (checked with fvcore).
9fab7c66
Narsil chunking by default.
db111495
Narsil Fix prefix caching for chat completion since we removed logprobs.
1352f708
Narsil More tests.
13e6d522
Narsil Dropping all the prefill logprobs.
f6998f84
Narsil Add a flag that enables users to get logprobs back.
3a86afc7
Narsil Repairing prompt token counting.
3ed703c2
Narsil Fixing a few tests.
a78b6fd1
HuggingFaceDocBuilderDev
Narsil Remove some scaffolding.
ca8a115a
Narsil Attempting to reduces the issues (workarounds for now).
f022ecfa
Narsil Narsil requested a review from danieldk danieldk 1 year ago
Narsil Narsil requested a review from danieldk danieldk 1 year ago
Narsil Narsil merged 5df80590 into main 1 year ago
Narsil Narsil deleted the auto_max_prefill branch 1 year ago
danieldk
danieldk commented on 2024-12-06

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone