Attempt at automatic max batch prefill.
54d3c815
Taking into account number of shards.
fa912440
Adding more cards.
23c0a20d
Adding A100 + H100
5bcb3e6a
Adding a few more cards.
e85dc0a0
Logprobs cost too much.
96ad65b5
h100 better name, and keep factor of 2
748dce60
Damn inflated sparse tflops.
3a53e8c2
Typo in h100.
3ec9259b
Updated the flops calculation (checked with fvcore).
9fab7c66
chunking by default.
db111495
Fix prefix caching for chat completion since we removed logprobs.
1352f708
More tests.
13e6d522
Dropping all the prefill logprobs.
f6998f84
Add a flag that enables users to get logprobs back.
3a86afc7
Repairing prompt token counting.
3ed703c2
Fixing a few tests.
a78b6fd1
Remove some scaffolding.
ca8a115a
Attempting to reduces the issues (workarounds for now).
f022ecfa
Narsil
merged
5df80590
into main 1 year ago
Narsil
deleted the auto_max_prefill branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub