text-generation-inference
feat(server): auto max_batch_total_tokens for flash att models
#630
Merged

feat(server): auto max_batch_total_tokens for flash att models #630

OlivierDehaene merged 19 commits into main from feat/automatic_max
OlivierDehaene
OlivierDehaene feat(server): auto max_batch_total_tokens for flash att models
b165f8b7
OlivierDehaene fix default value
4201a8be
OlivierDehaene fix default value
a6b128b2
OlivierDehaene update logs
086d0c22
OlivierDehaene pad to block size
d2e38435
OlivierDehaene add block size parameter
79616a87
OlivierDehaene revert back to normal allocator
de892fb4
OlivierDehaene cleanup
160a50af
OlivierDehaene OlivierDehaene force pushed from d3115082 to 160a50af 2 years ago
OlivierDehaene add syncs
1686a7c0
OlivierDehaene use max_memory_reserved
36a9bddd
OlivierDehaene sleep to connect to the CI runner
45d24bea
OlivierDehaene add tmate
99568eef
OlivierDehaene reset peak memory
05d2a77e
OlivierDehaene use less memory
0111869a
OlivierDehaene add clear cache when batch is finished
8793ae58
OlivierDehaene revert
7f399cd8
OlivierDehaene try 0.99
0a028018
OlivierDehaene 0.985
406b0940
OlivierDehaene 0.98
2934543a
OlivierDehaene OlivierDehaene merged fe80f536 into main 2 years ago
OlivierDehaene OlivierDehaene deleted the feat/automatic_max branch 2 years ago
flozi00
OlivierDehaene
flozi00

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone