feat(server): auto max_batch_total_tokens for flash att models #630
feat(server): auto max_batch_total_tokens for flash att models
b165f8b7
fix default value
4201a8be
fix default value
a6b128b2
update logs
086d0c22
pad to block size
d2e38435
add block size parameter
79616a87
revert back to normal allocator
de892fb4
cleanup
160a50af
add syncs
1686a7c0
use max_memory_reserved
36a9bddd
sleep to connect to the CI runner
45d24bea
add tmate
99568eef
reset peak memory
05d2a77e
use less memory
0111869a
add clear cache when batch is finished
8793ae58
revert
7f399cd8
try 0.99
0a028018
0.985
406b0940
0.98
2934543a
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub