feat: add triton kernels to decrease latency of large batches #2687
feat: add triton kernels to decrease latency of large batches
ea66379e
cast to int32
d1e95cea
fix kernel
347f3f51
fix kernel
a7465ba6
OlivierDehaene
marked this pull request as ready for review 1 year ago
disable triton on rocm
2b25e9a9
fix speculation
b4ebfa52
add slots filtering kernel
50b394d4
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub