text-generation-inference
feat: add triton kernels to decrease latency of large batches
#2687
Merged

feat: add triton kernels to decrease latency of large batches #2687

OlivierDehaene merged 7 commits into main from feat/triton_prepare
OlivierDehaene
OlivierDehaene feat: add triton kernels to decrease latency of large batches
ea66379e
OlivierDehaene cast to int32
d1e95cea
OlivierDehaene fix kernel
347f3f51
OlivierDehaene fix kernel
a7465ba6
OlivierDehaene OlivierDehaene force pushed from 04019bec to a7465ba6 1 year ago
OlivierDehaene OlivierDehaene requested a review from Narsil Narsil 1 year ago
OlivierDehaene OlivierDehaene marked this pull request as ready for review 1 year ago
OlivierDehaene disable triton on rocm
2b25e9a9
OlivierDehaene fix speculation
b4ebfa52
OlivierDehaene add slots filtering kernel
50b394d4
OlivierDehaene OlivierDehaene merged 6f88bd93 into main 1 year ago
OlivierDehaene OlivierDehaene deleted the feat/triton_prepare branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone