text-generation-inference
6f88bd93
- feat: add triton kernels to decrease latency of large batches (#2687)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
feat: add triton kernels to decrease latency of large batches (#2687) * feat: add triton kernels to decrease latency of large batches * cast to int32 * fix kernel * fix kernel * disable triton on rocm * fix speculation * add slots filtering kernel
References
#2687 - feat: add triton kernels to decrease latency of large batches
Author
OlivierDehaene
Parents
0f346a32
Loading