text-generation-inference
6f88bd93 - feat: add triton kernels to decrease latency of large batches (#2687)

Commit
1 year ago
feat: add triton kernels to decrease latency of large batches (#2687) * feat: add triton kernels to decrease latency of large batches * cast to int32 * fix kernel * fix kernel * disable triton on rocm * fix speculation * add slots filtering kernel
Parents
Loading