jax
54edaf6e - [Mosaic GPU] Add a warp specialized kernel with a separate TMA warp

Commit
1 year ago
[Mosaic GPU] Add a warp specialized kernel with a separate TMA warp With this kernel we're able to significantly improve the performance of large head_dim kernels, reaching ~62% utilization for 4k sequence length and ~71% for 32k. TODO: the two kernels are quite similar and it should be possible to collapse them into one PiperOrigin-RevId: 647597865
Author
Committer
Parents
Loading