jax
54edaf6e - [Mosaic GPU] Add a warp specialized kernel with a separate TMA warp

Commit

1 year ago

[Mosaic GPU] Add a warp specialized kernel with a separate TMA warp With this kernel we're able to significantly improve the performance of large head_dim kernels, reaching ~62% utilization for 4k sequence length and ~71% for 32k. TODO: the two kernels are quite similar and it should be possible to collapse them into one PiperOrigin-RevId: 647597865

References

#22143 - [Mosaic GPU] Add a warp specialized kernel with a separate TMA warp

Author

apaszke

Committer

a-googler

Parents

24b42eed

jax 54edaf6e - [Mosaic GPU] Add a warp specialized kernel with a separate TMA warp

jax
54edaf6e - [Mosaic GPU] Add a warp specialized kernel with a separate TMA warp