jax
af44ec5c - [Mosaic TPU] Allow small sublane tilings in second/third minor transpose for unpacked dtype instead of forcing native tiling, which can be more efficient when the second minor dimension of input tensor is small.

Commit
50 days ago
[Mosaic TPU] Allow small sublane tilings in second/third minor transpose for unpacked dtype instead of forcing native tiling, which can be more efficient when the second minor dimension of input tensor is small. In order to unify the implementation, we need to modify the existing 8x8 blocks 3-stage algorithm by changing shuffle patterns and indexing pairing. For small sublane tiling, fewer rounds of algorithm will be run. PiperOrigin-RevId: 826670330
Author
Parents
Loading