jax
6b93b358 - [Mosaic:TPU] Efficient relayout with internal scratch

Commit
1 year ago
[Mosaic:TPU] Efficient relayout with internal scratch We should support all different retilings (x*packing1, 128) <-> (y*packing2, 128) with any dtype in this cl at this moment. The efficient relayout with scratch brings significant improvements on current retiling in <= TPUv4 and retiling with (packing, 128) in TPUv5. All missing retiling supports are added in this cl, including increase sublane retiling and packed type retiling. PiperOrigin-RevId: 676982957
Author
Parents
Loading