jax
209f6cd6 - [Mosaic GPU] Profiler improvements

Commit
1 year ago
[Mosaic GPU] Profiler improvements 1. Each process now corresponds to an SM, showing how many blocks are executing concurrently. 2. The timeline now accounts for the start offset of each block, instead of aligning them together. This makes a lot more sense in the SM view. 3. We now use inline PTX to emit profiler events. This sometimes slightly pessimizes code generation, but allows us to predicate out write on all threads other than the leader of each warpgroup, improving the trace quality. 4. We make sure each trace is monotonic. I can't explain why but the clocks can behave very weirdly, potentially due to rescheduling on the SASS level. We now fix up all backward movements and emit a warning if big shifts have been detected. PiperOrigin-RevId: 659911268
Author
Committer
Parents
Loading