llvm-project
11bd0289 - [MLIR][GPU] Fix async.yield gpu.async.token lowering race (#190717)

Commit
19 days ago
[MLIR][GPU] Fix async.yield gpu.async.token lowering race (#190717) Root cause of #170833 (flakiness of `Integration/GPU/CUDA/async.mlir` on the Tesla T4 mlir-nvidia buildbot). In `gpu-to-llvm`, two patterns matched `async.yield` with the same benefit: the structural `ConvertYieldOpTypes` from `populateAsyncStructuralTypeConversionsAndLegality` (which just retypes operands), and `ConvertAsyncYieldToGpuRuntimeCallPattern` (which also creates and records an event on the stream backing each `gpu.async.token` operand). When the IR contained `gpu.launch_func`, the dialect-conversion framework picked the structural pattern, silently dropping the event record. The `async.execute` then yielded a stream pointer where its consumers expected an event, and the host await ended up calling `cuEventSynchronize` on a stream pointer. That call returns an error without waiting, so the host raced against the GPU. This change implements a fix which registers `ConvertAsyncYieldToGpuRuntimeCallPattern` with pattern benefit 2 so it wins on yields carrying `gpu.async.token` operands. The structural rewriter still handles yields without token operands. Also adds a new test `lower-async-to-gpu-runtime-calls.mlir` to check the correct IR shape of `async.yield` after a `gpu.launch_func`. Assisted-by: Claude Fixes #170833
Author
Parents
Loading