llvm-project
bd3088ec - [mlir][sparse][gpu] fix sparse GPU codegen out buffer (#189221)

Commit
14 days ago
[mlir][sparse][gpu] fix sparse GPU codegen out buffer (#189221) When lowering sparse tensor operations to GPU code using `-sparse-gpu-codegen`, the generated `gpu.memcpy` op for device-to-host copy was targeting the wrong buffer. In my case, it did not copy back the output buffer and instead only copied back the input positions buffer which results in the output buffer in host memory being empty. The `SparseGPUCodegen` pass carries an assumption that the first buffer is the out buffer. It looks like this assumption is not always true, as in my case its the input positions buffer which made it the only buffer getting copied back to host. This change introduces a fix by removing the assumption and replacing it with an analysis that checks for `memref::StoreOp` and write MemoryEffects. This change also adds a regression test which highlights the problematic edge case. Assisted by Gemini 3.1 Pro for finding the issue of using incorrect buffers in `gpu.memcpy` op in the lowered code.
Author
Parents
Loading