Thanks for the fix!
Hi @alexm-neuralmagic, I'm going to revert this PR because it doesn't compile on CUDA 12.1 and 11.8 and our release pipeline is currently stuck there. Sorry!
https://github.com/vllm-project/vllm/actions/runs/9311899829/job/25631836467#step:8:1873
ptxas /tmp/tmpxft_0000a2c3_00000000-9_marlin_24_cuda_kernel.compute_80.ptx, line 557; fatal : Parsing error near ':': syntax error
Login to write a write a comment.
A recently released PTX 8.5 (on May 9, 2024) introduced a new modifier, called ordered_metadata, for the mma.sp 2:4 warp sparse instruction. This modifier requires that the indices in the sparsity metadata are sorted in an increasing order starting from the LSB bit. In our case, in format_24.py lines 96-102, we already have this requirement met due to the following encodings:
Therefore, we can simply add the new modifier and nothing else is required.