llama.cpp
c8a2417d - CUDA: experimental native mxfp4 support for blackwell (#17906)

Commit

198 days ago

CUDA: experimental native mxfp4 support for blackwell (#17906) * CUDA: experimental native mxfp4 support for blackwell * optimize load_tiles * optimize quantize_mxfp4 * cleanup * first pass review: formatting * use interleaved layout for mma * mmq: add assert for size * use __nv_fp4x4_e2m1 * use iter_k as 512, cleanup * Use 1200 as blackwell instead of 1000 * address review comments * mmq: fix stride * quantize.cu: use reference impl of e8m0 scale * address review comments * add 120f-virtual + minor fixes --------- Co-authored-by: Aman Gupta <aman>

References

#17906 - CUDA: experimental native mxfp4 support for blackwell

Author

am17an

Parents

54132f1b

llama.cpp c8a2417d - CUDA: experimental native mxfp4 support for blackwell (#17906)

llama.cpp
c8a2417d - CUDA: experimental native mxfp4 support for blackwell (#17906)