transformers
dd70a8cb - Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953)

Commit

296 days ago

Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953) * Fix MXFP4 quantizer validation to enable CPU dequantization Move dequantize check before CUDA availability check to allow CPU inference when quantization_config.dequantize is True. This enables users to run MXFP4 models on CPU by automatically converting them to BF16 format. * Add tests for MXFP4 quantizer CPU dequantization validation * fix: format mxfp4 test file with ruff

References

#39953 - Fix MXFP4 quantizer validation to allow CPU inference with dequantize option

Author

returnL

Parents

82eb67e6

transformers dd70a8cb - Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953)

transformers
dd70a8cb - Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953)