transformers
dd70a8cb - Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953)

Commit
208 days ago
Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953) * Fix MXFP4 quantizer validation to enable CPU dequantization Move dequantize check before CUDA availability check to allow CPU inference when quantization_config.dequantize is True. This enables users to run MXFP4 models on CPU by automatically converting them to BF16 format. * Add tests for MXFP4 quantizer CPU dequantization validation * fix: format mxfp4 test file with ruff
Author
Parents
Loading