onnxruntime
bf76a0b7 - feat(quantization): add calibration cache to quantize_static (#28221)

Commit

13 days ago

feat(quantization): add calibration cache to quantize_static (#28221) ## Summary - Add an optional `calibration_cache_path` parameter to `quantize_static()` so users can save and reload the calibration result (`TensorsData`) across runs. - Avoids re-running the expensive calibration inference pass when iterating on post-calibration options such as `nodes_to_exclude`, `activation_type`, or `weight_type`. - Cache format is JSON, mirroring the encoder already used by `write_calibration_table` — no new serialization surface area. ## Motivation Fixes #21908. Users commonly re-run `quantize_static` multiple times on the same model and calibration dataset while varying the set of excluded nodes or the quant types, to trade off accuracy vs. speed. Today, every call repeats the full calibration inference loop even though the calibration result is identical, which is costly on large calibration datasets. There was no supported way to persist the computed tensor ranges — `write_calibration_table` writes a lossy table (drops histogram data) and has no paired reader. This PR closes that gap. ## Changes - `python/tools/quantization/calibrate.py`: - Add `TensorData.from_dict` and `TensorsData.from_dict` classmethods (inverse of existing `to_dict`). - Add module-level `_CalibrationCacheEncoder(json.JSONEncoder)`, `save_tensors_data(tensors, path)`, and `load_tensors_data(path)`. The encoder handles `TensorData`/`TensorsData`/`np.ndarray`/`CalibrationMethod`/numpy scalars. Writes are atomic (tmp file + `os.replace`) and auto-create parent directories. - `python/tools/quantization/quantize.py`: - `quantize_static` gains `calibration_cache_path: str | Path | None = None`. If the path exists, calibration is skipped and ranges are loaded from the cache. If the path is new, calibration runs and the result is saved. Raises `ValueError` if the cached `calibration_method` does not match the caller's `calibrate_method`. - `calibration_data_reader` becomes optional; at least one of it or an existing cache must be provided, else `ValueError`. - `python/tools/quantization/__init__.py`: export `TensorData`, `TensorsData`, `save_tensors_data`, `load_tensors_data`. - Tests: new `TestCalibrationCache` in `test/python/quantization/test_calibration.py` covering MinMax roundtrip, Entropy roundtrip (with histogram), missing-path error, parent-dir auto-creation, numpy scalar `bins` handling, method-mismatch guard, end-to-end `quantize_static` cache hit/miss, and `ValueError` when neither reader nor cache is provided. ## Test Plan - `python -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrationCache -v` - `python -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrateMinMaxCalibrator -v` (regression) - `lintrunner -a` on changed files: clean. ## Backward Compatibility `calibration_data_reader` changes from required-positional to optional-keyword. Existing call sites — whether positional or keyword — continue to work unchanged. The new behavior is only engaged when `calibration_cache_path` is provided.

References

#28221 - feat(quantization): add calibration cache to quantize_static

Author

Rishi-Dave

Parents

e3c34da4

onnxruntime bf76a0b7 - feat(quantization): add calibration cache to quantize_static (#28221)

onnxruntime
bf76a0b7 - feat(quantization): add calibration cache to quantize_static (#28221)