auto-round
bbfb992c - Add per-WrapperLinear activation checkpointing for memory reduction

Commit
7 days ago
Add per-WrapperLinear activation checkpointing for memory reduction Enable `--enable_activation_checkpointing` to reduce peak GPU memory during tuning by wrapping each WrapperLinear forward in torch.utils.checkpoint. During backward, only one layer's QDQ intermediates are recomputed at a time instead of all layers' being held simultaneously. On Qwen3-30B-A3B (128-expert MoE, MXFP8, 10 iters) this cuts peak VRAM from ~80 GB to ~13 GB (85% reduction) with ~3.5% time overhead and identical quantization quality. Key changes: - wrapper.py: WrapperLinear gains enable_activation_checkpointing; forward() dispatches to _checkpointed_forward -> _forward_impl - compressors/base.py: passes flag through wrapper_block() call - compressors/config.py: add to ExtraConfig + TuningExtraConfig - autoround.py: add to AutoRound.__new__() signature - __main__.py: add --enable_activation_checkpointing CLI flag - compressors/utils.py: block_forward_with_activation_checkpointing helper (kept for optional manual use) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>
Author
Parents
Loading