transformers
61cafd99 - ENH: Add support for LoRA hotswapping (#41297)

Commit
124 days ago
ENH: Add support for LoRA hotswapping (#41297) LoRA hotswapping has been available in PEFT since 0.15.0. There is already a diffusers integration (https://github.com/huggingface/diffusers/pull/9453), but the transformers integration was still missing this feature. This PR remedies this. Hotswapping allows to swap different LoRA adapters in-place instead of loading multiple adapters and switchint between them. Not only can this be advantageous to safe memory and potentially for quicker loading, the biggest advantage is that if the model is compiled, we can hotswap without triggering recompilation (loading a separate adapter would require recompilation). There are some caveats to using this feature, most notably that only LoRA is supported. This was fine for diffusers, as it only works with LoRA, but the transformers integration works with other PEFT methods too. However, LoRA should be by far the most common method, so this should be fine for now. This and other caveats have been documented. To make the usage more intuitive, hotswap is now auto-enabled after calling model.enable_peft_hotswap(). For this, we detect if enable_peft_hotswap() was called *and* if the adapter being loaded is *not* the first adapter (because the first adapter cannot be hotswapped, it needs to be loaded normally).
Parents
Loading