ENH: Add support for LoRA hotswapping (#41297)
LoRA hotswapping has been available in PEFT since 0.15.0. There is
already a diffusers
integration (https://github.com/huggingface/diffusers/pull/9453), but
the transformers integration was still missing this feature. This PR
remedies this.
Hotswapping allows to swap different LoRA adapters in-place instead of
loading multiple adapters and switchint between them. Not only can this
be advantageous to safe memory and potentially for quicker loading, the
biggest advantage is that if the model is compiled, we can hotswap
without triggering recompilation (loading a separate adapter would
require recompilation).
There are some caveats to using this feature, most notably that only
LoRA is supported. This was fine for diffusers, as it only works with
LoRA, but the transformers integration works with other PEFT methods
too. However, LoRA should be by far the most common method, so this
should be fine for now. This and other caveats have been documented.
To make the usage more intuitive, hotswap is now auto-enabled after
calling model.enable_peft_hotswap(). For this, we detect if
enable_peft_hotswap() was called *and* if the adapter being loaded
is *not* the first adapter (because the first adapter cannot be
hotswapped, it needs to be loaded normally).