peft
a33b1789 - ENH Caching option for DoRA inferrence (#2661)

Commit
39 days ago
ENH Caching option for DoRA inferrence (#2661) Resolves #2651 This PR adds caching of the LoRA weight and the weight norm from DoRA for faster inference. Since, during inference, the weights don't change, there is no need to recalculate those weights for a DoRA module each time. During training, recalculation is needed, thus there is no caching when the module has training=True. The cache does not prevent each and every possible duplicate calculation. For instance, the weight norm is calculated during module initialization and then again during the first forward pass when performing inference. Only starting from the second forward pass on will the weight norms be cached. The PR includes a script to measure the effect of caching. On my machine, I get: avg time LoRA: 0.0717 sec avg time DoRA no caching: 0.1718 sec avg time DoRA with caching: 0.0840 sec memory LoRA: 15612.00 MB memory DoRA no caching: 16212.00 MB memory DoRA with caching: 22118.00 MB DoRA time overhead no caching: 139.52% DoRA time overhead with caching: 17.08% DoRA memory overhead no caching: 3.84% DoRA memory overhead with caching: 41.67% Thus, caching can significantly reduce inference time but at a noticeable cost in memory.
Parents
Loading