ENH Caching option for DoRA inferrence (#2661)
Resolves #2651
This PR adds caching of the LoRA weight and the weight norm from DoRA
for faster inference. Since, during inference, the weights don't change,
there is no need to recalculate those weights for a DoRA module each
time.
During training, recalculation is needed, thus there is no caching when
the module has training=True.
The cache does not prevent each and every possible duplicate
calculation. For instance, the weight norm is calculated during module
initialization and then again during the first forward pass when
performing inference. Only starting from the second forward pass on will
the weight norms be cached.
The PR includes a script to measure the effect of caching. On my
machine, I get:
avg time LoRA: 0.0717 sec
avg time DoRA no caching: 0.1718 sec
avg time DoRA with caching: 0.0840 sec
memory LoRA: 15612.00 MB
memory DoRA no caching: 16212.00 MB
memory DoRA with caching: 22118.00 MB
DoRA time overhead no caching: 139.52%
DoRA time overhead with caching: 17.08%
DoRA memory overhead no caching: 3.84%
DoRA memory overhead with caching: 41.67%
Thus, caching can significantly reduce inference time but at a
noticeable cost in memory.