PR #2142 [FEAT] New LoRA Initialization Method: Explained Variance Adaptation

sirluk251 days ago👍 3

Description

In our work "One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation" (paper) we introduce a new data-driven initialization method for LoRA adapters. We will present this work at ENLSP (oral) and AFM at NeurIPS'24.

Eva initializes LoRA in a data-driven manner based on information of the downstream data using SVD. It also adaptively allocates ranks throughout the model.

Implementation Details

As the initialization is data-driven we created a new config class EvaConfig which at minimum needs to receive a value for the dataloader argument. Other parameters have default values which worked well in our experiments. The instantiated EvaConfig is passed as a parameter to LoraConfig. (we took inspiration from the LoftQConfig implementation).
In LoraConfig "init_lora_weights" needs to be set to "eva"
We modify the __init__ method of LoraModel to create a state dict for eva initialzation before initializing the parent class to be able to populate the rank_patternargument in LoraConfig. This way we directly initialize with a modified rank distribution allowing us to directly load the eva_state_dict.
All other code necessary for the initialzation is self-contained in peft/tuners/lora/eva.py
Under examples/eva_finetuning/eva_finetuning.py we add an example script for eva initialization

Open Questions

Not sure if self.model.load_state_dict(eva_state_dict, strict=False) is always a valid way to initialize. E.g. when the model is wrapped with torch.compile
We currently only tested a single GPU setting. We are not sure what the best way is to have multi-gpu support (only run init on rank 0 and copy to other ranks when done?)

Example Usage

eva_config = EvaConfig(
    dataloader = dataloader
)
peft_config = LoraConfig(
    r = 16,
    lora_alpha = 1,
    target_modules = ['q_proj', 'k_proj', 'v_proj', 'o_proj'],
    init_lora_weights = "eva",
    eva_config = eva_config,
)
peft_model = get_peft_model(model, peft_config)

BenjaminBossan251 days ago

Thanks a lot for this PR. I only skimmed it for now and have yet to read the details from the paper but this looks quite promising!

When looking at the implementation, I saw that the way the Eva weights are initialized sits a bit "outside" of how we normally do this in PEFT. The normal way is to have this on the layer level, check for instance here:

peft/src/peft/tuners/lora/layer.py

Lines 128 to 139 in 5758a7e

    
           # for inits that require access to the base weight, use gather_param_ctx so that the weight is gathered when using DeepSpeed 
        
           if isinstance(init_lora_weights, str) and init_lora_weights.startswith("pissa"): 
        
               with gather_params_ctx(self.get_base_layer().weight): 
        
                   self.pissa_init(adapter_name, init_lora_weights) 
        
           elif isinstance(init_lora_weights, str) and init_lora_weights.lower() == "olora": 
        
               with gather_params_ctx(self.get_base_layer().weight): 
        
                   self.olora_init(adapter_name) 
        
           elif init_lora_weights == "loftq": 
        
               with gather_params_ctx(self.get_base_layer().weight): 
        
                   self.loftq_init(adapter_name) 
        
           elif init_lora_weights: 
        
               self.reset_lora_parameters(adapter_name, init_lora_weights)

My guess why this is implemented differently here is that there is currently one pass through the modules that determines the individual ranks and creates the state dict. If this was refactored such that the weights were initialized on the layer level as usual, we would need two passes, one to adjust the ranks in the config and then the actual weight initialization on the layer level. Would it be possible to refactor the code to enable that?

My main concern with the current approach is that the weight initialization currently is happening in one place of the code but now there would be another place, making the code more disorganized. Second, IIUC, we would initialize the weights once and then override them with Eva parameters, which is wasteful. Finally, if we do this on the layer level, that could solve the multi-GPU issue, right?

Apart from this, what I noticed while skimming is that there is a big class IncrementalPCA which looks like a rewrite of the equivalent sklearn class using torch. My question is: Did you implement this from scratch or is it taken from somewhere? This has implications on how well this is tested etc.

I also had difficulties understanding the SVDHook and HashHook, could you please add a small docstring? Are these hooks permanently added?

Regarding torch.compile, we could add a test for that and see if it works. But this is not a necessity for this PR to land in PEFT, just something that's nice to have.

Going forward, we'll also have to add a few unit tests, but I'm fine with iterating on the implementation first and then adding the tests.

sirluk250 days ago (edited 250 days ago)

Thanks a lot for the quick feedback @BenjaminBossan !

I added docstrings to SVDHookand HashHook, hope this makes their purpose a bit more clear.

Regarding your other concerns about the implementation:

I agree its not that our proposed init method is not on a layer level as the other inits. I thought a lot about where in the code to put the init and unfortunately I think it would be quite difficult to put the init directly in the layer. (or at least I haven't found an elegant way to do it). The way eva is initialized is that we do multiple forward passes through the model, and incrementally perform SVD on the input activations of the target layers (usually more than 10 forward passes). The resulting singular vectors are used to initialize the Lora A matrices. So unfortunately 2 passes would also not be enough. We would also need to somehow pass an instance of EvaConfig to the LoraLayer.
What we could maybe do is create the eva_state_dict as it is created now and then somehow pass the state dict to LoraModel._create_and_replace and then the relevant tensors to the update_layer methods of individual layers.
I did also add a statement in the update_layer method to intialize the Lora B matrices with zeros as needed for eva. For this reason we do not initialize weights twice as the A matrices are not initialized in the update_layer method.
Regarding the new eva code. IncrementalPCA is indeed largely take from the sklearn class. So a lot of the code is directly copied from there. What we added is seamless gpu support by setting the device for all internal variables / buffers to the device of the first input tensor. We also integrated support for torch.svd_lowrank to significantly speed up computations. We did several tests internally if the results are equivalent when running this on gpu compared to running the sklearn implementation on cpu.

BenjaminBossan250 days ago

Thanks for the further explanations. Let me discuss the points separately.

Initialization

I see the issue with the cycling through the data. This is indeed tricky but I think it's worth putting some time into considering alternative solutions.

Right now, the dataloader is added to the config. We should really try to avoid this. The config should be a small, json-serializable object. This means it can't really contain any reference to the data. Moreover, there are so many unknowns about the data and how the model is called that I don't think we should have code like:

    inputs = {k: v.to(device) for k, v in next(iter(dataloader)).items() if k != "labels"}
    model(**inputs)

Let me throw out an alternative: Let's say that we had a function called initialize_lora_eva_weights or something like that. The API would be:

# normal PEFT code
base_model = ...
lora_config = LoraConfig(...)
model = get_peft_model(base_model, lora_config)

# new eva code
eva_config = EvaConfig(...)
sample_data = ...  # user needs to define this instead of relying on data loader

def forward_fn(model):
    return model(**sample_data)

model = initialize_lora_eva_weights(model, config=eva_config, forward_fn=forward_fn, adapter_name="default")

This would make it so that Eva is a function that is applied to the LoRA model after it was initialized. This avoids the issues I mentioned above.

The disadvantages I see with this approach:

Eva initialization is a separate step. Therefore, packages that want to support it have to explicitly account for it.
We would initialize the weights twice (but it's not really worse than right now).

Do you think this would work with the Eva algorithm or is there something I'm missing? LMK what you think.

IncrementalPCA

I wonder if we could use some of the sklearn unit tests and add them here to ensure that the IncrementalPCA produces the correct results. That way, we have the safety of knowing it works as expected without too much effort of adding our own unit tests.

Of course, this is all a bit outside of scope for PEFT, but if there is no external package that provides the functionality, it would be okay to add it to PEFT.

sirluk250 days ago

Implementation

I was wondering already if having the dataloader in the confige is a good idea since you probably only want primitive types there. I think your suggested solution is a sensible one. I gave it some thought and dont really have a better suggestion following the restrictions we discussed.

A few questions for clarification:

Would eva still be a valid value for init_lora_weights in LoraConfig. I am a bit worried about discoverability of eva if LoraConfig would then just be normal lora settings and someone needs to know about the function initialize_lora_eva_weights to use eva.
In your suggested solution, would sample data not be a dataloader?
Is it necessary to define a custom forward function for the eva init? Usually it should be anyway the same inputs that the model receives during normal finetuning. I did adapt the line you mentioned filtering out the label key from the inputs dict.
One question about how we get the layer activations for SVD. In the forward hook SVDHook we currently have it implemented like this

try:
    states = input.detach()
except AttributeError:
    states = input[0].detach()

So for cases where we input is a tuple, we just take the first value. Not super important at this stage I guess but I just wanted to ask if you think we should add an argument so the user can define what the inputs we should use are.

IncrementalPCA

It's a good idea to take some from sklearn. All tests for this class are in this file. Right now there is nothing like this available anywhere. I built my own package torch-incremental-pca but I just assumed you dont want any dependencies to small packages like this so I just copied the class over here. I did however create a pull request to have this feature implemented in pytorch directly so in the future we might be able to remove this from peft again.

BenjaminBossan250 days ago

I was wondering already if having the dataloader in the confige is a good idea since you probably only want primitive types there. I think your suggested solution is a sensible one. I gave it some thought and dont really have a better suggestion following the restrictions we discussed.

Okay, then let's go with this general approach. Maybe we have some ideas for improvements along the way, then we can still pivot.

Would eva still be a valid value for init_lora_weights in LoraConfig. I am a bit worried about discoverability of eva if LoraConfig would then just be normal lora settings and someone needs to know about the function initialize_lora_eva_weights to use eva.

Yes, let's keep it as an option. By itself, it wouldn't do anything, but it helps with discoverability. We could also consider setting some kind of flag and when users want to use the model with Eva init but before calling the function we discussed, the flag could be checked and raise an error/warning. Moreover, with this init option being set, we can signal to other packages that they need to call the function (e.g. trl could automatically call this in their SFTTrainer).

In your suggested solution, would sample data not be a dataloader?

Exactly, I don't think we absolutely need a dataloader, do we? AFAICT, we only need it to get a batch of data. If it needs to be a different batch each time, we could do:

iter_dl = iter(cycle(dataloader))  # optionally tqdm

def forward_fn(model):
    sample_data = next(iter_dl)
    return model(**sample_data)

Is it necessary to define a custom forward function for the eva init? Usually it should be anyway the same inputs that the model receives during normal finetuning. I did adapt the line you mentioned filtering out the label key from the inputs dict.

My reasoning for this is that there are too many possibilities how the model could be called, we can't cover all of them. The assumption that we can just filter out labels and then call model(**inputs) might mostly work for standard HF LLM architectures, but this method should also work with other model types, right? Let's keep the possibilities open.

But thinking more about it, I'd say what we can do is to keep your existing API of passing the dataloader and give the option for a custom forward_fn on top. That way, there is a more convenient way for the standard models and a more flexible and advanced way for all the others.

One question about how we get the layer activations for SVD.

I'd definitely rather have an instance check than a try ... except for this. But I'm not sure if that's what you were asking.

I just assumed you dont want any dependencies to small packages like this so I just copied the class over here.

That's correct. We could make it an optional dependency, as it is only needed for Eva. But we could also copy the code over to PEFT ("vendoring"), which is similar to what we did with BufferDict. What I would like to see: If you copy some of the sklearn tests to your package and ensure that they pass, we wouldn't need them in PEFT too, we could just say: "check this URL for the original code". This way, there is a bit less clutter.

I did however create a pull request to have this feature implemented in pytorch directly so in the future we might be able to remove this from peft again.

Nice. Of course, even if it lands, we can't use it immediately as we want to support older PyTorch versions. But definitely add a comment with a link to the PR. I would imagine, however, that PyTorch won't be too keen on the sklearn API, but let's see.

sirluk249 days ago

@BenjaminBossan before going ahead with this implementation of calling a function/method after get_peft_model is called I just wanted to propose one alternative idea to get your thoughts on it.

Instead of using LoraModel we create a subclass EvaModel. This model class initially would not have any adapter layers but we would add the forward hooks needed for the SVD computations. In the __init__ we would have something like

class EvaModel(LoraModel):

	def __init__(self, model, config, adapter_name)
		self.model = model
		for layer in config.target_modules:
			module = get_module(self.model, layer)
			module.register_forward_hooks(SVDHook(config))

During the first few forward passes of finetuning we would not actually update any weights but use the activations to compute the eva state dict. Once SVD is converged for all layers we would switch the internal state of the model to finetuning (with some boolean variable) and at that point only add the adapters with the appropriate eva values for the lora A matrices.

Hope this makes sense. That way we could leverage the finetuning loop directly. It seem a bit more seamless and works without needing any additional function calls after get_peft_model. But of course there might be some hidden issues I am not considering at the moment.

BenjaminBossan249 days ago👍 1

I like the proposal to seamlessly integrate this into the training loop. I'm not sure if we need a separate class for that, but you could come up with a prototype and I can check if there is a good way to integrate it without a separate class.

With this, we would need to make sure that it works with vanilla training loops but also frameworks like Trainer from transformers. If it does, I think this could be the optimal solution.

sirluk247 days ago (edited 247 days ago)

@BenjaminBossan I started implementing this approach of using the first few batches for the initialization and unfortunately I realised that there are at least 2 major issues with it:

The LoRA layers must be initialized at the beginning of training so that they can be passed to the optimizer. This is especially important if it needs to work out of the box with the HuggingFace Trainer. However, this means that the LoRA weights would take up unnecessary GPU memory that would be needed for the SVD computation. This is btw also an issue with the solution you proposed of doing a normal LoRA init first and then calling a function to initalize eva.
Further, if you use a learning rate schedule, it would already adjust the learning rate while you are still executing the SVD steps. This means that when you start fine-tuning, the learning rate is different from the intended initial learning rate. Especially problematic if there is a warmup I would assume.

These issues makes me wonder if some variant if my initially proposed solution would be feasible (computing the eva state dict before initializing the adapters)

One idea I would have which avoids passing a dataloader or something similar through the config would be to enable the get_peft_modelfunction to accept some optional keyword arguments through **kwargs and passing the dataloader as well as a model forward_fn this way.

def get_peft_model(
	model: PreTrainedModel,
	peft_config: PeftConfig,
	adapter_name: str = "default",
	mixed: bool = False,
	autocast_adapter_dtype: bool = True,
	revision: Optional[str] = None,
	**model_init_kwargs
):
   ...
   PeftModel(model, peft_config, adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype, **model_init_kwargs)
   ...



peft_model = get_peft_model(model, peft_config, eva_dataloader=dataloader, eva_forward_fn=eva_forward_fn)

We also don't directly have to modify the __init__ method of LoraModel but define a _pre_injection_hook for LoraModel which should also do the trick.

Let me know what you think about the issues I mentioned with the previously proposed approaches. If my suggested approach is not feasible, I am fine with going ahead with the implementation suggested by you with a initialize_lora_eva_weights function (this will also unfortunately suffer from the downside mentioned above that already initialized lora weights will take up gpu memory that could potentially be useful for the svd computation, which we might want to run with a higher batch size than finetuning).

BenjaminBossan246 days ago (edited 245 days ago)

I started implementing this approach of using the first few batches for the initialization and unfortunately I realised that there are at least 2 major issues with it:

Good points. There would probably be ways to work around this or accept these shortfalls, but I agree that this is not ideal.

One idea I would have which avoids passing a dataloader or something similar through the config would be to enable the get_peft_modelfunction to accept some optional keyword arguments through **kwargs and passing the dataloader as well as a model forward_fn this way.

I'm a bit reluctant to add more options to get_peft_model, especially when they're only used in a very narrow use case (e.g. this option only works for LoRA but get_peft_model is the entry point for all PEFT methods). Therefore, I tend to prefer the option of having a dedicated function.

To avoid initializing the LoRA weights through get_peft_model, I have a proposal. We recently added an option low_cpu_mem_usage=True to PEFT, which will results in the PEFT weights being initialized on meta device. This option has not been propagated to get_peft_model because until now, there was no reason to have it there. However, with Eva, it would be useful. So we could add the option there and then just pass it along to PeftModel in this line, which already accepts that argument. That way, we avoid initializing the LoRA weights. WDYT about that?

sirluk245 days ago

@BenjaminBossan thanks for your suggestions, I implemented a prototype solution now for the discussed approach. A few questions:

I moved the IncrementalPCA class which has been adapted from sklearn to peft.utils.incremental_pca.py because I thought it is not strictly part of eva and in theory could be used for other things, does that make sense?
Ideally users should be able to do from peft import initialize_lora_eva_weights`. Would it be suitable to then put this function in peft.utils or should it be in peft.tuners.lora. I didnt see any other functions being pulled directly into the peft namespace from the tuners directory.
I was wondering if we should use low_cpu_mem_usage=True as a default for eva. That way we would even need to have it as an argument for get_peft_model

BenjaminBossan commented on 2024-10-16

BenjaminBossan245 days ago

I implemented a prototype solution now for the discussed approach.

Thanks a lot. I think the API is decent and given the trade-offs we discussed, probably a good compromise.

I moved the IncrementalPCA class which has been adapted from sklearn to peft.utils.incremental_pca.py because I thought it is not strictly part of eva and in theory could be used for other things, does that make sense?

Yes, that's fine. In the end, I think it doesn't really matter. What's important is that it is not in the peft namespace (i.e. from peft import IncrementalPCA should not be possible), apart from that it doesn't matter too much. We can always move it later.

Ideally users should be able to do from peft import initialize_lora_eva_weights`. Would it be suitable to then put this function in peft.utils or should it be in peft.tuners.lora. I didnt see any other functions being pulled directly into the peft namespace from the tuners directory.

We do pull objects from peft.tuners into the peft namespace, otherwise from peft import LoraConfig would not be possible :) As for initialize_lora_eva_weights: Right now, this only works for LoRA and it's not planned to change that, right? Therefore, I would keep it in the tuners/lora directory. There, import it into __init__, then from there into the __init__ from tuners/ and from there into the __init__ from peft.

I was wondering if we should use low_cpu_mem_usage=True as a default for eva. That way we would even need to have it as an argument for get_peft_model

How exactly would you do that, given that the get_peft_model is done by the user?

Conversation is marked as resolved

Show resolved

sirluk241 days ago

@BenjaminBossan thanks again for your feedback!

We do pull objects from peft.tuners into the peft namespace, otherwise from peft import LoraConfig would not be possible :)

I pulled initialize_lora_eva_weights and get_eva_state_dict into the peft namespace now. I asked because I only saw classes being pulled from tuners, no functions so I was wondering if there is some rule for that.

How exactly would you do that, given that the get_peft_model is done by the user?

I thought we could just not have a check in the LoraModel.__init__

low_cpu_mem_usage = init_lora_weights="eva" or low_cpu_mem_usage

Regarding tests I adapted some tests from sklearn for the incremental_pca class in tests/test_incremental_pca.py. Could you maybe advice me what the minimum test setup should look like for eva.

initial commit

ec630939

add target modules based on eva_state_dict

dc26e7e5

remove default args

70171c8a

cleanup

f537b8ed

set correct prefix for state dict loading

a3ae4bae

revert

dd5a38b1

update docstrings, minor changes

2cfdf0c9

update docstrings, integrate peft changes

f3f89c6b

fix issues related to config having either type PeftConfig or dict

b71b4222

move state_dict modification to function

571eca8b

update paper link

bec1f80d

remove comments

fc343675

add documentation

0e8d29ce

add docstrings

c41e8204

update

eff0c9da

fix docs and add default evaconfig

1c912726

add eva init to peft namespace

57799713

add test for lowrank argument

b5968d29

add check

a397418f

simplify arguments

57b72b47

update docstrings

073f0a1e

optimize indices calc

6d0e0d7d

sirluk force pushed from 2779bf34 to 6d0e0d7d 240 days ago

BenjaminBossan requested changes on 2024-10-21

BenjaminBossan239 days ago

Thanks a lot for these updates to the PR.

Please always run make style after you made changes to the code so that the linter will pass.

I pulled initialize_lora_eva_weights and get_eva_state_dict into the peft namespace now. I asked because I only saw classes being pulled from tuners, no functions so I was wondering if there is some rule for that.

Ah okay, no worries, there are no rules about that.

I thought we could just not have a check in the LoraModel.__init__
low_cpu_mem_usage = init_lora_weights="eva" or low_cpu_mem_usage

I see. I think I'd rather not automatically switch this, maybe users have a specific reason why they need low_cpu_mem_usage=False despite using Eva. What you could do is give a warning if you find this, advising users that passing low_cpu_mem_usage=True is most likely what they want.

Regarding tests I adapted some tests from sklearn for the incremental_pca class in tests/test_incremental_pca.py.

I think what you have is more than enough, the test coverage for the class is very good. Let's just add a comment on where you adopted these tests from.

Could you maybe advice me what the minimum test setup should look like for eva.

This is a very good question. From my understanding, we can't really check the adjusted weights to say that after applying Eva, the weights should have this or that property. Just as an example, for PiSSA/OLoRA, we have tests that they reduce the quantization error. But I don't think we can have something similar here, right?

What we could do is just calling initialize_lora_eva_weights with a couple of different options to ensure that there is no error being raised. This helps with the test coverage but it wouldn't really help with ensuring that Eva actually works as described in the paper.

Maybe we can expand the included example to replicate one of the findings from the paper. That way, we would have a high confidence that it works as promised, WDYT about that?

Alternatively, we could create small self-contained training runs in the test, using a small model and dataset. Then we would train the model once with and once without Eva. The test would be e.g. that with Eva, the model converges faster. This would be a more functional test but of course it has the danger of being flaky/seed dependent and could be slow to run.

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Update src/peft/tuners/lora/eva.py

3574109f

add warning

91e87e6f

update

a133e8a1

add check if all hooks have been removed

523f1cda

extend documentation

261c81e6

make style

566ebf04

add tests for eva

e940efab

sirluk238 days ago (edited 238 days ago)

@BenjaminBossan thanks for all your suggestions, I integrated all of it into the pull request.

Regarding testing I added a new class TestEvaInitialization into tests_initialization.py. The tests basically run eva initialization three times for different seeds and check that the average cosine similarity between the same svd component from different seeds is above a certain threshold. This invariance to batch ordering is actually also one of the findings from our paper. I fear doing a finetuning run would probably take up too much resources / take to long for a testcase.
Overall I defined five test cases:

Check the get_eva_state_dict function which returns the state_dict for eva. We use a non-peft model because testing peft models is covered with the next test
Check initialize_lora_eva_weights with a LoRA model which internally also calls get_eva_state_dict
Check the __post_init__ method of EvaConfig for invalid settings.
and 5. check LoraConfig for valid eva config settings

During testing I noticed a potential issue with how the loftq_config field is defined in LoraConfig. I added an issue and could also create a separate pull request for it.

add licence notice

ba82bd51

add tests for lora_config with eva

202f933a

BenjaminBossan requested changes on 2024-10-23

BenjaminBossan237 days ago

Thanks a lot for all the updates. I did a more in-depth review now and found a few potential issues, please check my comments. Apart from those, could you please:

Always run make style after all changes are done
Resolve the merge conflict.
Clarify: Should it work with quantization like bnb? I think so but maybe it can be made explicit or even a small test be added.

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

src/peft/tuners/lora/eva.py

	486		)
	487		module.lora_A[adapter_name].weight.copy_(w)
	488		else:
	489		module.update_layer(

BenjaminBossan237 days ago

What would be a situation when this can occur? During the unit testing, it is not covered.

sirluk235 days ago (edited 235 days ago)

Its mostly because we dont support embedding and conv layers at the moment. So if adapters are added to those layers we cant initialize it with layer so the next best thing would be to do normal lora initalization by default. Do you think a separate test case is needed here?

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

src/peft/tuners/lora/eva.py

	90
	91		if previous_components is not None:
	92		components = self.svd.components_
	93		if len(components.shape) == 1:

BenjaminBossan237 days ago

Could you add a short comment when this would happen?

sirluk235 days ago (edited 235 days ago)

done, its only the case during the first svd step. Would you do it differently?

update

c6a5fc52

fix tau range

413be29c

Merge branch 'main' into main

5ad5f00d

update tau tests

b1bcf02f

sirluk235 days ago (edited 234 days ago)

@BenjaminBossan I integrated all of your suggestions in the code, many thanks (for details I left comments to the separate conversations you opened about suggested changes).

Three general questions:

Eva works out of the box with bitsandbytes. I included a gpu-dependent test in test_initialization with pytest.mark.skipif(not torch.cuda.is_available(), reason="test requires a GPU") to test bitsandbytes. Is that fine to have all EVA tests in one class in this script or would I need to move this test to test_common_gpu or test_gpu_examples?
under examples/eva_finetuning/README.md I added sections using the [[autotest]] command to automatically use the docstrings of the eva related functions. Is that fine or does that not work for the examples section?
To make using eva more seamless, do you think its a good idea to just have to specify init_lora_weights="eva" and if EvaConfig is not defined / none it is initialized with the defaults (i.e. EvaConfig())?

EDIT:
I added three checks to initialize_lora_eva_weights.

Check that model is a PeftModel
check that there is only one active adapter in the peftmodel.
verify that init_lora_weights = "eva"

I also removed peft_config as an argument from the function since its anyway an attribute of every PeftModel

add validity checks to initialize_lora_eva_weights

e8908394

style

ebb0ac62

extend documentation and small fixes

81fbc284

BenjaminBossan requested changes on 2024-10-28

BenjaminBossan232 days ago

Thanks a lot for your excellent work. Not much is missing to be able to merge this.

Eva works out of the box with bitsandbytes. I included a gpu-dependent test in test_initialization with pytest.mark.skipif(not torch.cuda.is_available(), reason="test requires a GPU") to test bitsandbytes. Is that fine to have all EVA tests in one class in this script or would I need to move this test to test_common_gpu or test_gpu_examples?

For this test to be actually run on our nightly CI with GPUs, you'd have to move it, I think test_gpu_examples.py is the right place (just create a new test class there).

under examples/eva_finetuning/README.md I added sections using the [[autotest]] command to automatically use the docstrings of the eva related functions. Is that fine or does that not work for the examples section?

Hmm, not sure what you mean, I'm not familiar with autotest. Could you please elaborate?

To make using eva more seamless, do you think its a good idea to just have to specify init_lora_weights="eva" and if EvaConfig is not defined / none it is initialized with the defaults (i.e. EvaConfig())?

I think it makes sense to do that if the default config works well enough out of the box. If we want the users to actively set the Eva parameters, it is better to make this step explicit. You can judge better than I can whether the default parameters should work well enough.

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

src/peft/tuners/lora/eva.py

	533		module.update_layer(r=w.size(0), init_lora_weights=peft_config.init_lora_weights, **update_layer_kwargs)
	534		module.lora_A[adapter_name].weight.copy_(w)
	535		else:
	536		module.update_layer(r=peft_config.r, init_lora_weights=True, **update_layer_kwargs)
	537		missing_eva_inits.append(name)
	538		print(name, type(module))

BenjaminBossan232 days ago

What would be a situation where this occurs? AFAICT it's not covered in tests. Also, let's remove the print statement.

sirluk231 days ago

Add a test for this test_missing_eva_inits. Since EVA at the moment does not support embedding layers for example, we are doing standard lora init if such layers have adapters.

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

improve customization options

62ae35c2

Merge pull request #3 from sirluk/simplify_entrypoints

0b316d36

update documentation

7c753b6b

update docs

404022f0

error to warning

724fd1c0

make style

d79672b1

fix type

ad646da1

sirluk231 days ago

Thank you again for your suggestions @BenjaminBossan, I worked them all into the pull request.

I moved the test for bits and bytes to test_gpu_examples.py. It was necessary to duplicated some code but I guess thats fine.
I updated the documentation a bit. I also added a section for eva under package reference, please let me know if its okay like this. I think it would be beneficial to find info about the EVA functions in the documentation.
Since my last message in this pull request I also made changes to the entrypoint get_indices_fn for customizing EVA. I made it a bit more generic and easier for users to understand. I renamed the callable argument in get_eva_state_dict and initialize_lora_eva_weights to prepare_model_inputs_fn now. It can be used to modify the model inputs and have access to the result in the SVD hooks. I adjusted the tests accordingly.
I discussed the topic about a default EvaConfig with my coauthors. Some parameters like rho or user_label_mask have a setting that should work well for most applications but nevertheless it can be important to adjust the default settings. We were wondering if we should initialize a default EvaConfig if its not specified but raise a warning explaining that it might be important to change the parameters in some cases. In either case I also updated the docstring of EvaConfig to mention this.

HuggingFaceDocBuilderDev230 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan requested changes on 2024-10-30

BenjaminBossan231 days ago

Thanks a lot for all the updates. The CI is a bit flaky at the moment, so don't worry about some failing test runs.

While reviewing the code, I found one major blocker that needs to be addressed before we can merge. This issue is that Eva redistributes the LoRA ranks, right? However, this is not reflected in the LoraConfig. Therefore, if a user saves the model and then tries to load it later, they'll get errors, e.g.:

size mismatch for base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight: copying a param with shape torch.Size([19, 2048]) from checkpoint, the shape in current model is torch.Size([16, 2048])

So unless I'm missing something, the solution for this would be to get the new ranks for all the LoRA weights and store them in a dict mapping the module name to the rank. This dict should be used to override the rank_pattern attribute of the LoraConfig. This should allow to successfully load the model. Could you please take a look at that and also extend the tests to cover this?

Regarding the alpha values, I'm not sure if those also need to be adjusted. E.g. if I have a LoRA rank of 16 and alpha of 32, the LoRA scaling would be 32/16. If the rank is changed to 19 but alpha stays at 32, the scaling would decrease to 32/19. Probably it's not a big deal, but I wanted to mention that.

I also added a section for eva under package reference, please let me know if its okay like this. I think it would be beneficial to find info about the EVA functions in the documentation.

Yes, that should be okay. Unfortunately, the doc preview appears to be broken at the moment, but if there is anything wrong with the rendering there, it'll be easy enough to fix later.

We were wondering if we should initialize a default EvaConfig if its not specified but raise a warning explaining that it might be important to change the parameters in some cases.

From my point of view, the current solution is good.

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

fix potential issues

51874665

sirluk230 days ago (edited 229 days ago)

@BenjaminBossan thanks for raising the issue with saving/loading eva models, didnt think about this at first. It should be fixed now. I am updating the rank_pattern in init_lora_eva_weights.

I also found another potential issue that was not accounted for so far. It is possible that after eva initialization some adapter layers could get assigned rank 0 . In those cases we would need to remove the adapter layer again. We would also need to update target_modules in the peft config. I also added test cases for this and the above issue. To test the rank=0 case it was necessary to move some functionality out of init_lora_eva_weights into a new function load_eva_state_dict. Otherwise it would not have been possible to modify the eva_state_dict (i.e. assigning one layer rank 0) in the test script. init_lora_eva_weights now just combines the two lower level functions get_eva_state_dict and load_eva_state_dict

Regarding alpha values, it should be fine as is for now. For the experiments in our paper we did some initial tests to adjust alpha for different ranks after the redistribution but it didn't lead to an improvement in performance. I saw that the AdaLoRA implementation also did not adjust alpha for different ranks, is that right? I did however add an use_alpha_pattern option in EvaConfig now which is False by default. If this is set to True alpha values will be adjusted after redistribution to reflect the new ranks.

Regarding docs, I built it locally and everything looks fine.

add option to adjust alpha after redistribution

b6b7b7bc

Merge pull request #4 from sirluk/alpha_pattern

13ffc56b

BenjaminBossan requested changes on 2024-10-31

BenjaminBossan229 days ago

Thanks for your continued work. The adjustments for saving and loading look good. Also, nice catch with the rank 0 issue.

I have a couple of smaller comments, but the big picture looks good.

Conversation is marked as resolved

Show resolved

src/peft/tuners/lora/eva.py

571	591	show_progress_bar=show_progress_bar,
572	592	)
573	593
574		# assert all lora layers are contained in eva_state_dict
575		update_layer_kwargs = {
576		"adapter_name": adapter_name,
577		"lora_alpha": peft_config.lora_alpha,
578		"lora_dropout": peft_config.lora_dropout,
579		"use_rslora": peft_config.use_rslora,
580		"use_dora": peft_config.use_dora,
581		}
582		missing_eva_inits = []
583		for name, module in model.named_modules():
584		if not isinstance(module, LoraLayer):
585		continue
586		if name in eva_state_dict:
587		w = eva_state_dict.pop(name)
588		module.update_layer(r=w.size(0), init_lora_weights=peft_config.init_lora_weights, **update_layer_kwargs)
589		module.lora_A[adapter_name].weight.copy_(w)
590		else:
591		module.update_layer(r=peft_config.r, init_lora_weights=True, **update_layer_kwargs)
592		missing_eva_inits.append(name)
593
594		if missing_eva_inits:
595		warnings.warn(
596		f"the following layers were initialized with init_lora_weights=True because they were not found in the eva state_dict: {missing_eva_inits} "
597		f"currently the following lora modules are not supported: {UNSUPPORTED_LORA_MODULES}"
598		)
599		return model
	594	load_eva_state_dict(model, eva_state_dict, adapter_name, _check_valid=False)

BenjaminBossan229 days ago

Could you please explain why we don't check validity here? A short comment would be nice to have. Tbh I would like to get rid of the parameter, not a fan of "private" arguments.

sirluk229 days ago (edited 229 days ago)

I did it this way because when call init_lora_eva_weights we would otherwise check validity twice (_check_valid is called at the beggining of init_lora_eva_weights and then again inside load_eva_state_dict. For that reason I defined the private argument but I am also not super keen on keeping it this way. I changed it now so that there is a private function _load_eva_state_dict that doesnt call _check_valid which is used by load_eva_state_dict and init_lora_eva_weights. So this private argument is gone now

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Update src/peft/tuners/lora/eva.py

17f5bf14

fix edge cases

1f03a954

Merge pull request #5 from sirluk/alpha_pattern

f74b0447

Merge branch 'main' into main

12d497f9

account for layer pattern

e6099695

split up print

7e4505f4

sirluk226 days ago

@BenjaminBossan thanks for your thoroughness, there were still some edgecases I didnt consider.

Besides the things you pointed out I updated some things additionally (sorry that there are still so many changes in the PR, but I keep discovering potential issues for users or ways to make the api simpler for users)

Fixed an issue in get_eva_state_dict if LoraConfig contains a rank pattern. For these cases max_components can be different for different layers so this variable is now a dict with keys being the adapter module names.
removed function load_eva_state_dict and integrated functionality into initialize_lora_eva_weights. This function now accepts an optional eva_state_dict argument.

One question: When loading the eva_state_dict I need to get the module name in the base model based off the module name in the peft model. I this a valid way to do that:

name_in_base_model = name.replace("base_model.model.", "")

fix rank_budget in case of rank_pattern

bec670fb

BenjaminBossan requested changes on 2024-11-04

BenjaminBossan225 days ago

Thanks for your continued work on this PR, this is really excellent work.

Regarding your latest changes, they make sense to me, thanks for explaining. I did another check and flagged some smaller issues, mainly around making this approach more flexible for custom models. Otherwise, this PR is good to be merged.

One question: When loading the eva_state_dict I need to get the module name in the base model based off the module name in the peft model. I this a valid way to do that:

Yes, I think the way it's implemented is good. For LoRA, we will always have this prefix.

Conversation is marked as resolved

Show resolved

src/peft/tuners/lora/eva.py

	35		from .layer import Embedding, LoraLayer, _ConvNd
	36
	37
	38		UNSUPPORTED_LORA_MODULES = (Embedding, _ConvNd)

BenjaminBossan225 days ago

For my understanding, it is better to define the unsupported LoRA modules than the supported LoRA modules because Eva would generally work with newly added quantization methods, which typically work on linear layers? I'm mainly wondering whether it is less error prone to define the supported LoRA modules instead. At the end of the day, if an unsupported LoRA module slips through, this is probably leading to an error and the check is mostly for a better error message, right?

sirluk225 days ago

fair point, the only advantage is having a better error message, I can invert this to SUPPORTED_LORA_MODULES

sirluk225 days ago

just was reminded during testing that the actual reason why I did it this way is to automatically include bnb linear layers. Otherwise I would need to do a conditional import I guess if bnb is available. Would you still say its better to do it this way?

BenjaminBossan225 days ago

No, that's why I mentioned

Eva would generally work with newly added quantization methods

For me, this was more to verify the reasoning here When new LoRA layers are added that are not supported here, we'll most likely forget to update this constant, but it's not the end of the world. However, I think we can make it a bit more likely no to miss this if we move UNSUPPORTED_LORA_MODULES to utils/constants.py and rename it to UNSUPPORTED_EVA_LORA_MODULES or something like this.

sirluk225 days ago

moving this to constants.py doesn't work unfortunately due to some circular import issue with PeftConfig

BenjaminBossan224 days ago

Hmm, any idea how that can be? In the line above, we already import from peft.utils.constants, so it's unclear to me how a circular import code arise here.

sirluk224 days ago

It has something to do with me now needing to import from lora/layer.py. I think the circular dependency goes something like this

in constants.py we now import from lora/layer.py
lora/layer.py imports from lora/config.py
lora/config.py imports from peft/config.py
peft/config.py imports from utils/other.py
utils/other.py imports from constants.py

BenjaminBossan222 days ago

Ah yes, that makes sense. Then I wonder if we can define the constant inside of lora/__init__.py. Just somewhere more visible than here :)

sirluk222 days ago

hmm unfortunately this does not fix the issue. Putting the variable in lora/__init__.py still raises an error.

BenjaminBossan220 days ago

All right, then let's just keep it as is.

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Merge branch 'huggingface:main' into main

b13d0144

small fixes

b24500c6

missing return statement

3119dc42

sirluk225 days ago

@BenjaminBossan, thanks integrated you suggestions and left some comments to the open threads:

when calling update_layer is needed
unsupported or supported lora modules
None check for dataloader

BenjaminBossan commented on 2024-11-04

BenjaminBossan225 days ago

Thanks for the updates, I think we're almost good to go. Just 2 small comments from my side.

when calling update_layer is needed

Sorry, I can't find the comment, could you please paste it again? Just to summarize, my main concern is that we may call module.update_layer even if it's not necessary, namely when the module was not initialized with low_cpu_mem_usage=True (i.e. when it is an real device, not meta, we would not need to initialize it).

sirluk225 days ago

@BenjaminBossan this is the dicussion about update_layer.

For the other two open conversation I added a comment directly to the conversation.

sirluk changed the title ~~New LoRA Initialization Method: Explained Variance Adaptation~~ [FEAT] New LoRA Initialization Method: Explained Variance Adaptation 225 days ago

BenjaminBossan commented on 2024-11-05

BenjaminBossan224 days ago

I replied to your latest comments. Also, there is a small merge conflict, could you please take care? Thanks.

Merge branch 'main' into main

ec408974

move dataloader none check

05f99ba6

adjust default value

c628afe5

update test threshold

69896629

sirluk224 days ago

Great, thanks for your feedback, I made some small changes, I think the only open issue is the potential circular import constants.py.

Small note: I changed the default value of rho back to 1.0 as this works well in most settings.

update docs

5982139b

BenjaminBossan220 days ago

@sirluk Some Eva tests are failing:

https://github.com/huggingface/peft/actions/runs/11698806717/job/32754904346?pr=2142#step:6:1195

Is it possible that you tested this with a GPU and there it's fast enough, but on the CPU CI runners, it's too slow? Maybe we should remove the timing assertions, as they tend to be flaky?

remove speed test and update docs

b5c6b8fe

sirluk220 days ago

Thanks for pointing this out, I was indeed testing on a gpu machine so it was automatically running everything on gpu. I removed the speed test now.

fix typo

2c6fb37e

BenjaminBossan approved these changes on 2024-11-12

BenjaminBossan217 days ago

Thanks a lot for ironing out the last kinks. The PR now LGTM. Really excellent work, thanks for that and for your patience.

I hope that many users will give it a try, as it looks quite promising.

BenjaminBossan merged 221965b7 into main 217 days ago

sirluk217 days ago

Great, thanks for your great feedback throughout the process!

julien-c217 days ago❤ 3

Great PR!

BenjaminBossan196 days ago

@sirluk I just wanted to make you aware of PR #2231 which introduces a (at least superficially) similar method to EVA, which also benefited from our discussion here to improve the API. The PR includes a small table that summarizes a couple of methods including EVA, maybe you can double check that it's accurate.

sirluk196 days ago

@BenjaminBossan, thanks for making us aware of this recent PR. It is indeed very relevant and interesting to us as well. I'm glad our discussion benefits the integration of other data-driven methods into PEFT. We believe the table is not entirely correct with respect to EVA. From their paper, "Knowledge Maintenance mode" in CorDA simply means using the smallest components for initialization, rather than the largest ones. This is necessary for methods like CorDA and PiSSA because they modify the base model weights, meaning the base model cannot be restored after fine-tuning. We intentionally followed the design of LoRA and left the base model untouched as this allows restoring the base model's weights and therefore full knowledge preservation. We just recently added a discussion paragraph on that in the updated version of our paper which will be on arXiv soon.
(tagging @paischer101 as he is the other joint first-author of EVA)

BenjaminBossan187 days ago

@sirluk Thanks for the reply. Somehow I didn't get a notification for your message or missed it, which is why I didn't respond earlier.

The CorDA PR is almost finished but there is still an opportunity to ask the authors to correct the table. From your comment, it was not quite clear to me what exactly was not entirely correct. Could you perhaps post directly on that PR and mention what could be improved?

paischer101187 days ago

@BenjaminBossan thanks for the response. Since EVA does not alter the base model weights, it is able to recover the base model and therefore allows full knowledge maintenance, i.e. the last column in the table should be a "yes" for EVA as well. I will make a post on the PR to clarify.

sirluk185 days ago

Hey @BenjaminBossan , just wanted to let you know there is a mistake in the latest peft release notes describing EVA. It says that we use "SVD of the base layer weights" which is not correct. EVA is a data driven method that performs incremental SVD on minibatches of finetuning data. The result is used to initilize LoRA and perform an adaptive rank redistribution. Could you maybe change that so that potential users are not confused?

githubnemo180 days ago👍 1

@sirluk Thanks for notifying us. I changed the release text accordingly.

	399
	400		elif self.init_lora_weights == "eva":
	401		if self.eva_config is None:
	402		raise ValueError("`eva_config` must be specified when `init_lora_weights` is 'eva'.")

	402		elif self.init_lora_weights == "eva" and self.eva_config is None:
	403		raise ValueError("`eva_config` must be specified when `init_lora_weights` is 'eva'.")
	404		elif self.init_lora_weights != "eva" and self.eva_config is not None:
	405		warnings.warn("`eva_config` specified but will be ignored when `init_lora_weights` is not 'eva'.")

	381		The dataloader to use for the forward pass.
	382		forward_fn: callable
	383		The forward function to use for the forward pass.
	384		```model(**inputs)``` is used if forward_fn is not provided.

	411		if device is not None:
	412		recursive_apply(model, device)
	413
	414		model.disable_adapter()

	359		model,
	360		peft_config,
	361		dataloader,
	362		forward_fn: callable = lambda model, inputs: model(**inputs),

peft
[FEAT] New LoRA Initialization Method: Explained Variance Adaptation
#2142

Merged

[FEAT] New LoRA Initialization Method: Explained Variance Adaptation #2142

Description

Implementation Details

Open Questions

Example Usage

Initialization

IncrementalPCA

Implementation

IncrementalPCA

	# for inits that require access to the base weight, use gather_param_ctx so that the weight is gathered when using DeepSpeed
	if isinstance(init_lora_weights, str) and init_lora_weights.startswith("pissa"):
	with gather_params_ctx(self.get_base_layer().weight):
	self.pissa_init(adapter_name, init_lora_weights)
	elif isinstance(init_lora_weights, str) and init_lora_weights.lower() == "olora":
	with gather_params_ctx(self.get_base_layer().weight):
	self.olora_init(adapter_name)
	elif init_lora_weights == "loftq":
	with gather_params_ctx(self.get_base_layer().weight):
	self.loftq_init(adapter_name)
	elif init_lora_weights:
	self.reset_lora_parameters(adapter_name, init_lora_weights)

	75		target_modules = ['q_proj', 'k_proj', 'v_proj', 'o_proj'],
	76		init_lora_weights = "eva",
	77		)
	78		peft_model = get_peft_model(model, peft_config, low_cpu_mem_usage=True)

	55		)
	56		peft_model = get_peft_model(model, peft_config, low_cpu_mem_usage=True)
	57
	58		initialize_lora_eva_weights(peft_model, peft_config, dataloader, device=svd_device)

	```model(**inputs)``` is used if forward_fn is not provided.
	`model(**inputs)` is used if forward_fn is not provided.

	392		adapter_name: (str)
	393		The name of the adapter to initialize the weights for.
	394		prepare_activations_fn: (Union[callable, Dict[str, callable], None])
	395		If a layer recieves multiple inputs as a list, per default the first input is used.

	If a layer recieves multiple inputs as a list, per default the first input is used.
	If a layer receives multiple inputs as a list, per default the first input is used.

	300		layer_hook_map = {dict(zip(hooks.keys(), hooks.keys())), equal_inputs_map}
	301
	302		# start svd calculation
	303		pbar = tqdm(iter(cycle(dataloader)), position=0, leave=False)

	311
	312		for name, hook in hooks.items():
	313		module = model.get_submodule(name)
	314		module._forward_hooks.clear()

	265		device = get_device_with_meta_params(model)
	266		model.eval()
	267
	268		hooks = {}

	161		"""
	162		Get the device of the model's parameters. Useful if some parameters are on meta device.
	163		"""
	164		devices = list(set([p.device for p in model.parameters() if str(p.device) != "meta"]))

	91		device: str = field(default="cuda", metadata={"help": "Device to use for EVA initialization"})
	92
	93		def __post_init__(self):
	94		assert self.rho >= 1.0, "early_stop_rho must be >= 1"

	88		use_label_mask: bool = field(default=False, metadata={"help": "Use label mask for EVA initialization"})
	89		label_mask_value: int = field(default=-100, metadata={"help": "if use_label_mask=True the value to look for to mask out ignored tokens"})
	90		whiten: bool = field(default=False, metadata={"help": "Apply whitening to singular vectors"})
	91		device: str = field(default="cuda", metadata={"help": "Device to use for EVA initialization"})

	1		# EVA: Efficient Vector Adaptation
	2		## Introduction ([Paper](https://arxiv.org/abs/2404.02948), [code](https://github.com/GraphPKU/EVA))

	64		# load dataset
	65		dataset = load_dataset("Rowan/hellaswag")
	66		dataset = dataset.map(
	67		lambda x: tokenizer(x['ctx'], padding='max_length', truncation=True, max_length=max_seq_len),

	72		@dataclass
	73		class EvaConfig:
	74		"""
	75		This is the sub-configuration class to store the configuration for a data-drive initialization via EVA.

	48		When `initialize_lora_eva_weights` is called, it will load the components calculated by EVA into the model. After this continue with standard LoRA finetuning.
	49
	50		## Getting eva_state_dict without loading the adapter weights
	51		If you want to get the eva_state_dict without loading the adapter weights, you can do the following:

	93		## Citation
	94		In case you find our work useful, please consider citing it.
	95
	96		@article{paischer2024eva,

	178		"""
	179		Get the device of the model's parameters. Useful if some parameters are on meta device.
	180		"""
	181		devices = list({p.device for p in model.parameters() if str(p.device) != "meta"})

	282		else:
	283
	284		def _check_fn(name, module):
	285		return any(name.endswith(t) for t in peft_config.target_modules)

	451		"""
	452		orig_device = get_device_with_meta_params(model)
	453		if device is not None:
	454		recursive_apply(model, device)

	337		indices = get_indices_fn(inputs, peft_config)
	338
	339		for name in list(hooks.keys()):
	340		hook, handle = hooks.get(name)

	377		}
	378		if len(remaining_svd_hooks) > 0:
	379		raise ValueError(
	380		f"All SVD hooks should have been removed but the following layers still have active hooks: {remaining_svd_hooks}"

	385		if rank == 0:
	386		continue
	387		hook = hooks[layer_hook_map[name]][0]
	388		assert torch.all(hook.converged[:rank]) # this should never happen because we check for convergence

	230		PCA method to compute the SVD components. The function also checks for convergence of the computed components using
	231		cosine similarity. The rank distribution for each layer is determined based on the explained variance ratio.
	232
	233		Parameters:

peft [FEAT] New LoRA Initialization Method: Explained Variance Adaptation #2142 Merged

[FEAT] New LoRA Initialization Method: Explained Variance Adaptation #2142

Description

Implementation Details

Open Questions

Example Usage

Initialization

IncrementalPCA

Implementation

IncrementalPCA

peft
[FEAT] New LoRA Initialization Method: Explained Variance Adaptation
#2142

Merged

	36		layer during the forward pass of a neural network. The hook also tracks convergence of the computed components
	37		using cosine similarity between the current and previous components.
	38
	39		Attributes:

	97
	98		def __post_init__(self):
	99		if self.rho < 1.0:
	100		raise ValueError("`rho` must be >= 1.")

	1567
	1568		@pytest.fixture
	1569		def dataset(self, tokenizer):
	1570		dataset = load_dataset("Rowan/hellaswag", split="train")

	1579		@pytest.fixture
	1580		def model(self):
	1581		model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
	1582		model.transformer.h = torch.nn.ModuleList([model.transformer.h[0]]) # truncate to 1 layer

	1585		@pytest.fixture
	1586		def peft_config(self):
	1587		return LoraConfig(
	1588		r=16, lora_alpha=1, target_modules=["c_attn"], init_lora_weights="eva", eva_config=EvaConfig(rho=2)

	1592		def peft_model(self, model, peft_config):
	1593		return get_peft_model(deepcopy(model), peft_config, low_cpu_mem_usage=True)
	1594
	1595		def test_eva_state_dict_consistency(self, model, dataset, peft_config):

	1617		f"is not greater than {self.COSINE_SIMILARITY_THRESHOLD}"
	1618		)
	1619
	1620		def test_eva_initialization_consistency(self, peft_model, dataset, peft_config):

	1557		COSINE_SIMILARITY_THRESHOLD = 0.75
	1558		NUM_SEEDS = 3
	1559		BATCH_SIZE = 4
	1560		MAX_LENGTH = 512

	65
	66		initialize_lora_eva_weights(peft_model, peft_config, dataloader, device=svd_device)
	67
	68		# from this point on, you can use the model as you would use a normal LoRA model

	1658		with pytest.warns(
	1659		UserWarning, match="`eva_config` specified but will be ignored when `init_lora_weights` is not 'eva'."
	1660		):
	1661		LoraConfig(init_lora_weights=True, eva_config=EvaConfig())