peft
[FEAT] New LoRA Initialization Method: Explained Variance Adaptation
#2142
Merged

[FEAT] New LoRA Initialization Method: Explained Variance Adaptation #2142

BenjaminBossan merged 65 commits into huggingface:main from sirluk:main
sirluk
sirluk251 days ago👍 3

Description

In our work "One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation" (paper) we introduce a new data-driven initialization method for LoRA adapters. We will present this work at ENLSP (oral) and AFM at NeurIPS'24.

Eva initializes LoRA in a data-driven manner based on information of the downstream data using SVD. It also adaptively allocates ranks throughout the model.

Implementation Details

  1. As the initialization is data-driven we created a new config class EvaConfig which at minimum needs to receive a value for the dataloader argument. Other parameters have default values which worked well in our experiments. The instantiated EvaConfig is passed as a parameter to LoraConfig. (we took inspiration from the LoftQConfig implementation).
    In LoraConfig "init_lora_weights" needs to be set to "eva"
  2. We modify the __init__ method of LoraModel to create a state dict for eva initialzation before initializing the parent class to be able to populate the rank_patternargument in LoraConfig. This way we directly initialize with a modified rank distribution allowing us to directly load the eva_state_dict.
  3. All other code necessary for the initialzation is self-contained in peft/tuners/lora/eva.py
  4. Under examples/eva_finetuning/eva_finetuning.py we add an example script for eva initialization

Open Questions

  • Not sure if self.model.load_state_dict(eva_state_dict, strict=False) is always a valid way to initialize. E.g. when the model is wrapped with torch.compile
  • We currently only tested a single GPU setting. We are not sure what the best way is to have multi-gpu support (only run init on rank 0 and copy to other ranks when done?)

Example Usage

eva_config = EvaConfig(
    dataloader = dataloader
)
peft_config = LoraConfig(
    r = 16,
    lora_alpha = 1,
    target_modules = ['q_proj', 'k_proj', 'v_proj', 'o_proj'],
    init_lora_weights = "eva",
    eva_config = eva_config,
)
peft_model = get_peft_model(model, peft_config)
BenjaminBossan
BenjaminBossan251 days ago

Thanks a lot for this PR. I only skimmed it for now and have yet to read the details from the paper but this looks quite promising!

When looking at the implementation, I saw that the way the Eva weights are initialized sits a bit "outside" of how we normally do this in PEFT. The normal way is to have this on the layer level, check for instance here:

# for inits that require access to the base weight, use gather_param_ctx so that the weight is gathered when using DeepSpeed
if isinstance(init_lora_weights, str) and init_lora_weights.startswith("pissa"):
with gather_params_ctx(self.get_base_layer().weight):
self.pissa_init(adapter_name, init_lora_weights)
elif isinstance(init_lora_weights, str) and init_lora_weights.lower() == "olora":
with gather_params_ctx(self.get_base_layer().weight):
self.olora_init(adapter_name)
elif init_lora_weights == "loftq":
with gather_params_ctx(self.get_base_layer().weight):
self.loftq_init(adapter_name)
elif init_lora_weights:
self.reset_lora_parameters(adapter_name, init_lora_weights)

My guess why this is implemented differently here is that there is currently one pass through the modules that determines the individual ranks and creates the state dict. If this was refactored such that the weights were initialized on the layer level as usual, we would need two passes, one to adjust the ranks in the config and then the actual weight initialization on the layer level. Would it be possible to refactor the code to enable that?

My main concern with the current approach is that the weight initialization currently is happening in one place of the code but now there would be another place, making the code more disorganized. Second, IIUC, we would initialize the weights once and then override them with Eva parameters, which is wasteful. Finally, if we do this on the layer level, that could solve the multi-GPU issue, right?

Apart from this, what I noticed while skimming is that there is a big class IncrementalPCA which looks like a rewrite of the equivalent sklearn class using torch. My question is: Did you implement this from scratch or is it taken from somewhere? This has implications on how well this is tested etc.

I also had difficulties understanding the SVDHook and HashHook, could you please add a small docstring? Are these hooks permanently added?

Regarding torch.compile, we could add a test for that and see if it works. But this is not a necessity for this PR to land in PEFT, just something that's nice to have.

Going forward, we'll also have to add a few unit tests, but I'm fine with iterating on the implementation first and then adding the tests.

sirluk
sirluk250 days ago (edited 250 days ago)

Thanks a lot for the quick feedback @BenjaminBossan !

I added docstrings to SVDHookand HashHook, hope this makes their purpose a bit more clear.

Regarding your other concerns about the implementation:

  • I agree its not that our proposed init method is not on a layer level as the other inits. I thought a lot about where in the code to put the init and unfortunately I think it would be quite difficult to put the init directly in the layer. (or at least I haven't found an elegant way to do it). The way eva is initialized is that we do multiple forward passes through the model, and incrementally perform SVD on the input activations of the target layers (usually more than 10 forward passes). The resulting singular vectors are used to initialize the Lora A matrices. So unfortunately 2 passes would also not be enough. We would also need to somehow pass an instance of EvaConfig to the LoraLayer.
    What we could maybe do is create the eva_state_dict as it is created now and then somehow pass the state dict to LoraModel._create_and_replace and then the relevant tensors to the update_layer methods of individual layers.
    I did also add a statement in the update_layer method to intialize the Lora B matrices with zeros as needed for eva. For this reason we do not initialize weights twice as the A matrices are not initialized in the update_layer method.

  • Regarding the new eva code. IncrementalPCA is indeed largely take from the sklearn class. So a lot of the code is directly copied from there. What we added is seamless gpu support by setting the device for all internal variables / buffers to the device of the first input tensor. We also integrated support for torch.svd_lowrank to significantly speed up computations. We did several tests internally if the results are equivalent when running this on gpu compared to running the sklearn implementation on cpu.

BenjaminBossan
BenjaminBossan250 days ago

Thanks for the further explanations. Let me discuss the points separately.

Initialization

I see the issue with the cycling through the data. This is indeed tricky but I think it's worth putting some time into considering alternative solutions.

Right now, the dataloader is added to the config. We should really try to avoid this. The config should be a small, json-serializable object. This means it can't really contain any reference to the data. Moreover, there are so many unknowns about the data and how the model is called that I don't think we should have code like:

    inputs = {k: v.to(device) for k, v in next(iter(dataloader)).items() if k != "labels"}
    model(**inputs)

Let me throw out an alternative: Let's say that we had a function called initialize_lora_eva_weights or something like that. The API would be:

# normal PEFT code
base_model = ...
lora_config = LoraConfig(...)
model = get_peft_model(base_model, lora_config)

# new eva code
eva_config = EvaConfig(...)
sample_data = ...  # user needs to define this instead of relying on data loader

def forward_fn(model):
    return model(**sample_data)

model = initialize_lora_eva_weights(model, config=eva_config, forward_fn=forward_fn, adapter_name="default")

This would make it so that Eva is a function that is applied to the LoRA model after it was initialized. This avoids the issues I mentioned above.

The disadvantages I see with this approach:

  1. Eva initialization is a separate step. Therefore, packages that want to support it have to explicitly account for it.
  2. We would initialize the weights twice (but it's not really worse than right now).

Do you think this would work with the Eva algorithm or is there something I'm missing? LMK what you think.

IncrementalPCA

I wonder if we could use some of the sklearn unit tests and add them here to ensure that the IncrementalPCA produces the correct results. That way, we have the safety of knowing it works as expected without too much effort of adding our own unit tests.

Of course, this is all a bit outside of scope for PEFT, but if there is no external package that provides the functionality, it would be okay to add it to PEFT.

sirluk
sirluk250 days ago

Implementation

I was wondering already if having the dataloader in the confige is a good idea since you probably only want primitive types there. I think your suggested solution is a sensible one. I gave it some thought and dont really have a better suggestion following the restrictions we discussed.

A few questions for clarification:

  • Would eva still be a valid value for init_lora_weights in LoraConfig. I am a bit worried about discoverability of eva if LoraConfig would then just be normal lora settings and someone needs to know about the function initialize_lora_eva_weights to use eva.
  • In your suggested solution, would sample data not be a dataloader?
  • Is it necessary to define a custom forward function for the eva init? Usually it should be anyway the same inputs that the model receives during normal finetuning. I did adapt the line you mentioned filtering out the label key from the inputs dict.
  • One question about how we get the layer activations for SVD. In the forward hook SVDHook we currently have it implemented like this
try:
    states = input.detach()
except AttributeError:
    states = input[0].detach()

So for cases where we input is a tuple, we just take the first value. Not super important at this stage I guess but I just wanted to ask if you think we should add an argument so the user can define what the inputs we should use are.

IncrementalPCA

It's a good idea to take some from sklearn. All tests for this class are in this file. Right now there is nothing like this available anywhere. I built my own package torch-incremental-pca but I just assumed you dont want any dependencies to small packages like this so I just copied the class over here. I did however create a pull request to have this feature implemented in pytorch directly so in the future we might be able to remove this from peft again.

BenjaminBossan
BenjaminBossan250 days ago

I was wondering already if having the dataloader in the confige is a good idea since you probably only want primitive types there. I think your suggested solution is a sensible one. I gave it some thought and dont really have a better suggestion following the restrictions we discussed.

Okay, then let's go with this general approach. Maybe we have some ideas for improvements along the way, then we can still pivot.

  • Would eva still be a valid value for init_lora_weights in LoraConfig. I am a bit worried about discoverability of eva if LoraConfig would then just be normal lora settings and someone needs to know about the function initialize_lora_eva_weights to use eva.

Yes, let's keep it as an option. By itself, it wouldn't do anything, but it helps with discoverability. We could also consider setting some kind of flag and when users want to use the model with Eva init but before calling the function we discussed, the flag could be checked and raise an error/warning. Moreover, with this init option being set, we can signal to other packages that they need to call the function (e.g. trl could automatically call this in their SFTTrainer).

  • In your suggested solution, would sample data not be a dataloader?

Exactly, I don't think we absolutely need a dataloader, do we? AFAICT, we only need it to get a batch of data. If it needs to be a different batch each time, we could do:

iter_dl = iter(cycle(dataloader))  # optionally tqdm

def forward_fn(model):
    sample_data = next(iter_dl)
    return model(**sample_data)

Is it necessary to define a custom forward function for the eva init? Usually it should be anyway the same inputs that the model receives during normal finetuning. I did adapt the line you mentioned filtering out the label key from the inputs dict.

My reasoning for this is that there are too many possibilities how the model could be called, we can't cover all of them. The assumption that we can just filter out labels and then call model(**inputs) might mostly work for standard HF LLM architectures, but this method should also work with other model types, right? Let's keep the possibilities open.

But thinking more about it, I'd say what we can do is to keep your existing API of passing the dataloader and give the option for a custom forward_fn on top. That way, there is a more convenient way for the standard models and a more flexible and advanced way for all the others.

  • One question about how we get the layer activations for SVD.

I'd definitely rather have an instance check than a try ... except for this. But I'm not sure if that's what you were asking.

I just assumed you dont want any dependencies to small packages like this so I just copied the class over here.

That's correct. We could make it an optional dependency, as it is only needed for Eva. But we could also copy the code over to PEFT ("vendoring"), which is similar to what we did with BufferDict. What I would like to see: If you copy some of the sklearn tests to your package and ensure that they pass, we wouldn't need them in PEFT too, we could just say: "check this URL for the original code". This way, there is a bit less clutter.

I did however create a pull request to have this feature implemented in pytorch directly so in the future we might be able to remove this from peft again.

Nice. Of course, even if it lands, we can't use it immediately as we want to support older PyTorch versions. But definitely add a comment with a link to the PR. I would imagine, however, that PyTorch won't be too keen on the sklearn API, but let's see.

sirluk
sirluk249 days ago

@BenjaminBossan before going ahead with this implementation of calling a function/method after get_peft_model is called I just wanted to propose one alternative idea to get your thoughts on it.

Instead of using LoraModel we create a subclass EvaModel. This model class initially would not have any adapter layers but we would add the forward hooks needed for the SVD computations. In the __init__ we would have something like

class EvaModel(LoraModel):

	def __init__(self, model, config, adapter_name)
		self.model = model
		for layer in config.target_modules:
			module = get_module(self.model, layer)
			module.register_forward_hooks(SVDHook(config))

During the first few forward passes of finetuning we would not actually update any weights but use the activations to compute the eva state dict. Once SVD is converged for all layers we would switch the internal state of the model to finetuning (with some boolean variable) and at that point only add the adapters with the appropriate eva values for the lora A matrices.

Hope this makes sense. That way we could leverage the finetuning loop directly. It seem a bit more seamless and works without needing any additional function calls after get_peft_model. But of course there might be some hidden issues I am not considering at the moment.

BenjaminBossan
BenjaminBossan249 days ago👍 1

I like the proposal to seamlessly integrate this into the training loop. I'm not sure if we need a separate class for that, but you could come up with a prototype and I can check if there is a good way to integrate it without a separate class.

With this, we would need to make sure that it works with vanilla training loops but also frameworks like Trainer from transformers. If it does, I think this could be the optimal solution.

sirluk
sirluk247 days ago (edited 247 days ago)

@BenjaminBossan I started implementing this approach of using the first few batches for the initialization and unfortunately I realised that there are at least 2 major issues with it:

  • The LoRA layers must be initialized at the beginning of training so that they can be passed to the optimizer. This is especially important if it needs to work out of the box with the HuggingFace Trainer. However, this means that the LoRA weights would take up unnecessary GPU memory that would be needed for the SVD computation. This is btw also an issue with the solution you proposed of doing a normal LoRA init first and then calling a function to initalize eva.
  • Further, if you use a learning rate schedule, it would already adjust the learning rate while you are still executing the SVD steps. This means that when you start fine-tuning, the learning rate is different from the intended initial learning rate. Especially problematic if there is a warmup I would assume.

These issues makes me wonder if some variant if my initially proposed solution would be feasible (computing the eva state dict before initializing the adapters)

One idea I would have which avoids passing a dataloader or something similar through the config would be to enable the get_peft_modelfunction to accept some optional keyword arguments through **kwargs and passing the dataloader as well as a model forward_fn this way.

def get_peft_model(
	model: PreTrainedModel,
	peft_config: PeftConfig,
	adapter_name: str = "default",
	mixed: bool = False,
	autocast_adapter_dtype: bool = True,
	revision: Optional[str] = None,
	**model_init_kwargs
):
   ...
   PeftModel(model, peft_config, adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype, **model_init_kwargs)
   ...



peft_model = get_peft_model(model, peft_config, eva_dataloader=dataloader, eva_forward_fn=eva_forward_fn)

We also don't directly have to modify the __init__ method of LoraModel but define a _pre_injection_hook for LoraModel which should also do the trick.

Let me know what you think about the issues I mentioned with the previously proposed approaches. If my suggested approach is not feasible, I am fine with going ahead with the implementation suggested by you with a initialize_lora_eva_weights function (this will also unfortunately suffer from the downside mentioned above that already initialized lora weights will take up gpu memory that could potentially be useful for the svd computation, which we might want to run with a higher batch size than finetuning).

BenjaminBossan
BenjaminBossan246 days ago (edited 245 days ago)

I started implementing this approach of using the first few batches for the initialization and unfortunately I realised that there are at least 2 major issues with it:

Good points. There would probably be ways to work around this or accept these shortfalls, but I agree that this is not ideal.

One idea I would have which avoids passing a dataloader or something similar through the config would be to enable the get_peft_modelfunction to accept some optional keyword arguments through **kwargs and passing the dataloader as well as a model forward_fn this way.

I'm a bit reluctant to add more options to get_peft_model, especially when they're only used in a very narrow use case (e.g. this option only works for LoRA but get_peft_model is the entry point for all PEFT methods). Therefore, I tend to prefer the option of having a dedicated function.

To avoid initializing the LoRA weights through get_peft_model, I have a proposal. We recently added an option low_cpu_mem_usage=True to PEFT, which will results in the PEFT weights being initialized on meta device. This option has not been propagated to get_peft_model because until now, there was no reason to have it there. However, with Eva, it would be useful. So we could add the option there and then just pass it along to PeftModel in this line, which already accepts that argument. That way, we avoid initializing the LoRA weights. WDYT about that?

sirluk
sirluk245 days ago

@BenjaminBossan thanks for your suggestions, I implemented a prototype solution now for the discussed approach. A few questions:

  • I moved the IncrementalPCA class which has been adapted from sklearn to peft.utils.incremental_pca.py because I thought it is not strictly part of eva and in theory could be used for other things, does that make sense?
  • Ideally users should be able to do from peft import initialize_lora_eva_weights`. Would it be suitable to then put this function in peft.utils or should it be in peft.tuners.lora. I didnt see any other functions being pulled directly into the peft namespace from the tuners directory.
  • I was wondering if we should use low_cpu_mem_usage=True as a default for eva. That way we would even need to have it as an argument for get_peft_model
BenjaminBossan
BenjaminBossan commented on 2024-10-16
BenjaminBossan245 days ago

I implemented a prototype solution now for the discussed approach.

Thanks a lot. I think the API is decent and given the trade-offs we discussed, probably a good compromise.

  • I moved the IncrementalPCA class which has been adapted from sklearn to peft.utils.incremental_pca.py because I thought it is not strictly part of eva and in theory could be used for other things, does that make sense?

Yes, that's fine. In the end, I think it doesn't really matter. What's important is that it is not in the peft namespace (i.e. from peft import IncrementalPCA should not be possible), apart from that it doesn't matter too much. We can always move it later.

  • Ideally users should be able to do from peft import initialize_lora_eva_weights`. Would it be suitable to then put this function in peft.utils or should it be in peft.tuners.lora. I didnt see any other functions being pulled directly into the peft namespace from the tuners directory.

We do pull objects from peft.tuners into the peft namespace, otherwise from peft import LoraConfig would not be possible :) As for initialize_lora_eva_weights: Right now, this only works for LoRA and it's not planned to change that, right? Therefore, I would keep it in the tuners/lora directory. There, import it into __init__, then from there into the __init__ from tuners/ and from there into the __init__ from peft.

  • I was wondering if we should use low_cpu_mem_usage=True as a default for eva. That way we would even need to have it as an argument for get_peft_model

How exactly would you do that, given that the get_peft_model is done by the user?

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/config.py
399
400 elif self.init_lora_weights == "eva":
401 if self.eva_config is None:
402
raise ValueError("`eva_config` must be specified when `init_lora_weights` is 'eva'.")
BenjaminBossan245 days ago👍 1

Let's also check the reverse, i.e. if self.init_lora_weights != "eva", there should be no eva_config.

sirluk
sirluk241 days ago

@BenjaminBossan thanks again for your feedback!

We do pull objects from peft.tuners into the peft namespace, otherwise from peft import LoraConfig would not be possible :)

I pulled initialize_lora_eva_weights and get_eva_state_dict into the peft namespace now. I asked because I only saw classes being pulled from tuners, no functions so I was wondering if there is some rule for that.

How exactly would you do that, given that the get_peft_model is done by the user?

I thought we could just not have a check in the LoraModel.__init__

low_cpu_mem_usage = init_lora_weights="eva" or low_cpu_mem_usage

Regarding tests I adapted some tests from sklearn for the incremental_pca class in tests/test_incremental_pca.py. Could you maybe advice me what the minimum test setup should look like for eva.

lukash-jku initial commit
ec630939
lukash-jku add target modules based on eva_state_dict
dc26e7e5
lukash-jku remove default args
70171c8a
lukash-jku cleanup
f537b8ed
lukash-jku set correct prefix for state dict loading
a3ae4bae
lukash-jku revert
dd5a38b1
sirluk update docstrings, minor changes
2cfdf0c9
sirluk update docstrings, integrate peft changes
f3f89c6b
sirluk fix issues related to config having either type PeftConfig or dict
b71b4222
sirluk move state_dict modification to function
571eca8b
sirluk update paper link
bec1f80d
sirluk remove comments
fc343675
sirluk add documentation
0e8d29ce
sirluk add docstrings
c41e8204
sirluk update
eff0c9da
sirluk fix docs and add default evaconfig
1c912726
sirluk add eva init to peft namespace
57799713
sirluk add test for lowrank argument
b5968d29
sirluk add check
a397418f
sirluk simplify arguments
57b72b47
sirluk update docstrings
073f0a1e
sirluk optimize indices calc
6d0e0d7d
sirluk sirluk force pushed from 2779bf34 to 6d0e0d7d 240 days ago
BenjaminBossan
BenjaminBossan requested changes on 2024-10-21
BenjaminBossan239 days ago

Thanks a lot for these updates to the PR.

Please always run make style after you made changes to the code so that the linter will pass.

I pulled initialize_lora_eva_weights and get_eva_state_dict into the peft namespace now. I asked because I only saw classes being pulled from tuners, no functions so I was wondering if there is some rule for that.

Ah okay, no worries, there are no rules about that.

I thought we could just not have a check in the LoraModel.__init__

low_cpu_mem_usage = init_lora_weights="eva" or low_cpu_mem_usage

I see. I think I'd rather not automatically switch this, maybe users have a specific reason why they need low_cpu_mem_usage=False despite using Eva. What you could do is give a warning if you find this, advising users that passing low_cpu_mem_usage=True is most likely what they want.

Regarding tests I adapted some tests from sklearn for the incremental_pca class in tests/test_incremental_pca.py.

I think what you have is more than enough, the test coverage for the class is very good. Let's just add a comment on where you adopted these tests from.

Could you maybe advice me what the minimum test setup should look like for eva.

This is a very good question. From my understanding, we can't really check the adjusted weights to say that after applying Eva, the weights should have this or that property. Just as an example, for PiSSA/OLoRA, we have tests that they reduce the quantization error. But I don't think we can have something similar here, right?

What we could do is just calling initialize_lora_eva_weights with a couple of different options to ensure that there is no error being raised. This helps with the test coverage but it wouldn't really help with ensuring that Eva actually works as described in the paper.

Maybe we can expand the included example to replicate one of the findings from the paper. That way, we would have a high confidence that it works as promised, WDYT about that?

Alternatively, we could create small self-contained training runs in the test, using a small model and dataset. Then we would train the model once with and once without Eva. The test would be e.g. that with Eva, the model converges faster. This would be a more functional test but of course it has the danger of being flaky/seed dependent and could be slow to run.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/config.py
402 elif self.init_lora_weights == "eva" and self.eva_config is None:
403 raise ValueError("`eva_config` must be specified when `init_lora_weights` is 'eva'.")
404 elif self.init_lora_weights != "eva" and self.eva_config is not None:
405
warnings.warn("`eva_config` specified but will be ignored when `init_lora_weights` is not 'eva'.")
BenjaminBossan239 days ago

Would it not be better to raise an error here or are there valid use cases for this?

sirluk239 days ago

I thought maybe when someone is comparing multiple lora init methods they can just have the eva_config always inside the lora_config and need to just change the init_lora_weights parameter

BenjaminBossan239 days ago

I see, I guess that could happen. Then I'm fine with leaving it as is.

Conversation is marked as resolved
Show resolved
docs/source/developer_guides/lora.md
75 target_modules = ['q_proj', 'k_proj', 'v_proj', 'o_proj'],
76 init_lora_weights = "eva",
77)
78
peft_model = get_peft_model(model, peft_config, low_cpu_mem_usage=True)
BenjaminBossan239 days ago

Let's mention quickly in the text that users should pass low_cpu_mem_usage=True as an optimization.

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/eva_finetuning.py
1
import torch
BenjaminBossan239 days ago

Let's add the license notice to the top. Also, a short README.md would be helpful for users to understand this example.

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/eva_finetuning.py
55)
56peft_model = get_peft_model(model, peft_config, low_cpu_mem_usage=True)
57
58
initialize_lora_eva_weights(peft_model, peft_config, dataloader, device=svd_device)
BenjaminBossan239 days ago

From here on out, the user should continue with normal LoRA fine-tuning, right? Let's add a comment here to clarify this.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
381 The dataloader to use for the forward pass.
382 forward_fn: callable
383 The forward function to use for the forward pass.
384
```model(**inputs)``` is used if forward_fn is not provided.
BenjaminBossan239 days ago
Suggested change
```model(**inputs)``` is used if forward_fn is not provided.
`model(**inputs)` is used if forward_fn is not provided.
Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
411 if device is not None:
412 recursive_apply(model, device)
413
414
model.disable_adapter()
BenjaminBossan239 days ago

This can be removed, right?

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
359 model,
360 peft_config,
361 dataloader,
362
forward_fn: callable = lambda model, inputs: model(**inputs),
BenjaminBossan239 days ago

Let's make this optional, and if None is passed, define a local function and use that for forward_fn. Let's avoid assigning a lambda function.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
392 adapter_name: (str)
393 The name of the adapter to initialize the weights for.
394 prepare_activations_fn: (Union[callable, Dict[str, callable], None])
395
If a layer recieves multiple inputs as a list, per default the first input is used.
BenjaminBossan239 days ago
Suggested change
If a layer recieves multiple inputs as a list, per default the first input is used.
If a layer receives multiple inputs as a list, per default the first input is used.
Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
1
import torch
BenjaminBossan239 days ago

Let's add the license notice.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
300 layer_hook_map = {**dict(zip(hooks.keys(), hooks.keys())), **equal_inputs_map}
301
302 # start svd calculation
303
pbar = tqdm(iter(cycle(dataloader)), position=0, leave=False)
BenjaminBossan239 days ago

Just wondering: Should we make the progress bar optional, e.g. via an extra argument? Maybe some users don't want it.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
311
312 for name, hook in hooks.items():
313 module = model.get_submodule(name)
314
module._forward_hooks.clear()
BenjaminBossan239 days ago

Hmm, should we call hook.remove() instead? What if there are other, unrelated forward hooks on the model?

sirluk239 days ago

thats true, I implemented a logic now to store the active hook handles. Calling hook.remove() directly doesnt work, we need to call handle.remove()

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
265 device = get_device_with_meta_params(model)
266 model.eval()
267
268
hooks = {}
BenjaminBossan239 days ago

Do we need a check at the end that all the hooks have been cleaned up or is that not necessary?

sirluk239 days ago

good idea! I implemented a check now

    # check all svd hooks have been removed
    remaining_svd_hooks = {n for n, m in model.named_modules() for v in m._forward_hooks.values() if isinstance(v, SVDHook)}
    if len(remaining_svd_hooks) > 0:
        raise ValueError(f"All SVD hooks should have been removed but the following layers still have active hooks: {remaining_svd_hooks}")
Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
161 """
162 Get the device of the model's parameters. Useful if some parameters are on meta device.
163 """
164
devices = list(set([p.device for p in model.parameters() if str(p.device) != "meta"]))
BenjaminBossan239 days ago

You can use a set comprehension: {p.device for ...}.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/config.py
91 device: str = field(default="cuda", metadata={"help": "Device to use for EVA initialization"})
92
93 def __post_init__(self):
94
assert self.rho >= 1.0, "early_stop_rho must be >= 1"
BenjaminBossan239 days ago

Let's completely avoid all asserts except for tests. Instead, raise a ValueError here. Same for the other asserts used below.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/config.py
72@dataclass
73class EvaConfig:
74 """
75
This is the sub-configuration class to store the configuration for a data-drive initialization via EVA.
BenjaminBossan239 days ago

Let's add a link to the paper here as well. Also, what's always nice to have for users is to give some indication when/how to change the parameters. E.g. "increase rho if ...".

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/config.py
88 use_label_mask: bool = field(default=False, metadata={"help": "Use label mask for EVA initialization"})
89 label_mask_value: int = field(default=-100, metadata={"help": "if use_label_mask=True the value to look for to mask out ignored tokens"})
90 whiten: bool = field(default=False, metadata={"help": "Apply whitening to singular vectors"})
91
device: str = field(default="cuda", metadata={"help": "Device to use for EVA initialization"})
BenjaminBossan239 days ago

Do we need this? I don't think so.

sirluk Update src/peft/tuners/lora/eva.py
3574109f
sirluk add warning
91e87e6f
sirluk update
a133e8a1
sirluk add check if all hooks have been removed
523f1cda
sirluk extend documentation
261c81e6
sirluk make style
566ebf04
sirluk add tests for eva
e940efab
sirluk
sirluk238 days ago (edited 238 days ago)

@BenjaminBossan thanks for all your suggestions, I integrated all of it into the pull request.

Regarding testing I added a new class TestEvaInitialization into tests_initialization.py. The tests basically run eva initialization three times for different seeds and check that the average cosine similarity between the same svd component from different seeds is above a certain threshold. This invariance to batch ordering is actually also one of the findings from our paper. I fear doing a finetuning run would probably take up too much resources / take to long for a testcase.
Overall I defined five test cases:

  1. Check the get_eva_state_dict function which returns the state_dict for eva. We use a non-peft model because testing peft models is covered with the next test
  2. Check initialize_lora_eva_weights with a LoRA model which internally also calls get_eva_state_dict
  3. Check the __post_init__ method of EvaConfig for invalid settings.
  4. and 5. check LoraConfig for valid eva config settings

During testing I noticed a potential issue with how the loftq_config field is defined in LoraConfig. I added an issue and could also create a separate pull request for it.

sirluk add licence notice
ba82bd51
sirluk add tests for lora_config with eva
202f933a
BenjaminBossan
BenjaminBossan requested changes on 2024-10-23
BenjaminBossan237 days ago

Thanks a lot for all the updates. I did a more in-depth review now and found a few potential issues, please check my comments. Apart from those, could you please:

  1. Always run make style after all changes are done
  2. Resolve the merge conflict.
  3. Clarify: Should it work with quantization like bnb? I think so but maybe it can be made explicit or even a small test be added.
Conversation is marked as resolved
Show resolved
examples/eva_finetuning/README.md
1# EVA: Efficient Vector Adaptation
2
## Introduction ([Paper](https://arxiv.org/abs/2404.02948), [code](https://github.com/GraphPKU/EVA))
BenjaminBossan237 days ago

The code link appears to be invalid (private repo maybe?), but I don't think we need it at all, since they code will be in PEFT once this is merged.

sirluk235 days ago

thanks, fixed the link! I would actually leave the code link in, we have alot of scripts there which allows people to reproduce the results from the paper.

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/README.md
48When `initialize_lora_eva_weights` is called, it will load the components calculated by EVA into the model. After this continue with standard LoRA finetuning.
49
50## Getting eva_state_dict without loading the adapter weights
51
If you want to get the eva_state_dict without loading the adapter weights, you can do the following:
BenjaminBossan237 days ago

Could you please add a sentence on why users may want to get this?

sirluk235 days ago

done

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/README.md
93## Citation
94In case you find our work useful, please consider citing it.
95
96
@article{paischer2024eva,
BenjaminBossan237 days ago

Let's enclose this in ``` to make it more readable.

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/README.md
64# load dataset
65dataset = load_dataset("Rowan/hellaswag")
66dataset = dataset.map(
67
lambda x: tokenizer(x['ctx'], padding='max_length', truncation=True, max_length=max_seq_len),
BenjaminBossan237 days ago

A couple of values like max_seq_len, rank etc. are not defined here. Let's define them, at the top maybe, so that people can actually copy+paste the code and it works.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
178 """
179 Get the device of the model's parameters. Useful if some parameters are on meta device.
180 """
181
devices = list({p.device for p in model.parameters() if str(p.device) != "meta"})
BenjaminBossan237 days ago
Suggested change
devices = list({p.device for p in model.parameters() if str(p.device) != "meta"})
devices = list({p.device for p in model.parameters() if p.device.type != "meta"})
Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
282 else:
283
284 def _check_fn(name, module):
285
return any(name.endswith(t) for t in peft_config.target_modules)
BenjaminBossan237 days ago

This is a bit too simplistic, just as an example, target_modules can also be a regex. I think you should be able to use model._check_target_module_exists(peft_config, name).

sirluk235 days ago

thanks for the suggestion, I improved this part a bit now.

  • for peft models I now just check hasattr(module, "base_layer") and I defined a new variable UNSUPPORTED_LORA_MODULES which are not used for eva (atm lora.Embedding and lora.Conv layers are not supported)
  • for non-peft models I am using check_target_module_exists now and filter for only torch.nn.Linear and transformers.pytorch_utils.Conv1D layers
Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
451 """
452 orig_device = get_device_with_meta_params(model)
453 if device is not None:
454
recursive_apply(model, device)
BenjaminBossan237 days ago

What would be a situation where we need to pass device and run this function?

sirluk235 days ago

I guess its fine to remove it and rely on the user to move the model to the appropriate device before calling this function

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
446 ```
447
448 Returns:
449
-------- model: (torch.nn.Module)
BenjaminBossan237 days ago

missing line break

sirluk235 days ago

I guess make style did this as I was using the wrong docstring format

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
474 if not isinstance(module, LoraLayer):
475 continue
476 if name in eva_state_dict:
477
w = eva_state_dict.pop(name)
BenjaminBossan237 days ago

Let's use named parameters here instead of relying on position. I think this can also simplify the code: Outside of the loop, create a kwargs dict with all the arguments that are the same in both calls, then only pass the arguments that differ.

sirluk235 days ago

make sense, done

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
337 indices = get_indices_fn(inputs, peft_config)
338
339 for name in list(hooks.keys()):
340
hook, handle = hooks.get(name)
BenjaminBossan237 days ago

Why not hooks[name]?

sirluk235 days ago

switched to hooks[name]

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
377 }
378 if len(remaining_svd_hooks) > 0:
379 raise ValueError(
380
f"All SVD hooks should have been removed but the following layers still have active hooks: {remaining_svd_hooks}"
BenjaminBossan237 days ago

What should a user do in this case? IIUC, this would be a bug and they should report an issue on GH. If that's the case, please add that instruction to the error message.

sirluk235 days ago

added a message that an issue should be created

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
385 if rank == 0:
386 continue
387 hook = hooks[layer_hook_map[name]][0]
388
assert torch.all(hook.converged[:rank]) # this should never happen because we check for convergence
BenjaminBossan237 days ago

Let's raise a proper error here.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
230 PCA method to compute the SVD components. The function also checks for convergence of the computed components using
231 cosine similarity. The rank distribution for each layer is determined based on the explained variance ratio.
232
233
Parameters:
BenjaminBossan237 days ago

You're using the numpydoc docstring convention here (which sklearn also uses), but we use the (google?) docstring convention in PEFT. Could you please adjust it here and throughout?

sirluk235 days ago

done

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
255 ```
256
257 Returns:
258
-------- eva_state_dict: (dict)
BenjaminBossan237 days ago

missing line break

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
36 layer during the forward pass of a neural network. The hook also tracks convergence of the computed components
37 using cosine similarity between the current and previous components.
38
39
Attributes:
BenjaminBossan237 days ago

"Attributes" should be "Parameters" but as mentioned elsewhere, we use a different docstring convention altogether.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/config.py
97
98 def __post_init__(self):
99 if self.rho < 1.0:
100
raise ValueError("`rho` must be >= 1.")
BenjaminBossan237 days ago

Would it make sense to also add a check on tau?

sirluk235 days ago

yes, for sure that makes sense, I added a check

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1567
1568 @pytest.fixture
1569 def dataset(self, tokenizer):
1570
dataset = load_dataset("Rowan/hellaswag", split="train")
BenjaminBossan237 days ago

I'd rather not use this dataset here in the tests, as it's downloaded from an external source. Would the test also work with, for instance, "ybelkada/english_quotes_copy", which already use in many other tests? Maybe that can also help speed up testing.

sirluk235 days ago

done

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1579 @pytest.fixture
1580 def model(self):
1581 model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
1582
model.transformer.h = torch.nn.ModuleList([model.transformer.h[0]]) # truncate to 1 layer
BenjaminBossan237 days ago

Just wondering if we should maybe keep 2 layers to cover edge cases a little bit better while still being fast.

sirluk235 days ago

thats a good point!

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1585 @pytest.fixture
1586 def peft_config(self):
1587 return LoraConfig(
1588
r=16, lora_alpha=1, target_modules=["c_attn"], init_lora_weights="eva", eva_config=EvaConfig(rho=2)
BenjaminBossan237 days ago

Is there a specific reason for these a bit strange parameters: r=16, alpha=1?

sirluk235 days ago

its the setting for most experiments in our paper

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1592 def peft_model(self, model, peft_config):
1593 return get_peft_model(deepcopy(model), peft_config, low_cpu_mem_usage=True)
1594
1595
def test_eva_state_dict_consistency(self, model, dataset, peft_config):
BenjaminBossan237 days ago

Could you please add a short explanation of what is tested here and why?

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1617 f"is not greater than {self.COSINE_SIMILARITY_THRESHOLD}"
1618 )
1619
1620
def test_eva_initialization_consistency(self, peft_model, dataset, peft_config):
BenjaminBossan237 days ago

Could you please add a short explanation of what is tested here and why?

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1557 COSINE_SIMILARITY_THRESHOLD = 0.75
1558 NUM_SEEDS = 3
1559 BATCH_SIZE = 4
1560
MAX_LENGTH = 512
BenjaminBossan237 days ago

Maybe we can go down here for speed?

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/eva_finetuning.py
65
66initialize_lora_eva_weights(peft_model, peft_config, dataloader, device=svd_device)
67
68
# from this point on, you can use the model as you would use a normal LoRA model
BenjaminBossan237 days ago

Just wondering: Would it make sense to actually include the fine-tuning code here, ideally using an example from/close to the paper. Or alternatively, add a second script with a complete example?

sirluk235 days ago (edited 235 days ago)

yes I guess it makes sense to have a complete example, anyway its basically just initializing the Trainer class. I added the necessary code to run a full example which is close to one of the experiments from our paper

src/peft/tuners/lora/eva.py
486 )
487 module.lora_A[adapter_name].weight.copy_(w)
488 else:
489
module.update_layer(
BenjaminBossan237 days ago

What would be a situation when this can occur? During the unit testing, it is not covered.

sirluk235 days ago (edited 235 days ago)

Its mostly because we dont support embedding and conv layers at the moment. So if adapters are added to those layers we cant initialize it with layer so the next best thing would be to do normal lora initalization by default. Do you think a separate test case is needed here?

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1658 with pytest.warns(
1659 UserWarning, match="`eva_config` specified but will be ignored when `init_lora_weights` is not 'eva'."
1660 ):
1661
LoraConfig(init_lora_weights=True, eva_config=EvaConfig())
BenjaminBossan237 days ago

Can we have small tests for:

  • whiten=True
  • prepare_activations_fn being a mapping
sirluk235 days ago

I added a whitening setting for test_eva_state_dict_consistency. To test prepare_activations_fn I added a test test_eva_state_dict_prepare_activations_mapping

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
303 inputs = next(iter(dataloader))
304 forward_fn(model, move_inputs_to_device(inputs, device))
305 hash_dict = {k: h[0].hashed_inputs[0] for k, h in hooks.items()}
306
equal_inputs_map = {vv: v[0] for v in find_equal_values(hash_dict).values() for vv in v[1:]}
BenjaminBossan237 days ago

I don't really understand what should happen to have equal_inputs_map, could you please add a comment?

sirluk235 days ago

Added a comment. It contains information about which layers recieve the same input activations (e.g. queries, keys and values). For those groups we only need to compute the svd once.

src/peft/tuners/lora/eva.py
90
91 if previous_components is not None:
92 components = self.svd.components_
93
if len(components.shape) == 1:
BenjaminBossan237 days ago

Could you add a short comment when this would happen?

sirluk235 days ago (edited 235 days ago)

done, its only the case during the first svd step. Would you do it differently?

sirluk update
c6a5fc52
sirluk fix tau range
413be29c
sirluk Merge branch 'main' into main
5ad5f00d
sirluk update tau tests
b1bcf02f
sirluk
sirluk235 days ago (edited 234 days ago)

@BenjaminBossan I integrated all of your suggestions in the code, many thanks (for details I left comments to the separate conversations you opened about suggested changes).

Three general questions:

  • Eva works out of the box with bitsandbytes. I included a gpu-dependent test in test_initialization with pytest.mark.skipif(not torch.cuda.is_available(), reason="test requires a GPU") to test bitsandbytes. Is that fine to have all EVA tests in one class in this script or would I need to move this test to test_common_gpu or test_gpu_examples?
  • under examples/eva_finetuning/README.md I added sections using the [[autotest]] command to automatically use the docstrings of the eva related functions. Is that fine or does that not work for the examples section?
  • To make using eva more seamless, do you think its a good idea to just have to specify init_lora_weights="eva" and if EvaConfig is not defined / none it is initialized with the defaults (i.e. EvaConfig())?

EDIT:
I added three checks to initialize_lora_eva_weights.

  1. Check that model is a PeftModel
  2. check that there is only one active adapter in the peftmodel.
  3. verify that init_lora_weights = "eva"

I also removed peft_config as an argument from the function since its anyway an attribute of every PeftModel

sirluk add validity checks to initialize_lora_eva_weights
e8908394
sirluk style
ebb0ac62
sirluk extend documentation and small fixes
81fbc284
BenjaminBossan
BenjaminBossan requested changes on 2024-10-28
BenjaminBossan232 days ago

Thanks a lot for your excellent work. Not much is missing to be able to merge this.

Eva works out of the box with bitsandbytes. I included a gpu-dependent test in test_initialization with pytest.mark.skipif(not torch.cuda.is_available(), reason="test requires a GPU") to test bitsandbytes. Is that fine to have all EVA tests in one class in this script or would I need to move this test to test_common_gpu or test_gpu_examples?

For this test to be actually run on our nightly CI with GPUs, you'd have to move it, I think test_gpu_examples.py is the right place (just create a new test class there).

under examples/eva_finetuning/README.md I added sections using the [[autotest]] command to automatically use the docstrings of the eva related functions. Is that fine or does that not work for the examples section?

Hmm, not sure what you mean, I'm not familiar with autotest. Could you please elaborate?

To make using eva more seamless, do you think its a good idea to just have to specify init_lora_weights="eva" and if EvaConfig is not defined / none it is initialized with the defaults (i.e. EvaConfig())?

I think it makes sense to do that if the default config works well enough out of the box. If we want the users to actively set the Eva parameters, it is better to make this step explicit. You can judge better than I can whether the default parameters should work well enough.

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/README.md
133
134### prepare_layer_inputs_fn
135
136
`prepare_layer_inputs_fn` can be used to preprocess the layer inputs before passing them to the SVD algorithm. `prepare_layer_inputs_fn` recieves two arguments: `inputs` and `peft_config`. It can either be a callable or a dictionary where the keys are the layer names and the values are callables. By default the following logic is used:
BenjaminBossan232 days ago
Suggested change
`prepare_layer_inputs_fn` can be used to preprocess the layer inputs before passing them to the SVD algorithm. `prepare_layer_inputs_fn` recieves two arguments: `inputs` and `peft_config`. It can either be a callable or a dictionary where the keys are the layer names and the values are callables. By default the following logic is used:
`prepare_layer_inputs_fn` can be used to preprocess the layer inputs before passing them to the SVD algorithm. `prepare_layer_inputs_fn` receives two arguments: `inputs` and `peft_config`. It can either be a callable or a dictionary where the keys are the layer names and the values are callables. By default the following logic is used:
Conversation is marked as resolved
Show resolved
examples/eva_finetuning/README.md
BenjaminBossan232 days ago🚀 1

Super good README!

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/utils.py
BenjaminBossan232 days ago

Missing copyright notice.

Conversation is marked as resolved
Show resolved
src/peft/mapping.py
163164 The revision of the base model. If this isn't set, the saved peft model will load the `main` revision for
164165 the base model
166 low_cpu_mem_usage (`bool`, `optional`, defaults to `False`):
167
Create empty adapter weights on meta device. Useful to speed up the loading process.
BenjaminBossan232 days ago

Normally, users should have no reason to pass low_cpu_mem_usage=True when calling get_peft_model, but as is, the description does not make that clear. Let's add:

"Leave this setting as False if you intend on training the model, unless the adapter weights will be replaced by different weights before training starts".

sirluk231 days ago

added this

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/eva_finetuning.py
59# setup peft config
60eva_config = EvaConfig(rho=rho)
61peft_config = LoraConfig(
62
r=rank, lora_alpha=alpha, target_modules=target_modules, init_lora_weights=True, eva_config=eva_config
BenjaminBossan232 days ago
Suggested change
r=rank, lora_alpha=alpha, target_modules=target_modules, init_lora_weights=True, eva_config=eva_config
r=rank, lora_alpha=alpha, target_modules=target_modules, init_lora_weights="eva", eva_config=eva_config
sirluk231 days ago

thanks for pointing out this error!

src/peft/tuners/lora/eva.py
533 module.update_layer(r=w.size(0), init_lora_weights=peft_config.init_lora_weights, **update_layer_kwargs)
534 module.lora_A[adapter_name].weight.copy_(w)
535 else:
536
module.update_layer(r=peft_config.r, init_lora_weights=True, **update_layer_kwargs)
537
missing_eva_inits.append(name)
538
print(name, type(module))
BenjaminBossan232 days ago

What would be a situation where this occurs? AFAICT it's not covered in tests. Also, let's remove the print statement.

sirluk231 days ago

Add a test for this test_missing_eva_inits. Since EVA at the moment does not support embedding layers for example, we are doing standard lora init if such layers have adapters.

Conversation is marked as resolved
Show resolved
src/peft/utils/incremental_pca.py
60
61 if lowrank:
62 if lowrank_q is None:
63
assert (
64
n_components is not None
65
), "n_components must be specified when using lowrank mode with lowrank_q=None."
66
lowrank_q = n_components * 2
67
assert lowrank_q >= n_components, "lowrank_q must be greater than or equal to n_components."
BenjaminBossan232 days ago

Let's raise proper errors instead of using asserts.

sirluk231 days ago

done

Conversation is marked as resolved
Show resolved
src/peft/utils/incremental_pca.py
130 return last_mean, last_variance, last_sample_count
131
132 if last_sample_count > 0:
133
assert last_mean is not None, "last_mean should not be None if last_sample_count > 0."
134
assert last_variance is not None, "last_variance should not be None if last_sample_count > 0."
BenjaminBossan232 days ago

Let's raise proper errors instead of using asserts.

sirluk improve customization options
62ae35c2
sirluk Merge pull request #3 from sirluk/simplify_entrypoints
0b316d36
sirluk update documentation
7c753b6b
sirluk update docs
404022f0
sirluk error to warning
724fd1c0
sirluk make style
d79672b1
sirluk fix type
ad646da1
sirluk
sirluk231 days ago

Thank you again for your suggestions @BenjaminBossan, I worked them all into the pull request.

  • I moved the test for bits and bytes to test_gpu_examples.py. It was necessary to duplicated some code but I guess thats fine.
  • I updated the documentation a bit. I also added a section for eva under package reference, please let me know if its okay like this. I think it would be beneficial to find info about the EVA functions in the documentation.
  • Since my last message in this pull request I also made changes to the entrypoint get_indices_fn for customizing EVA. I made it a bit more generic and easier for users to understand. I renamed the callable argument in get_eva_state_dict and initialize_lora_eva_weights to prepare_model_inputs_fn now. It can be used to modify the model inputs and have access to the result in the SVD hooks. I adjusted the tests accordingly.
  • I discussed the topic about a default EvaConfig with my coauthors. Some parameters like rho or user_label_mask have a setting that should work well for most applications but nevertheless it can be important to adjust the default settings. We were wondering if we should initialize a default EvaConfig if its not specified but raise a warning explaining that it might be important to change the parameters in some cases. In either case I also updated the docstring of EvaConfig to mention this.
HuggingFaceDocBuilderDev
HuggingFaceDocBuilderDev230 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan
BenjaminBossan requested changes on 2024-10-30
BenjaminBossan231 days ago

Thanks a lot for all the updates. The CI is a bit flaky at the moment, so don't worry about some failing test runs.

While reviewing the code, I found one major blocker that needs to be addressed before we can merge. This issue is that Eva redistributes the LoRA ranks, right? However, this is not reflected in the LoraConfig. Therefore, if a user saves the model and then tries to load it later, they'll get errors, e.g.:

size mismatch for base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight: copying a param with shape torch.Size([19, 2048]) from checkpoint, the shape in current model is torch.Size([16, 2048])

So unless I'm missing something, the solution for this would be to get the new ranks for all the LoRA weights and store them in a dict mapping the module name to the rank. This dict should be used to override the rank_pattern attribute of the LoraConfig. This should allow to successfully load the model. Could you please take a look at that and also extend the tests to cover this?

Regarding the alpha values, I'm not sure if those also need to be adjusted. E.g. if I have a LoRA rank of 16 and alpha of 32, the LoRA scaling would be 32/16. If the rank is changed to 19 but alpha stays at 32, the scaling would decrease to 32/19. Probably it's not a big deal, but I wanted to mention that.

I also added a section for eva under package reference, please let me know if its okay like this. I think it would be beneficial to find info about the EVA functions in the documentation.

Yes, that should be okay. Unfortunately, the doc preview appears to be broken at the moment, but if there is anything wrong with the rendering there, it'll be easy enough to fix later.

We were wondering if we should initialize a default EvaConfig if its not specified but raise a warning explaining that it might be important to change the parameters in some cases.

From my point of view, the current solution is good.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
387 # forward for one batch to check which layer inputs are equal to avoid unneeded svd calculations
388 forward_fn(model, inputs)
389 hash_dict = {k: h[0].hashed_inputs[0] for k, h in hooks.items()}
390
# equal input maps groups layers which recieve the same input. One layer is defined as the key and receives an svd hook. For the remaining layers the svd results can be skipped.
BenjaminBossan231 days ago

Managed to find the last one ;)

Suggested change
# equal input maps groups layers which recieve the same input. One layer is defined as the key and receives an svd hook. For the remaining layers the svd results can be skipped.
# equal input maps groups layers which receive the same input. One layer is defined as the key and receives an svd hook. For the remaining layers the svd results can be skipped.
Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
297 padding tokens and should not be used for SVD.
298 `peft.tuners.lora.eva.prepare_model_inputs_fn_language_modeling` is used by default. Any function defined
299 here expects two arguments: `model_input` and `peft_config` and should return a 2d tensor. Default logic:
300
```
BenjaminBossan231 days ago

I think pasting the whole code here is a bit too much and also makes it easy for the code to go out of sync.

sirluk230 days ago

fair point, I removed those code snippets from the docstrings

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
311 default settings works for language modeling and model_inputs is the mask used to determine which indices
312 should be used for SVD (created by `prepare_model_inputs_fn_language_modeling`). Default logic:
313 ```
314
def prepare_layer_inputs_fn_language_modeling(layer_input, model_input, layer_name) -> torch.Tensor:
BenjaminBossan230 days ago

Same argument about pasting the code here.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
518 padding tokens and should not be used for SVD.
519 `peft.tuners.lora.eva.prepare_model_inputs_fn_language_modeling` is used by default. Any function defined
520 here expects two arguments: `model_input` and `peft_config` and should return a 2d tensor. Default logic:
521
```
BenjaminBossan230 days ago

Same argument about pasting the code here.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
531 receives multiple inputs as a list, per default the first input is used. The default settings works for
532 language modeling and model_inputs is the mask used to determine which indices should be used for SVD
533 (created by `prepare_model_inputs_fn_language_modeling`). Default logic:
534
```
BenjaminBossan230 days ago

Same argument about pasting the code here.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
174 return {k: v for k, v in value_dict.items() if len(v) > 1}
175
176
177
def recursive_apply(module: torch.nn.Module, device: Union[str, torch.device]):
BenjaminBossan230 days ago

I think this function can be removed.

sirluk230 days ago

removed

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
596 f"the following layers were initialized with init_lora_weights=True because they were not found in the eva state_dict: {missing_eva_inits} "
597 f"currently the following lora modules are not supported: {UNSUPPORTED_LORA_MODULES}"
598 )
599
return model
BenjaminBossan230 days ago

IIUC, the model is fully modified in-place. So there isn't really a reason to return it here, right? If true, I would remove the return value. Otherwise, people may think that this works:

model_before_eva = ...
model_after_eva = initialize_lora_eva_weights(model, ...)
# continue using model_before_eva
sirluk230 days ago

That is true, I figured I could do the same as get_peft_model as it also modifies a model in place but also returns it. I removed the return statement now

Conversation is marked as resolved
Show resolved
examples/eva_finetuning/eva_finetuning.py
BenjaminBossan230 days ago

Nice example. I tried it locally with a smaller model and train loss with Eva enabled is consistently better than without it.

sirluk230 days ago

great to hear!

sirluk fix potential issues
51874665
sirluk
sirluk230 days ago (edited 229 days ago)

@BenjaminBossan thanks for raising the issue with saving/loading eva models, didnt think about this at first. It should be fixed now. I am updating the rank_pattern in init_lora_eva_weights.

I also found another potential issue that was not accounted for so far. It is possible that after eva initialization some adapter layers could get assigned rank 0 . In those cases we would need to remove the adapter layer again. We would also need to update target_modules in the peft config. I also added test cases for this and the above issue. To test the rank=0 case it was necessary to move some functionality out of init_lora_eva_weights into a new function load_eva_state_dict. Otherwise it would not have been possible to modify the eva_state_dict (i.e. assigning one layer rank 0) in the test script. init_lora_eva_weights now just combines the two lower level functions get_eva_state_dict and load_eva_state_dict

Regarding alpha values, it should be fine as is for now. For the experiments in our paper we did some initial tests to adjust alpha for different ranks after the redistribution but it didn't lead to an improvement in performance. I saw that the AdaLoRA implementation also did not adjust alpha for different ranks, is that right? I did however add an use_alpha_pattern option in EvaConfig now which is False by default. If this is set to True alpha values will be adjusted after redistribution to reflect the new ranks.

Regarding docs, I built it locally and everything looks fine.

sirluk add option to adjust alpha after redistribution
b6b7b7bc
sirluk Merge pull request #4 from sirluk/alpha_pattern
13ffc56b
BenjaminBossan
BenjaminBossan requested changes on 2024-10-31
BenjaminBossan229 days ago

Thanks for your continued work. The adjustments for saving and loading look good. Also, nice catch with the rank 0 issue.

I have a couple of smaller comments, but the big picture looks good.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
571591 show_progress_bar=show_progress_bar,
572592 )
573593
574 # assert all lora layers are contained in eva_state_dict
575 update_layer_kwargs = {
576 "adapter_name": adapter_name,
577 "lora_alpha": peft_config.lora_alpha,
578 "lora_dropout": peft_config.lora_dropout,
579 "use_rslora": peft_config.use_rslora,
580 "use_dora": peft_config.use_dora,
581 }
582 missing_eva_inits = []
583 for name, module in model.named_modules():
584 if not isinstance(module, LoraLayer):
585 continue
586 if name in eva_state_dict:
587 w = eva_state_dict.pop(name)
588 module.update_layer(r=w.size(0), init_lora_weights=peft_config.init_lora_weights, **update_layer_kwargs)
589 module.lora_A[adapter_name].weight.copy_(w)
590 else:
591 module.update_layer(r=peft_config.r, init_lora_weights=True, **update_layer_kwargs)
592 missing_eva_inits.append(name)
593
594 if missing_eva_inits:
595 warnings.warn(
596 f"the following layers were initialized with init_lora_weights=True because they were not found in the eva state_dict: {missing_eva_inits} "
597 f"currently the following lora modules are not supported: {UNSUPPORTED_LORA_MODULES}"
598 )
599 return model
594
load_eva_state_dict(model, eva_state_dict, adapter_name, _check_valid=False)
BenjaminBossan229 days ago

Could you please explain why we don't check validity here? A short comment would be nice to have. Tbh I would like to get rid of the parameter, not a fan of "private" arguments.

sirluk229 days ago (edited 229 days ago)

I did it this way because when call init_lora_eva_weights we would otherwise check validity twice (_check_valid is called at the beggining of init_lora_eva_weights and then again inside load_eva_state_dict. For that reason I defined the private argument but I am also not super keen on keeping it this way. I changed it now so that there is a private function _load_eva_state_dict that doesnt call _check_valid which is used by load_eva_state_dict and init_lora_eva_weights. So this private argument is gone now

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
527
528 if missing_eva_inits:
529 warnings.warn(
530
f"the following layers were initialized with init_lora_weights=True because they were not found in the eva state_dict: {missing_eva_inits} "
BenjaminBossan229 days ago

Let's break this line as it exceeds 120 chars.

sirluk226 days ago

done

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1750 # initialized model can be saved and loaded correctly.
1751 @pytest.mark.parametrize("has_rank_zero", [True, False])
1752 def test_load_eva_state_dict(self, model, dataset, peft_config, has_rank_zero):
1753
tmp_dirname = tempfile.mkdtemp()
BenjaminBossan229 days ago

Let's use the tmp_path fixture of pytest. Same in the test below.

sirluk226 days ago

done

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1759 )
1760 peft_model = get_peft_model(deepcopy(model), peft_config)
1761 sd = get_eva_state_dict(model, peft_config, dataloader)
1762
if has_rank_zero:
1763
k = "transformer.h.0.attn.c_attn"
1764
sd[k] = sd[k][:0]
BenjaminBossan229 days ago

I have a bit of trouble understanding this. It seems like this code sets the size of the weight of this layer to 0. Is this equivalent to having a LoRA adapter with rank 0? Or does "rank" here refer to something else?

sirluk229 days ago

thats correct, it simulates how a rank zero layer in the state dict would look like, a tensor with size 0 in the first dimension

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1765 load_eva_state_dict(peft_model, sd)
1766 # test that models that initialized `load_eva_state_dict` can be saved and loaded correctly
1767 peft_model.save_pretrained(tmp_dirname)
1768
peft_model = PeftModel.from_pretrained(model, tmp_dirname, torch_device=self.DEVICE, low_cpu_mem_usage=True)
BenjaminBossan229 days ago

Let's add a line to call forward on the model, just to be extra safe that it was loaded correctly. Same in the test below.

sirluk226 days ago

done

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
17421746 f"is not greater than {self.COSINE_SIMILARITY_THRESHOLD}"
17431747 )
17441748
1749
# Test that the `get_eva_state_dict` function can be used to initialize a model with EVA weights and that the
BenjaminBossan229 days ago

Let's move these descriptions inside of the functions.

sirluk226 days ago

done

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
505 new_rank = w.size(0)
506 if new_rank == 0:
507 parent, _, target_name = _get_submodules(model, name)
508
setattr(parent, target_name, module.base_layer)
BenjaminBossan229 days ago
Suggested change
setattr(parent, target_name, module.base_layer)
setattr(parent, target_name, module.get_base_layer())
Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
512 rank_pattern[pattern_key] = new_rank
513 new_alpha = peft_config.lora_alpha
514 if peft_config.eva_config.use_alpha_pattern:
515
new_alpha = peft_config.lora_alpha * new_rank / peft_config.r
BenjaminBossan229 days ago

IIUC, there is an edge case where this is incorrect. Users can pass their own rank_pattern, in which case the original rank would not necessarily be the peft_config.r.

sirluk226 days ago (edited 226 days ago)

Thanks for pointing this out. I fixed it now so that the function checks if there is an entry in rank_pattern for the layer

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
518 module.lora_A[adapter_name].weight.copy_(w)
519 new_target_modules.append(name)
520 else:
521
module.update_layer(
BenjaminBossan229 days ago

For my understanding, is this needed because the module's weight could be on meta device? If yes, can we check that first and only call upate_layer if needed?

sirluk228 days ago

it is needed because if we do not have an eva state dict entry for that layer we would like to use standard lora initialization for that layer.

BenjaminBossan225 days ago

And that is only required if the layer's parameters are on meta device, otherwise they are already properly initialized, right?

sirluk225 days ago (edited 225 days ago)

Actually when get_peft_model is called all adapters would be initialized with init_lora_weights="eva" which would only initialized lora B appropriately with zeros but not lora A. So if we dont have an entry in lora_state_dict to load, its best to just call update_layer again with standard lora init.

def update_layer(...):
        ...
        elif init_lora_weights == "eva":
            nn.init.zeros_(self.lora_B[adapter_name].weight)
BenjaminBossan224 days ago

Ah yes, I forget about that part, now it makes sense.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
526 # update target modules if some lora layers have been removed due to their EVA rank being 0
527 new_target_modules = new_target_modules + missing_eva_inits
528 other_module_names = [n for n, _ in model.named_modules() if n not in new_target_modules]
529
new_target_modules = _find_minimal_target_modules(new_target_modules, other_module_names)
BenjaminBossan229 days ago

Nice idea. But let's only "compress" the target modules if len(target_modules) >= MIN_TARGET_MODULES_FOR_OPTIMIZATION

sirluk226 days ago

done

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/config.py
95 If use_label_mask=True the value to look for to mask out ignored tokens. Default is -100.
96 whiten (`bool`): Apply whitening to singular vectors. Default is False.
97 Whitening has been shown to be beneficial for EVA in the vision domain.
98
use_alpha_pattern (`bool`):
BenjaminBossan229 days ago

Thanks for adding this. I think we can find a better name here, since the fact that this uses alpha_pattern under the hood is not really relevant for the users. How about: "adjust_alpha" or "adjust_scaling".

sirluk226 days ago

done. We also did some checks and it seems in the language modeling setting, adjusting alpha seems to work a bit better. Since I would assume a large share of users use peft for language model tasks I set the default now to True

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
534
535 # when use_alpha_pattern is True, lora scaling factors have been adjusted after the rank redistribution
536 if peft_config.eva_config.use_alpha_pattern:
537
model.peft_config[adapter_name].alpha_pattern = alpha_pattern
BenjaminBossan229 days ago

Would this be safe to override (instead of updating) even if the user has provided their own alpha_pattern, right? I think so, just wondering out loud.

sirluk226 days ago

I would say it is safe now to override as we are incorporating alpha values from alpha_pattern into the new alpha_pattern dict.

Conversation is marked as resolved
Show resolved
tests/test_initialization.py
1716 assert mean_cosine_similarity > self.COSINE_SIMILARITY_THRESHOLD, (
1717 f"Mean absolute cosine similarity {mean_cosine_similarity:.4f} "
1718 f"is not greater than {self.COSINE_SIMILARITY_THRESHOLD}"
1719
)
BenjaminBossan229 days ago

When using alpha_pattern=True, can we add an extra check? I think we should be able to collect the scaling attributes of the LoRA layers before and after adjusting, and they should be the same, right? Maybe this could be a separate test.

sirluk226 days ago

added test_eva_state_dict_adjust_scaling_factors

sirluk Update src/peft/tuners/lora/eva.py
17f5bf14
sirluk fix edge cases
1f03a954
sirluk Merge pull request #5 from sirluk/alpha_pattern
f74b0447
sirluk Merge branch 'main' into main
12d497f9
sirluk account for layer pattern
e6099695
sirluk split up print
7e4505f4
sirluk
sirluk226 days ago

@BenjaminBossan thanks for your thoroughness, there were still some edgecases I didnt consider.

Besides the things you pointed out I updated some things additionally (sorry that there are still so many changes in the PR, but I keep discovering potential issues for users or ways to make the api simpler for users)

  • Fixed an issue in get_eva_state_dict if LoraConfig contains a rank pattern. For these cases max_components can be different for different layers so this variable is now a dict with keys being the adapter module names.
  • removed function load_eva_state_dict and integrated functionality into initialize_lora_eva_weights. This function now accepts an optional eva_state_dict argument.

One question: When loading the eva_state_dict I need to get the module name in the base model based off the module name in the peft model. I this a valid way to do that:

name_in_base_model = name.replace("base_model.model.", "")
sirluk fix rank_budget in case of rank_pattern
bec670fb
BenjaminBossan
BenjaminBossan requested changes on 2024-11-04
BenjaminBossan225 days ago

Thanks for your continued work on this PR, this is really excellent work.

Regarding your latest changes, they make sense to me, thanks for explaining. I did another check and flagged some smaller issues, mainly around making this approach more flexible for custom models. Otherwise, this PR is good to be merged.

One question: When loading the eva_state_dict I need to get the module name in the base model based off the module name in the peft model. I this a valid way to do that:

Yes, I think the way it's implemented is good. For LoRA, we will always have this prefix.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
35from .layer import Embedding, LoraLayer, _ConvNd
36
37
38
UNSUPPORTED_LORA_MODULES = (Embedding, _ConvNd)
BenjaminBossan225 days ago

For my understanding, it is better to define the unsupported LoRA modules than the supported LoRA modules because Eva would generally work with newly added quantization methods, which typically work on linear layers? I'm mainly wondering whether it is less error prone to define the supported LoRA modules instead. At the end of the day, if an unsupported LoRA module slips through, this is probably leading to an error and the check is mostly for a better error message, right?

sirluk225 days ago

fair point, the only advantage is having a better error message, I can invert this to SUPPORTED_LORA_MODULES

sirluk225 days ago

just was reminded during testing that the actual reason why I did it this way is to automatically include bnb linear layers. Otherwise I would need to do a conditional import I guess if bnb is available. Would you still say its better to do it this way?

BenjaminBossan225 days ago

No, that's why I mentioned

Eva would generally work with newly added quantization methods

For me, this was more to verify the reasoning here When new LoRA layers are added that are not supported here, we'll most likely forget to update this constant, but it's not the end of the world. However, I think we can make it a bit more likely no to miss this if we move UNSUPPORTED_LORA_MODULES to utils/constants.py and rename it to UNSUPPORTED_EVA_LORA_MODULES or something like this.

sirluk225 days ago

moving this to constants.py doesn't work unfortunately due to some circular import issue with PeftConfig

BenjaminBossan224 days ago

Hmm, any idea how that can be? In the line above, we already import from peft.utils.constants, so it's unclear to me how a circular import code arise here.

sirluk224 days ago

It has something to do with me now needing to import from lora/layer.py. I think the circular dependency goes something like this

  • in constants.py we now import from lora/layer.py
  • lora/layer.py imports from lora/config.py
  • lora/config.py imports from peft/config.py
  • peft/config.py imports from utils/other.py
  • utils/other.py imports from constants.py
BenjaminBossan222 days ago

Ah yes, that makes sense. Then I wonder if we can define the constant inside of lora/__init__.py. Just somewhere more visible than here :)

sirluk222 days ago

hmm unfortunately this does not fix the issue. Putting the variable in lora/__init__.py still raises an error.

BenjaminBossan220 days ago

All right, then let's just keep it as is.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
659 raise ValueError("`initialize_lora_eva_weights` can only be used with `init_lora_weights='eva'`")
660
661 # compute svd
662
if eva_state_dict is None:
BenjaminBossan225 days ago

Let's add a check here that dataloader is not None. Otherwise, I think we'll hit

https://github.com/huggingface/peft/pull/2142/files#diff-36e504fe6dcda50b98764f8961b977a65a976f187bc391385299fba1fc07704bR566-R567

which can be confusing to the user.

sirluk225 days ago

But for init_lora_eva_weights the dataloader can actually be None if state dict is provided. I added a None check to _get_eva_state_dict where the dataloader is actually needed

BenjaminBossan225 days ago

But for init_lora_eva_weights the dataloader can actually be None if state dict is provided.

Well, here we already checked that eva_state_dict is None, so that would be covered.

I added a None check to _get_eva_state_dict where the dataloader is actually needed

That's also fine, then the type annotation should be updated to dataloader: Optional[Iterable]

sirluk225 days ago

sorry, which type annotation do you mean, I saw its not updated in the docstring of init_lora_eva_weights and added it there.

In get_eva_state_dict and _get_eva_state_dict its not really optional as it will always raise an error.

BenjaminBossan224 days ago

IMO it is optional because the function accepts and expects that None could be passed. I would expect a type checker to complain otherwise, as the get_eva_state_dict call here passes dataloader, which according to this function could be None.

It's no big deal overall, I personally don't use type checkers on this code base, but I know that some users complain sometimes because their IDE warns about the wrong type being used.

sirluk224 days ago👍 1

Okay that makes sense, I moved to None check to init_lora_eva_weights which was your original suggestion anyway. So in the other functions the dataloader argument would not be optional anymore now.

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
200 """
201 Move the inputs to the specified device. Adapted from hf.Trainer.
202 """
203
if isinstance(inputs, Mapping):
BenjaminBossan225 days ago

I think we can add a small optimization here: If we first check if the input has a .to method, we can call that (and remove the tensor check below). This is because some HF inputs, like those coming from tokenizers, are a mapping but they also have a .to method, which we can make use of.

sirluk225 days ago

ahh interesting, I actually took this logic from the transformers.Trainer class. I will update the checks

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
207 elif isinstance(inputs, torch.Tensor):
208 return inputs.to(device)
209 else:
210
raise ValueError(f"unsupported input type: {type(inputs)}")
BenjaminBossan225 days ago

I wonder if we really need to raise here. Perhaps the model somehow takes care of moving to the correct device. I think a warning is sufficient here.

sirluk225 days ago

sounds good

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
192 """
193 devices = list({p.device for p in model.parameters() if p.device.type != "meta"})
194 if len(devices) > 1:
195
raise ValueError("model has multiple devices")
BenjaminBossan225 days ago

I wonder if we should really raise an error here. Maybe the user has some kind of custom model that can correctly deal with the devices, and raising here would only stop them unnecessarily. How about giving a warning and skipping the move_inputs_to_device(inputs, device) if device could not be determined?

sirluk225 days ago

sounds good

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
443 "use_rslora": peft_config.use_rslora,
444 "use_dora": peft_config.use_dora,
445 }
446
# rank_pattern_keys() = list(chain(peft_config.rank_pattern.keys(), peft_config.alpha_pattern.keys()))
BenjaminBossan225 days ago👍 1

Can this be removed?

Conversation is marked as resolved
Show resolved
src/peft/tuners/lora/eva.py
179 return {k: v for k, v in value_dict.items() if len(v) > 1}
180
181
182
def get_target_name_key(name, pattern_keys):
BenjaminBossan225 days ago

This can be replaced by get_pattern_key if #2195 is merged first, right? Let's wait with this PR for the other one then.

sirluk225 days ago

yes correct!

sirluk Merge branch 'huggingface:main' into main
b13d0144
sirluk small fixes
b24500c6
sirluk missing return statement
3119dc42
sirluk
sirluk225 days ago

@BenjaminBossan, thanks integrated you suggestions and left some comments to the open threads:

  • when calling update_layer is needed
  • unsupported or supported lora modules
  • None check for dataloader
BenjaminBossan
BenjaminBossan commented on 2024-11-04
BenjaminBossan225 days ago

Thanks for the updates, I think we're almost good to go. Just 2 small comments from my side.

when calling update_layer is needed

Sorry, I can't find the comment, could you please paste it again? Just to summarize, my main concern is that we may call module.update_layer even if it's not necessary, namely when the module was not initialized with low_cpu_mem_usage=True (i.e. when it is an real device, not meta, we would not need to initialize it).

sirluk
sirluk225 days ago

@BenjaminBossan this is the dicussion about update_layer.

For the other two open conversation I added a comment directly to the conversation.

sirluk sirluk changed the title New LoRA Initialization Method: Explained Variance Adaptation [FEAT] New LoRA Initialization Method: Explained Variance Adaptation 225 days ago
BenjaminBossan
BenjaminBossan commented on 2024-11-05
BenjaminBossan224 days ago

I replied to your latest comments. Also, there is a small merge conflict, could you please take care? Thanks.

sirluk Merge branch 'main' into main
ec408974
sirluk move dataloader none check
05f99ba6
sirluk adjust default value
c628afe5
sirluk update test threshold
69896629
sirluk
sirluk224 days ago

Great, thanks for your feedback, I made some small changes, I think the only open issue is the potential circular import constants.py.

Small note: I changed the default value of rho back to 1.0 as this works well in most settings.

sirluk update docs
5982139b
BenjaminBossan
BenjaminBossan220 days ago

@sirluk Some Eva tests are failing:

https://github.com/huggingface/peft/actions/runs/11698806717/job/32754904346?pr=2142#step:6:1195

Is it possible that you tested this with a GPU and there it's fast enough, but on the CPU CI runners, it's too slow? Maybe we should remove the timing assertions, as they tend to be flaky?

sirluk remove speed test and update docs
b5c6b8fe
sirluk
sirluk220 days ago

Thanks for pointing this out, I was indeed testing on a gpu machine so it was automatically running everything on gpu. I removed the speed test now.

sirluk fix typo
2c6fb37e
BenjaminBossan
BenjaminBossan approved these changes on 2024-11-12
BenjaminBossan217 days ago

Thanks a lot for ironing out the last kinks. The PR now LGTM. Really excellent work, thanks for that and for your patience.

I hope that many users will give it a try, as it looks quite promising.

BenjaminBossan BenjaminBossan merged 221965b7 into main 217 days ago
sirluk
sirluk217 days ago

Great, thanks for your great feedback throughout the process!

julien-c
julien-c217 days ago❤ 3

Great PR!

BenjaminBossan
BenjaminBossan196 days ago

@sirluk I just wanted to make you aware of PR #2231 which introduces a (at least superficially) similar method to EVA, which also benefited from our discussion here to improve the API. The PR includes a small table that summarizes a couple of methods including EVA, maybe you can double check that it's accurate.

sirluk
sirluk196 days ago

@BenjaminBossan, thanks for making us aware of this recent PR. It is indeed very relevant and interesting to us as well. I'm glad our discussion benefits the integration of other data-driven methods into PEFT. We believe the table is not entirely correct with respect to EVA. From their paper, "Knowledge Maintenance mode" in CorDA simply means using the smallest components for initialization, rather than the largest ones. This is necessary for methods like CorDA and PiSSA because they modify the base model weights, meaning the base model cannot be restored after fine-tuning. We intentionally followed the design of LoRA and left the base model untouched as this allows restoring the base model's weights and therefore full knowledge preservation. We just recently added a discussion paragraph on that in the updated version of our paper which will be on arXiv soon.
(tagging @paischer101 as he is the other joint first-author of EVA)

BenjaminBossan
BenjaminBossan187 days ago

@sirluk Thanks for the reply. Somehow I didn't get a notification for your message or missed it, which is why I didn't respond earlier.

The CorDA PR is almost finished but there is still an opportunity to ask the authors to correct the table. From your comment, it was not quite clear to me what exactly was not entirely correct. Could you perhaps post directly on that PR and mention what could be improved?

paischer101
paischer101187 days ago

@BenjaminBossan thanks for the response. Since EVA does not alter the base model weights, it is able to recover the base model and therefore allows full knowledge maintenance, i.e. the last column in the table should be a "yes" for EVA as well. I will make a post on the PR to clarify.

sirluk
sirluk185 days ago

Hey @BenjaminBossan , just wanted to let you know there is a mistake in the latest peft release notes describing EVA. It says that we use "SVD of the base layer weights" which is not correct. EVA is a data driven method that performs incremental SVD on minibatches of finetuning data. The result is used to initilize LoRA and perform an adaptive rank redistribution. Could you maybe change that so that potential users are not confused?

githubnemo
githubnemo180 days ago👍 1

@sirluk Thanks for notifying us. I changed the release text accordingly.

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone