Fix PEFT x MoEs (#43261)
* current changes
* finally!
* collection is giid
* what kinda works
* nit
* fix name
* small nits
* introduce loading info and config?
* try to remove some duplication
* trying to simplify its really not that hard is it?
* nit
* is this better?
* update
* fix
* better?
* small fix
* force change lora
* push
* up
* replace gate_up_
* push
* Updated Ben (#43319)
* Getting closer (#43327)
It was necessary to flatten the LoRA weights for 3d MoE, as LoRA always
expected 2d weights (being nn.Linear).
* style
* bring back eval()
* nits
* Revert "bring back eval()"
This reverts commit bcee589f720ad6ab43b0cda1eccf6c406427fd29.
* fix quantizer
* fix
* fix key mapping not recognized
* fix kwargs shinannigans
* fix more kwargs passing
* up
* fix `use_safetensors=False` call?
* nits?
* properly pass use_safetensors=False
* fix
* style
* defaut factory
* style
* simplify
* fix custom adapter_state_dict
* small updates
* nit
* style
* Fix mixtral loading
* rank needed to be set to 2*r for concatenated gate up projection
parameter so that PEFT allocates 2*r and matches the converted
weights (using rank_pattern)
* the weights needed to be transposed to match the counter parts
* MoE in PEFT assumes (experts, in, out) but Mixtral MoE is transposed
so we need to patch this assumption in PEFT for now
* Make style
* Fix error messages
* hardcode checking if .bin works
* fix another test
* fix regex renaming patterns
* nits
* help debug tests
* style
* Patch `update_layer` instead of `_get_in_out_features`
The latter does not exist in released PEFT versions and
therefore is not an ideal target for this PR :)
* Handle Qwen2 conversion similarly to mixtral
* updates, explicit, simplify
* style
* nit
* fix `httpx.LocalProtocolError: Illegal header value b'unknown/None; hf_hub/1.3.2; python/3.13.2; torch/2.9.1; transformers/5.0.0.dev0;`
* some of the last nits
---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: nemo <git@ningu.net>