transformers
3af2eb7a - Fix PEFT x MoEs (#43261)

Commit
93 days ago
Fix PEFT x MoEs (#43261) * current changes * finally! * collection is giid * what kinda works * nit * fix name * small nits * introduce loading info and config? * try to remove some duplication * trying to simplify its really not that hard is it? * nit * is this better? * update * fix * better? * small fix * force change lora * push * up * replace gate_up_ * push * Updated Ben (#43319) * Getting closer (#43327) It was necessary to flatten the LoRA weights for 3d MoE, as LoRA always expected 2d weights (being nn.Linear). * style * bring back eval() * nits * Revert "bring back eval()" This reverts commit bcee589f720ad6ab43b0cda1eccf6c406427fd29. * fix quantizer * fix * fix key mapping not recognized * fix kwargs shinannigans * fix more kwargs passing * up * fix `use_safetensors=False` call? * nits? * properly pass use_safetensors=False * fix * style * defaut factory * style * simplify * fix custom adapter_state_dict * small updates * nit * style * Fix mixtral loading * rank needed to be set to 2*r for concatenated gate up projection parameter so that PEFT allocates 2*r and matches the converted weights (using rank_pattern) * the weights needed to be transposed to match the counter parts * MoE in PEFT assumes (experts, in, out) but Mixtral MoE is transposed so we need to patch this assumption in PEFT for now * Make style * Fix error messages * hardcode checking if .bin works * fix another test * fix regex renaming patterns * nits * help debug tests * style * Patch `update_layer` instead of `_get_in_out_features` The latter does not exist in released PEFT versions and therefore is not an ideal target for this PR :) * Handle Qwen2 conversion similarly to mixtral * updates, explicit, simplify * style * nit * fix `httpx.LocalProtocolError: Illegal header value b'unknown/None; hf_hub/1.3.2; python/3.13.2; torch/2.9.1; transformers/5.0.0.dev0;` * some of the last nits --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> Co-authored-by: nemo <git@ningu.net>
Author
Parents
Loading