transformers
f5722a54 - [DeepSpeed] add weight_mapping to _load_state_dict_into_zero3_model (#43303)

Commit
3 days ago
[DeepSpeed] add weight_mapping to _load_state_dict_into_zero3_model (#43303) * add weight_mapping to _load_state_dict_into_zero3_model * inital test * simpler * preserve metadata * do not call rename_source_key twice * raise error instead of warning * Free memory by clearing source tensors * formatting * comment about deepspeed partitioning * error out if trying to use deepspeed + DEEPSPEED_TP * suported soon * use LoadStateDictConfig * Update src/transformers/integrations/deepspeed.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/deepspeed.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/deepspeed.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/deepspeed.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/deepspeed.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/deepspeed.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * iterate by key * fix formatting * free memory as tensors are processed * create a meta dict for matching --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Author
Parents
Loading