[DeepSpeed] add weight_mapping to _load_state_dict_into_zero3_model (#43303)
* add weight_mapping to _load_state_dict_into_zero3_model
* inital test
* simpler
* preserve metadata
* do not call rename_source_key twice
* raise error instead of warning
* Free memory by clearing source tensors
* formatting
* comment about deepspeed partitioning
* error out if trying to use deepspeed + DEEPSPEED_TP
* suported soon
* use LoadStateDictConfig
* Update src/transformers/integrations/deepspeed.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/deepspeed.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/deepspeed.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/deepspeed.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/deepspeed.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/deepspeed.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* iterate by key
* fix formatting
* free memory as tensors are processed
* create a meta dict for matching
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>