Extend save_pretrained to offloaded models (#27412)
* added hidden subset
* debugged hidden subset contrastive search
* added contrastive search compression
* debugged compressed contrastive search
* memory reduction for contrastive search
* debugged mem red
* added low memory option feature
* debugged mem optmimization output stack
* debugged mem optmimization output stack
* debugged low mem
* added low mem cache
* fixed 2047 tensor view
* debugged 2042 past key val inputs
* reformatted tensors
* changed low mem output
* final clean
* removed subset hidden csearch
* fixed hidden device
* fixed hidden device
* changed compressor dtype
* removed hstate compression
* integrated csearch in generate
* test csearch integration into generation
exit()
* fixed csearch kwarg integration with generation
* final wrap and added doc
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* added debug print
* direct hstate cat
* direct hstate cat
* direct hstate cat debug
* direct hstate cat debug
* expanded full hidden state stack
* expanded full hidden state stack
* matched dims for hstates
* matched dims for hstates
* logits fix
* equality test
* equality hidden debug
* debug
* added prints for debug
* added prints for debug
* equality check
* switched squeeze dim
* input format debug
* tracing top_k_ids
* removed trace
* added test context
* added jitter
* added jitter
* added jitter
* returned state
* rebuilt past key value reconstruction
* debugged
* cleaned traces
* added selection for pkv
* changed output to dict
* cleaned
* cleaned
* cleaned up contrastive search test
* moved low_memory kwarg
* debugged
* changed low mem test batch size to 1
* removed output
* debugged test input shape
* reformatted csearch test
* added trace
* removed unsqueeze on final forward pass
* replaced unsqueeze with view
* removed traces
* cleaned
* debugged model kwargs
* removed special models from test
* ran make quality
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* refactored
* refactored
* refactored
* make fixup
* renamed flag sequential
* renamed flag sequential
* iterative onloading
* black style and test utils
* added traces for integrated test
* debugged
* added traces
* make style
* removed traces, make style
* included suggestions and added test
* debugged test
* added offload module check and make style
* is_accelerate_available and make style
* added test decorator
* changed test model and config spec
* added offload condition
* added lazy loading for each shard
* debugged
* modified sharding
* debugged
* added traces
* removed safe serialization
* no index overload;
* trace on safe save ptrs
* added ptr condition
* debugged
* debugged ptr
* moved module map init
* remake shard only for offloaded modules
* refactored
* debugged
* refactored
* debugged
* cleaned and make style
* cleaned and make style
* added trace
* sparse module map
* debugged
* removed module map conditional
* refactored
* debug
* debugged
* added traces
* added shard mem trace
* added shard mem trace
* removed underlying storage check
* refactored
* memory leak removal and make style
* cleaned
* swapped test decs and make style
* added mem checks and make style
* added free mem warning
* implemented some suggestions
* moved onloading to accelerate
* refactored for accelerate integration
* cleaned test
* make style
* debugged offload map name
* cleaned and make style
* replaced meta device check for sharding
* cleaned and make style
* implemented some suggestions
* more suggestions
* update warning
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* more suggestions
* make style
* new make style
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>