[saving] Simplify general logic (#42766)
* parallelize and cleanup
* simplify offloading
* fix
* oupsi
* add env variable to deactivate
* revert threading -> safetensors does not release the GIL
* comment
* create helper
* move it to accelerate integration