DeepSpeed
3110c388 - params partition for skip_init (#4722)

Commit
1 year ago
params partition for skip_init (#4722) Some models use ```skip_init``` to initialize weights. ```skip_init``` first initializes on a meta device in ```__init__``` of a module and then uses ```to_empty()```. This conflicts with the deepspeed hook ```module.__init__``` mechanism. it's necessary to wait for ```skip_init``` to finish before executing ```_post_init_method```. However, the ```from ... import skip_init``` behavior typically occurs outside the context, there seems to be no good way to directly hook into ```skip_init```. Hence, the approach here is to delay the execution of ```_post_init_method``` to resolve this issue. Known affected models include HuggingFace models like chatglm2 and chatglm3." --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Author
Parents
Loading