DeepSpeed
4fc2c8e7 - Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608)

Commit

1 year ago

Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608) * Adapt to Llama when using meta tensor to load * Fix gated mlp parameter mp * Re-enable meta tensor for kernel injection Fix layer params loading in meta tensor * Revert mlp_inter_mp for gated mlp as it is fixed * Monkey patch for fixing llama output * Fix formatting * Add comment --------- Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>

References

#3608 - Fix llama meta tensor loading in AutoTP and kernel injected inference

Author

zeyugao

Parents

463dea27

Files1

deepspeed/module_inject/containers
- llama.py

DeepSpeed 4fc2c8e7 - Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608)

DeepSpeed
4fc2c8e7 - Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608)