DeepSpeed
4fc2c8e7 - Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608)

Comment changes are shownComment changes are hidden
Commit
1 year ago
Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608) * Adapt to Llama when using meta tensor to load * Fix gated mlp parameter mp * Re-enable meta tensor for kernel injection Fix layer params loading in meta tensor * Revert mlp_inter_mp for gated mlp as it is fixed * Monkey patch for fixing llama output * Fix formatting * Add comment --------- Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
Author
Parents
  • deepspeed/module_inject/containers
    • File
      llama.py