DeepSpeed
0b25630a - Add arctic model support by adding w2 to all_reduce (#6856)

Commit

1 year ago

Add arctic model support by adding w2 to all_reduce (#6856) As title says. Default behavior of arctic model produces shape issues with AutoTP due to the MLP layer performing `w2 * act(w1*w3)`. However, method provided to fix Mixtral-7x8b in #5257 does not work since the MLP for Arctic is also used within a ModuleList for the MoE. This results in MLP weights hiding behind individual experts as layers `#.w#`, which is not caught by the fix in #5257. This adds the check directly within replace, where it can check for actual layer names for the `w2` key in the model to patch with `all_reduce`. --------- Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

References

#6856 - Add arctic model support by adding w2 to all_reduce

Author

pi314ever

Parents

4cd1d974

DeepSpeed 0b25630a - Add arctic model support by adding w2 to all_reduce (#6856)

DeepSpeed
0b25630a - Add arctic model support by adding w2 to all_reduce (#6856)