DeepSpeed
7af3a4be - add zero3 ```module_granularity_threshold ``` to zero optimization. (#6649)

Commit
1 year ago
add zero3 ```module_granularity_threshold ``` to zero optimization. (#6649) This PR adds Z3 coalesced fetch to zero optimization. Currently, some logic can be reused, but it's difficult to realize that as optimization choice(I only discovered these logic when trying to implement it). The benefit of this approach is reducing host overhead(reduce many hooks) and during the process of recursive fetching parameters (especially in fine-grained models, such as those with a large number of moe experts). This is particularly helpful for host-sensitive devices (such as hpu), where it achieved a 40% performance improvement in our customer workloads. FYI @delock @deepcharm --------- Co-authored-by: Ma, Guokai <guokai.ma@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading