DeepSpeed
bf16+pipeline parallelism
#1801
Merged

bf16+pipeline parallelism #1801

tjruwase merged 52 commits into master from olruwase/bf16-updates
tjruwase
jeffra bf16 updates
fb0dc00f
tjruwase Got bf16 working
6eb4f1fa
tjruwase fp32 reduction; flattened tensors
a3d3576e
tjruwase bf16+zero_stage_1 first cut
6f5ebc37
tjruwase finish zero_stage 1 sharding
819abe2a
tjruwase Matching fp16 with debugging codes
e48035b7
tjruwase Matching loss with fp16
82450539
tjruwase Fix gradient clipping
15293139
tjruwase bf16 gradient clipping fix
27e5b956
tjruwase tjruwase requested a review from ShadenSmith ShadenSmith 3 years ago
tjruwase tjruwase requested a review from jeffra jeffra 3 years ago
tjruwase tjruwase requested a review from samyam samyam 3 years ago
tjruwase tjruwase requested a review from conglongli conglongli 3 years ago
tjruwase tjruwase requested a review from awan-10 awan-10 3 years ago
tjruwase tjruwase requested a review from cli99 cli99 3 years ago
tjruwase tjruwase requested a review from eltonzheng eltonzheng 3 years ago
tjruwase tjruwase requested a review from minjiaz minjiaz 3 years ago
tjruwase tjruwase requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 3 years ago
tjruwase
tjruwase tjruwase force pushed from ed26ef43 to 27e5b956 3 years ago
tjruwase Unscale grad norm
f4977024
tjruwase Fix grad norm scaling
0ad7c7d3
tjruwase Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa
b81d862f
tjruwase Fix clip_grad key error
35ea3808
tjruwase Reduce tied weight gradients
37011a92
tjruwase Rebase with master
8fbd4bfd
tjruwase Fix grad norm for moe
61d51fd6
tjruwase tjruwase removed review request from conglongli conglongli 3 years ago
tjruwase tjruwase removed review request from awan-10 awan-10 3 years ago
tjruwase tjruwase removed review request from cli99 cli99 3 years ago
tjruwase tjruwase removed review request from minjiaz minjiaz 3 years ago
tjruwase tjruwase removed review request from RezaYazdaniAminabadi RezaYazdaniAminabadi 3 years ago
tjruwase tjruwase requested a review from duli2012 duli2012 3 years ago
jeffra Merge branch 'master' into olruwase/bf16-updates
3ee61cdb
jeffra Merge branch 'master' into olruwase/bf16-updates
46cc2ce3
tjruwase Reduce specified gradients
de3616ca
tjruwase Merge branch 'olruwase/reduce_specified_gradients' of github.com:micr…
89e054d8
tjruwase Use O(n) instead of O(n^2)
ab61edb0
tjruwase Remove optimizer restriction for bf16
b7d64fd7
tjruwase Link bf16 & fp32 params
19198688
tjruwase Clip gradients of last stage tied weights
77b649d1
tjruwase Merge branch 'master' into olruwase/bf16-updates
4a505ecd
tjruwase Merge branch 'master' into olruwase/bf16-updates
ff99cb25
jeffra
jeffra commented on 2022-03-15
jeffra Merge branch 'master' into olruwase/bf16-updates
20fdba35
jeffra
ShadenSmith
ShadenSmith approved these changes on 2022-03-15
tjruwase Merge branch 'master' into olruwase/bf16-updates
86fa437d
jeffra Merge branch 'master' into olruwase/bf16-updates
71499a8a
jeffra Merge branch 'master' into olruwase/bf16-updates
7e7fa60b
tjruwase Simplify tied weights reduction logic
2aa612a6
tjruwase Merge branch 'master' into olruwase/bf16-updates
2cd21f15
tjruwase Merge branch 'olruwase/bf16-updates' of github.com:microsoft/DeepSpee…
a4cbf0c1
tjruwase Merge branch 'master' into olruwase/bf16-updates
67ea260f
tjruwase Merge branch 'master' into olruwase/bf16-updates
6a4d6e67
tjruwase Merge branch 'master' into olruwase/bf16-updates
4e1dcfd1
tjruwase Also clip all tp rank parameters
e24814a1
tjruwase Merge branch 'olruwase/bf16-updates' of github.com:microsoft/DeepSpee…
88cdf61c
tjruwase lp to hp mapping
20697bc4
tjruwase Link lp/hp/optim state; Refresh links after checkpoint load
4e8f7fff
tjruwase Merge branch 'master' into olruwase/bf16-updates
52a2f109
tjruwase Merge branch 'olruwase/bf16-updates' of github.com:microsoft/DeepSpee…
3ed57035
tjruwase Remove debug print
5481b864
tjruwase Remove debug print
d911e672
thomasw21
thomasw21 commented on 2022-03-29
tjruwase Simplify zero_grad logic
144f6527
tjruwase fp32 accessors
bb70816f
tjruwase Merge branch 'master' into olruwase/bf16-updates
89b4b3f1
tjruwase Merge branch 'master' into olruwase/bf16-updates
a9bfaee9
thomasw21
thomasw21 commented on 2022-03-31
tjruwase Fix update bug
fa4ff11d
tjruwase Merge branch 'olruwase/bf16-updates' of github.com:microsoft/DeepSpee…
cfd56385
tjruwase Merge branch 'master' into olruwase/bf16-updates
5ea1c60f
tjruwase Merge branch 'master' into olruwase/bf16-updates
0e2a1c50
tjruwase tjruwase merged 56c52238 into master 3 years ago
kisseternity
kisseternity commented on 2022-08-22
mrwyattii mrwyattii deleted the olruwase/bf16-updates branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone