xla
ZeRO1: Add bucketting logic to control the size of tensors for all-gather/reduce-scatter
#6025
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
15
Changes
View On
GitHub
ZeRO1: Add bucketting logic to control the size of tensors for all-gather/reduce-scatter
#6025
JackCaoG
merged 15 commits into
master
from
jeffhataws_zero1_fixes2
jeffhataws
force pushed
from
62c5109a
to
e692a81c
1 year ago
jeffhataws
force pushed
from
7c3d92da
to
84a509d0
1 year ago
jeffhataws
added
backport_2.2
jeffhataws
requested a review
from
alanwaketan
1 year ago
jeffhataws
requested a review
from
JackCaoG
1 year ago
jeffhataws
force pushed
from
84a509d0
to
285a766c
1 year ago
jeffhataws
force pushed
from
a4532576
to
6022c917
1 year ago
JackCaoG
added
backport_2.3
jeffhataws
commented on 2024-03-13
jeffhataws
commented on 2024-03-13
jeffhataws
commented on 2024-03-15
add bucketting logic to control the size of tensors for all-gather an…
90eda151
Yapf lint fixes
46a069af
handle the case when groups is none
8e79997b
update zero1
5a87467e
yapf lint fixes
b354c277
Fix missing curly brackets in assertion msg
22e29d37
Fixing FAL issue when sharded params are initialized with torch.double
96c61cd6
Yapf fixes
6b7ce8fa
Fix indices and variable names
a5de71af
Checking of <tensor>.numel for output tensors cause error in GPU runtime
77b2ad17
jeffhataws
force pushed
from
a8f050e5
to
77b2ad17
1 year ago
jeffhataws
commented on 2024-03-16
Avoid passing empty input buckets
ae348b24
hgt312
commented on 2024-03-19
jeffhataws
force pushed
from
173ef47d
to
13965fd3
1 year ago
Fix indent for 2 lines in ZeRO1 (shard.grad = grad_shard, index += 1)
85863703
jeffhataws
force pushed
from
13965fd3
to
85863703
1 year ago
jeffhataws
commented on 2024-03-20
Refactor bucketized all-gather/reduce-scatter functions; add bucket_c…
675e7a11
jeffhataws
force pushed
from
ec4b1e05
to
675e7a11
1 year ago
jeffhataws
requested a review
from
hgt312
1 year ago
hgt312
approved these changes on 2024-03-20
JackCaoG
commented on 2024-03-20
JackCaoG
commented on 2024-03-20
JackCaoG
commented on 2024-03-20
JackCaoG
commented on 2024-03-20
JackCaoG
commented on 2024-03-20
JackCaoG
commented on 2024-03-20
Refactor bucketing logic into a class, shared by all-gather/reduce-sc…
d7c99588
Remove bucket-cap division logic; separate bucket cap for allgather/r…
5006388f
jeffhataws
requested a review
from
JackCaoG
1 year ago
JackCaoG
approved these changes on 2024-03-21
jeffhataws
commented on 2024-03-22
JackCaoG
merged
e75677f1
into master
1 year ago
jeffhataws
deleted the jeffhataws_zero1_fixes2 branch
306 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
JackCaoG
hgt312
alanwaketan
Assignees
No one assigned
Labels
backport_2.2
backport_2.3
Milestone
No milestone
Login to write a write a comment.
Login via GitHub