xla
ZeRO1: Add bucketting logic to control the size of tensors for all-gather/reduce-scatter
#6025
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
15
Changes
View On
GitHub
Commits
add bucketting logic to control the size of tensors for all-gather and reduce-scatter
aws-rhsoln
committed
1 year ago
Yapf lint fixes
jeffhataws
committed
1 year ago
handle the case when groups is none
aws-rhsoln
committed
1 year ago
update zero1
hgt312
committed
1 year ago
yapf lint fixes
jeffhataws
committed
1 year ago
Fix missing curly brackets in assertion msg
jeffhataws
committed
1 year ago
Fixing FAL issue when sharded params are initialized with torch.double
amithrm
committed
1 year ago
Yapf fixes
jeffhataws
committed
1 year ago
Fix indices and variable names
jeffhataws
committed
1 year ago
Checking of <tensor>.numel for output tensors cause error in GPU runtime
jeffhataws
committed
1 year ago
Avoid passing empty input buckets
jeffhataws
committed
1 year ago
Fix indent for 2 lines in ZeRO1 (shard.grad = grad_shard, index += 1)
jeffhataws
committed
1 year ago
Refactor bucketized all-gather/reduce-scatter functions; add bucket_cap_mb arg
jeffhataws
committed
1 year ago
Refactor bucketing logic into a class, shared by all-gather/reduce-scatter
jeffhataws
committed
1 year ago
Remove bucket-cap division logic; separate bucket cap for allgather/reducescatter
jeffhataws
committed
1 year ago
Loading