xla
ZeRO1: Add bucketting logic to control the size of tensors for all-gather/reduce-scatter
#6025
Merged

Commits
  • add bucketting logic to control the size of tensors for all-gather and reduce-scatter
    aws-rhsoln committed 1 year ago
  • Yapf lint fixes
    jeffhataws committed 1 year ago
  • handle the case when groups is none
    aws-rhsoln committed 1 year ago
  • update zero1
    hgt312 committed 1 year ago
  • yapf lint fixes
    jeffhataws committed 1 year ago
  • Fix missing curly brackets in assertion msg
    jeffhataws committed 1 year ago
  • Fixing FAL issue when sharded params are initialized with torch.double
    amithrm committed 1 year ago
  • Yapf fixes
    jeffhataws committed 1 year ago
  • Fix indices and variable names
    jeffhataws committed 1 year ago
  • Checking of <tensor>.numel for output tensors cause error in GPU runtime
    jeffhataws committed 1 year ago
  • Avoid passing empty input buckets
    jeffhataws committed 1 year ago
  • Fix indent for 2 lines in ZeRO1 (shard.grad = grad_shard, index += 1)
    jeffhataws committed 1 year ago
  • Refactor bucketized all-gather/reduce-scatter functions; add bucket_cap_mb arg
    jeffhataws committed 1 year ago
  • Refactor bucketing logic into a class, shared by all-gather/reduce-scatter
    jeffhataws committed 1 year ago
  • Remove bucket-cap division logic; separate bucket cap for allgather/reducescatter
    jeffhataws committed 1 year ago
Loading