DeepSpeed
Fix potential random layout inconsistency issues in sparse attention modules
#534
Merged

Commits
  • 1) Register layout as buffer of module so that we can save/load checkpoint; 2) Add a broadcast of layout at the beginning to ensure different processes will have consistent layout during distributed training.
    Zhun Liu committed 5 years ago
  • Add docstring for max_seq_length argument in SparseSelfAttention
    zhunliu committed 5 years ago
  • Merge branch 'master' into master
    jeffra committed 5 years ago
  • Merge branch 'master' into master
    jeffra committed 5 years ago
  • Merge branch 'master' into master
    jeffra committed 5 years ago
  • Merge branch 'master' into master
    jeffra committed 5 years ago
  • Merge branch 'master' into master
    jeffra committed 5 years ago
Loading