DeepSpeed
Tensor parallelism for Mixture of Experts
#2074
Merged

Commits
  • add tensor parallelism support for non-expert groups
    siddharth9820 committed 3 years ago
  • non-expert tensor parallelism - drop tokens before a2a
    siddharth9820 committed 3 years ago
  • support tensor parallelism for non-experts
    siddharth9820 committed 3 years ago
  • fix formatting
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • migrate code for dropping tokens from megatron
    siddharth9820 committed 3 years ago
  • change gather function name
    siddharth9820 committed 3 years ago
  • fall back to previous error message
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • formatting changes
    siddharth9820 committed 3 years ago
  • change function names
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    tjruwase committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • fix number of local experts
    siddharth9820 committed 3 years ago
  • Merge branch 'moe-tensor-parallelism' of github.com:microsoft/DeepSpeed into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • fix documentation
    siddharth9820 committed 3 years ago
  • correct log statement
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • roll back ep-size setting code
    siddharth9820 committed 3 years ago
  • Merge branch 'moe-tensor-parallelism' of github.com:microsoft/DeepSpeed into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • add detailed comments
    siddharth9820 committed 3 years ago
  • restore function in groupy.py
    siddharth9820 committed 3 years ago
  • better comments
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • remove code that changes ep_size and convert it to asserts
    siddharth9820 committed 3 years ago
  • correct groups
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • correction
    siddharth9820 committed 3 years ago
  • add copyright
    siddharth9820 committed 3 years ago
  • correction
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    awan-10 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • formatting changes
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    tjruwase committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    tjruwase committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • add unit tests
    siddharth9820 committed 3 years ago
  • small change
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • remove amp from tests
    siddharth9820 committed 3 years ago
  • Merge branch 'moe-tensor-parallelism' of github.com:microsoft/DeepSpeed into moe-tensor-parallelism
    siddharth9820 committed 3 years ago
  • Merge branch 'master' into moe-tensor-parallelism
    tjruwase committed 3 years ago
Loading