DeepSpeed
c17dc33c - Using explicit GPU upcast for ZeRO-Offload (#6962)

Comment changes are shownComment changes are hidden
Commit
163 days ago
Using explicit GPU upcast for ZeRO-Offload (#6962) Following discussion in [PR-6670](https://github.com/microsoft/DeepSpeed/pull/6670), the explict upcast is much more efficient than implicit upcast, this PR is to replace implicit upcast with explict one. The results on 3B model are shown below: | Option | BWD (ms) | Speed up | |------------|-----|------| | Before PR-6670 | 25603.30 | 1x | | After PR-6670 | 1174.31 | 21.8X | | After this PR| 309.2 | 82.8X |
Author
Parents
  • deepspeed/runtime/zero
    • File
      stage3.py