DeepSpeed
c17dc33c - Using explicit GPU upcast for ZeRO-Offload (#6962)

Commit

163 days ago

Using explicit GPU upcast for ZeRO-Offload (#6962) Following discussion in [PR-6670](https://github.com/microsoft/DeepSpeed/pull/6670), the explict upcast is much more efficient than implicit upcast, this PR is to replace implicit upcast with explict one. The results on 3B model are shown below: | Option | BWD (ms) | Speed up | |------------|-----|------| | Before PR-6670 | 25603.30 | 1x | | After PR-6670 | 1174.31 | 21.8X | | After this PR| 309.2 | 82.8X |

References

#6962 - Using explicit GPU upcast for ZeRO-Offload

Author

xylian86

Parents

8d1bc0a0

Files1

deepspeed/runtime/zero
- stage3.py

DeepSpeed c17dc33c - Using explicit GPU upcast for ZeRO-Offload (#6962)

DeepSpeed
c17dc33c - Using explicit GPU upcast for ZeRO-Offload (#6962)