DeepSpeed
ff1c5435 - fix memcpy issue on backward for zero-infinity (#6670)

Commit
1 year ago
fix memcpy issue on backward for zero-infinity (#6670) This PR is similar to [PR#5301](https://github.com/microsoft/DeepSpeed/pull/5301), that optimizes the D2H time use pinned memory. Previously, the D2H memcpy will be the bottleneck during the final backward pass of each iteration for ZeRO-Infinity(offload), as shown in Trace-1. The new version can eliminate the bottleneck, as shown in Trace-2. _Trace-1_ <img width="480" alt="image" src="https://github.com/user-attachments/assets/891e3770-351b-4e03-8a59-b491bc44d03b"> _Trace-2_ <img width="192" alt="image" src="https://github.com/user-attachments/assets/f1cf9037-77f8-42a6-adc8-d5c6bacde0aa"> cc @tjruwase --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading