DeepSpeed
4fea41f4 - Remove assumption that padding only occurs on last rank (#6974)

Commit
317 days ago
Remove assumption that padding only occurs on last rank (#6974) As discussed in [PR-6918](https://github.com/microsoft/DeepSpeed/pull/6918), padding can occur on multiple ranks with large DP degrees. For example, with: - Flattened tensor size: 266240 - DP degree: 768 - Alignment: 1536 - Required padding: 1024 (1536 * 174 - 266240) - Per-rank partition size: 348 (1536 * 174 / 768) - The padding occurs on last three ranks. This PR removes the single-rank padding assumption for more general cases. --------- Co-authored-by: Sam Foreman <saforem2@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading