DeepSpeed
43166463 - Sort and dedupe -gencode flags emitted by op_builder.builder (#8021)

Commit
18 days ago
Sort and dedupe -gencode flags emitted by op_builder.builder (#8021) ## Summary - Sort and dedupe `ccs` in `CUDAOpBuilder.compute_capability_args` so the emitted `-gencode` flags are deterministic regardless of the order in which architectures appear in `TORCH_CUDA_ARCH_LIST` or `cross_compile_archs`. - Matches PyTorch's own canonicalisation, which already sorts the gencode sequence (noted in #7871 while investigating #7863). - Also dedupes so repeated arches do not produce duplicate `-gencode` entries. ## Why Issue #7871 observed that PyTorch sorts `-gencode` flags but DeepSpeed emits them in the order entries appear in `TORCH_CUDA_ARCH_LIST`. That order dependence contributed to the regression discussed in #7863. The non-JIT branch in `op_builder/builder.py` did not sort or dedupe before iterating over `self.ccs()`, so calls like `TORCH_CUDA_ARCH_LIST="8.0;7.5;8.0;7.0"` produced an out-of-order, duplicated flag sequence. The JIT branch already sorts (line 669), so this brings the non-JIT branch in line. ## Changes - `op_builder/builder.py`: after `filter_ccs`, sort and dedupe `ccs` by numeric `(major, minor)` (stripping any `+PTX` suffix for comparison). The downstream `+PTX` handling at the emission site is preserved. - `tests/unit/ops/test_op_builder.py`: new `test_non_jit_branch_sorts_and_dedupes_gencode_flags` covering the unsorted + duplicated input case. The existing `test_non_jit_branch_unchanged` continues to pass. ## Test plan - [x] `pytest tests/unit/ops/test_op_builder.py -x -v` (7 passed, including the new test and the prior non-JIT regression test) - [x] `yapf` (no diff) - [x] `codespell` (clean) Fixes #7871 --------- Signed-off-by: Aditya Singh <adisin650@gmail.com>
Parents
Loading