DeepSpeed
819af0e5 - Fix DeepCompile all-gather scheduler candidate selection (#8033)

Commit
34 days ago
Fix DeepCompile all-gather scheduler candidate selection (#8033) This PR fixes issues with the heuristic in DeepCompile's scheduler: - Fix a candidate-selection bug in `fast_free_schedule()`: the scheduler computed the zero-`free_acc_mem` candidate subset, but then sorted the full runnable set instead of that subset. - Keep the existing local scheduling heuristic, but rank candidates with graph-local all-gather pressure metrics before release-side cost when a low-live release path is available. - Add deterministic CPU-only FX scheduler regressions for the zero-free filter, pressure ordering, fallback candidate ordering, and single-all-gather ordering. ## Rationale `fast_free_schedule()` is a local heuristic for reducing gathered-parameter live ranges. This patch keeps that model, but fixes a general selection inconsistency: when at least one runnable candidate can reach release without additional all-gathers, the scheduler should choose from that zero-`free_acc_mem` subset. The previous code used the subset only as a branch condition, then ranked all runnable candidates by `free_cost`, so it could select a candidate that still required additional all-gathers before release. After preserving the zero-`free_acc_mem` filter, the ordering uses only workload-independent graph pressure signals already available to the scheduler: scheduled all-gather count, all-gather byte pressure, release-side cost, and a stable node-name tie breaker. In the fallback path, where every candidate still requires additional all-gathers, `free_acc_mem` remains the primary selector and the scheduler preserves the previous boundary of scheduling only through `schedule_until_ag`; this avoids making a memory-budget decision without tracking already-live gathered parameters. ## Testing - `python -m pytest tests/unit/compile/test_list_schedule.py -q` - `pre-commit run --all-files` --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Parents
Loading