[CI] diff driven test selection (#8077)

Commit

8 days ago

[CI] diff driven test selection (#8077) TLDR: Analyze PR's diff and filter out tests that aren't exercising what has changed, potentially cutting down runtime and expense by 95-99% most of the time. Detailed: Deepspeed's CI takes forever - most of the time burning $$ and wastes dev time for no reason as most changes require just a few tests to run. HF Transformers has a system to select which tests to run based on the diff of the PR - Sylvain Gugger wrote it many years ago since that repo has now probably thousands of tests. Deepspeed's CI isn't too bad but can easily take hours. So I asked Claude Opus 4.8 to replicate the system for Deepspeed. Please have a look. It looks super complicated, so I'm not sure how easy it'd be to maintain/operate unless we always use AI to continue maintaining it. I asked Claude to leave a detailed state file so that it or another model could pick it up where it left. I started with just the slowest costliest workload `.github/workflows/modal-torch-latest.yml` to see if it works well. If you're happy we can replicate it to the rest of the workloads. CC: @loadams, @tjruwase - please tag others if you think they would be helpful to discuss this. --------- Signed-off-by: Stas Bekman <stas@stason.org>

References

#8077 - [CI] diff driven test selection

Author

stas00

Parents

a1632458

DeepSpeed f3460829 - [CI] diff driven test selection (#8077)

DeepSpeed
f3460829 - [CI] diff driven test selection (#8077)