pytorch
2ac6ee7f - Migrate jobs: `windows.4xlarge`->`windows.4xlarge.nonephemeral` (#100548)

Commit View On GitHub

Commit

1 year ago

Migrate jobs: `windows.4xlarge`->`windows.4xlarge.nonephemeral` (#100548) This is reopening of the PR https://github.com/pytorch/pytorch/pull/100377 # About this PR Due to increased pressure over our windows runners, and the elevated cost of instantiating and bringing down those instances, we want to migrate instances from ephemeral to not ephemeral. Possible impacts are related to breakages in or misbehaves on CI jobs that puts the runners in a bad state. Other possible impacts are related to exhaustion of resources, especially disk space, but memory might be a contender, as CI trash piles up on those instances. As a somewhat middle of the road approach to this, currently nonephemeral instances are stochastically rotated as older instances get higher priority to be terminated when demand is lower. Instances definition can be found here: https://github.com/pytorch/test-infra/pull/4072 This is a first in a multi-step approach where we will migrate away from all ephemeral windows instances and follow the lead of the `windows.g5.4xlarge.nvidia.gpu` in order to help reduce queue times for those instances. The phased approach follows: * migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral` instances under `pytorch/pytorch` * migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral` instances under `pytorch/pytorch` * submit PRs to all repositories under `pytorch/` organization to migrate `windows.4xlarge` to `windows.4xlarge.nonephemeral` * submit PRs to all repositories under `pytorch/` organization to migrate `windows.8xlarge.nvidia.gpu` to `windows.8xlarge.nvidia.gpu.nonephemeral` * terminate the existence of `windows.4xlarge` and `windows.8xlarge.nvidia.gpu` * evaluate and start the work related to the adoption of `windows.g5.4xlarge.nvidia.gpu` to replace `windows.8xlarge.nvidia.gpu.nonephemeral` in other repositories and use cases (proposed by @huydhn) The reasoning for this phased approach is to reduce the scope of possible contenders to investigate in case of misbehave of particular CI jobs. # Copilot Summary  ### <samp>🤖 Generated by Copilot at 579d87a</samp> This pull request migrates some windows workflows to use `nonephemeral` runners for better performance and reliability. It also adds support for new Python and CUDA versions for some binary builds. It affects the following files: `.github/templates/windows_binary_build_workflow.yml.j2`, `.github/workflows/generated-windows-binary-*.yml`, `.github/workflows/pull.yml`, `.github/actionlint.yaml`, `.github/workflows/_win-build.yml`, `.github/workflows/periodic.yml`, and `.github/workflows/trunk.yml`. # Copilot Poem  ### <samp>🤖 Generated by Copilot at 579d87a</samp> > _We're breaking free from the ephemeral chains_ > _We're running on the nonephemeral lanes_ > _We're building faster, testing stronger, supporting newer_ > _We're the non-ephemeral runners of fire_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/100377 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/atalman (cherry picked from commit 7caac545b1d8e5de797c9593981c9578685dba81) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100548 Approved by: https://github.com/jeanschmidt, https://github.com/janeyx99

Author

jeanschmidt

Committer

pytorchmergebot

Parents

843ead13

pytorch 2ac6ee7f - Migrate jobs: `windows.4xlarge`->`windows.4xlarge.nonephemeral` (#100548)

Commit

pytorch
2ac6ee7f - Migrate jobs: `windows.4xlarge`->`windows.4xlarge.nonephemeral` (#100548)