pytorch
51ff408f - Add retry when cleaning up Windows workspace (#102051)

Commit
1 year ago
Add retry when cleaning up Windows workspace (#102051) Windows flakiness strikes again. There is a new flaky issue start appearing on HUD in which tearing down Windows workspace fails with `Device or resource busy` error when trying to `rm -rf ./*` the workspace, for example https://github.com/pytorch/pytorch/actions/runs/5051845102/jobs/9064107717. It happens on both build and test jobs. I have looked into all commits since last weekend but there is nothing standing out or Windows-related. The error means that a process still hold the directory, but it's unclear which one as all CI processes should have been stopped by then (https://github.com/pytorch/pytorch/pull/101460) with the only exception of the runner daemon itself. On the other hand, the issue is flaky as the next job running on the same failed runner can clean up the workspace fine when checking out PyTorch (https://github.com/pytorch/pytorch/blob/main/.github/actions/checkout-pytorch/action.yml#L21-L35). For example, `i-0ec1767a38ec93b4e` failed at https://github.com/pytorch/pytorch/actions/runs/5051845102/jobs/9064107717 and its immediate next job succeeded https://github.com/pytorch/pytorch/actions/runs/5052147504/jobs/9064717085. So, I think that adding retrying should help mitigate this. Related to https://github.com/pytorch/test-infra/pull/4206 (not the same root cause, I figured out https://github.com/pytorch/test-infra/pull/4206 while working on this PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102051 Approved by: https://github.com/kit1980
Author
Committer
Parents
Loading