pytorch
61cdae0c - Switch Windows CI jobs to G5 runners (#91727)

Commit View On GitHub

Commit

1 year ago

Switch Windows CI jobs to G5 runners (#91727) ### Changelist * Change Windows TORCH_CUDA_ARCH_LIST from `7.0` to `8.6` to compatible with NVIDIA A10G TPU * Correctly disable some tests that requires flash attention, which is not available on Windows at the moment. This has been fixed by https://github.com/pytorch/pytorch/pull/91979 * G5 runner has `AMD EPYC 7R32` CPU, not an Intel one * This seems to change the behavior of `GetDefaultMobileCPUAllocator` in `cpu_profiling_allocator_test`. This might need to be investigated further (TODO: TRACKING ISSUE). In the meantime, the test has been updated accordingly to use `GetDefaultCPUAllocator` correctly instead of `GetDefaultMobileCPUAllocator` for mobile build * Also one periodic test `test_cpu_gpu_parity_nn_Conv3d_cuda_float32` fails with Tensor not close error when comparing grad tensors between CPU and GPU. This is fixed by turning off TF32 for the test. ### Performance gain * (CURRENT) p3.2xlarge - https://hud.pytorch.org/tts shows each Windows CUDA shards (1-5 + functorch) takes about 2 hours to finish (duration) * (NEW RUNNER) g5.4xlarge - The very rough estimation of the duration is 1h30m for each shard, meaning a half an hour gain (**25%**) ### Pricing On demand hourly rate: * (CURRENT) p3.2xlarge: $3.428. Total = Total hours spent on Windows CUDA tests * 3.428 * (NEW RUNNER) g5.4xlarge: $2.36. Total = Total hours spent on Windows CUDA tests * Duration gain (0.75) * 2.36 So the current runner is not only more expensive but is also slower. Switching to G5 runners for Windows should cut down the cost by (3.428 - 0.75 * 2.36) / 3.428 = **~45%** ### Rolling out https://github.com/pytorch/test-infra/pull/1376 needs to be reviewed and approved to ensure the capacity of the runner before PR can be merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91727 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/seemethere

Author

huydhn

Committer

pytorchmergebot

Parents

b7cad020

pytorch 61cdae0c - Switch Windows CI jobs to G5 runners (#91727)

Commit

pytorch
61cdae0c - Switch Windows CI jobs to G5 runners (#91727)