pytorch
3e55fc91 - [pet] Remove additional @record in elastic_launch to fix file existing error

Commit
3 years ago
[pet] Remove additional @record in elastic_launch to fix file existing error Summary: Since `launch_agent()` in api.py is already decorated with record, we can remove the usage in elastic_launch. It also fix the bug for FileExistError on MAST We run an experiment to count how many times record is invoked in D27901961 to ensure the assumption. Test Plan: ``` fbpkg build -E torchelastic_distributed_sum buck run mode/dev-nosan //pytorch/elastic/torchelastic/tsm/fb/cli:tsm -- run_ddp --scheduler mast --fbpkg torchelastic_distributed_sum:fde7879 --nnodes 1 --nproc_per_node 1 --resource T1 --run_cfg hpcIdentity=oncall_dai_pet,hpcClusterUuid=MastNaoTestCluster main.par ``` https://www.internalfb.com/mast/job/tsm_wilsonhong-torchelastic_distributed_sum_a92f97e7 Reviewed By: borovsky-d Differential Revision: D27902034 fbshipit-source-id: e08b02d4b9c7a7c70fbb0dbcb24b95af55d2ea95
Author
Wilson Hong
Parents
Loading