pytorch
8c6d352b - Log a new "timer expired" event to Scuba in file_based_local_timer (#85861)

Commit
2 years ago
Log a new "timer expired" event to Scuba in file_based_local_timer (#85861) Summary: The "kill worker process" event was logged to Scuba only when the worker process was really reaped. We want to add a new event "timer expired", no matter the worker process will be reaped or not. This will help collect data before we enable the JustKnob to kill the worker process on timeout. Test Plan: ### Unit Test ``` buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test ``` ``` Test Session: https://www.internalfb.com/intern/testinfra/testrun/7318349508929624 RE: reSessionID-ea464c43-54e7-44f2-942b-14ea8aa98c74 Up: 10.5 KiB Down: 1.1 MiB Jobs completed: 100. Time elapsed: 3206.9s. Cache hits: 91%. Commands: 11 (cached: 10, remote: 1, local: 0) Tests finished: Pass 55. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` -------- ``` buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test ``` ``` Test Session: https://www.internalfb.com/intern/testinfra/testrun/6473924579130483 RE: reSessionID-231a47b7-a43d-4c0f-9f73-64713ffcbbd3 Up: 5.7 MiB Down: 1.9 GiB Jobs completed: 182156. Time elapsed: 282.4s. Cache hits: 99%. Commands: 72112 (cached: 72107, remote: 1, local: 4) Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` Differential Revision: D39903376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85861 Approved by: https://github.com/d4l3k
Author
Committer
Parents
Loading