pytorch
4a88f71f - Fix potential naming clash when writing traces with tensorboard_trace_handler (#97392)

Commit View On GitHub

Commit

1 year ago

Fix potential naming clash when writing traces with tensorboard_trace_handler (#97392) Fixes https://github.com/pytorch/pytorch/issues/82915 This rare flaky issue caught my attention today when it failed flakily on MacOS in https://github.com/pytorch/pytorch/actions/runs/4494182574/jobs/7906827531. The test expected 3 traces to be written but got only 2 of them. Looking a bit closer into the `tensorboard_trace_handler` function, it looks like there is a potential filename clash here. The millisecond since epoch `"{}.{}.pt.trace.json".format(worker_name, int(time.time() * 1000))` is used as part of the name. As `tensorboard_trace_handler` is used as a callback handle in the test, the names look too close to each other (1-millisecond apart), i.e. ``` huydo-mbp_13494.1679526197252.pt.trace.json huydo-mbp_13494.1679526197253.pt.trace.json huydo-mbp_13494.1679526197250.pt.trace.json ``` Switching to nanosecond reduces the chance of two or more of them having the same timestamp while keeping the naming convention intact, i.e. `huydo-mbp_13804.1679526325182878000.pt.trace.json` I suspect that this is also the cause of Windows flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97392 Approved by: https://github.com/malfet, https://github.com/aaronenyeshi

Author

huydhn

Committer

pytorchmergebot

Parents

d499b7d7

pytorch 4a88f71f - Fix potential naming clash when writing traces with tensorboard_trace_handler (#97392)

Commit

pytorch
4a88f71f - Fix potential naming clash when writing traces with tensorboard_trace_handler (#97392)