[torch/elastic] Pretty print the failure message captured by @record (#64036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64036
This PR slightly revises the implementation of the internal `_format_failure()` method in order to pretty print the error message captured in a subprocess by the `record` annotation.
With this PR a failure log is formatted as below:
```
Root Cause:
[0]:
time: 2021-08-26_17:12:07
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 8045)
error_file: /tmp/torchelastic_6cj9eppm/6d9d844a-6ce4-4838-93ed-1639a9525b00_rec9kuv3/attempt_0/0/error.json
msg:
{
"message": "ValueError: Test",
"extraInfo": {
"py_callstack": [
" File \"/data/home/balioglu/fail.py\", line 7, in <module>\n main()\n",
" File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n error_handler.record_exception(e)\n",
" File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n _write_error(e, self._get_error_file_path())\n",
" File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n \"py_callstack\": traceback.format_stack(),\n"
],
"timestamp": "1629997927"
}
}
```
in contrast to the old formatting:
```
Root Cause:
[0]:
time: 2021-08-26_17:15:50
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 9417)
error_file: /tmp/torchelastic_22pwarnq/19f22638-848c-4b8f-8379-677f34fc44e7_u43o9vs7/attempt_0/0/error.json
msg: "{'message': 'ValueError: Test', 'extraInfo': {'py_callstack': 'Traceback (most recent call last):\n File "/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 351, in wrapper\n return f(*args, **kwargs)\n File "/data/home/balioglu/fail.py", line 5, in main\n raise ValueError("BALIOGLU")\nValueError: BALIOGLU\n', 'timestamp': '1629998150'}}"
```
ghstack-source-id: 136761768
Test Plan: Run the existing unit tests.
Reviewed By: kiukchung
Differential Revision: D30579025
fbshipit-source-id: 37df0b7c7ec9b620355766122986c2c77e8495ae