add nesting to TORCH_SHOW_DISPATCH_TRACE (#87751)
Added indents to `TORCH_SHOW_DISPATCH_TRACE` so that you more easily see the call tree from the dispatcher. Definitely slower, but it's all guarded under the `DEBUG` build. Example output:
I know we have the PyDispatcher now, but I still found this helpful for debugging
```
[call] op=[aten::ones], key=[BackendSelect]
[redispatch] op=[aten::ones], key=[CPU]
[call] op=[aten::empty.memory_format], key=[BackendSelect]
[redispatch] op=[aten::empty.memory_format], key=[CPU]
[call] op=[aten::fill_.Scalar], key=[CPU]
[call] op=[aten::clone], key=[AutogradCPU]
[redispatch] op=[aten::clone], key=[CPU]
[call] op=[aten::empty_strided], key=[BackendSelect]
[redispatch] op=[aten::empty_strided], key=[CPU]
[call] op=[aten::copy_], key=[CPU]
[call] op=[aten::view], key=[PythonTLSSnapshot]
[redispatchBoxed] op=[aten::view], key=[AutogradCPU]
[redispatch] op=[aten::view], key=[ADInplaceOrView]
[redispatch] op=[aten::view], key=[Functionalize]
[call] op=[aten::view], key=[PythonTLSSnapshot]
[redispatchBoxed] op=[aten::view], key=[Meta]
[call] op=[aten::view], key=[PythonTLSSnapshot]
[redispatchBoxed] op=[aten::view], key=[Python]
[callBoxed] op=[aten::view], key=[CPU]
[call] op=[aten::clone], key=[PythonTLSSnapshot]
[redispatchBoxed] op=[aten::clone], key=[AutogradCPU]
[redispatch] op=[aten::clone], key=[Functionalize]
[callBoxed] op=[aten::clone], key=[PythonTLSSnapshot]
[redispatchBoxed] op=[aten::clone], key=[Python]
[callBoxed] op=[aten::clone], key=[CPU]
[call] op=[aten::empty_strided], key=[BackendSelect]
[redispatch] op=[aten::empty_strided], key=[CPU]
[call] op=[aten::copy_], key=[CPU]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87751
Approved by: https://github.com/ezyang, https://github.com/zou3519