[pytorch] Name the main and autograd threads for better debugging (#121170)
The main thread and the autograd one are latency critical threads. They launch CPU/GPU/Accelerator kernels and if for some reason they get preempted, the rank can become a straggler in a distributed training application. By naming these threads we can debug performance issues that impact the latency sensitive threads.
I used Kineto traces to verify if the thread names were propagated:
<img width="851" alt="Screenshot 2024-03-04 at 3 07 43 PM" src="https://github.com/pytorch/pytorch/assets/23515689/68b4a09c-b8e5-4f14-a5c0-6593f866c03f">
Also:
```
nvidia-smi
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3065920 C ...me#python#py_version_3_10 1968MiB |
| 1 N/A N/A 3065926 C ...me#python#py_version_3_10 1978MiB |
| 2 N/A N/A 3065930 C ...me#python#py_version_3_10 2084MiB |
| 3 N/A N/A 3065936 C ...me#python#py_version_3_10 2016MiB |
| 4 N/A N/A 3065939 C ...me#python#py_version_3_10 1998MiB |
| 5 N/A N/A 3065943 C ...me#python#py_version_3_10 2070MiB |
| 6 N/A N/A 3065948 C ...me#python#py_version_3_10 2026MiB |
| 7 N/A N/A 3065952 C ...me#python#py_version_3_10 2070MiB |
+-----------------------------------------------------------------------------+
[me@myhost ~]$ ps -T -p 3065920
PID SPID TTY TIME CMD
3065920 3065920 pts/14 00:01:04 pt_main_thread
...
3065920 3092181 pts/14 00:00:40 pt_autograd_d0
3065920 3092182 pts/14 00:00:00 pt_autograd_d1
3065920 3092183 pts/14 00:00:00 pt_autograd_d2
3065920 3092184 pts/14 00:00:00 pt_autograd_d3
3065920 3092185 pts/14 00:00:00 pt_autograd_d4
3065920 3092186 pts/14 00:00:00 pt_autograd_d5
3065920 3092187 pts/14 00:00:00 pt_autograd_d6
3065920 3092188 pts/14 00:00:00 pt_autograd_d7
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121170
Approved by: https://github.com/albanD