pytorch
b8862104 - [profiler] Add kineto init delay when used in daemon mode (#120276)

Commit View On GitHub

Commit

212 days ago

[profiler] Add kineto init delay when used in daemon mode (#120276) Fixes #112389 ## About PyTorch (Kineto) profiler registers with the profiling daemon Dynolog to enable on-demand profiling. The user should only need to set the env variable `KINETO_USE_DAEMON`. To enable this we need to initialize kineto library early rather than lazily on a PyTorch profiler call. This initialization happens in a static initializer. - Kineto init function basically registers a callback using the CUDA CUPTI library https://github.com/pytorch/kineto/blob/main/libkineto/src/init.cpp#L130-L148 - However, the above needs the dynamic linking to libcupti.so to have taken place. - I understand now that static initializations of compilation units will be called before the dynamic linking leading to a segfault in #112389 ![image](https://github.com/pytorch/pytorch/assets/6922212/29c9e79b-8080-4198-aaae-8a5696dccaec) ## Workaround We add a delay in the initialization that can be configured using the env variable 'KINETO_DAEMON_INIT_DELAY_S'. May not be the best but it could help resolve the issue. ## Testing Tested this out with [linear_model_example.py](https://github.com/facebookincubator/dynolog/blob/main/scripts/pytorch/linear_model_example.py) First export the daemon env variable ### Without any delay ``` >$ python3 linear_model_example.py INFO:2024-02-21 19:34:50 2366287:2366287 init.cpp:131] Registering daemon config loader, cpuOnly = 1 INFO:2024-02-21 19:34:50 2366287:2366287 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-02-21 19:34:50 2366287:2366287 IpcFabricConfigClient.cpp:93] Setting up IPC Fabric at endpoint: dynoconfigclientb8f91363-d8d6-47a7-9103-197661e28397 status = initialized INFO:2024-02-21 19:34:50 2366287:2366287 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-02-21 19:34:50 2366287:2366287 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 cpu 99 1385.468505859375 ``` ### With 5 seconds delay ``` >$ KINETO_DAEMON_INIT_DELAY_S=5 python3 linear_model_example.py cpu 99 284.82305908203125 10099 8.817167282104492 INFO:2024-02-21 19:34:26 2359155:2359214 init.cpp:131] Registering daemon config loader, cpuOnly = 1 ERROR: External init callback must run in same thread as registerClient (1782580992 != -1922169024) INFO:2024-02-21 19:34:26 2359155:2359214 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-02-21 19:34:26 2359155:2359214 IpcFabricConfigClient.cpp:93] Setting up IPC Fabric at endpoint: dynoconfigclient49270a3f-e913-4ea6-b9e0-cc90a853a869 status = initialized INFO:2024-02-21 19:34:26 2359155:2359214 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-02-21 19:34:26 2359155:2359214 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 20099 8.817167282104492 ``` ### With an invalid delay ``` >$ KINETO_DAEMON_INIT_DELAY_S=abc python3 linear_model_example.py INFO:2024-02-21 19:35:02 2369647:2369647 init.cpp:131] Registering daemon config loader, cpuOnly = 1 INFO:2024-02-21 19:35:02 2369647:2369647 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-02-21 19:35:02 2369647:2369647 IpcFabricConfigClient.cpp:93] Setting up IPC Fabric at endpoint: dynoconfigclient0e12a349-af7b-4322-901d-1ff22f91fd4c status = initialized INFO:2024-02-21 19:35:02 2369647:2369647 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-02-21 19:35:02 2369647:2369647 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 cpu ``` ### Unit test updated as well. ## Impact This should not impact any general user. The initialization only occurs if `KINETO_USE_DAEMON` is set in the environment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120276 Approved by: https://github.com/anupambhatnagar, https://github.com/aaronenyeshi

Author

briancoutinho

Committer

pytorchmergebot

Parents

be0ee934

pytorch b8862104 - [profiler] Add kineto init delay when used in daemon mode (#120276)

Commit

pytorch
b8862104 - [profiler] Add kineto init delay when used in daemon mode (#120276)