pytorch
d91d21fd - [submodule kineto] Enable profiler connection to daemon during init for cpu only jobs (#118320)

Commit
285 days ago
[submodule kineto] Enable profiler connection to daemon during init for cpu only jobs (#118320) Fixes #112389 and https://github.com/facebookincubator/dynolog/issues/208 This PR enables profiler initialization for CPU only use cases. The main goal is to enable on-demand profiling with a daemon when using CPU only mode of PyTorch. * When CUDA is available the profiler is initialized on first CUDA stream creation (or lazily when profiler is run). * Since the CUDA stream creation callback does not exist on CPU only PyTorch the profiler is never initied on its own. * Thus the job does not register with Dynolog when we set "KINETO_USE_DAEMON" env variable to set. Part of the fix is in Kineto https://github.com/pytorch/kineto/pull/861, we point to it in PyTorch. The change in PyTorch is to correctly set the `cpuOnly` argument. ## TestPlan: Build PyTorch from source with USE_CUDA=0 so we have CPU only based build. Git hash = `a40951defd87b9a5e582cf9112bf7a8bd0930c79` (See instructions in PyTorch repo) For the setup we run dynolog daemon in another terminal ``` buck2 run dynolog/src:dynolog -- --enable_ipc_monitor & ``` Now run an example model in PyTorch - see [linear_model.py](https://github.com/facebookincubator/dynolog/blob/main/scripts/pytorch/linear_model_example.py) , and set the device to 'cpu' inside the code instead of 'cuda'. ``` export KINETO_USE_DAEMON=1 python linear_model_example.py ``` Output shows the profiler registration with dynolog ``` (pytorch) [bcoutinho@devgpu038.ftw6 ~/local/pytorch (main)]$ python linear_model_example.py INFO:2024-01-25 11:08:53 1807792:1807792 init.cpp:122] Registering daemon config loader, cpuOnly = 1 INFO:2024-01-25 11:08:53 1807792:1807792 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-01-25 11:08:53 1807792:1807792 IpcFabricConfigClient.cpp:93] Setting up IPC Fabric at endpoint: dynoconfigclient0dc36b8a-e14c-4260-958b-4b2e7d15e986 status = initialized INFO:2024-01-25 11:08:53 1807792:1807792 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 INFO:2024-01-25 11:08:53 1807792:1807792 DaemonConfigLoader.cpp:63] Setting communication fabric enabled = 1 ``` We can also collect a trace using ``` [bcoutinho@devgpu038.ftw6 ~/fbsource/fbcode (3bc85f968)]$ buck2 run dynolog/cli:dyno -- gputrace --log-file /tmp/test.json Kineto config = ACTIVITIES_LOG_FILE=/tmp/test.json PROFILE_START_TIME=0 ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=false PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false response length = 147 response = {"activityProfilersBusy":0,"activityProfilersTriggered":[1807792],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[1807792]} Matched 1 processes Trace output files will be written to: /tmp/test_1807792.json ``` And trace file contains the trace correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118320 Approved by: https://github.com/aaronenyeshi
Author
Committer
Parents
Loading