openvino
dbc544df - [GPU] Defer allocations of inputs (#35126)

Commit

17 days ago

[GPU] Defer allocations of inputs (#35126) In `input_layout_node` try to skip early mem allocations so that we can avoid mem increase for large inputs. This optimization saves the total memory peak by the phi silica application from 10gb down to 6gb. ### Details: Before this change the allocate_mem was skipped (set to false) for example for dynamic shapes and internal networks. The PR forces it always to be false and also handle cases where the inputs are expected to be present (check for null, or allocate temp buffer for simplicity). See the early version of the presentation: https://intel-my.sharepoint.com/:p:/p/bartlomiej_filipek/IQCJ4tTQG0XHQYjMm_FAUlyIAf2FeLPfsthPN6xxJt4TD-I?e=DOG6DL ### Tickets: CVS-178139 ### AI Assistance: - AI assistance used: yes - If yes, summarize how AI was used: Ai generated most of the code after several iterations. Manually tested and debugged on the phi silica script app. ## Perf/mem Comparison: ### Using benchmark_app.exe, LunarLake 5 236V, 16GB, iGPU, Model | PR Avg FPS | Master Avg FPS | Δ FPS | PR Compile RAM | Master Compile RAM | Δ RAM -- | -- | -- | -- | -- | -- | -- YOLOv3 | ~115.5 | ~115.0 | ~+0.4% | ~256 MB | ~256 MB | ≈0 PSD2 | ~55.3 | ~55.9 | ~-1.1% | ~841 MB | ~825 MB | ~+16 MB PR PSD7 | ~5.28 | ~5.27 | ~+0.2% | ~1362 MB | ~1361 MB | ≈0 PSR | ~5.45 | ~5.47 | ~-0.4% | ~5633 MB | ~5633 MB | ≈0 ResNet-50 | ~1160 | ~1158 | ~+0.2% | ~1070 MB | ~1066 MB | ≈0 PR - binaries compiled with this PR Master - OpenVino Master, as of 14th April, 2075ff44dc539d2cadfe07ec8bea39623ad300f5 ### MCT Real weight system Results from the MTC team, running on real weight system (11th May) "All models look accurate compared to CPU outputs", "We may still see image quality regression due to a past OV change" | Model | Cosine_Sim_Avg | L2_norm_avg | |---------------------------|----------------|-------------| | Model_PSD1_v0_qdq | 0.9999 | 17.9488 | | Model_PSD2_v0_qdq | 0.9994 | 9.6158 | | Model_PSD3_v1_0_201_qdq | 0.9999 | 30.0694 | | Model_PSD4_v0_qdq | 0.9992 | 120.0805 | | Model_PSD5_1_v1_0_295_qdq | 0.9996 | 3.4721 | | Model_PSD6_v0_qdq | 0.9969 | 7.6443 | | Model_PSD7_v0_qdq | 1.0000 | 3.8851 | | Model_PSD8_v1_0_297_qdq | 0.9997 | 164.4600 | `Cosine_Sim_Avg` and `L2_norm_avg ` - averaged across several output for a given model ov_4th_may_with_pr_35126 against ov_latest from early May --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

References

#35126 - [GPU] Defer allocations of inputs

Author

intbf

Parents

7876e430

openvino dbc544df - [GPU] Defer allocations of inputs (#35126)

openvino
dbc544df - [GPU] Defer allocations of inputs (#35126)