[GPU] Defer allocations of inputs (#35126)
In `input_layout_node` try to skip early mem allocations so that we can
avoid mem increase for large inputs.
This optimization saves the total memory peak by the phi silica
application from 10gb down to 6gb.
### Details:
Before this change the allocate_mem was skipped (set to false) for
example for dynamic shapes and internal networks. The PR forces it
always to be false and also handle cases where the inputs are expected
to be present (check for null, or allocate temp buffer for simplicity).
See the early version of the presentation:
https://intel-my.sharepoint.com/:p:/p/bartlomiej_filipek/IQCJ4tTQG0XHQYjMm_FAUlyIAf2FeLPfsthPN6xxJt4TD-I?e=DOG6DL
### Tickets:
CVS-178139
### AI Assistance:
- AI assistance used: yes
- If yes, summarize how AI was used: Ai generated most of the code after
several iterations. Manually tested and debugged on the phi silica
script app.
## Perf/mem Comparison:
### Using benchmark_app.exe, LunarLake 5 236V, 16GB, iGPU,
Model | PR Avg FPS | Master Avg FPS | Δ FPS | PR Compile RAM | Master
Compile RAM | Δ RAM
-- | -- | -- | -- | -- | -- | --
YOLOv3 | ~115.5 | ~115.0 | ~+0.4% | ~256 MB | ~256 MB | ≈0
PSD2 | ~55.3 | ~55.9 | ~-1.1% | ~841 MB | ~825 MB | ~+16 MB PR
PSD7 | ~5.28 | ~5.27 | ~+0.2% | ~1362 MB | ~1361 MB | ≈0
PSR | ~5.45 | ~5.47 | ~-0.4% | ~5633 MB | ~5633 MB | ≈0
ResNet-50 | ~1160 | ~1158 | ~+0.2% | ~1070 MB | ~1066 MB | ≈0
PR - binaries compiled with this PR
Master - OpenVino Master, as of 14th April,
2075ff44dc539d2cadfe07ec8bea39623ad300f5
### MCT Real weight system
Results from the MTC team, running on real weight system (11th May)
"All models look accurate compared to CPU outputs", "We may still see
image quality regression due to a past OV change"
| Model | Cosine_Sim_Avg | L2_norm_avg |
|---------------------------|----------------|-------------|
| Model_PSD1_v0_qdq | 0.9999 | 17.9488 |
| Model_PSD2_v0_qdq | 0.9994 | 9.6158 |
| Model_PSD3_v1_0_201_qdq | 0.9999 | 30.0694 |
| Model_PSD4_v0_qdq | 0.9992 | 120.0805 |
| Model_PSD5_1_v1_0_295_qdq | 0.9996 | 3.4721 |
| Model_PSD6_v0_qdq | 0.9969 | 7.6443 |
| Model_PSD7_v0_qdq | 1.0000 | 3.8851 |
| Model_PSD8_v1_0_297_qdq | 0.9997 | 164.4600 |
`Cosine_Sim_Avg` and `L2_norm_avg ` - averaged across several output for
a given model
ov_4th_may_with_pr_35126 against ov_latest from early May
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>