openvino
3641e3e6 - [GPU] Fix perf drop of 4bit-KV-cache of VLM (#35831)

Commit
9 days ago
[GPU] Fix perf drop of 4bit-KV-cache of VLM (#35831) some VLMs showed performance drop when 4bit KV-cache is enabled with PA backend. ### Details: When KV_CACHE_PRECISION=u4 is set globally, all non-PA SDPA nodes — including Vision Encoder self-attention — are blocked from using the fast sdpa_micro__prefill kernel and fall back to the slower sdpa_opt__multi_reg kernel. This happens because the check in sdpa_opt.cpp reads the global config get_kv_cache_precision() without verifying whether the SDPA node actually uses compressed KV cache ### Tickets: - CVS-185922 ### AI Assistance: - *AI assistance used: no / yes* - *If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).* Analyze validation report to summarize performance issue table of VLMs in the report. --------- Signed-off-by: Min, Byung il <byungil.min@intel.com>
Author
Parents
Loading