openvino
3641e3e6 - [GPU] Fix perf drop of 4bit-KV-cache of VLM (#35831)

Commit

9 days ago

[GPU] Fix perf drop of 4bit-KV-cache of VLM (#35831) some VLMs showed performance drop when 4bit KV-cache is enabled with PA backend. ### Details: When KV_CACHE_PRECISION=u4 is set globally, all non-PA SDPA nodes — including Vision Encoder self-attention — are blocked from using the fast sdpa_micro__prefill kernel and fall back to the slower sdpa_opt__multi_reg kernel. This happens because the check in sdpa_opt.cpp reads the global config get_kv_cache_precision() without verifying whether the SDPA node actually uses compressed KV cache ### Tickets: - CVS-185922 ### AI Assistance: - *AI assistance used: no / yes* - *If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).* Analyze validation report to summarize performance issue table of VLMs in the report. --------- Signed-off-by: Min, Byung il <byungil.min@intel.com>

References

#35831 - [GPU] Fix perf drop of 4bit-KV-cache of VLM

Author

byungilm

Parents

c63bd58e

openvino 3641e3e6 - [GPU] Fix perf drop of 4bit-KV-cache of VLM (#35831)

openvino
3641e3e6 - [GPU] Fix perf drop of 4bit-KV-cache of VLM (#35831)