openvino
96b59f7f - [pa support SD]support pa kv cache reorder for post kv-update after speculative decode (#34875)

Commit
2 days ago
[pa support SD]support pa kv cache reorder for post kv-update after speculative decode (#34875) ### Details: - split kv reorder impl from https://github.com/openvinotoolkit/openvino/pull/33638 - Implement new internal op: pa_kv_reorder and enable unit tests <img width="1186" height="283" alt="image" src="https://github.com/user-attachments/assets/e533abba-0047-406b-b623-4f0dfeeb0fc4" /> - to co-work with GENAI PR: https://github.com/openvinotoolkit/openvino.genai/pull/3199 and OV PR: https://github.com/openvinotoolkit/openvino/pull/34644 - to run the pipeline: ./eagle_speculative_decoding_lm main_model draft_model max_output eagle3_branch_factor depth num_assistant_tokens prompt - perf gain: (branch_factor = 8, depth = 3, num_assistant_tokens = 15) <img width="962" height="158" alt="image" src="https://github.com/user-attachments/assets/b0bf753a-f954-4470-92e1-69eb1d23b63a" /> ### Tickets: - CVS-184480 ### AI Assistance: - *AI assistance used: yes - *If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).* --------- Signed-off-by: fishbell <bell.song@intel.com>
Author
Parents
Loading