openvino
f1c20045 - [CPU][PA] Align bidirectional image attention and sliding window interaction with reference (#35446)

Commit

26 days ago

[CPU][PA] Align bidirectional image attention and sliding window interaction with reference (#35446) ### Details: - When both bidirectional image attention and a sliding window were active, the CPU plugin computed the attention start index as: `start_idx = image_group_end - sliding_window` Whereas the correct way to do it (implemented by transformers and GPU Plugin) is to not clip the image attention, but instead pass it as a whole block, no matter what the sliding window size is. - An example Let's assume 6 image tokens and a sliding window of size 5. **Before** the fix: Attention mask: `[1, 6)` - first token cut **After** the fix: Attention mask: `[0, 6)` - full group If attention mask is being calculated for a text token, the attention is regular causal/sliding window, no matter if previous tokens were image or text, which matches transformers implementation. ### Tickets: - CVS-185393 ### AI Assistance: - *AI assistance used*: yes, to write tests and make sure the changes don't impact other aspects/inputs of PA --------- Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

References

#35446 - [CPU][PA] Align bidirectional image attention and sliding window interaction with reference

Author

p-wysocki

Parents

540f6b2c

openvino f1c20045 - [CPU][PA] Align bidirectional image attention and sliding window interaction with reference (#35446)

openvino
f1c20045 - [CPU][PA] Align bidirectional image attention and sliding window interaction with reference (#35446)