[CPU][PA] Align bidirectional image attention and sliding window interaction with reference (#35446)
### Details:
- When both bidirectional image attention and a sliding window were
active, the CPU plugin computed the attention start index as:
`start_idx = image_group_end - sliding_window`
Whereas the correct way to do it (implemented by transformers and GPU
Plugin) is to not clip the image attention, but instead pass it as a
whole block, no matter what the sliding window size is.
- An example
Let's assume 6 image tokens and a sliding window of size 5.
**Before** the fix:
Attention mask: `[1, 6)` - first token cut
**After** the fix:
Attention mask: `[0, 6)` - full group
If attention mask is being calculated for a text token, the attention is
regular causal/sliding window, no matter if previous tokens were image
or text, which matches transformers implementation.
### Tickets:
- CVS-185393
### AI Assistance:
- *AI assistance used*: yes, to write tests and make sure the changes
don't impact other aspects/inputs of PA
---------
Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>