[GPU] Enable SDPA Fusion for 3D inputs and implicit GQA broadcasting (#36029)
This PR enhances the UnsqueezeBroadcastReshapeSDPAFusion pass to capture
more flexible attention topologies, specifically targeting MQA and GQA
patterns.
Key Enhancements:
1) 3D Input Support: The pass now successfully intercepts and reshapes
3D Key/Value tensors feeding into Unsqueeze or Reshape nodes.
2) GQA Implicit Broadcasting: Previously, the fusion aborted if the
model required expanding KV heads to match a larger number of Query
heads. This PR introduces dynamic shape extraction to bypass explicit
Broadcast nodes. The SDPA kernel will now perform an implicit broadcast
in the registers—for example, natively mapping 2 KV heads across 32
Query heads (a 1:16 ratio)—saving massive memory bandwidth.
### Tickets:
- CVS-186566, CVS-187072
### AI Assistance:
- *AI assistance used: yes
- Debug and test generation