Change head_size parameter dependent on qkv_hidden_size (#12933)
**Description**: Add qkv_hidden_size support in CUDA Attention Layer
implementation.
Changes include:
- Modify UT to test GPU and CPU implementation
- Add overload for CUDA kernel `AddBiasTransposeQKV` to support scenario
where V_HIDDEN_SIZE != QK_HIDDEN_SIZE
- Update variable names from `head_size` to `qkv_head_sizes[0]` or
`qkv_head_sizes[2]`
- Modify function definitions to allow communication of
`qkv_hidden_sizes` or `qkv_head_sizes`
Note that this feature is not supported in Rocm EP or quantized
attention right now.
**Motivation and Context**
- Why is this change required? What problem does it solve? The current
CUDA implementation of attention layer doesn't support the parameter
qkv_hidden_size added in the CPU implementation in PR
[8039](https://github.com/microsoft/onnxruntime/pull/8039)
- If it fixes an open issue, please link to the issue here.
Co-authored-by: Peter Mcaughan <petermca@microsoft.com>