Add support for custom position ids and attention bias to GQA CPU operator (#23944)
### Description
- Added support for custom position ids and attention masks to the GQA
CPU operator (fp32 and fp16)
- Added MLAS eltwise add kernel for mask application for FP32 and FP16
- Added unit tests for the added eltwise add MLAS kernel
- Modified python tests to test the new GQA inputs
### Motivation and Context
Custom position ids and attention mask are required in order to
implement speculative decoding in PhiSilica
### Benchmarks
All the benchmarks are executed on the GQA op configuration which will
be used in the PhiSilica speculative decoding secnario, and the
configuration is as follows:
- num_heads: 32
- kv_num_heads: 32
- do_rotary: 1
- local_window_size: -1
- head_size: 96
- sequence_length: 6
- packed_qkv: True
Benchmarks were executed on Cadmus with Snapdragon(R) X 12-core X1E80100
@ 3.40 GHz
In the tables below, column headers are total sequence length values
used for benchmarking, and the row values are if the attention bias was
used or not. Values are average inference time in ms over 100000 runs.
#### Fp16 results
| Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 |
2000 | 2500 | 3000 | 3500 | 4000 |
|:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------|
| Without bias | 0.284054 | 0.257449 | 0.275806 | 0.334123 | 0.458324 |
0.614133 | 0.912791 | 1.38585 | 1.92186 | 2.39203 | 2.88808 | 3.46262 |
| With bias | 0.250926 | 0.253072 | 0.279724 | 0.337774 | 0.499058 |
0.585388 | 0.914316 | 1.40701 | 1.87311 | 2.47475 | 3.3906 | 3.47474 |
| Runtime increase | -11.66% | -1.7% | +1.42% | +1.09% | +8.89% | -4.68%
| +0.17% | +1.53% | -2.54% | +3.46% | +17.4% | +0.35% |
#### Fp32 results
| Total sequence length | 50 | 100 | 250 | 500 | 750 | 1000 | 1500 |
2000 | 2500 | 3000 | 3500 | 4000 |
|:-----------------|:---------|:---------|:---------|:---------|:---------|:---------|:--------|:--------|:--------|:--------|:--------|:--------|
| Without bias | 0.259049 | 0.270541 | 0.304583 | 0.376708 | 0.554013 |
0.633217 | 1.20696 | 1.65985 | 1.95169 | 2.45807 | 3.05637 | 4.05169 |
| With bias | 0.261631 | 0.268002 | 0.300853 | 0.370452 | 0.529865 |
0.735216 | 1.43493 | 1.4385 | 1.99028 | 2.3858 | 2.99425 | 4.80197 |
| Runtime increase | +1.0% | -0.94% | -1.22% | -1.66% | -4.36% | +16.11%
| +18.89% | -13.34% | +1.98% | -2.94% | -2.03% | +18.52% |
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>