Add BF16 kernels in several ops for Gemma-3 (#26102)
### Description
This PR adds missing kernels for bfloat16 precision across several ops
in both the `ai.onnx` and `com.microsoft` domains.
1. `SkipLayerNormalization` [contrib
op](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftskiplayernormalization)
2. `Conv` for [opset 22](https://onnx.ai/onnx/operators/onnx__Conv.html)
3. `Pow` for [opset 15](https://onnx.ai/onnx/operators/onnx__Pow.html)
4. `AveragePool` for [opset
22](https://onnx.ai/onnx/operators/onnx__AveragePool.html)
This PR also enables weight-only quantization of a bfloat16 `MatMul` op
to a bfloat16 `MatMulNBits` [contrib
op](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftmatmulnbits).
### Motivation and Context
This PR enables running ONNX models from the Gemma-3 family that are
generated with bfloat16 precision.