[QNN-EP] Add MatMulNBits translation for GPU (#26340)
### Description
Add support for translation of MatMulNBits contrib op to
QNN with FullyConnected operation with INT4 BlockQuantized weights
Implementation details:
- Translate MatMulNBits to FullyConnected in OpBuilder
- Support QNN_QUANTIZATION_ENCODING_BLOCK for INT4 weights
- Pass INT4 weights and quant params as BlockQuantization encoding
params in QNN
Testing:
- Added new unit tests for MNB -> QNN-GPU
- Validated all OnnxRuntime tests
- Validated the following LLMs through Olive and ORT-GenAI execution
flow
- LlaMA3.2 1B
- Qwen2.5
- DeepSeek-R1-Qwen 1.5b
- Phi3.5-mini-instruct
### Motivation and Context
LLMs with INT4 quantization pass in Olive will generate a model with
MatMulMBits contrib ops.
To run these ops via QNN-EP, MatMulNBits is translated to QNN
FullyConnected op with INT4 weights.
---------
Co-authored-by: tirupath-qti <tirupath@qti.qualcomm.com>