[HLSL][Matrix] Add `half` type overloads to `mul` and exercise them (#185506)
PR #184882 was missing `half` type-specific overloads for `mul`.
This PR introduces `half` type-specific overloads for `mul` and
additional codegen tests for the half type.
Also added f16 tests for the lowering of llvm.matrix.multiply.
The offload test suite already has a `mul.fp16` test for exercising half
types at runtime, so no change is needed there.
Assisted-by: claude-opus-4.6