vulkan: Use larger loads in scalar/coopmat1 matmul (#15729)
I think glslang will translate an access like x[i][1].z to
OpAccessChain ... x, i, 1, 2
OpLoad float16_t ...
rather than loading all of x[i] in a single OpLoad. Change the
code to explicitly load the vector/matrix.