[Matrix] Fix cbuffers support for matrix element expr (#185471)
fixes #184877
This change was threefold.
1. copy the padded cbuffer from memory to a local alloca
2. switch to using the new `getFlattenedIndex` helpers for index
generation
3. convert row major to column major indicies in codegen depending on
LangOptions