MLAS: MlasSgemm refactoring (#1749)
Refactor the SGEMM kernels to resynchronize the code between Windows/Linux and remove unneeded binary bloat from a different zero/add mode kernel. Another goal is to get to a cleaner state for then doing a DGEMM kernel.