Make requantize a qgemm post processor (#7850)
Description:
Change requantize interface so it can be processed block by block. This enable as to make requantize to be a post processor of QGEMM.
Motivation and Context
Previous changes show we improve performance by parallelize batch gemm. Unfortunately we could not parallelize the batch gemm in quantize_linear_matmul due to the requantize operation at the end of each gemm. By changing requantize to be a qgemm post processor, we now can parallelize the batch operation.
Co-authored-by: Chen Fu <fuchen@microsoft.com>