ZCode FastFormers changes (#5827)
* Add FBGEMM submodule
* Add fbgemm based per-channel quantization
* Add missing logic for pre-layernorm transformer model fusion
* add support for structured pruning architecture -fastformers
* Fix windows build
* Add a default behavior when head_size is not present for the backward compatibility
* Remove FBGEMM and default to tensor-wise quantization, column-wise quantization will be enabled later
* Fixed some unit test errors
* Fix windows compile error and unit test errors
* delete the option removed from the upstream
* Addresses review comments and fixes a merge error
* Remove commented out code
* add non-zero zp support
* support A and B scale with any dimensions
* fix build breaks
* fix warning in MSVC
* Fix bug for not checking original float value names when treat it as not existing.
* Clean up head size
* Clean up python tools
* Enable per column quantization
* fix quant weight cleanup bug
* A few code clean up
* Some code clean-up
* Some code clean-up
* Change option name
* update default value
* Rename option and parameter names
* Missing argument name change
* Add tests for quantization options for attention and matmul
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>