Support mxfp nvfp lmhead quant (#1051)
* fp8 exporting bugfix
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* refine exllama backend cuda UT
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* add lm_head layer act_max hook, enable mxfp/nvfp lm_head export
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fixtypo
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* fixtypo
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix ut typo
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* refine logs, fix pack_layer for awq&gptq
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* refine log, fix pack_layer for awq&gptq
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add awq&gptq lm_head UT
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix local path
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
---------
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>