fix fp_layers issue and force to FP16 on cuda for autoround format inference (#326)
* fix merge error
* fix fp_layers issues
* Loosen the restrictions of lm-eval
* fix and add ut
* fix
* API usage does not support fuzzy match
* bugfix of UT
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
---------
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: WeiweiZhang1 <weiwei1.zhang@intel.com>