onnxruntime
Route fp16 HQNBIT_CompInt8 (4-bit and 8-bit) through fp32 MLAS path in MatMulNBits
#27820
Merged

Commits
  • Route fp16 HQNBIT_CompInt8 through fp32 MLAS path for 4-bit and 8-bit
    jambayk committed 91 days ago
  • Remove dead HQ4BitGemm_CompInt8 and HQ8BitGemm_CompInt8 MLAS code
    jambayk committed 90 days ago
  • lint
    jambayk committed 90 days ago
  • Fix HQNBIT_CompInt8 PrePack bugs for 4-bit and 8-bit
    jambayk committed 90 days ago
  • Address review: ORT_ENFORCE for scales, move SQNBIT check to GetComputeType
    jambayk committed 90 days ago
Loading