onnxruntime
d0b08af3 - Implementation of QAttention for the DNNL execution provider (#10004)

Commit

4 years ago

Implementation of QAttention for the DNNL execution provider (#10004) * Add QAttention to DNNL EP Add QAttention to DNNL EP (limited support and disable for gpu) update ONEDNN version to 2.4.4 bug fix in getcapability add memory debug print Signed-off-by: Wang <zhaoyang.wang@intel.com> * Address Code Review + MatMulInteger Fix clean up code and add comments fix matmulinteger and add fusion rule to enable initialized vector weight zero points of 0s update DNNL_TAG to v2.5 Signed-off-by: Wang <zhaoyang.wang@intel.com> * Linux Compile Fix + rollback ONEDNN to 2.4.4 Signed-off-by: Zhaoyang Wang <zhaoyang.wang@intel.com> * Fix QAttention Debug build Signed-off-by: Wang <zhaoyang.wang@intel.com> * Fix QAttention build if USE_DNNL not specified Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Wang <zhaoyang.wang@intel.com> Co-authored-by: MTC <63478620+jeyblu@users.noreply.github.com>

References

#10004 - Implementation of QAttention for the DNNL execution provider

Author

georgen117

Parents

78775532

onnxruntime d0b08af3 - Implementation of QAttention for the DNNL execution provider (#10004)

onnxruntime
d0b08af3 - Implementation of QAttention for the DNNL execution provider (#10004)