onnxruntime
d7b48f82 - [CUDA] Correct after_gather_dim for nibbled uint4 index (#26484)

Commit
41 days ago
[CUDA] Correct after_gather_dim for nibbled uint4 index (#26484) ### Description The after_gather_dim in CUDA backend now only supports uint8 dtype. This PR ensures indexing matches correctly in gather_block_quantized with nibbled 4bits weights. ### Motivation and Context This allows token_embeddings and lm_head tied in 4bit weights, which saves more room and compresses models further.
Author
Parents
Loading