onnxruntime
d7b48f82 - [CUDA] Correct after_gather_dim for nibbled uint4 index (#26484)

Commit

41 days ago

[CUDA] Correct after_gather_dim for nibbled uint4 index (#26484) ### Description The after_gather_dim in CUDA backend now only supports uint8 dtype. This PR ensures indexing matches correctly in gather_block_quantized with nibbled 4bits weights. ### Motivation and Context This allows token_embeddings and lm_head tied in 4bit weights, which saves more room and compresses models further.

References

#26484 - [CUDA] Correct after_gather_dim for nibbled uint4 index

Author

jixiongdeng

Parents

760eea48

onnxruntime d7b48f82 - [CUDA] Correct after_gather_dim for nibbled uint4 index (#26484)

onnxruntime
d7b48f82 - [CUDA] Correct after_gather_dim for nibbled uint4 index (#26484)