onnxruntime
16d7f551 - lora conv1d replacement (#16643)

Commit
2 years ago
lora conv1d replacement (#16643) in LoRA code, it will use conv1d to do projection for qkv, while the conv1d calculation is mathematically equivalent to matmul, and matmul is much faster than conv1d. The subsitution of the graph optimizer is: 1 conv1d >> 2 split + 1 squeeze + group_num matmul + 1 concat with this optimizer, we see 10%+ in one 1P model
Author
Parents
Loading