onnxruntime
af89496f - Allow generic pipeline to accept some params for cross attention (#16519)

Commit
2 years ago
Allow generic pipeline to accept some params for cross attention (#16519) Allow `GemmSoftmaxGemmPermuteGenericPipeline<T>` to be used in some cross attention, that opt for rocblas instead of ck if rocblas is better to the small problem. The improvement is ~20% e2e time reduction on some test cases for whisper large. **Note:** This is because ck has some performance issue if the sequence length is merely 1, and should be improved in the future.
Author
Parents
Loading