Allow generic pipeline to accept some params for cross attention (#16519)
Allow `GemmSoftmaxGemmPermuteGenericPipeline<T>` to be used in some
cross attention, that opt for rocblas instead of ck if rocblas is
better to the small problem. The improvement is ~20% e2e time reduction
on some test cases for whisper large.
**Note:** This is because ck has some performance issue if the sequence
length is merely 1, and should be improved in the future.