onnxruntime
730fab30 - Refactor Attention cuda kernel (#17578)

Commit
2 years ago
Refactor Attention cuda kernel (#17578) * Break QkvToContext into small functions. Each fused and unfused kernel will have separated function. * Move DecoderAttention kernel to separated file * Move KV cache related kernel to attention_kv_cache.cu ### Motivation and Context To make the code easier to maintain.
Author
Parents
Loading